As the U.S. Food and Drug Administration considers opening the door to a wider array of artificially intelligent medical devices, new research suggests that medical AI systems can perpetuate racial bias in health care in ways humans don’t understand and can’t detect.
An international group of doctors and computer scientists recently announced that AI systems trained to analyze X-rays, CT scans, mammograms, and other medical images were able to predict a patient’s self-reported race with a high degree of accuracy based on the images alone. The systems made accurate race predictions even when the images they were analyzing were degraded to the point that anatomical features were indistinguishable to the human eye.
Most concerningly, according to the paper’s authors, the team was unable to explain how the AI systems were making their accurate predictions.
“That means that we would not be able to mitigate the bias,” Dr. Judy Gichoya, a co-author of the study and radiologist at Emory University, told Motherboard. “Our main message is that it’s not just the ability to identify self-reported race, it’s the ability of AI to identify self-reported race from very, very trivial features. If these models are starting to learn these properties, then whatever we do in terms of systemic racism ... will naturally populate to the algorithm.”
In recent years, other research has exposed racial bias in medical AI algorithms, but in those cases the cause for the bias could be explained. For example, one high-profile study found that a health care algorithm was underestimating how sick Black patients were because its predictions were based on historical cost of care data. Hospitals, in general, spend less on treating Black patients.
Gichoya and the team went to great lengths to find a similar explanation for their findings. They examined whether the race predictions were influenced by biological differences such as denser breast tissue. They investigated the images themselves to see whether the AI models were picking up on differences in quality or resolution to make their predictions, perhaps because images of Black patients came from lower-quality machines. None of their experiments explained the phenomenon.
The findings “really complicate” the prevailing thinking about how to mitigate bias in diagnostic AI systems, Kadija Ferryman, a New York University anthropologist who studies ethics and health care technologies, told Motherboard. Many AI developers in the field hope that they can build systems that don’t perpetuate medical racism by removing data that can be used as a proxy for race (such as zip code) and training their systems on datasets that are actually representative of the population.
“What this article is saying is that might not be good enough,” Ferryman said. “These initial solutions that we’re thinking might be sufficient actually might not be.”
The research comes at a critical time for AI in medicine. In January, the FDA announced that it’s considering changing the way it regulates medical device software in a way that would open the door for more advanced and complicated algorithms.
Currently, the FDA only approves fixed AI medical devices—those that are trained on a specific set of data and don’t change or improve as they process more data. But the agency will likely soon begin allowing non-fixed algorithms that learn and evolve as they work.
Even algorithms that have already been approved, though, have often not been examined to see if they perform differently for people of different races and aren’t monitored on an ongoing basis.
Dr. Bibb Allen, chief medical officer for the American College of Radiology’s Data Science Institute, told Motherboard the new research from Gichoya and the team is another warning that the FDA should require medical AI to undergo bias testing and regular monitoring.
“The concept of autonomous AI is really concerning because how do you know if it breaks if you’re not watching it.” Allen said. “Developers are going to have to pay closer attention to this kind of information.”