Data sets that train artificial intelligence and machine learning technology may not be representative of the population as a whole, studies have revealed.
Data that is too narrow in focus can sharp racial disparities. Subsequently, outcomes for marginalized populations can be worse. This is definitely the case if caregivers are not aware of biases that arise from this technology.
Dr. Art Papier, CEO of vendor VisualDx and associate professor of dermatology at the University of Rochester in New York, believes it’s time for a change. He is concerned about the dangers of over-reliance on AI, and believes there should be a shift from “artificial” to “augmented” intelligence.
In this Healthcare IT News interview, he draws from his technical and clinical experiences to explain how to train models on more diverse image and data sets, and why he believes this strategy is key for providing clinicians with reliable and equitable resources that augment decision making, overcome knowledge gaps, and promote greater health equity and outcomes.
Q. Studies reveal that data sets used to train artificial intelligence and machine learning often lack representative data. How does this happen?
A. Training sets for artificial intelligence and machine learning technology are often determined by the demographic of the geographically based health system. Health systems and hospitals are seeing patients in particular parts of the United States or world, and these unique populations can differ vastly from one location to the next.
For example, health systems operating in areas without large communities of patients of color may lack representative data to adequately train models to treat such populations. In this type of scenario, training sets are skewed toward the overall distribution of the demographics within that particular health system’s patient population.
This can be problematic if the training sets are tailored to one portion of the population over another, which leads to knowledge gaps and biases that may creep into the system or impair clinical decision making.
Developers of machine learning models should strive to access training sets from diverse organizations. To avoid limited training sets, the collection and identification process needs to be thorough and purposeful to ensure training sets are diverse and representative of the entire patient population being treated.
It is important to compile training sets from multiple geographic locations to represent all populations and skin types. Machine learning developers might consider partnering with global organizations to ensure these models are equitable and result in fair, consistent outcomes, regardless of skin color.
Q. Relying on tools built on narrow data can exacerbate racial disparities and lead to worse outcomes for marginalized populations, especially if providers are not aware of the common biases and pitfalls stemming from this type of technology. What are steps healthcare provider organizations can take to avoid this happening?
A. Healthcare professionals using machine learning models that are developed on limited populations will absolutely be exposed to biases. Health systems implementing these technologies must understand the patient base they are developed from, and the criteria of the test sets used to measure the accuracy of the machine learning models.
Healthcare professionals have numerous and rigorous ways to evaluate the accuracy of these models, such as developing a test set that represents different populations. In the case of dermatology, experts should make sure that the imagery training the models includes a robust collection of images on light, dark and mid-tone skin types.
Additionally, the images training the models must be accurately labeled and tagged to define the pictures. Accurate assignments of metadata tags are paramount to depicting and differentiating possible diagnoses of infectious disease and other complex conditions presenting on the skin, especially on different skin types.
Q. You believe there are dangers of over-reliance on AI that healthcare professionals and technologists must be wary of, and that the perception should shift from “artificial” to “augmented” intelligence. Please explain.
A. It is dangerous to think healthcare professionals will use software they think is 100% accurate when no software can be infallible. Diagnosis and clinical reasoning are fraught with ambiguity and complexity, and because of this grayness, we are far from having a single artificial intelligence system to accurately diagnose conditions across all medical specialties.
In the future, narrow domains will have specific tools created for them. For example, we now have FDA-approved algorithms for diabetic retinopathy with the software choosing between whether or not the diagnosis is diabetic retinopathy. I expect this type of application will expand over time to cover other retinal diseases.
As these tools are created, it is unrealistic to think one will be introduced and can make accurate diagnoses across all of medicine. The industry must ensure that medical professionals know which narrow domains are covered within this technology, and to what degree of accuracy.
The real danger is when healthcare professionals over-rely on the technology and assume its capabilities are fully precise. Clinicians need to be prepared for technology in the exam room as more of a means to augment their intelligence and not replace it.
As medical professionals, we have these incredibly impactful tools at our disposal to become even more precise, but ultimately, the human is the deciding factor. We must enhance medical education so healthcare professionals are trained in how to use these tools and understand their power as well as their weaknesses.
Q. Based on your technical and clinical experiences, please explain how to train models on more diverse image and data sets.
A. In training imagery, we have different image types such as radiologic, ophthalmologic, endoscopic, as well as images of the skin and mucosa, which are my areas of expertise. Machine learning models that are used to detect skin conditions need high-quality images of the skin and correct tags on the imagery.
A challenge with collecting and training diverse image sets are the many pigmentation differences in skin. Recognizing the signs of a disease can be challenging for different skin types as increased blood flow to the skin would present as redness in light skin and darker brown in darker, more pigmented skin.
I have seen deeply pigmented patients that have very faint clues of important infectious diseases or drug reactions that are subtle to the naked eye and often missed by generalists. We need to make sure these patients are not misdiagnosed. This is a challenging, ever-evolving area of clinical practice that will continuously need to be improved.
Another key and understated element is the quality of the imagery and how the images itself were photographed or captured. Images need to be taken with care and properly illuminated so you can see the subtle redness or differences in complex conditions across all skin types. There are real photography, optical and perceptual issues one has to consider.
Capturing these images also means providing the proper equipment, education and training that is essential to taking quality photos and measuring data that will be effective. In telemedicine and tele-dermatology, we often have to contend with poorly taken pictures, which shows a clear need to improve the image quality and data sets for effective machine learning.
Email the writer: firstname.lastname@example.org
Healthcare IT News is a HIMSS Media publication.