Skin condition image databases are notoriously biased toward lighter skin. Rather than waiting for the tedious process of collecting additional photos of illnesses such as cancer or inflammation on darker skin, one group intends to use artificial intelligence to fill in the gaps. It is developing an AI software to generate synthetic images of diseases on darker skin, which will be used in a tool to help diagnose skin cancer.
“Having real photographs of darker skin is the ultimate solution,” says Eman Rezk, a machine learning expert working on the project at McMaster University in Canada. “We need to find a means to close the gap until we obtain that data.”
However, other specialists in the field are concerned that employing synthetic images may pose new issues. According to Roxana Daneshjou, a clinical scholar in dermatology at Stanford University, the emphasis should be on adding more varied real photographs to existing databases. “Creating synthetic data appears to be a less difficult path than completing the hard work of creating a diverse data set,” she says.
AI is being used in dermatology in a variety of ways. Researchers are developing systems that can analyze photos of rashes and moles to determine the most likely sort of problem. Dermatologists can then use the results to aid in diagnosis. However, most tools are based on image databases that either don’t have many examples of disorders on darker skin or don’t have good information about the skin tones they include. As a result, it is difficult for organizations to be certain that a tool will be as accurate on darker skin.
That’s why Rezk and his team turned to computer-generated graphics. The project is divided into four major parts. The team had already evaluated existing image sets to determine how underrepresented darker skin tones were in the first place. It also created an AI algorithm that used photographs of skin problems on lighter skin to create images of those conditions on darker skin and validated the images provided by the model. “We were able to build high-quality synthetic photos with varying skin tones from the available white scan data thanks to breakthroughs in AI and deep learning,” Rezk explains.
The researchers will then blend the synthetic photos of darker skin with real photographs of lighter skin to develop a skin cancer detection software. According to Rezk, the system will constantly search image databases for new, real images of skin disorders on darker skin that can be added to the future model.
The team isn’t the first to develop synthetic skin photos; in 2019, a group led by Google Health researchers published a paper explaining a method for producing them, which could produce images with variable skin tones. (Google is interested in dermatological AI and unveiled a tool last spring that can identify skin disorders.)
According to Rezk, synthetic photos are a stopgap measure until more true images of situations on darker skin become accessible. Daneshjou, on the other hand, is concerned about utilizing synthetic images at all, even as a temporary solution. Researchers would have to carefully examine AI-generated photographs for any typical flaws that people would not be able to detect with the naked eye. This type of anomaly might conceivably bias AI program outcomes. The only method to ensure that synthetic images perform as well as actual images in a model is to compare them to genuine photographs, which are in low supply. “Then it gets back to, well, why not just work on getting more real images?” she adds.
Daneshjou expresses concern if a diagnostic model is based on synthetic images from one group and genuine photos from another, even if just temporarily. It is possible that the model will function differently on different skin tones.
She believes that relying on synthetic data may make individuals less motivated to advocate for actual, diverse imagery. “Are you going to keep performing the work if you’re going to do that?” she claims. “Instead of trying to perform this workaround, I’d prefer to see more people work on acquiring real, diverse data.”