A pixelated image of Barack Obama upsampled to the image of a white man has sparked another discussion on racial bias in artificial intelligence and machine learning.Posted to Twitter last weekend, the image was generated with an artificial intelligence tool called Face Depixelizer , which takes a low-resolution image as an input and creates a corresponding high-resolution image through machine learning generative models. The online tool utilizes an algorithm called PULSE—originally published by a group of undergraduate students at Duke University.
In the example posted online, a pixelated yet recognizable image of Obama is put through Face Depixelizer , resulting in the output of a high-resolution image featuring the face of a white man. Robert Osazuwa Ness, a research engineer and the creator of altdeep.ai, was able to run PULSE locally on his computer and came across the same issues discussed online—namely that the algorithm outputted the face of a white person when given the input of a down-sampled image of a person of color. Pixelated images of Alexandria Ocasio-Cortez and Lucy Liu also resulted in images of white women when processed with PULSE.The results illustrate the known problem of racial bias in AI, showing how algorithms perpetuate the bias of their creators and the data they're working with, in this case by turning people of color white.
Denis Malimonov is the programmer behind Face Depixelizer, and in an email to Motherboard said the tool is not meant to actually recover a low-resolution image, but rather create a new and imagined one through artificial intelligence.“There is a lot of information about a real photo in one pixel of a low-quality image, but it cannot be restored,” Malimonov said. “This neural network is only trying to guess how a person should look.”Still, as the side by side images began to spread online—along with other instances in which Face Depixelizer misread the faces of people of color—people took the opportunity to address the issue of racial bias often found in artificial intelligence and machine learning.
In an email to Motherboard, the authors of PULSE explained the algorithm itself was never trained on a dataset, but uses a model called StyleGAN—which was trained on a dataset of faces taken from Flickr. PULSE is then used to search through possible StyleGAN outputs for realistic faces of imaginary people that correspond to each low-resolution input image.“This dataset is more diverse than CelebA (a common dataset used in computer vision, comprised of nearly 90% white faces) but still has its own biases, which StyleGAN inherits and are well documented,” they said.As a result , PULSE produces white faces much more frequently than faces of people of color, which the authors admit. They also repeated Malimonov’s point that PULSE is not meant to be used to identify individuals whose faces have been blurred, or to recover an image from a low-resolution file.“It makes high-resolution faces of imaginary people, rather than recovering the original,” they said.“I wasn’t surprised at all. This is a known problem across machine learning,” Ness told Motherboard. He said what made this particular case stand out was the subject of the photo that first began spreading on Twitter.“Barack Obama is clearly an iconic person, so that when you looked at the blurred image, when you looked at the down-sampled image, it was clear to you who it was,” Ness said. “And so when it then turned it into something else it just became a much more poignant example of the problem.”Alexia Jolicoeur-Martineau, a PhD student in artificial intelligence, said more diversity among researchers is what is needed to tackle any issues of bias within machine learning.“Dataset is generally biased which leads to bias in the models trained. The methodology can also lead to bias,” Jolicoeur-Martineau said in an email. “However, bias inherently comes from the researcher themselves which is why we need more diversity. If an all-white set of male researchers work on project, it's likely that they will not think about the bias of their dataset or methodology.”