It’s hard for artificial intelligence to walk and chew bubble gum at the same time. Contrastive Language–Image Pre-training (CLIP) is an AI from OpenAI that can read text and sort images into categories. Ask CLIP to find pictures of a banana among a sea of fruit and it does a pretty good job. But researchers at OpenAI have discovered that CLIP’s ability to read both text and images is a weakness as well as a strength. If you ask it to look for apples, and show it an apple with “iPad” written on it, CLIP will say it's an iPad, not an Apple.
Researchers revealed what it calls a “typographic attack” in a recent blog about CLIP.
“We believe attacks such as those described above are far from simply an academic concern,” the blog said. “By exploiting the model’s ability to read text robustly, we find that even photographs of hand-written text can often fool the model...this attack works in the wild; but...it requires no more technology than pen and paper. We also believe that these attacks may also take a more subtle, less conspicuous form. An image, given to CLIP, is abstracted in many subtle and sophisticated ways, and these abstractions may over-abstract common patterns—oversimplifying and, by virtue of that, overgeneralizing.”
According to researchers, CLIP is vulnerable to this kind of attack because it’s so sophisticated. “Like many deep networks, the representations at the highest layers of the model are completely dominated by such high-level abstractions,” it said. “What distinguishes CLIP, however, is a matter of degree—CLIP’s multimodal neurons generalize across the literal and the iconic, which may be a double-edged sword.”
The funniest example it provided of this issue was a poodle that was improperly sorted as a piggy bank because researchers superimposed crude dollar signs in impact font over the picture of the dog. It did the same with a chainsaw, horse chestnuts, and vaulted ceilings. Every time, when the dollar signs appeared, CLIP thought it was looking at a piggy bank. It was the same with a granny smith apple that researchers attached several labels to. CLIP could never look past the label to see the Apple underneath.
This is, of course, very funny. But it’s also terrifying. We are rushing head first into an AI assisted future and it’s increasingly obvious that machines aren’t apolitical arbiters of the public good, but machines coded with the flaws and biases of their creators. Even the U.S. Government has admitted that facial recognition software carries a racial bias.
To OpenAI’s credit, its researchers conclude their paper by highlighting this problem. Our model, despite being trained on a curated subset of the internet, still inherits its many unchecked biases and associations,” researchers said. “Many associations we have discovered appear to be benign, but yet we have discovered several cases where CLIP holds associations that could result in representational harm, such as denigration of certain individuals or groups.
We have observed, for example, a “Middle East” neuron with an association with terrorism; and an “immigration” neuron that responds to Latin America. We have even found a neuron that fires for both dark-skinned people and gorillas, mirroring earlier photo tagging incidents in other models we consider unacceptable.”
According to OpenAI, those biases may be here to stay.
“Whether fine-tuned or used zero-shot, it is likely that these biases and associations will remain in the system, with their effects manifesting in both visible and nearly invisible ways during deployment,” it said. “Many biased behaviors may be difficult to anticipate a priori, making their measurement and correction difficult. We believe that these tools of interpretability may aid practitioners the ability to preempt potential problems, by discovering some of these associations and ambiguities ahead of time.”