AI Can Be Fooled With One Misspelled Word
When artificial intelligence is dumb.
Computers are getting really good at learning things about the world and applying that knowledge to new situations, like being able to identify a cat in a photo. But researchers have recently discovered that there's a hitch: by adding some digital noise to an image, imperceptible to the human eye, it's really easy to trick a machine.
Now, researchers have engineered a similar way to fool AI trained to understand human language. One can imagine how this might pose a risk if, for example, machines are one day autonomously vetting legal documents.
In a paper posted to the arXiv preprint server this week, a team of computer scientists from the Renmin University of China in Beijing describes their system for fooling a computer trained to understand language. In short, by adding one very specific word to a particular part of a long sentence, or slightly misspelling one word in a phrase, a computer trained to say whether a sentence is about buildings or corporations will completely flip its analysis.
But that's a bit abstract, so here's how it works in practice. In one example, the researchers took a long passage containing multiple sentences describing the 1939 film Maisie. On the first pass, the AI was 99.6 percent sure about categorizing the passage as being about film—a correct analysis. But, when the researchers slightly misspelled one word ("film" to "flim"), all of a sudden the computer guessed that the sentence was about companies with 99 percent accuracy. (Why companies came up remains a bit of a mystery.) This is despite the passage containing other, correctly spelled instances of the word "film."
Clearly, something is wrong, and it's called an adversarial example.
These are, broadly speaking, things that an AI has been trained to recognize but are rendered inscrutable to computers after a few tweaks that humans either can't observe, or can easily ignore. Previous research has shown that a computer can be made to think a stop sign is a yield sign, which would lead to serious problems if this were to happen to a self-driving car.
The researchers undertook this study because, they write, research into adversarial examples—like the aforementioned cat picture sprinkled with digital noise—has so far largely ignored ways to trick a machine trained to understand language. But how worried should anybody be about this? Right now, not very.
The researchers note that their approach was highly specialized and they did a good bit of reverse-engineering to discover specific ways to trick their AI. Moreover, it took a lot of human work to generate the adversarial examples.
Now that they know it's possible, they have to get ahead of attackers who might exploit this vulnerability. Up next: figuring out how to launch a large-scale attack in the wild.
Subscribe to Science Solved It , Motherboard's new show about the greatest mysteries that were solved by science.