How Cellphone Camera Images Can Fool Machine Vision
Grainy images that would be clear enough to human eyes confuse a machine.
Finding some grainy imperfections in your smartphone photo is an unavoidable reality of digital photography (especially in low light conditions), but it's not going to stop you from recognizing who or what you've photographed. However, that might not be true of machines that use Google's computer vision software to "see."
According to a new report, Google researchers found that the accuracy of the company's image recognition algorithms often failed when they were challenged with grainy, less-than-perfect pictures.
We're already using machine vision for all sorts of purposes, including facial recognition, image identification, and self-driving vehicles. This study looked at Google's software specifically, which is one of the best machine vision systems out there. It suggests that there could be real limitations to a growing number of such systems, which is important to deal with as we decide how much to trust the devices that technology makers claim can "see" for us.
Adversarial images aren't a new problem in the field of machine learning. These pictures, which have a specifically engineered type of grainy noise, have been used to throw a wrench into image classification software: Changes to an image that would be just about imperceptible to a human eye, like blurry pixelation, can totally mess up a computer's ability to correctly identify what it is.
"It was found out, a few years ago, that it is possible to modify the input image and it will confuse the image recognition system," said Alexey Kurakin, one of the authors of the report and researcher at Google Brain, who spoke to me over the phone from California.
"Let's say an image of an elephant," he continued. "You modify the image slightly with this noise that is hard for the human eye to see very well. [If] you give it to your image recognition system, now the image recognition system thinks it's no longer an elephant, but an airplane or a car," even though a human eye likely wouldn't be confused by the same trick.
Before now, this vulnerability had only been tested by uploading an image to the classification system directly. Kurakin and his team tried something different. They took cellphone pictures of printed images that were increasingly modified with that special kind of noise—random data that, again, would not stop us from seeing an elephant, but would throw the computer vision system for a loop.
He and his team found that the software still misclassified items, in the most noisy cases as often as 97 percent of the time.
According to the paper, the cellphone images, which were input into the Inception 3 neural network (which is Google's really, really smart image identification algorithm), were captured "without careful control of lighting, camera angle, distance to the page."
In other words, these images looked a whole lot like what would be produced not in the lab, for research purposes, but out in the real world.
"Prior to my paper, they directly fed the image to neural network," said Kurakin. "This is important because, if you have a file with the image, you have fine-grain control over each pixel." In other words, previous research had generated the noise in the individual pixels and then let the machine analyze the picture.
Kurakin and his team have now shown that a snapshot from a regular camera with no modifications has the same worrisome effect.
To get over this hurdle, scientists will need to tinker with both the image recognition software itself, and the data that's used to train it. Maybe machines can be become familiar with flawed images that way.
Until then, it's a reminder that while we begin to realize the incredible promise of new machine learning technology, it still has fundamental weaknesses that could be exploited.