Fooling Image Classification Networks Is Really Easy

March 28, 2017, 8:00am

Machine learning image recognition techniques can be beaten with only the slightest bit of near-invisible distortion, according to a paper posted earlier this month to the arXiv preprint server by a group of Swiss computer scientists. Suitable image perturbations are easy to reproduce and are, moreover, universal. The same tweak will foil a neural network regardless of the particular image that it’s been tasked with classifying. It’s a hell of a security flaw: Imagine, for instance, the ease with which someone might pollute the object-recognition involved in driverless cars.

Image classification is a central problem in state-of-the-art machine learning. The basic idea is of showing an algorithm enormous datasets of labeled images—like photos of dogs labeled “dog” and so forth—and eventually the algorithm will start to abstract features in those images that are more likely to indicate the presence of an actual dog (or whatever). Eventually, we wind up with great big machine learning models that can be used to classify new, unlabeled images.

Videos by VICE

These models are far from perfect, but we get some really impressive results. What the Swiss researchers are saying is that, given the introduction of their perturbation, we can have two nearly identical images of a dog—nigh indistinguishable to the human eye—and wind up with dramatically different predictions from the image classifier. A naive observer would really have no idea why the classifier returned erroneous results for one of the images.

“The existence of these perturbations is problematic when the classifier is deployed in real-world—and possibly hostile—environments, as they can be exploited by adversaries to break the classifier,” Pascal Frossard and colleagues write. “Indeed, the perturbation process involves the mere addition of one very small perturbation to all natural images, and can be relatively straightforward to implement by adversaries in real-world environments, while being relatively difficult to detect as such perturbations are very small and thus do not significantly affect data distributions.”

This is what the perturbations look like on their own, with each one corresponding to a different image classification network. When these are blended into an image, they become mostly invisible. The similarity here between the different perturbations means that the same basic idea generalizes well across the various classification networks.

To understand classification problems in machine learning, it helps to imagine just some points on a graph. Some of those points are one thing (“dog,” say) while the other points are another thing (“cat”). As a classification algorithm learns the difference, it parses these points and their positions on the graph, which correspond to different image features. Maybe y-axis values correspond to the sizes of different animals. Assuming animal size correlates to animal class, the points for dogs and cats will wind up kind of grouped together. The task of the image classifier is then to find a boundary between these groups so that, given a new, unlabeled data point, it can predict whether it’s a dog or cat based on where it is in relation to the boundary.

Because real-world data is complicated, the shape of this boundary is crucial. The Swiss group hypothesizes that their universal perturbation works by exploiting “geometric correlations between different parts of the decision boundary of the classifier.” They don’t seem entirely certain how it has such an outsized effect.

In any case, the takeaway is that image recognition can be broken almost effortlessly. As such, we can imagine an adversary using it to foil, say, a gun detection algorithm. One the other hand, we can imagine it as a future tool for protecting privacy in public spaces. At the very least, it points to the unsettledness of the technology in the first place.

Subscribe to pluspluspodcast, Motherboard’s new show about the people and machines that are building our future.