This Handwriting Algorithm Could Help Robots Understand Causality

If a robot sees a crushed can, how will it know what to do with it?

|
Dec 10 2015, 6:27pm

Image: Science

Imagine a crushed can of soda, lying on the ground. Intuitively, you probably know what happened to it. You can visualize it, even. Maybe someone crumpled it in their hands, or angrily crushed it under their heel, or even slammed it onto their forehead while shouting "DETONATOOOOR!"

Whatever the actual explanation, humans have the absolutely unremarkable ability to run through all of these possible scenarios in our heads and establish the most likely series of events that led to a particular outcome. But while humans do this almost without thinking, it's actually exceedingly difficult for a computer to do the same.

These days, complex programs called neural networks are actually pretty good at recognizing objects like a crushed can of soda, but they can't necessarily infer how the can got that way, or why. In the future, it won't be enough for a robot to look at a crushed can and simply recognize it for what it is and move on—nobody wants a future full of robotic litterbugs.

If the robot knows that the can was once perfectly serviceable and filled with soda, but someone drank it and crushed it, however, then it could infer that it's garbage, and recycle it. For machines to act as competently as humans in the real world, they have to be able to infer an object's history in order to properly interact with it; essentially, understand and infer causality.

A team of computer scientists at MIT, New York University, and the University of Toronto believe they've made a breakthrough when it comes to computers mimicking this very human ability. In a paper published today in published today in Science, the researchers describe how they trained an algorithm on a large set of pen strokes from hand-drawn characters in 50 different languages, and then presented the algorithm with images of hand-drawn characters that it hadn't seen before. The algorithm then guessed at several possibilities for how the character was drawn, pen stroke by pen stroke, and then draw the character itself.

In a very limited way, the computer was able to "look" at a thing—a letter—and infer how it was made.

Watch more from Motherboard: Inhuman Kind

"Imagine you have a robot that you want to have human-level object recognition‚ recognizing the objects around it," Ruslan Salakhutdinov, one of the paper's co-authors at the University of Toronto, said. "You have to build a system that can learn tens of thousands of different categories, and look at novel objects and very quickly understand them and recognize them. The question is how to do that."

After the initial training, the algorithm was able to recognize, deconstruct, and re-draw a new character after being exposed to it once. Salakhutdinov calls this "one-shot learning." The trick lies in using a Bayesian algorithm, which is a simpler kind of machine learning than more generalist neural networks with potentially dozens of "layers" of simulated neurons. The Bayesian approach that the researchers took, which is based on probabilities and inferences, is also far less data-hungry than most neural networks, which require thousands upon thousands of examples before they can reliably recognize an image or generate a new one.

"People are very good at doing this task," Salakhutdinov said. "If I give you a Korean character, you don't need to see thousands of characters to recognize what it is, or how to draw it. But it's very difficult for a computer."

The possible applications of the researchers' algorithm aren't all so futuristic, Salakhutdinov

said. An algorithm that can reliably break an image of an object down into its constituent parts and make a pretty good guess as to how it got that way could be used in robotic speech recognition.

"Say you hear the word 'Chewbacca,' for the first time—what's a 'Chewbacca?'" Salakhutdinov explained. "And then you get the answer, and you know what it is. In language acquisition, these things come up all the time, and humans are very good at it. In the space of images it's the same problem when you see a new object. These same principles could be applied to other domains."

After all, what good is a robot that can recognize a crushed can of soda if it doesn't know that it's trash?