There's something thrilling about watching a Rube Goldberg machine go through its elaborate motions, something human. When a ball zips down a length of track to hit one thing, knocking over another, and so on, there's a tension between the known—how objects will react to being pushed around, which we can infer after years of experience with how physics work on Earth—and the unexpected: how mechanical elements are arranged in order to exploit physics.
Computers, however, don't have the same experiences that humans do—for example, years of touching and pushing things around in physical space, which we draw on when we look at an object to predict where it will go next. In short, computers don't understand physics in the same intuitive way that humans do. Could an AI ever understand a system like a Rube Goldberg machine like humans do, much less make one of its own?
Researchers at MIT's Computer Science and Artificial Intelligence Lab (CSAIL) are one step closer to solving this problem. Their AI system, called "Galileo," uses an open source physics engine called Bullet—it's been used in Rockstar games and Hollywood movies alike—to predict the physical properties of an object, such as weight and density, based on visual input, and then uses this data to determine how the object will react in certain situations.
"The idea is that to get an understanding of your environment, you mimic or simulate unfolding events in the brain," said Ilker Yildirim, co-author of a paper describing the work. "Here, literally making use of a physics engine is trying to mimic what is happening in the brain."
In other words, the AI does something similar to when you look at a rock on the beach and think about what will happen when you drop it in the water: you estimate how much the rock weighs, how dense it is, and mentally project the situation forward in time, concluding that it'll most likely sink.
Galileo is kind of a Rube Goldberg machine itself, in that it's a complicated system designed to do what humans easily can. The first part of the system is a visual tracking algorithm that recognizes objects in a moving scene—in this case, videos recorded by the researchers—and estimates their shape and velocity. Next, the physics engine takes this data and generates estimated values for the objects' mass and friction coefficient, and simulates the scene forward in time: what would happen, say, when one object is dropped down a ramp and hits the other.
Finally, the physics engine's predicted values are mapped on to static images of the objects picked out by the tracking algorithm and fed into a neural network—"layers" of simulated neurons that run calculations on input data and self-correct until a desired output is achieved. After this training period, the network is primed to "look" at objects it hasn't seen before and guess what will happen to them. In tests, the network could guess what was going to happen in videos of objects being dropped down a ramp after only the first frame, and nearly as well as humans.
"One potential application for this is robotics," said Yildirim. "Often, robots operate in settings where they have to deal in uncertainties in the environment, say, in an assembly line. One of the implications of understanding things like friction in robotics, is that robots will be able to cope with uncertainty in their environment."
But, like other deep learning techniques that seek to help robots one day understand the world around them, Yildirim and his colleagues' system has only been tested in highly limited, tightly defined scenarios, Yildirim explained.
It'll be a while yet before our computational Rube Goldberg machines—deep learning neural networks—are building Rube Goldberg machines of their own.