If you've ever had a sibling that plays video games, this should be a familiar scene: they're playing the game, and you're sitting beside them on the floor, shoving Doritos into your face by the handful. You're also shouting, "UP! OK, NOW GO DOWN! NO, DOWN! WATCH OUT FOR THAT GUY BEHIND YOU!" Maybe, just maybe, you'll beat the game together.
This is essentially how undergraduate students at Stanford University recently taught AI to play a notoriously challenging game, Montezuma's Revenge for the Atari 2600. These fledgling computer scientists hope that this approach could one day be used to allow advanced robots and AI to learn how about the real world from average schmoes like me in the future.
"Everyone in the developed world is interacting with AI every day, whether they know it or not," said Russell Kaplan, a Stanford computer science student and one of the study's co-authors, in an interview. "Regular humans need to be able to talk to their machines."
To teach the AI to play Montezuma's Revenge, instead of taking the usual approach of training the AI model on pixel data from the game until it figures out how to win, they trained the model to understand natural human language and how it relates to in-game actions. The researchers then guided the AI with instructions like "get to the coin," or "jump to the rope."
After training the AI this way, the students report in a paper, posted to the arXiv preprint server, their AI was able to get a high score of 3500. This is impressive, but not the best—an AI model from Google's DeepMind lab achieved a score of 6600 in Montezuma's Revenge last year, the highest ever reported for a machine. However, the Stanford students note that they couldn't train their AI as thoroughly as DeepMind, due to a "limited computational budget." They are only undergrads, after all, but their approach still beat the AI with the next-highest score for Montezuma's Revenge in the OpenAI Gym by far.
Montezuma's Revenge is particularly difficult for computers because the game has very sparse rewards—like obtaining a key in a room filled with dangers—which makes it tough for machines to learn what the winning game actions are. DeepMind conquered this by building in new mathematical bonuses to drive AI exploration. The Stanford team's approach is different, because instead of pseudo-rewards, they put a human in the loop and trained the computer to understand what the humans' instructions meant.
The really exciting thing about this student project, though, is how it envisions better human-robot interactions. After all, people learn about the world from other people all the time. In the future, teaching your robot buddy to cook might actually mean teaching them to cook, as in, "Here, grab the spoon. Now, mix up the ingredients. You're getting it!"
"The way traditional AIs work is randomly mashing buttons until they get a reward from the environment, and then learning to mash buttons in the future," Kaplan said. "But in the real world, you'd have to get so lucky to do the correct random sequence of actions that it's virtually impossible to apply existing approaches productively."
But with natural language instructions from a human being in the training process, the whole process gets a little less random, and the AI or robot will be able to navigate the challenges of daily life.
Whoever said video games in university is a waste of time?
Get six of our favorite Motherboard stories every day by signing up for our newsletter .