This Guy Made a ‘Game Boy Supercomputer’ That Can Handle 1 Billion Frames Per Second

Teaching AI how to transfer knowledge from one game to another is a tricky task, but some clever hardware hacking offered a solution.
The Distributed A3C Setup at IBM
Image: Kamil Rocki

Machine learning algorithms are able to dominate humans at our toughest board games, our classic video games, and even some modern ones. But they still have some big limitations, largely having to do with memory—or rather, a lack of it.

Kamil Rocki, a computer scientist who works on AI at IBM Research, recently created a “supercomputer” for machine learning applications made from thousands of virtual Game Boys that he described in an October blog post as “arguably the fastest 8-bit console cluster in the world.” This cluster of virtual Game Boys is designed to fuel an AI that is learning how to play Game Boy games as part of a project that Rocki hopes will help develop more efficient machine learning algorithms and robust memory for AI.


But let’s back up—what’s all this about AI and memory? Computer scientists train an artificial neural network (a type of computing architecture loosely based on the human brain) so that it can perform a particular task, like beating a specific game. If a neural net that had mastered Tetris tried to learn a more complex game like Super Mario Bros., however, that neural network would basically be starting from scratch and wouldn’t be able to draw upon its experience with Tetris to learn Super Mario Bros. faster.

Rocki’s Game Boy computing cluster is a tool that he hopes will help solve this major problem in contemporary AI research: knowledge transfer between machine learning domains. In 2015, Google’s DeepMind demonstrated that a single neural network was capable of mastering several different Atari 2600 games. This was a step toward memory for neural networks, yet as Rocki explained in a blog post describing his Game Boy “supercomputer,” there wasn’t that much difference in these games in terms of their complexity.

Space Invaders being played at 100MHz, one-fourth of the FPGA's full speed. Image: Kamil Rocki

For most of the Atari games mastered by DeepMind’s neural net, the relationship between a player’s action with the joystick and its results on screen are made explicit through immediate feedback. This is remarkably different from games like Prince of Persia, where a player’s actions might not produce immediate feedback and there is not an explicit score on the screen at all times.


To tackle these games with AI, Rocki explained in his blog post, he realized that he would need a neural net that would have to be able to play games fast and run several games at the same time. “Imagine you could finish Prince of Persia in 1/100th of the time and run 100,000 games at the same time,” Rocki wrote.

The neural network would also have to be implemented on a console that has a wide variety of games available to it that aren’t super resource intensive to run, so that its ability to use prior knowledge on more complex games could be tested.

Rocki considered several different consoles for his research, he wrote, including an arcade version of Space Invaders, the Atari 2600, and the Nintendo Entertainment System (NES), and the Game Boy Classic. The problem, however, was that each of these systems maxed out at about 3,000 frames per second. If Rocki wanted to truly boost the pace of machine learning, he would have to figure out how to run hundreds of millions of frames per second on one of these platforms.

Ultimately, Rocki settled on the Game Boy because the console boasts over 1,000 games to choose from with a lot of variance in terms of complexity. The Game Boy Classic’s 160-by-144 two-bit color screen is also easy to process, which makes things easier on the system running the games.

Here, Kamil Rocki has tried to organize Game Boy games by their complexity according to his own subjective judgment. Games like Space Invaders are on the left and less complex, whereas Pokemon and Prince of Persia are in the upper right and are more complex. Image: Kamil Rocki


To do this, Rocki emulated the Intel 8080 CPU found at the core of most of these game consoles (the Game Boy used a CPU that was nearly the same as the 8080, with a few small console-specific tweaks) in a field-programmable gate array (FPGA), a beefy, specialized computer chip that can be used to emulate other hardware at scale.

For example, the 8080 CPU used in the original Space Invaders arcade console ran at 1 million cycles per second (1 MHz). Yet when this CPU is emulated in a FPGA, it can be clocked up to 400 MHz, which is like running a game at around 24,000 frames per second. Not only that, but a single FPGA can emulate 100 of these CPUs at a time, giving a total of 2.4 million frames per second on a single FPGA chip.

Rocki told me in an email that 1296 FPGA chips were wired together to produce around 1 billion frames per second from hundreds of emulated Game Boys. (This could, theoretically be accomplished on as few as 50 FPGA chips, but Rocki said he and his colleagues didn’t “push them to their limits.”) Only one physical Game Boy was used for testing, Rocki said.

Hardware accelerated Tetris, played at 100MHz, roughly one-fourth of the full speed. Image: Kamil Rocki

As Rocki wrote in his blog, tests using his Gameboy supercomputer have so far been quite successful. According to his October blog, he sees his tool as part of a wider trend that will wed AI algorithms with advanced hardware solutions in the next decade. This may well be the trend that finally sets us on the road to stronger artificial intelligence, which could, at least in part, have been created with a 90s video game console.