Scientists Found a Way to Defeat a 'Near-Superhuman' Go-Playing AI

Researchers developed a rival adversarial AI to trick the fearsome KataGo model into losing games.
Scientists Found a Way to Defeat a 'Near-Superhuman' Go-Playing AI
Image: Olena Ruban via Getty Images
ABSTRACT breaks down mind-bending scientific research, future tech, new discoveries, and major breakthroughs.

Scientists have created a computer program capable of defeating a Go-playing AI that’s so good at winning, it’s long been called “near-superhuman.” 

One of the world’s oldest board games, Go is much like chess in that black and white pieces (or in this instance, stones) represent opposing players, but different in that the goal is to gain territory, not capture the other player’s king. Once considered a pastime that could teach people about human nature, the game has ironically been popularized in recent years by KataGo, a publicly-available AI that can play the game at the level of a “top human professional.”


Players have often used KataGo to test their skills, train for other matches, and even analyze past games, yet in a study posted recently on the preprint server arXiv, researchers report that by using an adversarial policy—a kind of machine-learning algorithm built to attack or learn weaknesses in other systems—they’ve been able to beat KataGo at its own game between 50 to 99 percent of the time, depending on how much “thinking ahead” the AI does.

Funnily enough, the new system doesn’t win by trumping KataGo all out, but instead by forcing KataGo into a corner, essentially tricking it into offering to endthe match at a point favorable to its adversary. “KataGo is able to recognize that passing would result in a forced win by our adversary, but given a low tree-search budget it does not have the foresight to avoid this,” co-author Tony Wang, a Ph.D student at MIT said of the study on the site LessWrong, an online community dedicated to “causing safe and beneficial AI.”

According to Adam Gleave, a Ph.D student at the University of California Berkeley and another co-author of the paper, the new AI system learned this strategy through reinforcement learning, a machine learning training method that involves rewarding wanted behavior when an AI takes an action that changes its environment and ignoring unwanted ones. “The nice thing about this [method] is that it lets you just specify what the objective of the task is without specifying how the AI system achieves it,” Gleave told Motherboard. “It’s just learning that by trial-and-error.”


Trained to handle a variety of different boards and rule-sets, KataGo learns via self-play, wherein an algorithm learns how to do something by playing against itself, in this case, millions of times. Inversely, Gleave’s algorithm was trained to attack via a method called victim-play, or from games between the adversary (itself) and a fixed victim agent, given data only from the turns where it's the adversary’s move, as researchers wanted the algorithm to exploit the other player, not mimic it. 

The unspoken rule of multi-player gaming would have us believe that if player A can be beaten by Player C, they must be inherently stronger than Player B. But what makes the team’s algorithm especially fascinating is that even though the program can easily beat a superhuman AI, it surprisingly doesn’t stand a chance against human amateurs. 

“In some ways, there's no reason to expect but it would beat humans. It's never played against a human, it’s never played against anything other than this very specific AI system,” Gleave says. Though such a shortcoming could be put down to humanity’s innate unpredictability, the study actually concludes that by taking advantage of KataGo’s blindspots, even professional-level AI have flaws to exploit. 

Going forward, the team plans to try and translate the attack to other games, like chess or shogi. Yet Gleave says their research isn’t just about training a video game-playing AI on how to fool a digital opponent. The main takeaway is that their work reveals new details about how AI systems think about and tackle problems. 

“Even if an AI system is performing at a very high level and seems to be as good as a human or better in many tasks,” Gleave says. “It might be performing that task in a way very different from a human, so you should expect that it's going to fail, and in actually very surprising and alien ways.”