Games

How an AI Called the "World Tester" Helped Make 'Baldur's Gate 3' a Reality

We think of bug testing as people in an office, playing a game over and over. These days, it's more complicated.
October 23, 2020, 1:00pm
As screen shot from the video game Baldur's Gate 3.

The long wait for Baldur's Gate 3 entering early access took a little longer last month, when developer Larian announced it would need another week to get things in order. What caught my eye, however, was Larian's explanation for why the developer needed some more time:

"…the game still has to pass the 'World Tester.' The World Tester is a sort of AI super-gamer that plays through the game at incredible speed, stress testing everything and pushing it to its limits. This super-gamer is currently playing through, and the results are looking good but not perfect yet. We know that if the super-gamer doesn't break the game, there's less chance you will."

The, uh, world tester?

It's a fancy name that conjures images of SHODAN from System Shock, but in reality it's a different way of calling something that happens all the time in modern game development: automated QA. QA, or quality assurance, is how we usually think about squashing bugs in games, which itself conjures images of people in a room playing the same game for hours.

ss_dbc438ced67fd34e81071d50ccc39f7101b7b1f0.1920x1080.jpg

In reality, a lot of today's games are made with one working in conjunction with the other.

"It isn’t feasible for humans to run a full test sweep on every new code or content submission," said League of Legends developer Jim Merill in a 2016 blog about Riot's own automation efforts, "and, even if it were, it would require an army of testers to return results sufficiently quickly.'

You can even read an academic paper that explains how Electronic Arts came up with automated testing to help development with its football games all the way back in 2012.

Larian producer Octaaf Fieremans told VICE Games the world tester is the studio's "first line of defense against the most apparent issues a particular version of the game can have." It's a way of finding early red flags. When a new build of the game is available, the world tester takes a look, starts playing through the game, and can quickly find big issues like crashes.

The computers pull the new build, start playing, and issue a report on what it finds. Red flags, like pesky crashes, are automatically added to a database and simultaneously get sent to teams within Larian tasked with investigating them. It all happens without lifting a finger.

There are several computers in Larian that run the world tester, which means there is not, in fact, a single "world tester." There's several! There's not even a screen attached to them.

A version of the world tester has existed at Larian for years now, going as far back as 2013's Divinity: Dragon Commander. Then, the world tester required a lot of hands-on attention.

"When we were making Dragon Commander, it was automatically doing combat on several maps until the game eventually would crash," said Fieremans. "This still required someone to manually check if the game was still running, debugging the issue locally and then restarting the world tester. We’ve come a long way since then."

What the world tester is capable of changes based on the project in development. It wasn't always able to save and load games, for example, or run through dialog options. Nor could it track performance and produce a readable heat map that shows where problems are occurring and how frequently they occur. And it couldn't participate in combat on its own.

The advanced features of the world tester don't eliminate the need for humans. Generally, if the world tester does its thing on a build and it finds no problems after a few hours, that build is considered good to pass along and be played by normal QA. They're complimentary tools.

In most cases, human QA is handling the more advanced, stable versions of the game, trying to dig in and find quirks the world tester cannot. But sometimes they will cross paths.

"When under time pressure and on a build that we know does not have many changes, and in theory should be stable, we start regular QA and the world tester at the same time," said Fieremans. "This has proven to be very efficient near times of release, as both scenarios have already happened in the past: the world tester could pass a build while our QA rejects it, as well as our very human QA passing a build and the world tester finding issues purely due to it being better in some cases at covering a lot of ground and scenarios in an automated way."

One of the biggest advantages of the world tester, and automated QA in general, is the ability to repeat a task over and over. The world tester does not get bored.

"[People] are not as good at repeating the same steps over and over on each build in exactly the same way," said Fieremans.

All of this would be important and useful if Larian were making a video game the traditional way, where it's quietly developed and polished for several years, before it's released into the world. Recently, Larian has embraced early access, opting to develop large parts of their games in conjunction with the community over the course of several years. Baldur's Gate 3 is the most high-profile release from Larian yet, but it's still being made the exact same way.

Early access means players expect games to be rough and incomplete, for there to be more bugs than normal, but it doesn't mean Larian is just uploading the latest build off the assembly line. Fieremans described Larian's approach to early access, what they consider okay to release to their players, is "a playable and enjoyable experience." The world tester helps achieve that because of its ability to find crashes and other instability issues so fast.

"If our world tester flags several major instability issues during its first few hours, we cannot release a build like that to our players as they would also be crashing every five to ten minutes if applied to a large community," said Fieremans.

The same logic is applied to the tiny details its human QA teams are picking over, like ensuring that equipment stays consistent when saving and loading.

While early access is useful for finding bugs, a sort of crowdsourced QA, Fieremans said that's not what Larian is after during this period. The world tester helps guard against it.

"What we want most, is to offer players a version of the game that allows them to play and generate the feedback we don’t know about yet," said Fieremans. "This can only be achieved by removing the firstline issues that we know will make the game too hard to play or unenjoyable to them."

Follow Patrick on Twitter. His email is patrick.klepek@vice.com, and available privately on Signal (224-707-1561).