One of the highest compliments you can pay a fan mod for a game is that it can feel indistinguishable from something made by the original developer. This helps explain the enthusiastic response to a recent trailer for A Night to Remember, a new story for The Witcher 3. It looks great, and more specifically, sounds great, especially the new lines of dialogue for the main character, Geralt. Maybe they managed to get the original voice actor?
Nope. Instead, the reason it's so scarily accurate and realistic is thanks to a combination of lines spliced together from existing game dialogue and, importantly, new lines made with a speech synthesis tech called CyberVoice by a Russian company called Mind Simulation.
There are samples of Geralt on the CyberVoice website, and they're impressively eerie.
"All voices are parodied by AI," reads the website. "The demo is for demonstration purposes only and is not intended to offend anyone."
It's actually impossible to tell which lines in the trailer have been made with the help of CyberVoice and which lines are clever audio editing. Not knowing is, of course, the point.
Geralt's longtime voice actor, Doug Cockle, did not respond to a request for comment.
It's not the first time we've seen this used, either. An ambitious Skyrim fan project, Wyrmstooth, used Google's text-to-speech AI Tacotron 2 to simulate the voices of Skyrim voice actors they, understandably, did not have participating. It was featured in a trailer.
The designer behind this Witcher 3 mod, nikich340, told Waypoint until recently they were not familiar with how CyberVoice worked, but that the resulting lines came out "pretty well." nikich340 wasn't sure how many lines of dialogue were in the mod, which takes about an hour to finish, but speculated there were roughly 45 lines of dialogue made by the algorithm.
"I am not sure how they did it and what AI or algorithm [was] used," said nikich340, who said a Mind Simulation representative appeared on a community Discord and brought up the CyberVoice tech, which lead to him investigating whether it would be useful for upping the quality of the mod.
The problem for creators telling new stories in high-end productions like The Witcher 3 is that fans have certain expectations, like hearing from Geralt. Another option, nikich340 pointed out, was to find a voice actor who could emulate Cockle's lines and provide their services for free, since mods are largely made by fans without extra financial resources. That's what nikich340 did for one of the mod's other characters, the vampire Orianna.
Automation and algorithms are quickly (and often problematically) integrating into different parts of the world. The same is true of game development, and like anything else, it comes with a lot of unanswered ethical questions. There are areas where it makes immediate sense, like the way a number of big budget video games with massive worlds, including Baldur's Gate 3, have automated ways of poking and prodding their games for glitches.
"CyberVoice does not seek to replace professional voice actors," said Mind Simulation CEO Derikyants in an email. "AI can’t play a role. The first stage of voice acting is only an actor."
The website for CyberVoice says it "shares the royalties with the voice authors," and seeks to be more than just a piece of software that pulls off a cool trick or exploits voice actors.
"I was both impressed, and…queasy? Tech reaching a persuasive level of sophistication is a singularity point which forces deeper concerns about how (or whether) people understand our craft into the open."
In theory, the way the system works is an actor registers with CyberVoice and, according to Derikyants, "receives royalties depending on the volume of future synthesized speech." The company envisions a world where instead of fans building their own AI-driven Geralt voices, developers are uploading voices for fans to play with, encouraging modders to use them.
In theory, anyway.
"I was both impressed, and...queasy?" said voice actress Sarah Elmaleh, upon watching the Witcher 3 mod trailer. "Tech reaching a persuasive level of sophistication is a singularity point which forces deeper concerns about how (or whether) people understand our craft into the open."
Elmaleh has had prominent roles in a variety of games, recently Anthem and Gears 5.
She mentioned some "alarm" among performers aware of this relatively new tech, and in particular, wondered what it would mean for how actors are paid. What if a game developer needs a new line, the actor is unavailable, and they decide to give this fancy new AI a try? And what are the creative implications when the tech eventually becomes "good enough"?
SAG-AFTRA, the union that represents many voice actors, is already aware of similar kinds of tech, coming out publicly in support of legislation related to exploitative use of deepfakes.
"There will always be stuff that lingers around the edges and grey areas that are harder to deal with," said SAG-AFTRA COO and general counsel Duncan Crabtree-Ireland to Motherboard in 2019, "but if we could make a dent in the rather extraordinary volume of stuff that's going on that will be welcomed and beneficial to the people who are the targets of it. It's not going to be absolute or perfect, but it's better than the Wild West situation we're in at the moment."
"As an actor, a performance consultant, and a voice director, I'm constantly advocating for the spontaneous alchemy that happens inside an actor, in a session," said Elmaleh. "When I'm in a flow state as an actor, even I have no idea what I'm about to sound like, I may not even know how I'm about to feel—and that's more or less how you know it's really working."
There's a real difference between the implications for fan projects and professional endeavors, too. Before Elmaleh was hired to actually be in a BioWare game, she voiced several characters for a Dragon Age: Origins mod. This happens pretty frequently, too.
"I think the more developers think of performance as a collaboration," she said, "rather than just a rote process by which you extract an asset, the better for everyone—actors, devs and players alike."
The impact of AI on voice actors is no longer a question of when. Instead, it's how much.