It was only a matter of time before the wave of artificial intelligence-generated voice startups became a play thing of internet trolls. On Monday, ElevenLabs, founded by ex-Google and Palantir staffers, said it had found an “increasing number of voice cloning misuse cases” during its recently launched beta. ElevenLabs didn’t point to any particular instances of abuse, but Motherboard found 4chan members appear to have used the product to generate voices that sound like Joe Rogan, Ben Shapiro, and Emma Watson to spew racist and other sorts of material. ElevenLabs said it is exploring more safeguards around its technology.
The clips uploaded to 4chan on Sunday are focused on celebrities. But given the high quality of the generated voices, and the apparent ease at which people created them, they highlight the looming risk of deepfake audio clips. In much the same way deepfake video started as a method for people to create non-consensual pornography of specific people before branching onto other use cases, the trajectory of deepfake audio is only just beginning.
In one example, a generated voice that sounds like actor Emma Watson reads a section of Mein Kampf. In another, a voice very similar to Ben Shapiro makes racist remarks about Alexandria Ocasio-Cortez. In a third, someone saying “trans rights are human rights” is strangled. In another, Rick Sanchez from the animated show Rick & Morty says “I’m going to beat my wife Morty. I’m going to beat my fucking wife Morty. I’m going to beat her to death Morty.” (Justin Roiland, who voices Sanchez, recently appeared in court for a pre-trial hearing on charges of felony domestic violence. Roiland pleaded not guilty in 2020).
Do you know anything else about abuse of AI-generated voices? We'd love to hear from you. Using a non-work phone or computer, you can contact Joseph Cox securely on Signal on +44 20 8133 5190, Wickr on josephcox, or email firstname.lastname@example.org.
The clips run the gamut from harmless, to violent, to transphobic, to homophobic, to racist. One 4chan post that included a wide spread of the clips also contained a link to the beta from ElevenLabs, suggesting ElevenLabs’ software may have been used to create the voices.
On its website ElevenLabs offers both “speech synthesis” and “voice cloning.” For the latter, ElevenLabs says it can generate a clone of someone’s voice from a clean sample recording, over one minute in length. Users can quickly sign up to the service and start generating voices. ElevenLabs also offers “professional cloning,” which it says can reproduce any accent. Target use cases include voicing newsletters, books, and videos, the company’s website adds.
ElevenLabs’ website says the company was founded by Piotr Dabkowski, an ex-Google machine learning engineer, and Mati Staniszewski, an ex-Palantir deployment strategist. This month the company announced a $2 million pre-seed round led by Czech venture capital firm Credo.
On Monday, shortly after the clips circulated on 4chan, ElevenLabs wrote on Twitter that “Crazy weekend—thank you to everyone for trying out our Beta platform. While we see our tech being overwhelmingly applied to positive use, we also see an increasing number of voice cloning misuse cases.” ElevenLabs added that while it can trace back any generated audio to a specific user, it was exploring more safeguards. These include requiring payment information or “full ID identification” in order to perform voice cloning, or manually verifying every voice cloning request.
ElevenLabs did not respond to a request for comment.
Subscribe to our cybersecurity podcast, CYBER. Subscribe to our new Twitch channel.