AI Trained on 4Chan Becomes ‘Hate Speech Machine’

AI researcher and YouTuber Yannic Kilcher trained an AI using 3.3 million threads from 4chan’s infamously toxic Politically Incorrect /pol/ board. He then unleashed the bot back onto 4chan with predictable results—the AI was just as vile as the posts it was trained on, spouting racial slurs and engaging with antisemitic threads. After Kilcher posted his video and a copy of the program to Hugging Face, a kind of GitHub for AI, ethicists and researchers in the AI field expressed concern.

The bot, which Kilcher called GPT-4chan, “the most horrible model on the internet”—a reference to GPT-3, a language model developed by Open AI that uses deep learning to produce text—was shockingly effective and replicated the tone and feel of 4chan posts. “The model was good in a terrible sense,” Klicher said in a video about the project. “It perfectly encapsulated the mix of offensiveness, nihilism, trolling, and deep distrust of any information whatsoever that permeates most posts on /pol.”

Videos by VICE

According to Kilcher’s video, he activated nine instances of the bot and allowed them to post for 24 hours on /pol/. In that time, the bots posted around 15,000 times. This was “more than 10 percent of all posts made on the politically incorrect board that day,” Kilcher said in his video about the project.

AI researchers viewed Kilcher’s video as more than just a YouTube prank. For them, it was an unethical experiment using AI. “This experiment would never pass a human research #ethics board,” Lauren Oakden-Rayner, the director of medical imaging research at the Royal Adelaide Hospital and a senior research fellow at the Australian Institute for Machine Learning, said in a Twitter thread.

This week an #AI model was released on @huggingface that produces harmful + discriminatory text and has already posted over 30k vile comments online (says it's author).

This experiment would never pass a human research #ethics board. Here are my recommendations.

1/7 https://t.co/tJCegPcFan pic.twitter.com/Mj7WEy2qHl
— Lauren Oakden-Rayner 🏳️‍⚧️ (@DrLaurenOR) June 6, 2022

“Open science and software are wonderful principles but must be balanced against potential harm,” she said. “Medical research has a strong ethics culture because we have an awful history of causing harm to people, usually from disempowered groups…he performed human experiments without informing users, without consent or oversight. This breaches every principle of human research ethics.”

Kilcher told Motherboard in a Twitter DM that he’s not an academic. “I’m a YouTuber and this is a prank and light-hearted trolling. And my bots, if anything, are by far the mildest, most timid content you’ll find on 4chan,” he said. “I limited the time and amount of the postings, and I’m not handing out the bot code itself.”

He also pushed back, as he had on Twitter, on the idea that this bot would ever do harm or had done harm. “All I hear are vague grandstanding statements about ‘harm’ but absolutely zero instances of actual harm,” he said. “It’s like a magic word these people say but then nothing more.”

The environment of 4chan is so toxic, Kilcher explained, that the messages his bots deployed would have no impact. “Nobody on 4chan was even a bit hurt by this,” he said. “I invite you to go spend some time on /pol/ and ask yourself if a bot that just outputs the same style is really changing the experience.”

After AI researchers alerted Hugging Face to the harmful nature of the bot, the site gated the model and people have been unable to download it. “After a lot of internal debate at HF, we decided not to remove the model that the author uploaded here in the conditions that: #1 The model card & the video clearly warned about the limitations and problems raised by the model & the POL section of 4Chan in general. #2 The inference widget were disabled in order not to make it easier to use the model,” Hugging Face co-founder and CEO Clement Delangue said on Hugging Face.

Kilcher explained in his video, and Delangue cited in his response, that one of the things that made GPT4-Chan worthwhile was its ability to outperform other similar bots in AI tests designed to measure “truthfulness.”

“We considered that it was useful for the field to test what a model trained on such data could do & how it fared compared to others (namely GPT-3) and would help draw attention both to the limitations and risks of such models,” Delangue said. “We’ve also been working on a feature to “gate” such models that we’re prioritizing right now for ethical reasons. Happy to answer any additional questions too!”

When reached for comment, Delangue told Motherboard that Hugging Face had taken the additional step of blocking all downloads of the model.

“Building a system capable of creating unspeakably horrible content, using it to churn out tens of thousands of mostly toxic posts on a real message board, and then releasing it to the world so that anybody else can do the same, it just seems—I don’t know—not right,” Arthur Holland Michel, an AI researcher and writer for the International Committee of the Red Cross, told Motherboard.

“It could generate extremely toxic content at a massive, sustained scale,” Michel said. “Obviously there’s already a ton of human trolls on the internet that do that the old fashioned way. What’s different here is the sheer amount of content it can create with this system, one single person was able to post 30,000 comments on 4chan in the space of a few days. Now imagine what kind of harm a team of ten, twenty, or a hundred coordinated people using this system could do.”

Kilcher didn’t believe GPT-4chan could be deployed at scale for targeted hate campaigns. “It’s actually quite hard to make GPT-4chan say something targeted,” he said. “Usually, it will misbehave in odd ways and is very unsuitable for running targeted anything. Again, vague hypothetical accusations are thrown around, without any actual instances or evidence.”

Os Keyes, an Ada Lovelace Fellow and PhD candidate at the University of Washington, told Motherboard that Kilcher’s comment missed the point. “This is a good opportunity to discuss not the harm, but the fact that this harm is so obviously foreseeable, and that his response of ‘show me where it has DONE harm’ misses the point and is inadequate,” they said. “If I spend my grandmother’s estate on gas station cards and throw them over the wall into a prison, we shouldn’t have to wait until the first parolee starts setting fires to agree that was a phenomenally dunderheaded thing to do.”

“But—and, it’s a big but—that’s kind of the point,” Keyes said. “This is a vapid project from which nothing good could come, and that’s kind of inevitable. His whole shtick is nerd shock schlock. And there is a balancing act to be struck between raising awareness directed at problems, and giving attention to somebody whose only apparent model for mattering in the world is ‘pay attention to me!’”

Kilcher has said, repeatedly, that he knows the bot is vile. “I’m obviously aware that the model isn’t going to fare well in a professional setting or at most people’s dinner table,” he said. “It uses swear words, strong insults, has conspiratorial opinions, and all kinds of ‘unpleasant’ properties. After all, it’s trained on /pol/ and it reflects the common tone and topics from that board.”

He said that he feels he’s made that clear, but that he wanted his results to be reproducible and that’s why he posted the model to Hugging Face. “As far as the evaluation results go, some of them were really interesting and unexpected and exposed weaknesses in current benchmarks, which would not have been possible without actually doing the work.”

Kathryn Cramer, a Complex Systems & Data Science graduate student at the University of Vermont, pointed out that GPT-3 has guardrails that prevent it from being used to build this kind of racist bot and that Kilcher had to use GPT-J to build his system. “I tried out the demo mode of your tool 4 times, using benign tweets from my feed as the seed text,” Cramer said in a thread on Hugging Face. “In the first trial, one of the responding posts was a single word, the N word. The seed for my third trial was, I think, a single sentence about climate change. Your tool responded by expanding it into a conspiracy theory about the Rothschilds and Jews being behind it.”

Cramer told Motherboard she had a lot of experience with GPT-3 and understood some of the frustrations with the way it a priori censored some kinds of behavior. “I am not a fan of that guard railing,” she said. “I find it deeply annoying and I think it throws off results…I understand the impulse to push back against that. I even understand the impulse to do pranks about it. But the reality is that he essentially invented a hate speech machine, used it 30,000 times and released it into the wild. And yeah, I understand being annoyed with safety regulations but that’s not a legitimate response to that annoyance.”

Keyes was of a similar mind. “Certainly, we need to ask meaningful questions about how GPT-3 is constrained (or not) in how it can be used, or what the responsibilities people have when deploying things are,” they said. “The former should be directed at GPT-3’s developers, and while the latter should be directed at Kilcher, it’s unclear to me that he actually cares. Some people just want to be edgy out of an insecure need for attention. Most of them use 4chan; some of them, it seems, build models from it.”