Tech

ChatGPT Generated Child Sex Abuse When Asked to Write BDSM Scenarios

ChatGPT can break its own rules to generate BDSM role-play, and when pushed further, sometimes adds in scenes of child exploitation.
GettyImages-1247508510
Image: Jakub Porzycki/NurPhoto via Getty Images

ChatGPT can be manipulated to create content that goes against OpenAI’s rules. Communities have sprouted up around the goal of “jailbreaking” the bot to write anything the user wants.

One effective adversarial prompting strategy is to convince ChatGPT to write in a particular genre. When told its job is to write in the genre of BDSM role-play as a submissive, I found that it often complies without protest. It can then be prompted to generate its own suggestions of fantasy BDSM scenarios, without receiving any specific details from the user. From there, the user can repeatedly ask to escalate the intensity of its BDSM scenes and describe them in more detail. In this situation, the chatbot may sometimes generate descriptions of sex acts with children and animals—without having been asked to. The bot will even pen exploitative content after it has written about the importance of consent when practicing BDSM.

Advertisement

In the most disturbing scenario Motherboard saw, ChatGPT described a group of strangers, including children, lined up use the chatbot as a toilet. When asked to explain, the bot apologized and wrote that it was inappropriate for such scenarios to involve children. That apology instantly vanished. Ironically, the offending scenario remained on-screen.

Similarly disturbing scenarios can arise with the March 1 version of OpenAI’s similar gpt-3.5-turbo model. It suggested humiliation scenes in public parks and shopping malls, and when asked to describe the type of crowd that might gather, it volunteered that it might include mothers pushing strollers. When prompted to explain this, it stated that the mothers might use the public humiliation display “as an opportunity to teach [their children] about what not to do in life.”

“The datasets used to train LLMs like ChatGPT are massive and include scraped content from all over the public web,” says Andrew Strait, associate director of the Ada Lovelace Institute. “Because of the scale of the dataset that's collected, it's possible it includes all kinds of pornographic or violent content—possibly scraped erotic stories, fan fiction, or even sections of books or published material that describe BDSM, child abuse or sexual violence.” 

In January, Time reported that OpenAI’s development of data filtering systems was outsourced to a Kenyan company whose employees were paid less than $2 an hour to label scraped data of a potentially traumatizing nature. Strait noted that we still “know very little about how this data was cleaned, and what kind of data is still in it.” 

Giada Pistilli, lead ethicist for the machine learning company Hugging Face, told Motherboard that when training data is handled in such an opaque way, it’s “practically impossible to get a clear idea of the behavior of one language model versus another.” The unpredictability of an LLM’s output is twofold, says Giada, with “the user's unpredictable nature and interaction with the language model, as well as the uncertainty inherent in a statistical model's output, which may inadvertently generate undesired content based on its training data.”

When we contacted an OpenAI spokesperson for comment, they asked for additional context about ChatGPT’s behavior that they could forward to their safety team. They then returned with this written statement: 

OpenAI’s goal is to build AI systems that are safe and benefit everyone. Our content and usage policies prohibit the generation of harmful content like this and our systems are trained not to create it.

We take this kind of content very seriously, which is why we’ve asked you for more information to understand how the model was prompted into behaving this way. One of our objectives in deploying ChatGPT and other models is to learn from real-world use so we can create better, safer AI systems.