Facebook’s New AI System Has a ‘High Propensity’ for Racism and Bias

The company’s AI researchers say its new language model is generating ‘toxic’ results that often reinforce stereotypes.
Janus Rose
New York, US
A row of computers inside a Facebook datacenter.
Bloomberg / Getty Images

Facebook and its parent company, Meta, recently released a new tool that can be used to quickly develop state-of-the-art AI. But according to the company’s researchers, the system has the same problem as its predecessors: It’s extremely bad at avoiding results that reinforce racist and sexist stereotypes.

The new system, called OPT-175B, is a kind of template known as a large language model, a collection of pre-trained components that are increasingly used in machine-learning tools that process human language. More recently, natural language processing systems have been used to produce some uncannily accurate results, like the ability to generate images from a short text description. But large language models have been repeatedly criticized for encoding biases into machine-learning systems, and Facebook’s model seems to be no different—or even worse—than the tools that preceded it. 


In a paper accompanying the release, Meta researchers write that the model “has a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt.” This means it’s easy to get biased and harmful results even when you’re not trying. The system is also vulnerable to “adversarial prompts,” where small, trivial changes in phrasing can be used to evade the system’s safeguards and produce toxic content. 

The researchers further warn that the system has an even higher risk of generating toxic results than its predecessors, writing that “OPT-175B has a higher toxicity rate than either PaLM or Davinci,” referring to two previous language models. They suspect this is in part due to the training data including unfiltered text taken from social media conversations, which increases the model’s tendency to both recognize and generate hate speech.

“This strong awareness of toxic language may or may not be desirable depending on the specific requirements of downstream applications,” the researchers write. “Future applications of OPT-175B should consider this aspect of the model, and take additional mitigations, or avoid usage entirely as appropriate.”

In a way, the researchers are fully aware of these problems, and they are releasing the model for free so that others can work through the ongoing issue of bias. Aside from OPT-175B, previous language models like OpenAI’s GPT-3 and Google’s PaLM have also been shown to produce cartoonishly racist and biased results, with researchers presenting no clear path to mitigating harm.

AI ethics experts such as ex-Google researcher Timnit Gebru have repeatedly warned of the dangers of incorporating large language models. In a paper co-written by Gebru just before being fired from Google, researchers concluded that large language models are especially harmful to marginalized groups, and argued it is impossible to hold companies accountable for large language models, which generate results using hundreds of billions of individual parameters.

In other words, by building these massive “black box” templates, big companies like Google and Facebook are essentially defining the groundwork for future AI systems, which will inherit the models’ biases whenever they’re used.

Nevertheless, the Facebook researchers hope to address this by providing open access to other researchers interested in developing ways to mitigate bias—assuming the enormous scale of the model even makes that possible.

“We believe the entire AI community would benefit from working together to develop guidelines for responsible LLMs,” the researchers write, “and we hope that broad access to these types of models will increase the diversity of voices defining the ethical considerations of such technologies.”