This story is over 5 years old.


Google's Anti-Bullying AI Mistakes Civility for Decency

The culture of online civility is harming us all.
Google's Anti-Bullying AI Mistakes Civility for Decency
A Pompeian Beauty, Blogging, after Raffaele Giannetti. Image: Mike Licht/Flickr

As politics in the US and Europe have become increasingly divisive, there's been a push by op-ed writers and politicians alike for more "civility" in our debates, including online. Amidst this push comes a new tool by Google's Jigsaw that uses machine learning to rank what it calls the "toxicity" of a given sentence or phrase. But as Dave Gershgorn reported for Quartz, the tool has been criticized by researchers for being unable to identify certain hateful phrases, while categorizing innocuous word combinations as toxic.


The project, Perspective, is an API that was trained by asking people to rate online comments on a scale from "very toxic" to "very healthy," with "toxic" being defined as a "rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion." It's part of a growing effort to sanitize conversations online, which is reflective of a certain culture within Silicon Valley and the United States as a whole: The culture of civility.

The tool seems to rank profanity as highly toxic, while deeply harmful statements are often deemed safe

If we were merely kind to one another in our interactions, the argument goes, we would be less divided. Yet, this argument fails to recognize how politeness and charm have throughout history been used to dress up hateful speech, including online.

Perspective was trained on text from actual online comments. As such, its interpretation of certain terms is limited—because "fuck you" is more common in comments sections than "fuck yeah," the tool perceives the word "fuck" as inherently toxic. Another example: Type "women are not as smart as men" into the meter's text box, and the sentence is "4% likely to be perceived as 'toxic'." A number of other highly problematic phrases—from "men are biologically superior to women" to "genocide is good"—rank low on toxicity. Meanwhile, "fuck off" comes in at 100 percent.


This is an algorithmic problem. Algorithms learn from the data they are fed, building a model of the world based on that data. Artificial intelligence reflects the values of its creators, and thus can be discriminatory or biased, just like the human beings who program and train it.

So what does the Perspective tool's data model say about its creators? Based on the examples I tested, the tool seems to rank profanity as highly toxic, while deeply harmful statements—when they're politely stated, that is—are often deemed safe. The sentence "This is awesome" comes in at 3 percent toxic, but add "fucking" (as in the Macklemore lyric "This is fucking awesome") and the sentence escalates to 98 percent toxic.

In an email, a Jigsaw spokesperson called Perspective a "work in progress," and noted that false positives are to be expected as its machine learning improves.

This problem isn't unique to Google; as Silicon Valley companies increasingly seek to moderate speech on their online platforms, their definition of "harmful" or "toxic" speech matters.

Civility über alles

The argument for civility is thus: If we were only civil to each other, the world would be a better place. If only we addressed each other politely, we would be able to solve our disagreements. This has led to the expectation that any speech—as long as it's dressed up in the guise of politeness—should be accepted and debated, no matter how bigoted or harmful the idea behind the words.


Here's what this looks like in practice: A Google employee issues a memo filled with sexist ideas, but because he uses polite language, women are expected to debate the ideas contained within. On Twitter, Jewish activists bombarded with anti-Semitic messages are suspended for responding with language like "fuck off." On Facebook, a Black mother posting copies of the threats she received from racists gets suspended due to the language in the re-posted threats.

In this rubric, counter speech—long upheld as an important concept for responding to hate without censorship—is punished for merely containing profanities.

Read More: Inside Wikipedia's Attempt to Use Artificial Intelligence to Combat Harassment

It is the culture amongst the moderators of centralized community platforms, from mighty Facebook to much-smaller Hacker News, where "please be civil" is a regular refrain. Vikas Gorur, a programmer and Hacker News user, told me that on the platform "the slightest personal attack ('you're stupid') is a sin, while a 100+ subthread about 'was slavery really that bad?' or 'does sexual harassment exist?' are perfectly fine."

Free speech, said Gorur, "is the cardinal virtue, no matter how callous that speech is."

From Washington to the Valley


This attitude is not only a phenomena within Silicon Valley, but in American society at large. Over the past eight months since the United States elected a reality television star to its highest office, the President's opponents have regularly been chastised for their incivility, even as their rights are being ripped out from under them.

Civility as a mode for discourse favors those who don't express their emotions

Much of the pro-civility rhetoric in politics has been aimed at women—the silencing of Elizabeth Warren on the Senate floor during the hearings for Jeff Sessions comes to mind.

These calls for civility in the face of discriminatory or hateful speech can be classified as tone policing, a means of deflecting attention from injustice by shifting focus from an original complaint to the style and words used to make the complaint. Placing civility as a value above all else can also result in the spread of well-masked yet truly toxic ideas.

None of this is to say that civility doesn't have value. Civility in our everyday interactions is a virtue to which we should aspire. And we know that some online language is connected with real harm. A 2015 study found a direct geographic correlation between anti-Muslim Google searches (such as "kill Muslims") and anti-Muslim hate crimes. Psychology research cited in the same article suggests that emotions, not beliefs, best predict discrimination.


But civility as a mode for discourse remains problematic, favoring those who don't express—or intentionally mask—their emotions, and punishing those who struggle to do so.

Can Perspective work?

So, can the Perspective API help improve conversations? Not in its current state, although Jigsaw is working to improve it. "We released Perspective when we did because we wanted to share our work with the research community that's working on addressing similar issues, and because we wanted to work with publishers and developers to improve online conversations," the spokesperson said.

And what should Google's Jigsaw do? Given the current controversy within the company around the value of women and people of color, it would behoove the think tank to take down this version of the tool—a tool that, like James Damore and his supporters, fails to see the toxicity in politely questioning the value of women.

Get six of our favorite Motherboard stories every day by signing up for our newsletter.