In March, Twitter announced a partnership with IBM’s Watson to combat one of its most vexing and longest-simmering problems: harassment. Twitter has been the scene of some of the highest-profile and most egregious recent acts of online harassment, from Gamergate to Leslie Jones to the so-called troll storms of the alt-right.
While ending this problem, or at least mitigating it, has been a stated priority for CEO Jack Dorsey, Twitter’s deal with Watson, a computer best known for beating humans on “Jeopardy!”, reflects a Silicon Valley ideal that’s actually making harassment even tougher to solve — a belief that technology itself can root out doxing, rape and death threats, and slurs that plague open platforms.
Silicon Valley’s belief in this is understandable. Tech, after all, is the answer to hard problems, and even harder problems deserve better tech. Moreover, tech scales from a business perspective, whereas hiring lots of people doesn’t.
But hate speech in particular is a tough problem for algorithms — much tougher than screening for obscene images. That’s because harassment isn’t necessarily the words themselves, but the intent behind them, and that is very hard for current technology to identify.
“We need human intervention to label new data as hate speech because algorithms are not good at surmising intent,” said Angelina Fabbro, senior technical lead at the smart to-do list company Begin. “Maybe they will be one day, but they aren’t yet, and we need solutions today and not tomorrow.”
Watson’s natural language processing is very literal, working more at the sentence level, and it analyzes sentiment, which comprises five emotions: anger, disgust, fear, joy, and sadness. Because sentiment analysis is fact-based, it’s going to be easier for Watson to understand the intent behind “Pop music for $800, Alex” than “I think men are much smarter than women, it’s science.”
This isn’t to say Watson can’t help. But this is where you need people, and specifically researchers, to guide machine-learning systems in order to get to the intent of the words. Humans are good at understanding nuance, reading a situation, and identifying double meanings. Some algorithmic intervention can help mitigate harassment and alleviate pain for human moderators, but it needs to be balanced with human intervention.
A big challenge with understanding harassment is understanding irony — a major tenet of the internet is trolling culture, which involves humor. “Internet slang changes quickly, so a word may not have a violent or threatening meaning, but over time, it may develop one,” Fabbro said.
Consider the string of words the Anti-Defamation League classifies as the most popular white supremacist slogan in the world: “We must secure the existence of our people and a future for white children.” What makes this sentence hateful isn’t that the word “secure” precipitates “future” and “white children.” What makes it hateful is who says it and how they use it.
Google’s Jigsaw group has spent the past year or two researching online harassment and harassing speech in commenting sections of the New York Times, the Guardian, and the Economist, as well as in interactions with Wikipedia editors. The result was the creation of the Perspective API, which created new ratings for understanding “toxic” interactions by looking beyond the regular sentiment or tone and studying slang and casual hate speech.
Google specifically designed Perspective to be used with moderators. “The reason we suggest that people don’t use this for fully automated moderation is we don’t think the quality is good enough,” said Lucas Dixon, lead engineer on Perspective API. “It’s really easy to have biases in models, and it’s hard to tease out.”
Perspective is currently being used at the New York Times as a part of the moderators’ workflow; so far, it’s helped the paper turn on more comment sections for more articles.
Harassment isn’t just “how toxic something sounds” or how “angry” a user is. Harassment can be innocuous; it can be inherited racism or internalized misogyny. It’s “wow, you’re so articulate,” or “Did your boyfriend code that for you?” in addition to “I hope you get raped.”
Facebook recently announced it is hiring 3,000 human moderators on top of the 4,500 already working for the company for Facebook Live to mitigate the suicides, murders, and other violent acts turning up in users’ feeds.
But even that may not be enough. The Facebook Files, a Guardian series based on a leaked copy of Facebook’s guidelines for moderators, shows human workers crushed under the weight of rapid-fire decision-making over revenge porn, beheadings, hate speech, Holocaust denialism, and other things that violate Facebook’s standards. But at least they’re trying, and if CEO Mark Zuckerberg can understand the power of people when it comes to keeping digital platforms civil, anyone can.
Caroline Sinders is a user researcher and artist who works in machine learning, memes, and online harassment.