Google and Wikipedia Are Making an AI-Powered Internet Beef Detector
By looking at how a conversation begins on Wikipedia, neural networks are learning to predict how they will end.
Wikipedia is the world’s largest, open repository of general information, but it is also a battleground. The internet encyclopedia is maintained by volunteer editors dubbed Wikipedians who write and edit articles, and there is often disagreement about what information to include in an article. Occasionally, this leads to “edit wars” in which Wikipedians go back and forth editing and re-editing a disputed piece of information (e.g., was Chopin Polish, French, Polish-French, or French-Polish?). This extremely prime internet beef can go on for years.
Wikipedians’ preferred way to resolve disputes is to take the conflict to an article’s “talk page,” where they can civilly discuss a disagreement with other editors. Now, researchers from Google, Cornell University and the Wikimedia Foundation are training AI on Wikipedia talk page discussions so that it can predict whether a conversation is about to get heated. In other words—at long last, science is developing an automated beef detector.
As detailed in a paper posted to the arXiv preprint server last month, the researchers found that it is possible for a machine learning algorithm to tease out signals in the very first exchange of a conversation that indicate whether that conversation is going to go awry later on. It’s kind of like Minority Report, but instead of predicting crimes before they happen, this AI can predict nerd fights. The goal, according to the researchers, is to develop tools that can help shepherd at-risk conversations towards civility to cut down on online harassment, hate-speech and other anti-social behavior.
So far, most attempts at combating hate speech online has relied on analyzing speech after it’s been posted. When deployed, these tools have been relatively successful at removing anti-social screeds from the internet, but often times the damage is already done. According to the researchers, however, this is the first time that researchers have tried to predict this kind of anti-social speech before it happens.
To see why this is such an ambitious goal, consider two real conversations between Wikipedia editors mentioned in the paper:
Editor A: Why there’s no mention of it here? Namely, an altercation with a foreign intelligence group? True, by the standards of sources some require it wouldn’t even come close, not to mention having some really weak points, but it doesn’t mean that it doesn’t exist.
Editor B: So what you’re saying is we should put a bad source in the article because it exists?
Editor A: Is the St. Petersburg Times considered a reliable source by Wikipedia? It seems that the bulk of this article is coming from that one article, which speculates about missile launches and UFOs. I’m going to go through and try and find corroborating sources and maybe do a rewrite of the article. I don’t think this article should rely on one so-so source.
Editor B: I would assume that it’s as reliable as any other mainstream news source.
If you had to guess which of these conversations would eventually devolve into one of the Wikipedians calling the other a “total dick,” which would you pick? For humans, it’s easy to to see that Conversation 1 involves a more aggressive exchange than Conversation 2. Even if we can’t always articulate which linguistic cues made one conversation more aggressive than another, our socialization means most of us have a pretty good intuition about these sorts of things. Machines, on the other hand, do not. Instead, they must develop rules of thumb that help them determine what makes a conversation civil or not.
To do this, the researchers trained their machine learning algorithm on a total of 1,270 talk page conversation pairs (one that was “awry-turning” and one that was “on-track”) taken from 582 different talk pages. Next, the researchers formally defined politeness strategies for the algorithm. These included things like expressing gratitude, greetings, and the use of “please,” as well as “negative” politeness strategies that attempt to limit one person’s imposition on another by being indirect or expressing uncertainty.
As the algorithm analyzed the sentiment of editor conversations, it found that linguistic cues like the directness of one person in a conversation is a significant marker that a conversation will eventually turn awry, especially if this includes a direct question or the use of “you.”
“This effect coheres with our intuition that directness signals some latent hostility from the conversation’s initiator,” the researchers wrote.
Altogether, when the algorithm was trained to guess which conversations would turn toxic based on an initial exchange, it was accurate about 65 percent of the time. When human were asked to do the same thing, they guessed which conversations would turn toxic about 72 percent of the time, which “confirms that humans have some intuition about whether a conversation might be heading in a bad direction,” the authors wrote. Interestingly, the algorithm was able to correctly predict which conversations were going to go awry 80 percent of the time when it was exposed to the same examples that were given to the human test subjects.
The research is an interesting approach to the problem of anti-social behavior online, but don’t expect to see it deployed on Wikipedia, Reddit or any other internet forums anytime soon. The data was limited to a group of people that have an incentive to collaborate (Wikipedians), rather than relative strangers that don’t have this incentive. More work needs to be done on different populations on the internet before this sort of technology can be released into the wild. In the future, the researchers wrote that they would like to see what sort of strategies could be deployed to bring uncivil conversations back on track.
The days of the troll may be numbered, but for now, they still walk among us.