This story is over 5 years old.


This Algorithm Could Show When the Next Genocide Is About to Happen

Called Umati, or “crowd” in Swahili, the program monitors dangerous speech on Twitter and Facebook.
Skulls at Nyamata Memorial Site, Rwanda. Photo: Fanny Schertzer/Wikimedia Commons

Take note, Nostradamus: There could be another proselytizer of potential violence in the market, except this one has the potential to be more accurate. And it speaks in 1s and 0s.

Researchers at the iHub Data Lab in Kenya are building an algorithm that has the potential to show early warning signs of violence across the world. Called Umati, or "crowd" in Swahili, the program monitors dangerous speech on Twitter and Facebook. Experts say that inflammatory speech can foreshadow ethnic violence, and even genocide. The algorithm is expected to be released for public use at the beginning of 2016, and the team will first use the algorithm in South Sudan, home to a brutal civil war that has displaced more than 2.2 million people.


"I am a believer that AI will someday be better able to track hate speech than humans, although I don't know how long it will take," Sidney Ochieng, a project coordinator of Umati, told me. "Much of speech relies around context, which is hard to code, but humans also have biases when we monitor hate speech."

The Umati algorithm scavenges the internet in search of a "bag of words", or key-phrases which a human inputs. The list is like a rap-sheet of hate-speech—words regarding tribe, nationality, gender, and sexual orientation. With its collection of bottom-feeder language on the web, the algorithm takes into account how influential the speaker is and how hateful the speech is and gives it a ranking.

Basically, the user inputs a bunch of phrases, and the algorithm searches for them, and then ranks them on potentially dangerous speech. Think of it as pressing "control +f" on Twitter and Facebook, and then ranking each post based on the number of hits it gets. Then, a human goes through and double checks.

"Dangerous speech has preceded episodes of terrible intergroup violence, over and over in many different contexts"

Because Umati tracks online sentiment and dangerous speech, the accuracy of the algorithm's ability to detect ethnic violence depends on the connection between online speech and offline action in a society.

"Dangerous speech has preceded episodes of terrible intergroup violence, over and over in many different contexts," said Susan Benesch, a faculty associate of the Berkman Center for Internet and Society at Harvard University.


In Rwanda the traditional use of the word "Inyenzi", or cockroach, was a description of Tutsi militia members who carried out cross-border raids on the Rwandan army. Like the bug, the Tutsi militia were hard to eliminate. But after its repeated use on radio stations in the run up to the 1994 genocide, the meaning of "Inyenzi"changed to encompass all Tutsi, militia or not. A human would have easily picked up that "Inyenzi"became a keyword for dangerous speech, but an algorithm might have had more trouble.

Much of speech depends on the context of words, and programming an algorithm to decipher new niche phrases like "Inyenzi" isn't easy. The hope is that the Umati algorithm will be able to self-learn and analyze large networks to identify dangerous speech before humans can in the future. But for now, Umati will rely on humans to select the "bag of words" when they unveil the algorithm in South Sudan.

Social media in South Sudan has exacerbated divides between ethnic groups, according to the United States Institute of Peace, and the radio stations in the country have reportedly urged men to rape women based on perceived ethnic and political loyalties.

There is a potential that the algorithm could be used by some, specifically authoritarian governments, to vilify members of the political opposition. Because hate speech laws in some countries are arbitrary, they can be used as a pretext for arresting members of the opposition. Yet there is no proof yet that dangerous speech directly causes violence.

"We have to be careful not to claim a direct causal connection between speech online and violence," Benesch told me. "Someone might be more likely to commit violence because of something he heard, but we can't say that he did the violence only because of what he heard."

In other words, if authoritarian governments—or anyone—used the algorithm as a surveillance tool, its utility would be limited. In 2013, a previous version of Umati actually found that there was little connection between online threats and real world action in the run up to the presidential election. In an effort to avoid ethnic violence that broke out after elections in 2007, users on Twitter and Facebook regularly confronted and condemned dangerous speech, one of the best ways to counter its effects.

Yet that phenomenon might just be unique to Kenya at that time. When the algorithm is released, Ochieng, the project coordinator, marveled at the potential uses that journalists or political campaigns would find with Umati. "It would be interesting to track online sentiment and measure the effect of Donald Trump."