Shootings are an epidemic in the US, but federal funding for research into gun violence has been in a deep freeze since 1996, thanks in part to the NRA-backed Dickey Amendment, which prevents the Center for Disease Control from pursuing research "to advocate or promote gun control."
Basically, humans can't get money to research the problem of gun violence in the US. To get around this, some scientists want machines to do the job.
On September 25, University of Pennsylvania computer scientists Ellie Pavlick and Chris Callison-Burch unveiled a new, human-annotated database of gun violence incidents in the US at the Bloomberg Data for Good Exchange Conference in New York. The database was created by workers on Amazon's Mechanical Turk platform, and carefully highlights information from thousands of news articles over the course of several years, Pavlick told me in an interview.
Soon, the team plans to release the complete dataset to machine learning researchers for them to train their algorithms. The idea is that machines could soon automatically maintain a research-grade database of gun deaths in the US without expensive human labor, and in near real-time. In other words, they could do it faster and more cheaply than we could.
"Right now, we have a lot of tools well-suited to the task, and we just need people to adapt them," Pavlick said.
"Once the system is up [and] running, it would essentially run for free," she continued. "We'll need some human workers to clean up the data, but it will be very cheap compared to having doctors or other bureaucrats creating these databases."
The specific type of machine learning that Pavlick and Callison-Burch have their eye on is called natural language processing, or NLP. This type of work focuses on machines being able to glean useful information from natural human text—not an easy task since language often has multiple meanings, and can be sarcastic or ironic.
"Step one is being able to make fact-based arguments"
Natural language processing also has a problem with bias. Since many NLP databases are created by humans, they may carry intentional or unintentional racist or sexist undertones thanks to the language workers use when annotating sources. This kind of bias could create a lot of harm if it skewed the results of a database of gun deaths in the US, an issue that cuts across matters of class, sex, and race.
"The questions that we ask the annotators to answer are very objective. It's mostly highlighting words in the text instead of providing their own judgements," Pavlick said of this problem. "The annotation ends up being the full text of the article and these extracted pieces of information like names and locations."
While deeper analysis on the dataset, where questions of prediction bias might come into play, could certainly be accomplished at a later time, Pavlick said, the focus now is closing a massive information gap in researchers' understanding of the awful consequences of gun violence in the US.
"People speculate and say things like mass shootings are always caused by Islamic terrorists, or black-on-black crime," Pavlick said. "People say these things, and the data's not there. Step one is being able to make fact-based arguments about this stuff."
Get six of our favorite Motherboard stories every day by signing up for our newsletter.