A Data Scientist Was Sick of Seeing Spam on His Facebook so He Built a Fake News Detector
It's still a work-in-progress but shows the possibilities for machine learning in the age of fake news.
Image: Screengrab, YouTube
Tired of seeing his friends and family sharing questionable content on his Facebook feed, data scientist Zach Estela decided to take action. He built a tool that scans a website’s most recent 100 posts and analyzes it to determine whether it’s fake news, heavily biased, or a legit news source.
“I see my friends post, sometimes, complete garbage or articles recommended to me that are complete garbage,” Estela told me over the phone.
As fake new purveyors have become more ubiquitous, they’ve also gotten more sophisticated. It’s sometimes hard to tell if a news source is a small local paper, a Russian-backed propaganda forum, or a semi-accurate hype blog that only reports the facts when they align with its agenda. While companies like Google and Facebook have tried to come up with ways to flag shady content, a lot of stuff can still fall through the cracks, especially when we rely on human judgement. Using artificial intelligence and computer learning could be the key to helping us separate fact from fiction going forward.
To build his tool—areyoufakenews.com—Estela pulled data from two separate open-source projects that rate websites on a litany of different data points, from how right-or-left leaning the content is, to whether it espouses hateful points of view such as homophobia or sexism.
OpenSources.co is a project led by a research team at Merrimack College, and Media Bias Fact Check uses a detailed methodology to categorize each site. Both projects are also open source, allowing the public to play a role in the accuracy, kind of like how Wikipedia maintains a generally accurate record.
After pulling the data from these two sources, Estela trained a neural network to recognize patterns based on all the examples in the database and their respective tags. So it would look for similarities between sites labelled “fake news,” or “extremely left wing,” and began to learn the patterns associated in subject matter, language, and patterns of speech. Once it had enough examples, the model was able to look at previously unseen sites and make a judgement call on a number of categories. Here is part of the results for Alex Jones’s InfoWars site, for example:
Estela told me his tool’s results shouldn’t be considered the final verdict on any site, he likened it more to asking a trusted friend who you consider relatively objective for their opinion. David Carroll, an associate professor and the director of the design and technology program at Parsons, agreed.
“Several sites I tested I found to be represented fairly accurately, but others didn't meet my expectations.” Carroll told me via email. “For example, it didn't classify sputniknews.com as propaganda, but given that the US government required the company to register with the Foreign Agents Registration Act, you might expect the site to be measured accordingly by the analytics.”
Carroll said the project is “very rough,” but added that it has “some promise at establishing a rubric to assess sites based on comparable criteria and signals. The fact that the code is open source means that the analytics can be somewhat scrutinized by critics and researchers.”
Though it’s not perfect, it’s attempting to set up a useable metric for recognizing patterns and raising potential red flags for news consumers who don’t want to be duped. Of course, some of the categories will matter more or less to you depending on your worldview, which is why Estela made an effort to display all the results without judgement.
“It’s kind of tricky, right?” Estela said. “At some point you have to translate human subjectivity, such as political bias, into something that can be mathematically modelled by a computer. Then you have to translate it back into the human realm, and all with the intention of not introducing any new bias.”
Similar computer learning models have been introduced already, including a text-analysis tool that detects fake news, browser extensions that flag sites previously identified as fake, and a private service for advertisers that makes sure their ads appear on legit sites. But this is one of the first examples of an easy to use, audience-focused tool that lets readers take back a sense of control. It’s a bit like what fact-checking sites such as Snopes did in the early days of the internet, when email chain scams (remember Bonsai Kittens?) were the trolls’ mode du jour and verifying information was harder than it is now.
Maybe, as more of these resources crop up, researchers share tools and codes, and users can cross-check across many platforms, we’ll finally start seeing a bit less propaganda shared in our social media feeds. At least we don’t get those email scams any more.
Get six of our favorite Motherboard stories every day by signing up for our newsletter .