Microsoft Used Machine Learning to Make a Bot That Comments on News Articles For Some Reason

The social internet has a bot problem. Fake accounts plague Twitter and Facebook, and content designed to misinform readers has become an issue that’s drawn the attention of Congress.

This difficult and growing problem hasn’t stopped a team of researchers from creating an algorithm that can parse news stories, then bicker with real humans in the comments section.

Videos by VICE

Engineers at Beihang University and Microsoft China developed a bot that reads and comments on online news articles. They call their model “DeepCom,” short for “deep commenter.”

According to the research, the system is made up of two neural networks: a reading network that “comprehends” an article and picks out important points, and a generation network that writes a comment based on those points and the title of the article. The system is based on how humans typically consume news online: We’re likely to read the headline, take away a few key points, and write a comment based on the points that are most interesting to us—or support or contest them based on our own personal views.

DeepCom does the same thing, but automatically. And its creators intend for it to be used to encourage human commentary on articles, to drive more people to read them and engage with the content.

There are two versions of the DeepCom paper online. One was published to the arXiv pre-print server on September 26, and another on October 1. The earlier version of the paper cites a potential use for DeepCom and comment sections in general: “…encouraging users to browse the comment stream, to share new information, and to debate with one another. With the prevalence of online news articles with comments, it is of great interest to build an automatic news commenting system with data-driven approaches.”

Essentially, the paper is suggesting that a system that automatically generates fake engagement and debate on an article could be beneficial because it could dupe real humans into engaging with the article as well.

This statement is left out of the newer version. In its place, the researchers write that they realize there may be risks involved with an AI that pretends to be a human and comments on news stories.

The example they give in the paper is very benign: After reading a news article about FIFA rankings, DeepCom sends two comments. One says, “If it’s heavily based on the 2018 WC, hence England leaping up the rankings, how are Brazil at 3?” The other comments, “England above Spain, Portugal and Germany. Interesting.”

Over the last few years, fake accounts and botnets—interconnected systems of bots working together—have become an epidemic for social media platforms like Twitter and Facebook. On Twitter, fake accounts with stock-image profile photos ran rampant, following each other in the hundreds of thousands and tweeting political propaganda. In May, Facebook removed two billion fake accounts, thousands of which were pushing political views and disinformation.

A spokesperson for Microsoft’s research team told Motherboard that the team addresses the risks in the paper. But in the latest version of that paper, from October 1, the most that it acknowledged about risks is that they exist—not what the researchers plan to do to avoid potential harms caused by fake commenters.

“We are aware that numerous uses of these techniques can pose ethical issues and that best practices will be necessary for guiding applications…. There is a risk that people and organizations could use these techniques at scale to feign comments coming from people for purposes of political manipulation or persuasion,” the researchers write in the updated version of the paper. This statement isn’t in the earlier version. “In our intended target use, we explicitly disclose the generated comments on news as being formulated automatically by an entertaining and engaging chatbot.”

Basically, the researchers say that the system they propose should clearly label that the automatically generated comments are automatically generated. But this wouldn’t stop someone from adapting the code into their own bot commenter with less transparency.

A commenter-bot like DeepCom also wouldn’t be immune to the biases that affect all AI systems. DeepCom is trained on two datasets: a Chinese dataset made by crawling Tencent News, a popular Chinese websites of news and opinion articles, and an English dataset built by crawling news articles and comments from Yahoo News. Both of these mix in opinion and editorialized pieces with journalism, and as all news articles are (currently) written by humans, they’re all inevitably biased in some way. The paper doesn’t address issues of potential bias in the data.

“While there are risks with this kind of AI research, we believe that developing and demonstrating such techniques is important for understanding valuable and potentially troubling applications of the technology,” the researchers write in the latest version of the paper.

Whitney Phillips, an assistant professor of Communications, Culture & Digital Technologies at Syracuse University who studies internet trolls, argued that like other technologies such as deepfakes, risks to vulnerable communities are often an afterthought. And sometimes, experiments can be biased toward certain results depending on which questions the researchers seek to answer.

“[H]ow we ask and answer questions (what even occurs to us as a question worth asking) has everything to do with where we’re standing in the world,” Phillips told Motherboard. “Some risks just don’t occur to some people, because they’ve never had to think about that or worry about those things impacting them.”

Correction Oct. 7, 2019 at 12:50 p.m.: This article originally said that the source code for DeepCom was shared on Github. DeepCom had a project page on Github, but the source code was never shared external to Microsoft. Motherboard regrets the error.