This story is over 5 years old.

Automated Journalists Will Bury the Lede

Algorithms for BuzzFeed and the Associated Press are creating data dumps for readers to make sense of.
Image: Shutterstock

BuzzFeed just automated its first listicle, and the Associated Press is starting to use software to automatically write certain business articles. Algorithms and humans are now officially working side-by-side in the newsroom. It's also going to usher in big data dumps that force (or allow, depending on how you look at it) the reader to find the story for themselves.

The move to automation has been pretty seamless so far. Journalists have long relied on software to help with their writing—spellcheck and autocorrect have proven particularly useful in this story already—but now, we're using software to do the writing for us. The Los Angeles Times infamously used a robot to break news of an earthquake (and has since done the same with many other quakes and also with homicides in the area). Or, in the case of BuzzFeed, to compile a list.


It's an entirely new type of journalism, according to Jeremy Singer-Vine. Hailing from the Wall Street Journal's investigative reporting team, he's recently been hired as BuzzFeed's data editor. The plan, he says, isn't to replace humans or to create viral lists (like the company might be trying to do in its partnership with Whisper), but to create data dumps that readers can sift through and take from it whatever they want. That much is clear in his first experiment—a list of 275 ways Americans hurt themselves playing with fireworks.

It's not your average BuzzFeed list: There's no snark, there are barely any photos, and there's hardly any formatting. Most importantly, there's that number—275 patients. It's not easily digestible for the reader, and it would be time consuming for a reporter to go through the official US Consumer Product Safety Commission report to grab a random sampling of 275 incidents. But for a computer, it's a piece of cake.

"I had come across the data this morning, so I wrote an algorithm (the code for it is available here) to see if we could do something fun with it," Singer-Vine told me. "I think it's something people like seeing, primary sources like these—the rawest form of these stories … I wanted to do this first one with as little human intervention as possible."

Basically, it's a data dump.

The difference between what Singer-Vine did for this story and what he's done for countless other ones is just that—its rawness. Data-assisted journalism has been around since, well, forever, it's just that the data has usually taken a backseat to the words reporters use to describe it. Now, instead of an analysis of a database, you're getting the database itself.


Similarly, the AP announced that its robots won't replace reporters (not yet, at least), but will instead be used to churn out tons of content about company earnings reports. The AP says it'll be able to exponentially grow the number of stories it writes about businesses, from "300 stories manually," to "up to 4,400 automatically for companies throughout the United States each quarter."

Expect those automated stories to have buried ledes, from time to time.

Likewise, Singer-Vine says that his team, at least, will be focused on using data to tell stories, not push out thousands of computer-generated lists (though, they'll be doing a bit of that, too).

"For me, it's less about producing more content and more about producing different types of stories," he said. "I'm looking at the Twitter feed, and seeing people find things that I didn't pick up myself. I'm glad we posted them all. Paring them down [to a smaller number] wouldn't have been the worst thing in the world, but it looks like people are enjoying it even without it being trimmed."

That's for the first one. But do readers always prefer more content instead of stuff that's carefully curated? We'll see. In the Associated Press' case, all of its automated stories will be clearly marked with a line that says it was written by a robot. In BuzzFeed's case, there was no indication that a robot wrote the story. Singer-Vine mentioned the algorithm on Twitter, which is how I ended up seeing it. Are readers going to be cool with not knowing whether their story was written by a human or compiled by a robot?

And that's where we are right now. I don't know if robots will ever be able to write as well as humans, but BuzzFeed and the AP are both looking at using their algorithms to augment reporting, not replace it. If it's up to the readers themselves to find the story buried in a mountain of numbers, then so be it.