Remember all those hours you spent in the library while in school, poring over scholarly articles and journals for a paper that was due the next day? Didn’t you suspect that many of them were full of nonsensical technical jargon? Well, here’s a funny story…
Science and tech publishers Springer and the Institute of Electrical and Electronic Engineers (IEEE) removed over 120 research papers from their databases earlier this week, after discovering that the articles were randomly, and entirely, computer-generated.
French computer scientist Cyril Labbé spent two years compiling the list of articles, all of which had successfully found their way into over 30 published international conference proceedings.
Labbé developed a formula to detect papers composed using SCIgen, a free and downloadable piece of software written in 2005 by three MIT graduate students, which automatically generates full research articles at the click of a button — complete with a title, abstract, body, graphs, citations, and bibliography.
Papers generated by SCIgen are designed to appear legitimate at first glance, but are 100-percent fake. According to the description on the program’s site, the aim “is to maximize amusement, rather than coherence.”
The science journal Nature reported that Springer featured at least 16 SCIgen-produced papers in its publications, and IEEE published over 100 such papers.
Titles on Labbé’s list of fakes include “Application of Amphibious Technology in ReutoMail,” “Analysis of Impact of Highly-Available Archetypes on Robotics,” and “A New Method For the Visualization of Byzantine Fault Tolerance.”
The titles may appear genuine to the untrained eye, but SCIgen co-creator Jeremy Stribling told VICE News that the fake papers would stick out like a sore thumb to anyone familiar with computer science.
“It should be obvious to anyone” in the computer science community “that the papers are nonsense after reading only a few paragraphs,” he said.
So, how the hell did these fake papers not only get published in credible databases, but also make it through the submission process for international computer science conferences?
Labbé told VICE News that he suspects that the field of computer science is engulfed in a “spamming war” due to the extreme pressure on scientists to create and publish conference-worthy papers. He doesn’t know how or why the fake papers were created and submitted, or if the alleged “authors” (many of whom are actual academics) were even aware that their names were used. He attempted to contact authors and conference organizers about the fraudulent papers without success.
Stribling isn’t surprised by Labbé’s revelation, and noted that it’s not uncommon for some conferences to accept completely un-reviewed submissions. He explained that SCIgen’s original purpose was to test how high (or low) the bar was for submissions to the WMSCI Conference. It’s possible that this batch of scientific balderdash was submitted by scientists or graduate students to similarly highlight the shortcomings of conference organizers.
After the publication of Nature’s report, Maxwell Krohn, one of Stribling’s collaborators on SCIgen, explained the rationale behind the program in an online hacker news forum under the username “maxtaco.”
“At the time, there was an arms race within the systems community to see who could flip a bigger bird to the organizers of the SCI spamference,” he wrote. “We're pumped that SCIgen is still a useful weapon against charlatans the world over (this now includes you, IEEE and Springer).”
VICE News contacted Springer spokesperson Alexander Brown as the company was pulling the suspect articles and beginning an investigation into how they were published.
“We’re using manpower and detection programs in order to sift through all of our content,” Brown said. “We do publish 2,200 journals and 8,400 books annually, so it’s going to take some time.”
IEEE communications director Monika Stickel told VICE News that IEEE “took immediate action to remove those papers, and also refined our processes to prevent papers not meeting our standards from being published in the future.”
Stickel was vague when asked what refinements, exactly, IEEE had done to its processes.
“We continue to follow strict governance guidelines for evaluating IEEE conferences and publications,” she said, “including serving a leadership role in sharing best practices and implementing new procedures that ensure that the highest quality possible content is produced for our attendees, members, and volunteers.”
Stribling said the incident confirms what many have suspected for years.
“The exorbitant fees that these organizations charge for subscriptions from universities in order to access these ‘peer-reviewed’ journals and proceedings are a complete rip-off,” he said.
Stribling thinks that universities should stop supporting these organizations, and that researchers should practice great caution when submitting their work to prospective conferences and journals.
SCIgen isn’t the only problem. There are at least two clone versions of SCIgen — one for physics and one for math — and Labbé noted that there are likely other random text generators being developed.
Is this the beginning of a campaign of misinformation that will debase our collection of scientific research, or merely the tomfoolery of guys who spend a little too much time in front of computer screens?
VICE News isn’t sure. We’re still trying to figure out what the Byzantine Empire has to do with computer science.
Image via Flickr