Time takes its toll on our bodies—your metabolism is slowing, your face is slowly lining and sagging, even Kobe Bryant’s knees are giving out. But the great devourer and healer also takes its toll on humanity’s collective body of research. Once published, our research data fades precipitously.
Looking at a random set of 516 studies published between 1991 and 2001, researchers from the University of British Columbia found that the data was reliably around for two years following publication. For each year that followed the odds of the data set being around dropped by 17 percent.
Some kinds of time-based erosion makes sense. The loss of 70 percent of America’s silent films can be attributed to the fragile nature of the films and apathy from the studios that made them. They just didn’t know that what they had was valuable.
But that doesn’t seem like an adequate reason for scientific data to disappear. The people who collected and published it are alive, and certainly at least they knew how difficult it was to get and knew its value—they filled out the damn grant applications, after all. And data sets are easy to store—you don’t even need to put them into a filing cabinet anymore! Just stick ‘em on a Zip disk and it’ll be there and waiting for as long as we have Zip disks.
And therein lies the problem.
The main obstacles to data retrieval were obsolete storage devices and email addresses going rotten on the vine—the odds of being able to reach the lead or corresponding author dropped 7 percent a year as well.
The study, published in Current Biology, concluded that the storage of research is too important to leave to researchers. Other studies have found that researchers can be reluctant to share their data, so the UBC team suggests that "policies are needed" to mandate that research is moved to public archives.
Of course the question still remains how to store data in those public archives. Motherboard’s Meghan Neal found that magnetic tape is still more reliable than hard disks, which also have a surprisingly short life, even when they’re working as cloud servers.
Tim Vines, one of the authors on the paper, told The Telegraph that “Losing data is a waste of research funds and it limits how we can do science. Concerted action is needed to ensure it is saved for future research.”
Might I suggest, imprinting data sets on tungsten then?