On Wednesday, an international team of scientists published the first image of a black hole ever. It looked like a SpaghettiO, and yet the image was an incredible scientific achievement that gave humanity a glimpse of one of the universe’s most destructive forces and confirmed long-held theories—namely, that black holes exist.
Storing the raw data for the image was a feat itself—tiny portions of data spread across five petabytes stored on multiple hard drives, the equivalent of 5,000 years worth of MP3s. Katie Bouman, a computer scientist and assistant professor at the California Institute of Technology, led the development of the algorithm that imaged the black hole. An image of her posing with some of the data drives went viral as observers praised her success.
On Reddit’s /r/datahoarder subreddit, a community dedicated to spreading the passion of hoarding vast amounts of data, the drives were bigger news than the scientific achievement itself.
“Black holes are cool I guess, but imagine all the remuxes [lossless Blu-Ray files] you could store on those bad boys, in RAID 1 no less,” Redditor “PretentiousJackass”
commented on a picture
of Bouman displaying the hard drives containing the black hole image data.
“RAID” mean redundant array of independent disks, and refers to a method by which data is copied. RAID1, a specific subset of RAID, is a method of perfect mirroring, where the data is copied identically onto another drive without being deleted on the first drive. So if the copying fails, the original data is untouched.
Other commenters had questions about how Bouman and her colleagues kept track of the massive amount of data coming in from all over the world.
“As someone who's subbed for Big Data…all I can think is how are those striped [storing data across multiple drives to speed up a transfer] and checksummed? [checking for errors in a transfer] Are those individual RAID6 pods? Is it then JBOD [just a bunch of disks] for each pod? And most importantly, where can I get a storage rack that holds four of these across?" Redditor “postmodest” wanted to know.
The massive amounts of data were essential to creating the image of the black hole. Bouman and other scientists coordinated radio telescopes all over the Earth, each pointed at the black hole and gathering data at different times. The data scientists then pieced this information together and used an algorithm to fill in the blanks and generate a likely image of the black hole.
The five petabytes of data took up such a massive amount of digital and physical space it couldn’t be sent over the internet. Instead, the hard drives were flown to processing centers in Germany and Boston where the data was assembled.
“Getting this first picture will come down to an international team of scientists, an earth sized telescope, and an algorithm that puts together the final picture,” Bouman said in a TED talk she gave ahead of the photo’s capture, where she explained the process.
Others were less articulate and more, ahem, passionate about the sheer amount of data storage on display. “This post should be tagged NSFW,” /r/datahoarder poster “voyagerfan5761” said.
Get six of our favorite Motherboard stories every day by signing up for our newsletter.