Tech

Archivists Are Saving the History of Internet Piracy

The goal is to create a catalog of more than 5 terabytes of searchable, piracy-related metadata.
GettyImages-996405320
Image: Getty Images

An ongoing project to catalog the history of piracy has just topped the 300 gigabyte mark, with a goal of offering a searchable index of more than 5 terabytes of piracy-related metadata once complete. It is now the largest searchable index of piracy metadata in the history of the internet.

The “warez” scene is an old as the internet itself. From the earliest days of BBS (bulletin board systems) to the rise of BitTorrent, the piracy community is as vibrant as any on the internet. From the ASCII and other art included in the .nfo files that accompany group releases, to piracy group logos and brands, there’s decades of residual data documenting the rise and fall of an ocean of different groups and subcultures that might otherwise be lost to the sands of time. Enter The Eye: a pet project of a man who calls himself the Archivist, whose obsession with cataloging the ever-shifting, impermanent history of the internet has ranged from archiving a petabyte of porn and the entirety of Instagram to preserving 80 gigabytes of old Apple videos deleted by YouTube. The Archivist told Motherboard his efforts are funded entirely by community donations. “These files contain unique artworks, information about scene groups, the trials and tribulations of those groups be it inter-personal feuds, issues with being raided by law enforcement, law trying to infiltrate the groups, how the groups acquire media, how they crack games and software, how the work on early movie releases to get then looking the best they can for wider release and so on,” he said. “Without archives like this so much history of a huge online world vanishes and that's simply not acceptable,” he added.

As with the YouTube metadata and Instagram archives, the Archivist says the biggest obstacle to tracking and cataloging such content is the sheer volume of files involved. The initial release of this latest piracy dataset included upwards of 13,000,000 files, and while the total size of this metadata was just under 400 GB, organizing them remains extremely time consuming.

“The most recent milestone in this endeavour came only yesterday, when I finally finished the unpacking and compressing of 4,000,000+ SRR files from srrdb.com—a site which now tries their best to thwart scraping,” he said. “This addition comes in at a fair 1.2TB and is still not everything from the site; I've yet to grab files released after February of this year.” Other internet archival efforts tend to get far more attention, in large part because the press treats piracy as the black sheep of the internet. When piracy is mentioned, it’s usually portrayed by analysts and the media exclusively as a nefarious, irredeemable phenomenon. But it’s not that simple. Data suggests that piracy is better viewed as an expression of consumer dissatisfaction. Studies indicate that piracy can often act as a form of “invisible competition,” prompting everyone from the cable TV sector to the video game industry to try a little harder, be it offering better streaming TV services, or backing off obnoxious DRM or monetized DLC. Regardless of one’s opinion on piracy itself, the surrounding internet subculture’s long history—estimated to have begun somewhere around 1975—is well worth preservation. That said, piracy isn’t the only thing the Archivist and the folks at The Eye are working on.

“Recently YouTube is back at their bullshit again removing or forcing the mass removal of content as well as straight up undeniable censorship of opinion, so myself and friends at The-Eye are working on a service to take care of this issue to the best of our underfunded abilities,” he said. The fruit of those efforts should pop up next week, when the website is expected to release more than 10 billion YouTube video metadata files, and the launch of a new service that should allow the public to lend a hand in organizing vast troves of data.