The Internet Archive is a repository of terabytes of data from thousands of CDs and floppy discs, and some of it can be hard to sort through. Files in the archive could contain anything—music, text documents, ancient memes, and old flash animations—and until recently, the only way to figure out what data was on these ancient CDs was to download it and pray you had the software to render it into readable material. DiscMaster just changed all that.
DiscMaster is a new website that is sifting through the CDs and floppy disks in the Internet Archive and making it all into a searchable database. Even more incredibly, it’s taking all of the old file formats and making them viewable in a browser. As of this writing, the archive represents more than 7,000 CDs and 11 million files.
“For a specific group of people this will revolutionize their relationship to the archive,” Jason Scott, an archivist at the Archive and spokesperson for DiscMaster, told Motherboard. “This will be an endless font of information. This will be the biggest thing I’ll work on this year.”
Scott said that DiscMaster is a labor of love from an Archive fan who made contact through Discord. They’d been working on DiscMaster for 18 months when they finally put out the call for help. Scott said he was blown away. “The program is pulling apart every archive,” he said. “It is generating easy to use programs that can preview the material easily.”
One of the most difficult parts of looking through old files is the formats. In the early Wild West days of the online world there were no standardized file formats, no set way to render a video, no agreed upon audio codec, and no single way to render text. Looking through old files requires you to identify these ancient formats and figure out a way to render them in a modern browser.
DiscMaster does all that for you and it works in both modern and legacy browsers. Scott said that means someone on an old Commodore 64 with a browser can surf to DiscMaster and view old files without any hassle. And anyone using the latest version of Chrome can view the same file without much trouble. “This thing is a beast,” he said. “It’s 11 terabytes of data right now.”
Scott likes DiscMaster for a lot of reasons, but a big one is that it’s a blow against skeptics who’ve said that no one will ever look through all the archive's material and that it's too difficult to access. Now there’s a tool that’s sorting through thousands of CDs, organizing the data, and making it viewable to a wide audience. “For the group that cares, they will care very deeply about it,” he said.
When Scott did a soft launch of DiscMaster on Sunday, the website had 70 views. On Tuesday it crashed with around 40,000 views. As of this writing, it’s been viewed around 84,000 times. We know this because of an old-school view counter that’s sitting on the front page. “I knew this thing was great when they added the counter,” he said.
The program is slowly working its way through every CD and floppy disc in the Archive, expanding its database as it goes. “It takes a while for the program to grind over a CD-Rom,” Scott said. Depending on the size and type of the files on the CDs, the program can take several hours to sort the data and make it viewable online. According to Scott, the plan is to use the program to sort through old AOL and FTP content sitting unsorted in the Archive as well.
Scott noted it’s possible that personal and private information could be buried in the CDs and inadvertently published. “There’s only so much you can precheck with 93 million files and counting,” he said. But he promised that anyone who reaches out to the archive can have that personal or sensitive information pulled down. It was one of the first features he requested DiscMaster add, he said.
DiscMaster is an incredible tool for archivists, historians, the curious, and people looking for half-remembered media or work they thought was lost to time. Scott said he found some old songs from the 1990s he thought he’d lost buried in the Archive thanks to DiscMaster. “I encourage everyone to do an ego search,” he said. “If you thought your work was lost, you may be shocked to discover what’s been saved.”