21 Terabytes of Open Source Code Is Now Stored in an Arctic Vault

Other artifacts stored in the archive include manuscripts from the Vatican Library and masterpieces from the National Museum of Norway.
July 17, 2020, 2:08pm

For those worried about preserving the heritage of open source coding for future generations, rest assured, a deposit of GitHub’s public repositories has made it safely to the Arctic World Archive.

According to a GitHub blog post, the code was successfully deposited on July 8, 2020 to the Github Arctic Code Vault—a data repository preserved in the Arctic World Archive, a facility for data preservation in Svalbard, Norway. The effort is part of the Github Archive Program and is Github’s second deposit made through the program.


“Our mission is to preserve open source software for future generations by storing your code in an archive built to last a thousand years,” GitHub wrote.

According to the blog post, a snapshot of all the active public repositories on GitHub was taken on February 2, 2020, resulting in 21 TB of repository data. In GitHub, a repository is used to organize a project and contains all the folders and files needed for the project to run.

This data was then written onto 186 reels of piqlFilm—a digital photosensitive archival film. According to a spokesperson for Piql, the makers of the film, the technology is a completely self-contained medium and any files stored on it will be recoverable in the future regardless of available technologies.

“All information required to recover stored information is written on the film itself in human readable text, along with file specifications and source code for the retrieval software,” the spokesperson told Motherboard in an email.

This is good news for advocates of open source coding who want to preserve a snapshot of all of Github’s public repositories. GitHub says on its website that it has over 50 million users and more than 100 million repositories.

“As today’s vital code becomes yesterday’s historical curiosity, it may be abandoned, forgotten, or lost,” Github wrote on its website. “Archiving software across multiple organizations and forms of storage will help ensure its long-term preservation.”

Every reel in the archive will include a guide in five languages. Information documenting the technical history and cultural context of the archive will be included as well.

Partners with GitHub in the Github Archive Program include the Internet Archive, Software Heritage, and Project Silica.

According to its website, the Arctic World Archive was established in 2017 and holds a collection of digital artifacts and information from over 15 contributing countries. The archive is located in the permafrost of an arctic mountain in the Svalbard archipelago and is designed to withstand natural and man-made disasters.

Other artifacts stored in the archive include manuscripts from the Vatican Library and masterpieces from the National Museum of Norway.