Motherboard Made a Tool That Archives Websites on Demand

May 1, 2018, 11:19am

Archiving services, such as the Wayback Machine, may be a staple of online journalism, but they sometimes have a problem. While, say, Archive.is might preserve one particular webpage, perhaps the Wayback Machine can’t, depending on what sort of restrictions the website developer has put in place. For example, someone stopped copies of MSNBC host Joy Reid’s blog, which hosted a stream of homophobic comments, from displaying in the Wayback Machine.

With that in mind, I made a tool that can push a single webpage or URL to multiple archiving sites at once, and fire back the newly minted digital copies in response. Hopefully it will help reporters and researchers more efficiently figure out which service will work best for that particular site.

Videos by VICE

Called mass_archive, the tool is made in Python and is used on the command line. Just download the script, open a terminal where the script is stored, and type “python mass_archive.py example.com”, with example.com being the page or site you want to target.

Got a tip? You can contact this reporter securely on Signal on +44 20 8133 5190, OTR chat on jfcox@jabber.ccc.de, or email joseph.cox@vice.com.

If the script isn’t successful at getting an archive up on the Wayback Machine, it’ll just move onto Archive.is, and then Perma.cc, another archiving service. For the last one you’ll need to create a free account, which gives you 10 archives a month. Perma.cc also records which account archived a particular page. If that sounds like a pain, just take the Perma.cc parts out of the script in a text editor. (The bit that handles requests to Archive.is uses a module from Past Pages, an open-source effort to archive the news; thanks to Past Pages for that).

I’ve already tried the tool on sites that have tried to avoid being archived, such as malware companies, and mass_archive did the job. If you think mass_archive might be useful for you, feel free to use it.

*Each of these three services have their own rules about what websites can be archived and for what purposes; you must comply with their terms of services as well as the terms of any website that you’re archiving*

The Github page for the tool is here, with some installation instructions and other requirements.

Tagged:
Motherboard, python, Tech, Wayback Machine

More
From VICE

Screenshot: YouTube: IShowSpeed, Epic Games

IShowSpeed Fortnite Skin Reportedly Leaked, Including Release Date and Cosmetic Items

3 hours ago

By Brent Koepp
Screenshot: Warner Bros. Games

Lego Batman: Legacy Of The Dark Knight – How To Unlock Early Access

4 hours ago

By Denny Connolly
Tim Roney/Getty Images

How The Pogues Responded to Censorship of Their Hit Song ‘Fairytale of New York’: ‘Times Change’

4 hours ago

By Lauren Boisvert
Screenshot: Ubisoft

Everything Worth Buying in the PlayStation Holiday Sale

5 hours ago

By Denny Connolly

VICE
Editions

Motherboard Made a Tool That Archives Websites on Demand

Videos by VICE

D’Angelo Explains How the Music Industry Caused Delay in ‘Black Messiah’ Release

Travis Scott’s Cactus Jack Foundation Teamed up With NASA for a New Youth Program

Cyberpunk 2077 Sequel Leak Claims Release Date Is Years Away

More Than 20 Skeletons Found Under the Tower of London, Including Plague Victims

Best THC Drinks of 2025: Weed-Infused Seltzers, Tonics, and Cocktails That Hit Just Right

Kylie Minogue Has the No. 1 Holiday Song in the U.K., While Mariah Carey Still Reigns in the U.S.

More
From VICE

IShowSpeed Fortnite Skin Reportedly Leaked, Including Release Date and Cosmetic Items

Lego Batman: Legacy Of The Dark Knight – How To Unlock Early Access

How The Pogues Responded to Censorship of Their Hit Song ‘Fairytale of New York’: ‘Times Change’

Everything Worth Buying in the PlayStation Holiday Sale

Videos by VICE

MoreFrom VICE

Add your account details

More
From VICE