Data Hoarders Are Plotting an Archive of Tumblr Porn

It’s not the first time Reddit’s data hoarding community has saved content that is about to be purged from the internet, but this archive carries some risk.

|
Dec 8 2018, 1:00am

Image: Shutterstock; Composition: Motherboard

On Monday, the blogging platform Tumblr announced it would be removing all adult content after child pornography was discovered on some blogs hosted on the site. Given that an estimated one-quarter of blogs on the platform hosted at least some not safe for work (NSFW) content, this is a major content purge. Although there are ways to export NSFW content from a Tumblr page, Tumblr’s purge will inevitably result in the loss of a lot of adult content.

Unless, of course, Reddit’s data hoarding community has anything to say about it.

On Wednesday afternoon, the redditor u/itdnhr posted a list of 67,000 NSFW Tumblrs to the r/Datasets subreddit. Shortly thereafter, they posted an updated list of 43,000 NSFW Tumblrs (excluding those that were no longer working) to the r/Datahoarders subreddit, a group of self-described digital librarians dedicated to preserving data of all types.

This is not an exhaustive list of Tumblrs that host adult content, but rather consists of all the NSFW Tumblrs that have been posted to Reddit over the past seven years. Using these web addresses for NSFW Tumblrs, the data hoarders can in principle scrape the content from each of the blogs to preserve on their own hard drives.


Watch: Japan’s Female-Focused Porn Industry


Tackling a preservation project of this scale is a well worn path for the data hoarding community. Sometimes, such as in the case of the data hoarder who tried to build an unofficial archive of all of Instagram, their data preservation feats are just to push the limits of what’s possible. In other instances, however, these data hoarders work to preserve content, such as Alex Jones’ YouTube channel, that is at risk of being purged from the internet.

The Tumblr preservation effort, however, poses some unique challenges. The biggest concern, based on the conversations occurring on the subreddit is that a mass download of these Tumblrs is liable to also contain some child porn. This would put whoever stores these Tumblrs at serious legal risk.

Still, some data hoarders are congregating on Internet Relay Chat (IRC) channels to strategize about how to pull and store the content on these Tumblrs. At this point, it’s unclear how much data that would represent, but one data hoarder estimated it to be as much as 600 terabytes.

Trying to preserve the blogosphere’s favorite nude repository is a noble effort, but doesn’t change the fact that Tumblr’s move to ban adult content will deal a serious blow to sex workers around the world. Indeed, the entire debacle is just another example of how giant tech companies like Apple continue to homogenize the internet and are the ultimate arbiters of what can and cannot be posted online.

This article originally appeared on Motherboard.

More VICE
Vice Channels