The Data Hoarders
GIF by the author. Video: PushyPixels


This story is over 5 years old.


The Data Hoarders

In an era of mega-breaches, digital packrats are amassing, swapping, and sourcing leaked passwords and other personal information like any other collectible.

More people than you'll ever know have your online passwords. The underground trade in stolen data isn't just for hackers looking for a payday, but for passionate collectors too—individuals who build up billion-strong archives of website credentials, voting details, physical addresses, and many more pieces of personal information on people all over the world.

These reams of data, which are often pooled together from hacked sites such as LinkedIn and MySpace, could be used for breaking into accounts or provide a helpful list of contacts for spammers. But certain netizens just amass, swap, and source them like any other collectible.


They're data hoarders.

"I gotta be pushing upward 1 billion lines," avid data collector David (not his real name) told me in an instant message. David started collecting a couple of years ago, when he was introduced to a group of like-minded hoarders who gave him 50 files to get going.

"50 turned into 100, 100 turned into 200, that turned into 500, and still growing," said David, who considers data an asset. "Once you start getting money, you want to make more. Once you start getting files and data, you want more."

These collectors may haunt the same forums as people who hack websites to steal information, but they have their own communities for sharing and obtaining data too. Some give parts of their archive away for free, building up trust with others.

"Being the first to show generosity can go a long way in establishing those relationships," David said. When a site gets hacked, friends often send him a new dump.

Others might trade the personal info of tens of millions of people like baseball cards, and swap their social media site for a sought after forum.

"My dumps are kinda theirs; theirs are kinda mine," another collector who shares his resources told me.

If you have the right connections, are kind, and don't rip people off, it's possible to build up an impressive portfolio.


We are in the era of so-called mega breaches. The ensuing data dumps are so staggeringly large that it's easy to lose perspective that the contents of each of those hundreds of millions of records may represent a person, and grant access to some of their most private communications or digital spaces.


"What is considered as a 'large data breach' has changed drastically as a result of all of these huge sites that have been leaked this year," Keen, the pseudonymous creator of breach monitoring site and an experienced data collector, told me.

Over the past few weeks, previously unknown, and truly massive, data breaches have come to light. MySpace with 360 million full records. LinkedIn had 177 million hashed passwords. A strange cache of 33 million Twitter user's login credentials surfaced. Hackers sold 100 million plaintext passwords for (Russia's Facebook), and a dump of 127 million credentials, many of them belonging to users of dating site Badoo, too. 65 million hashed passwords from Tumblr popped up as well.

In other words, a dizzying amount of user details have been swapped and sold recently.

"I collect whatever I don't think is trash"

Of course, these latest dumps only build on the decades old trade of stolen data which has affected countless websites and services. Online stores, porn sites, media outlets—every type of site you can imagine has been hacked. And these only include the breaches that we know about.

Keen has over 1,000 databases in all, he said. Many are relatively small—totaling at hundreds of thousands of records—such as gaming forums or niche sites, and some stretch back nearly ten years. Keen said he knows a handful of people with extremely large collections, probably consisting of around five of six thousand individual databases. Many people with that much data likely hack sites themselves too, he added.


A third data collector told me he has around 5 terabytes of data, including a wealth of criminal forum dumps, as well as larger, richer breaches, such as that of extramarital affairs site Ashley Madison. (For context, the database of 100 million records clocks in at 17GB, just 0.3 percent of his total collection).

"I especially love dumps of controversial sites, companies, etc. Like HT [Hacking Team], underground forums, anti-piracy groups," he told me in an online message. Some of these dumps don't necessarily contain any user details, but thousands of emails which provide insight into a company's workings, the collector pointed out.


Indeed, it's curiosity that might motivate data hoarders to seek out new breaches. They may not use the data to break into accounts, as hackers recently did with a smattering of celebrity's Twitter accounts, or sell it off for a profit. But digging through their vast archives can have its own rewards.

Some collectors are interested about the popularity of certain passwords on mainstream sites, how many emails addresses belonging to a certain country are in a database, or just seeing first hand the effect of sloppy internet security. Others use the troves as a personal look-up service, able to return information on someone they might know.

"If I want to find who somebody is, I will use it. It's like my personal XKeyscore," David said, referring to an NSA tool that acts as a Google for mass surveillance programs. For this reason, David prefers much more general datasets, those that might include details on the greater population, such as the recently reported, albeit long-traded dataset of child gaming site Neopets, rather than some obscure hacking forum.


"I collect whatever I don't think is trash," David said. He might not stockpile records from the popular Chinese forum Tianya, for example. When he is collecting, David is generally "thinking about broadening my archive of personally identifiable information," he added.

Although it may not constitute hacking, the data can be used for more aggressive purposes.

When Keen started picking up databases, he and a group of other collectors would use the information to intimidate those who ripped others off.

"We would show the scammers that we had their personal info, including their names and that we would release it unless they gave their victims their stuff back," he said. That group disbanded and Keen moved onto trying to inform people about breaches instead. Nowadays, he said he very rarely actually looks into the data itself.

And yet he keeps it anyway, on the off chance he might need it in the future and because it took him so long to accumulate it all. Throwing his data archive away would be like getting rid of a year's work, he said.

The third hoarder I spoke to, the one with 5 terabytes of data, said much the same thing. "Will I ever stop collecting? Maybe one day—more likely than deleting my current ones."

"I'm a packrat," he added.