This story is over 5 years old.


The Twitter Archive at the Library of Congress Won't Actually Be Very Useful

An awesome but mostly uptappable data gold mine.

A couple years ago, everybody lauded the Library of Congress when it announced it would be archiving all of the tweets in the world in one convenient place. And who could argue that storing hundreds of terabytes of citizen-created data is a praise-worthy undertaking for a stodgy government agency. Just think of all the good things we could do with that tweet data! Cure diseases, predict weather patterns, understand politics—the possibilities are endless. At least they would be if the Library of Congress could now figure out a way to let people actually access the data.


They're having some trouble on that front. The Library of Congress released a white paper this week with an update to its tweet-archiving project. The good news is that they've been very successful at archiving tweets. They've been less successful, however, at giving people access to those tweets. Citing a slew of technical difficulties, the paper says that the "Library has not yet provided researchers access to the archive [despite] approximately 400 inquiries." Not that those researchers would've been able to get anything done. Essentially, the tweets aren't searchable, since a single inquiry "could take 24 hours," according to the Library--which means they're sort of useless--and to make them searchable "would require an extensive infrastructure of hundreds if not thousands of servers." Which is expensive.

Don't be too hard on the Library on Congress, now. We're talking about a lot of tweets. There are already 170 billion tweets floating around cyberspace, and another 400 million are posted each day. It's also important to understand that a tweet isn't just 140 characters. There are an additional 50 fields of metadata storing everything from the timestamp to the user name. "It’s pretty raw,” Deputy Librarian of Congress Robert Dizard Jr. recently told The Washington Post. “You often hear a reference to Twitter as a fire hose, that constant stream of tweets going around the world. What we have here is a large and growing lake. What we need is the technology that allows us to both understand and make useful that lake of information."

This brings us to a complicated conclusions about the future of our tweets. Everybody was excited about the Library of Congress archiving everyone's tweets, because it's the government and presumably, they'd do it in such a way that the public would have free and open access to the tweets. This is still true, except for the last part about access. There's a chance that Twitter itself could be tasked with opening up access, and there are signs that they're heading in that direction. Just last month, Twitter started offering users the ability to download their tweet archives. Again, that doesn't help the researchers that want to crunch the massive amount of data that all of Twitter holds.

Give it time. The Library of Congress might hire some whiz kid from MIT with a penchant for public service and a knack for writing algorithms. Or Twitter could offer access for a price, like they did to marketers and companies like Google about a year ago. Remember, though: They don't have to do a thing. As much as we like to think that all of our tweet data is ours. It's not. Twitter owns it, and everybody else--including the Library of Congress--is at its mercy.