Tech

Is Metadata Anonymous? Of Course Not

From the very beginning of the year’s surveillance scandals, the NSA, the Department of Justice, and President Obama himself have argued that the bulk collection of phone call metadata is anonymous, and thus legal. 

“Nobody is listening to your telephone calls,” Obama famously said in June, following the PRISM reveal. “That’s not what this program is about. As was indicated, what the intelligence community is doing is looking at the numbers and durations of calls. They’re not looking at names and they’re not looking at content, but sifting through this so-called meta data, they may identify potential leads with respect to people that might engage in terrorism.”

Videos by VICE

Earlier this year, NSA Director Gen. Keith Alexander told hackers at the Black Hat conference that the agency “does NOT obtain” subscriber information or names associated with phone numbers. And just this week, President Obama stated unequivocally that the metadata surveillance program has not been abused.

The claim that metadata are anonymous has been instrumental to the NSA’s legal arguments, a claim that’s since received a smackdown in federal court and which may end up being examined by the Supreme Court. Regardless, there’s a bit of a logical disconnect in the NSA’s argument: If metadata truly is anonymous, how can it be useful for identifying terrorists? Well, it’s not anonymous at all.

Want to see for yourself? Google your phone number, which can legally be collected as part of bulk record collections authorized by the Patriot Act, and see if you can put a name to it. It took me a couple combinations (xxx-xxx-xxxx, xxx.xxx.xxx, etc.), but I found my number attached to the Whois registration details for a long-ignored personal blog.

What about my connections? I searched my girlfriend’s number, who was identified in open records on New York City’s taxi lost and found service. I found my mom’s first name and home address thanks to a Craigslist rental listing, deleted months ago, that was auto-reposted on another site and cached by Google.

Scrolling further down my recently-called list, there are the publicly-available numbers for my local hardware store and the car services I use at work and home, both of which give a good idea of where I spend my time in Brooklyn, despite my California area code.

A slide from a presentation earlier this year by NSA Director Gen. Keith Alexander purporting to show how metadata remains anonymous. Photo by Dan Stuckey

But as easy as it is to put a name to a number using legal, public records, that’s just the first step. What happens when you connect those interactions—based on nominally anonymous phone numbers—with the vast amounts of data available on the social web? After all, there’s got to be a reason the NSA collects data on billions of social interactions.

Figuring out how metadata can be used to build a profile is the goal of MetaPhone, a project from Jonathan Mayer and Patrick Mutchler of the Stanford Security Lab. The project is powered by an eponymous Android app that scans through your recent calls, text logs, and Facebook profile and activity data to try to see just how many connections can be made.

I gave the app a spin, and after a thorough disclaimer that collected data will be deleted when the project finishes sometime next year and the option to not be named in any resulting publications, MetaPhone mined through my history. Hey, it’s for science.

“Patrick and I want to bring empirical evidence into the debates surrounding phone metadata surveillance,” Mayer said in an email. “Like many computer scientists, we believe phone metadata is identifiable, interconnected, and intimate. Enough with the slippery talking points—let’s get the facts.”

So how anonymous is metadata really? MetaPhone’s early results show that identifying an anonymous phone number is “trivial.” In a recent blog post, Mayer explains that he and Mutchler pulled 5,000 random phone numbers from their dataset, and cross-queried them with Yelp, Google Places, and Facebook. 

“With little marginal effort and just those three sources—all free and public—we matched 1,356 (27.1%) of the numbers,” Mayer wrote. “Specifically, there were 378 hits (7.6%) on Yelp, 684 (13.7%) on Google Places, and 618 (12.3%) on Facebook.”

But that’s just the tip of the iceberg. Mayer continues:

What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73.

How about if money were no object? We don’t have the budget or credentials to access a premium data aggregator, so we ran our 100 numbers with Intelius, a cheap consumer-oriented service. 74 matched. Between Intelius, Google search, and our three initial sources, we associated a name with 91 of the 100 numbers.

So with a modicum of effort, the team was able to identify 27.1 percent of phone numbers through a straightforward public search, and 91 percent of a subset using slightly deeper probing. While that blows claims that metadata are anonymous out of the water, the real power lies in building a network of connections based off of a starting data point.

The ACLU’s Jay Stanley and Ben Wizner explained the power of metadata in a Reuters op-ed way back in June:

A Massachusetts Institute of Technology study a few years back found that reviewing people’s social networking contacts alone was sufficient to determine their sexual orientation. Consider, metadata from email communications was sufficient to identify the mistress of then-CIA Director David Petraeus and then  drive him out of office.

The “who,” “when” and “how frequently” of communications are often more revealing than what is said or written. Calls between a reporter and a government whistleblower, for example, may reveal a relationship that can be incriminating all on its own.

Simply put, using metadata to develop a portrait of a personal network can reveal surprisingly personal information. Currently, the MetaPhone project is looking at how that network detection works. “Our latest results addressed the identifiability of phone numbers,” Mayer said. “We’ve also looked at automated detection of private relationships and the interconnectivity of the call graph.”

Section 215 of the Patriot Act authorizes the NSA to collect phone metadata in bulk, provided that “adequate minimization procedures” are taken to ensure that searches are targeted. However, the NSA has argued that because unknown terrorists use phones, and the only way to identify which phones they use is to look at all of them, the agency can collect as much data as it wants. The secret court that oversees the NSA’s activity has repeatedly called the agency out for lying about its need to collect everything (and of course approved its activities anyway).

Why does that matter? The argument for phone metadata collection is based around its utility in identifying important connections; if known Terrorist A is recorded to regularly call Suspicious Number B, and then Random Guy C gives it a call, then C could be a person of interest, and so on down the line. NSA Director Alexander has previously argued that the agency only focuses on terrorist targets because, essentially, terrorists don’t talk to normal people, hence surveillance is minimized.

Screenshot from MetaPhone

But early MetaPhone results suggest that networks aren’t isolated into bad guy connections and good guy connections. While it didn’t immediately give conclusions to my Facebook data, my call records connected me to 21 percent of MetaPhone users within two “hops,” and 57 percent within three. At the center was T-Mobile’s voicemail number, which connected me to 16 percent of all users.

If the NSA’s argument was correct, I’d be connected to far fewer people in the MetaPhone dataset, because my personal network here in New York would appear to be fairly isolated to the networks of security researchers in California. Instead, we’re all jumbled up, which makes it difficult to develop discrete inquiries, which is required by the NSA’s legal mandate to minimize data collection.

So if a terrorist calls a Pizza Hut, is everyone else who ordered a pizza fair game for datamining? It’d be stupid to assume that someone would be connected to a terrorist because they both called a popular pizza place. But remember, the metadata are supposed to be anonymous, so no one is supposed to know that it’s a Pizza Hut in question.  

All an analyst would see, to hear the NSA tell it, would be that a person of interest called a hub that tons of other people call, all of whom would need to be further investigated to look for more connections. Multiply that over the entire network, and it’s easy to see why a former NSA staffer said the agency simply has more data than it can feasibly sift through.

At the heart of the problem is a surveillance program with a legal basis rooted in the false conclusion that metadata is anonymous, and thus not harmful to sift through. Effectiveness aside, such massive data collection can be used to develop information about relationships and connections totally unrelated to terrorists targets, an illegal act that NSA analysts have previously been busted for. With that privacy threat identified, the next step for the MetaPhone project is to see how deep things go.

“Public information can provide lots of biographical detail, but may be limited in revealing relationships, interests, and activities,” Mayer said. “We’re far from finished with our efforts to understand just how much meaning is packed into phone metadata.”

@derektmead