There are lots of apps that make it easy to encrypt your phone calls and chats. But your metadata—the data about who you're talking to, when, and more—is more difficult to obscure.
Which is why I was intrigued to come across a project called Vuvuzela, uploaded to the code-sharing platform GitHub last week. Vuvuzela is a prototype chat app, still under development, that not only encrypts the content of messages between two people, but as much information about those messages and the people who sent and received them as possible.
"Encryption software can hide the content of messages, but adversaries can still learn a lot from metadata—which users are communicating, at what times they communicate, and so on—by observing message headers or performing traffic analysis," states an academic paper describing the project.
And metadata, as we've learned from revelations about US, Canadian and UK spying operations, can reveal a lot—sometimes more than the contents of communication itself. As former NSA director Michael Hayden once infamously said, "We kill people based on metadata." That's how valuable such information can be.
According to the paper describing Vuvuzela's capabilities, presented in October at the 2015 Symposium on Operating Systems Principles, the goal is to minimize the amount of metadata about a person or their conversation that is leaked, or can be intercepted. The only variables revealed are "the total number of users engaged in a conversation, and the total number of users not engaged in one" (it does not reveal which users are in each group).
The team's work was funded by the National Science Foundation and Google.
According to David Lazar, a PhD student at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and the primary author of Vuvuzela's code, the main reason it has traditionally been so difficult to encrypt the metadata associated with our online chats and phone calls is efficiency.
"The most efficient way for a server to deliver a message from A to B is to directly send the message from A to B, but that requires the server to know who A and B are," Lazar explained in an email. "It's much harder for A's message to get delivered to B when the server can't know that A and B are talking to each other."
In an example implementation of Vuvuzela, all users connect to a server, which is connected to other servers in a chain. Messages are routed between servers in this chain to an ever-changing number of "dead drops," that only user A and user B can access.
"In practice, a dead drop is just a pseudorandom 128-bit number. To communicate, two users agree on a dead drop and send messages with the same dead drop ID in the 'header' of the message," explained Lazar (though, to be clear, this all happens in the background). "The last server in the Vuvuzela chain sees all of the incoming messages and their dead drop IDs. When it encounters two messages with the same dead drop ID, it exchanges the messages and sends them back down the chain. This is how users receive each other's messages."
To protect users, dead drops are changed regularly, and because of the way messages are passed between servers, there's no way for an attacker who has comprised one server to tell which dead drops correspond with which users.
Even if a user was to compromise the last server in the chain—the server that can see all the dead drop IDs—there's not much a sophisticated attacker could glean about who is talking to who. The reason: noise.
Vuvuzela adds noise to all the data flying around the network to make it harder to observe who's communicating with whom. Basically, Vuvuzela's goal is to create a system where, even after observing a large stream of communication over time, an attacker can't reliably distinguish who is talking to who. According to Lazar, "Vuvuzela generates noise equivalent to 1.2 million users, even if 100 million users are using the system or just 2 users are using it."
In other words, in the Vuvuzela configuration described in Lazar's paper, it will appear to an attacker that there are 1.2 million users communicating.
"The biggest drawback is the high message latency. In our experiments with 1 million users, the end-to-end message latency was 37 seconds"
Vuvuzela is certainly not the only project attempting to solve this problem, but it is one of the most recent. Out of the University of Waterloo, cryptographer Ian Goldberg has been taking another approach to the problem of chat metadata, and is tackling the problem of presence—the information that chat programs leak before you even initiate a conversation—with a project called DP5.
In other words, how do you privately convey that you're online, and available to chat, to a list of authorized users?
"Vuvuzela is complementary to DP5," Goldberg told me via email. "Vuvuzela protects the delivery of messages, while DP5 protects presence indication. So Alice could use DP5 to privately learn that her friend Bob was online (without revealing to anyone that she is friends with Bob), and then use Vuvuzela to privately send Bob messages (without revealing to anyone that she is communicating with Bob)."
Of course, it's hard to hide absolutely everything. For example, "Vuvuzela cannot hide the fact that a user is connected to the system," the paper reads. One way around this is to leave the client open all the time, so it's not possible to infer that two users always turn their clients on at, say, 9AM each day, and off shortly after their chat.
And for all their efforts, efficiency still poses a formidable challenge. "Running a Vuvuzela server can be expensive due to the cost of bandwidth. However, the biggest drawback is the high message latency," Lazar said. "In our experiments with 1 million users, the end-to-end message latency was 37 seconds."
It's not exactly the most real-time messaging app out there—"Perhaps the higher latency makes Vuvuzela better suited for SMS-style messaging, rather than GChat-style messaging," Lazar offered—but at least your metadata won't be leaking for all to see.
Correction, Dec. 7: A previous version of this article indicated that "it will always appear to an attacker that there are 1.2 million users online." However, Lazar has clarified that that the exact amount of noise generated by Vuvuzela changes regularly and is not static, and that the noise obfuscates how many users are communicating and not merely online, as previously stated. Motherboard regrets the error.