Computer scientists at the University of Pennsylvania have developed an algorithmic framework for conducting targeted surveillance of individuals within social networks while protecting the privacy of "untargeted" digital bystanders. As they explain in this week's Proceedings of the National Academy of Sciences (PNAS), the tools could facilitate counterterrorism efforts and infectious disease tracking while being "provably privacy-preserving"—having your anonymous cake and eating it too.
"The tension between the useful or essential gathering and analysis of data about citizens and the privacy rights of those citizens is at an historical peak," the researchers begin. "Perhaps the most striking and controversial recent example is the revelation that US intelligence agencies systemically engage in 'bulk collection' of civilian 'metadata' detailing telephonic and other types of communication and activities, with the alleged purpose of monitoring and thwarting terrorist activity."
Other conflicts mentioned by the Penn group include issues around medical data and targeted advertising. In every case, the friction is between individual privacy and some larger purpose, whether it's corporate profits, public health, or domestic security. Can we really have both?
How can we search for the things we want without revealing information about the population we don't want to know anything about?
Probably not, but we might not have to live with an "all or nothing" approach to privacy either. This is what we have now, according to the researchers: either every person has a right to privacy or no person does. What they propose instead is a population divided and classified. This is already sounding pretty ominous, but let's hear them out.
"There is a protected subpopulation that enjoys (either by law, policy, or choice) certain privacy guarantees," the researchers write. "For instance, in the examples above, these protected individuals might be nonterrorists, or uninfected citizens (and perhaps informants and health care professionals). They are to be contrasted with the 'unprotected' or targeted subpopulation, which does not share those privacy assurances." Still ominous.
There is an interesting and useful question at the root of this: Given a network, probably a social network, modeled as a graph (the graph theory sort of graph), how can we search for the things we want (terrorists, people spreading infections around) without revealing information about the population we don't want to know anything about?
"At the highest level," the group writes, "one can think of our algorithms as outputting a list of confirmed targeted individuals discovered in the network, for whom any subsequent action (e.g., publication in a most-wanted list, further surveillance, or arrest in the case of terrorism; medical treatment or quarantine in the case of epidemics) will not compromise the privacy of the protected."
The algorithms are based on a few basic ideas. The first is that every member of a network (a graph) comes with a sequence of bits indicating their membership in a targeted group. If say, the number two bit was set in your personal privacy register, then you might be part of the "terrorist" target population. For an algorithm searching a network for targets, it doesn't just get to ask to reveal every network member's bits. It has a budget of sorts, where it can only reveal so many bits and no more. The algorithms work to optimize this scenario such that as many bits-of-interest are revealed as possible.
It does this optimization via a notion known as a statistic of proximity (SOP), which is a quantification of how close a given graph node is to a targeted group of nodes. This is what guides the search algorithms.
Using real social networks with stochastically (randomly) generated, artificial target groups, the Penn team found that they could indeed search a network for targeted members while not revealing information about individuals in privacy-protected populations.
"Our work is of course not a complete solution to the practical problem, which can differ from our simple model in many ways," the group concludes. "Here we highlight just one interesting modeling question for future work: Is it possible to give rigorous privacy guarantees to members of the protected population when membership in the targeted population is defined as a function of the individuals' private data? In our model, we avoid this question by endowing the algorithm with a costly 'investigation' operation, which we assume can infallibly determine an individual's targeted status—but it would be interesting to extend our style of analysis to situations in which this kind of investigation is not available."