Earlier this week, DNA screening company 23andMe announced a collaboration with Big Pharma giant Pfizer.
The pharma company will get access to 23andMe's research platform, which includes the genetic info (and other data offered up in questionnaires) provided by some 650,000 consenting customers who have swabbed their saliva and sent it in for analysis.
The value of this data lies in its size. Everyone's information is aggregated into a big dataset, with identifying information stripped out. The point is to find common denominators associated with health conditions, certain mutations that might suggest a connection between genetic traits and disease. For the pharma companies, this could in turn suggest new ways to develop drugs to treat that disease. The usefulness is in the multitude of genomes to compare.
So what role does the individual play? How lost in the crowd is one person's genome?
I reached out to Yaniv Erlich, a genetic researcher at MIT who's known as a bit of a "genome hacker" and has looked at how individuals can be identified through their DNA. It therefore surprised me when Erlich said that he has himself participated in 23andMe.
He explained why you need a lot of data to get useful information. "What we have basically are many, many alleles in the genome, many mutations in the genome, each of which contribute a very minute amount to the predisposition to a disease," he said. "So in order to find these many contributions you need to have a lot of data to be able to see the signal above the noise."
23andMe confirmed to me that "individually identifiable information is not presented, and query results are always rounded up so results are always presented in a group with many others."
But, Erlich said, it is theoretically possible—theoretically, mind—that you could identify someone in pooled data. Take a summary of stats related to a specific genetic trait. "Because you contributed your own genome, the allele sequence in each position is a tiny bit biased toward your genome," he said.
He gave the analogy of a tiny drop of blue dye in a bottle of water—that's information your genome among the many. Now imagine doing that for different locations, so you end up with a whole bunch of water bottles, all with a dot of blue in them. You might not be able to detect the blue in one bottle of water, but if you went back and tested you'd see that, yes, there was a bias toward blue dye in those bottles of water. You are in that dataset.
Maybe my data is also aggregated in this sense. And I love that.
Actually doing anything like this would be another matter. For a start, you'd need to have the DNA of the person you wanted to test. And the larger the sample size (the bigger the water bottles) the more impossible it gets.
Speaking entirely hypothetically, Erlich suggested a situation in which you could get DNA from a job interview candidate and then run it against a database related to sensitive personal traits, like sexual orientation or a stigmatised condition. If their DNA was in the database, you'd know that information. But that would require a lot of access, skill, and conditions set just right.
As for 23andMe, Erlich isn't worried. In fact, he said their procedures were very good: For instance, they notified people using the service of how their data could be used, and only users who give permission are included in research. A 23andMe spokesman added that people can choose at any point to withdraw some or all of their information, and that data sharing in the Pfizer collaboration had the same kind of privacy restrictions as other 23andMe participation.
"We employ software, hardware, and physical security measures to protect the computers where customer data is stored," he added. "Personally identifying information is stored separately from genetic data and in line with the highest industry standards for security."
It's not the first time 23andMe has made deals with pharma companies. Last week, the company announced it was working with Genentech in relation to Parkinson's disease. These steps come after the FDA banned it from offering medical analysis to customers, such as suggesting individuals are at risk of certain illnesses. But the service is still available elsewhere, and launched in the UK in December.
Nevertheless, it looks like the focus, at least in the company's eyes, is not on the results of your personal spit test but the research opportunities of hundreds of thousands like it.
"Maybe my data is also aggregated in this sense," said Erlich. "And I love that, because I want to advance scientific studies, and we all want better drugs; we want to have better access to medical treatment. This is exactly how to do that."