The Genome's Big Data Problem
As genome sequencing proliferates, we need to figure out the issues around collecting, storing, and sharing genetic data.
Image: Flickr/Dave Fayram
Medicine will be revolutionised in the 21st century, thanks largely to our increasing understanding and collection of genetic data.
Genetic data is information pertaining to part or all of your genome: the DNA structure that makes you you. This is translated into a massive string of letters—approximately six billion characters in length—that can reveal all sorts of things about you.
Thanks to the rise of genome sequencing, prescription medicines could end up being tailored towards individuals, increasing the drugs' effectiveness and minimising their side effects. Treatments could be developed for previously resilient diseases thanks to greater information available for research. It could even be possible to predict how predisposed infants are to various conditions as they grow up.
One program already using genetic data is the Personal Genome Project (PGP), an open call to those who wish to contribute to scientific research. If someone decides to participate in the project, they naturally have to sign a consent form—but it's not as easy as blindly clicking ‘I have read and agree to the Terms and Conditions.’ The New York Times reported that participants need to pass a test to make sure they fully understand what they are enrolling in, and what risks they are taking. These include the potential to be refused health insurance, or denied a job, because of a predisposition to a disease revealed by genome sequencing.
A sample kit for the Personal Genome Project. Image: Flickr/Peter Rukavina
Some of the scenarios on the consent form may sound far-fetched, but they're not unfeasible: someone could plant synthetic DNA to implicate you in a crime, for instance, or use your data in cloning.
So for all its benefits, there are still serious concerns around genetic data that need to be handled before we all jump on the genome band wagon. How will the data be stored? Who will be able to access it? What security will be in place?
When I was recently at the European Parliament in Strasbourg, I asked an expert panel what problems we were likely to see as genetic data proliferates. I was given an answer that equated to “We don't know.” Unsatisfied, I decided to look into some of the issues myself.
Privacy is the main concern around the debate of genetic data, because, by its very nature, the data can be used to identify an individual and their relatives much more accurately than other types of personal information. Indeed, those behind the Personal Genome Project recognize this. As Albert Sun at the Times wrote, “With the amount of data being shared, participants cannot be guaranteed of anonymity or privacy. While their names are not directly associated with their data, other information about them is, including birth dates, genders, ZIP codes, genomes and medical histories.”
Other projects are not so open about the risks. In the UK, the National Health Service (NHS) has proposed a genetic database, and those behind that plan claim that it is possible to remain “pseudo-anonymous” while listed within the system, by omitting certain parts of the data such as name or address. However, others have suggested that identification of people by genetic data combined with other public databases will be possible.
A researcher analysing a genome. Image: Flickr/DOE Joint Genome Institute
“Genetic data is not data that can be anonymised,” Pascal Borry, an assistant professor of bioethics at the Centre for Biomedical Ethics and Law, told me.
“I think that everyone agrees that if somebody puts in enough effort, and they have genetic data, they can probably re-identify,” said Tim Caulfield, a professor in the Faculty of Law and the School of Public Health at the University of Alberta. “The disagreement is in the ease with which this could happen.”
Another concern is how the data is stored. 23andMe, a commercial sequencing company, says it stores its data “with multiple levels of encryption and security protocols protecting your personal information.” Some researchers are developing ‘homomorphic’ protection, a novel approach that would greatly strengthen the security of the data. At the moment, however, a massive amount of computing power is required for such a method, so it won’t be becoming widespread any time soon.
If not stored securely, the theft of genetic data could cause a lot of headaches for program participants.
Just as the use of genetic data is a new step for scientific research, so it is for businesses. In the same way that personal data has become a commodity traded by companies, businesses are likely to want to capitalize on this new avenue.
“One of the big pushes to get hold of medical data, including genetic data, is to create personalised risk assessments which try to predict future health, and that can be used for personalised marketing,” Helen Wallace from GeneWatch, a non-profit that monitors developments in genetic technology, told me.
Referring specifically to the NHS plan, the GeneWatch website warns that “a personalised risk assessment is expected to lead to a massive expansion in the market for drugs and other products, such as supplements and cholesterol-lowering margarines, which can be sold using personalised marketing based on an individual's health data.”
Remember the much-publicised case of Target figuring out a girl was pregnant before her father knew, and sending her advertisements for baby products? Well, genetic data has the potential to go beyond that.
A specimen for 23andMe's genome sequencing program. Image: Flickr/Peter Rukavina
Something unique to the commercialization of genetic data—as opposed to internet browsing or purchasing habits—is how advertisements could also be targeted at your relatives. As well as finding out whether someone has a certain predisposition to a disease, “a company could also find out […] who their relatives were, and maybe sell on that information,” Wallace told me.
Government-run projects will realistically cross over into the commercial sector too. With the NHS database, the plan is that once genetic data has been gathered—and the Health Secretary has recommended that all children have their DNA sequenced at birth—this data will be added to a national database.
However, the NHS has a history of selling information to third parties, including drug and insurance firms. The government has also liaised with Google before about displaying hospital stats in its search results. That time, Google pulled out due to public backlash. But the search giant does seem to have an interest in the market for genetic data, and has for instance shown geneticists how to upload DNA data to the cloud.
The increased likelihood of commercialization of your genetic data is not helped by the “hype around the idea that we should all get our genome sequenced in the first place.” Wallace continued. “In fact, most of the scientific evidence is suggesting this is very useful for some people with rare genetic disorders for or high familal risk for breast cancer, but it's not actually useful as a screening tool for predicting susceptibility to common disease.”
It is of course in the interest of those who make money from genetic data to make “that market as big as possible,” she said.
ETHICS AND REGULATION
Here's one ethical quandary: If your genetic data can reveal intimate details about your family, shouldn't you obtain their consent before you have your own genome sequenced?
“If I'm getting my whole genome sequenced, and then joining a biobank [a programme that stores your biological data for research or commercial purposes], that information about me is going to have relevance to my brothers for sure,” health law expert Caulfield said. As for whether you're currently required then to get their consent, “The answer is, no you don't. There's no technical, legal reason to do that." But then, he said, that might make you ask, “Is the law appropriate?”
"If I'm getting my whole genome sequenced, and then joining a biobank, that information about me is going to have relevance to my brothers for sure."
A more personal decision that needs to be considered is that this data will be stored or worked on for longer than your lifetime, and that of your relatives. As Caulfield pointed out, giving up your genetic data results in you “donating your biological story for a very long period of time.”
This leads onto what legal protections should be in place for all of this data. One of the problems plaguing privacy laws already is the huge variation in them across the planet. That becomes an even greater problem when data is being accessed by researchers from different parts of the world. “One of the underlying themes of big data is that the data will be available anywhere,” Caulfield said.
There are attempts at a harmonization of laws that would mitigate this. A powerful new data protection law is being passed in Europe, for instance—but it is yet to gain support from the UK government.
Caulfield pointed out that it's important whatever laws are applied to genetic data are balanced. “You can also get an over-reaction,” he said. “We saw that with cloning, for example," Research in that domain was stifled in the US and Europe while it blossomed in countries with different regulations, such as China.
“Having evidence-based, informed laws is really important,” Caulfield said. Similar to protections for facial recognition information, and even more basic biometric data such as fingerprints, legal protections specifically crafted for genetic data are in their infancy. It is “a very complicated issue, and one that needs more investigation,” he concluded.