One of the hardest parts of genetic research is reading DNA. Every cell of our body contains a copy of our entire genetic code, but only some of that genetic code is actually used.
Now, researchers at Harvard University’s Department of Stem Cell and Regenerative Biology working with GPU manufacturer NVIDIA have developed a method of quickly and accurately identifying the wadded up DNA buried in our cells, using machine learning and GPUs. It might help us detect cancer and genetic disease earlier and faster.
Researching genomes is a laborious process that requires looking at chromatin, a mix of DNA and protein inside chromosomes. In 2013, scientists invented Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq), a method of rooting around in chromatin to see what’s going on. The problem is that ATAC-Seq takes hours and produces lots of noisy data. Even with high-precision scientific tools, folded up sequences of DNA are hard to sort through.
A chromatin dataset studied by ATAC-Seq was around 50 million reads of cells conducted over 15 hours. Enter AtacWorks, a machine learning program that augments ATAC-Seq and makes the data much easier to read. It can use 1 million reads in 30 minutes to get the same result that would take ATAC-Seq alone more than half a day.
AtacWorks is a residual neural network that studies past chromatin datasets and builds predictive models based on what it’s learned. To train AtacWork, scientists fed it a raw chromatin dataset and the same dataset after it had been cleaned up using ATAC-Seq. AtacWorks then looks at the two, learns how AtacWorks functions, then replicates it faster than humans can.
Fundamentally, what is happening here is that AI—powered by GPUs typically used for gaming (and increasingly, of course, research)—is making key genetic research both much easier and much faster.
“With AtacWorks, we’re able to conduct single-cell experiments that would typically require 10 times as many cells,” Jason Buenrostro, assistant professor at Harvard and the developer of the ATAC-seq method, said in a blog post. “Denoising low-quality sequencing coverage with GPU-accelerated deep learning has the potential to significantly advance our ability to study epigenetic changes associated with rare cell development and diseases.”
Researchers published a study about AtacWorks in Nature Communications on March 8. According to the paper, it’s possible that AtacWorks will greatly speed up the process of epigenetic research and allow scientists to better research Alzheimer's, cancer, and rare diseases.
“Based on these advancements, we anticipate that AtacWorks will broadly enhance the utility of epigenetic assays, providing a powerful platform to investigate the regulatory circuits that underlie cellular heterogeneity,” the paper said.
It might also help develop treatment for those diseases. “With very rare cell types, it’s not possible to study differences in their DNA using existing methods.” Lead researcher Avantika Lal said in a blog. “AtacWorks can help not only drive down the cost of gathering chromatin accessibility data, but also open up new possibilities in drug discovery and diagnostics.”