How Machine Learning Is Identifying New, Better Drugs

When Dr. Robert Murphy first started researching biochemistry and drug development in the late 1970s, creating a pharmaceutical compound that was effective and safe to market followed a strict experimental pipeline that was beginning to be enhanced by large-scale data collection and analysis on a computer.

Now head of the Murphy Lab for computational biology at Carnegie Mellon University (CMU), Murphy has watched over the years as data collection and artificial intelligence have revolutionized this process, making the drug creation pipeline faster, more efficient, and more effective.

Videos by VICE

Recently, that’s been thanks to the application of machine learning—computer systems that learn and adapt by using algorithms and statistical models to analyze patterns in datasets—to the drug development process. This has been notably key to reducing the presence of side effects, Murphy says.

“Beginning around 10 years ago or so, the question was starting to be raised of, is there a way to improve that process such that we consider what the characteristics are of drugs on more than one thing at a time,” Murphy said.

In traditional drug creation, scientists identify a target in the body—a protein that is mutated in some way to cause an illness—and test a large range of chemical compounds on it until they find one that achieves a desired effect, either inhibiting a negative one or enhancing a positive one. This compound is then tweaked and tested in clinical trials to eliminate side effects until it receives approval from regulators (in the U.S., the Food and Drug Administration) for use on patients.

When machine learning is applied, this process gets quicker and more effective, testing the impact of a much larger range of compounds (or sometimes inventing new ones altogether) on a desired target in one fell swoop.

The average drug costs around $2.8 billion in research and development expenses and takes around 10 years before it wraps clinical trials and achieves approval for market. Murphy notes this process ends with a drug being recalled if enough patients experience adverse side effects.

But machine learning can eliminate kinks in the pipeline and reduce the chance of recall by testing a near infinite range of compounds on a desired target in one fell swoop. ML systems can predict drug-protein interactions, the efficacy of an intervention, possible side effects, and can optimize a molecule’s biological response to a drug.

As a leading mind in the field of computational biology and a pioneer of CMU’s program on the topic, Murphy himself has played a strong role in this. In 2011, he penned a commentary noting that machine learning would play a role of growing importance in the drug discovery process. But his argument went a step further, advocating for the use of active machine learning, or a subset of ML in which the user offers the machine feedback on desired outcomes, improving its efficiency and accuracy over time. In the drug discovery process, the number of experiments required to screen a specific compound on a specific target while monitoring impact on other targets can quickly become unwieldy. Active ML offers researchers the opportunity to direct the experiment, supervising the computer as it iteratively chooses experiments that are most likely to improve the model.

“Drug discovery and development will be dramatically improved by the ability to assess effects of potential drugs more comprehensively,” Murphy wrote in the commentary at the time.

“Clearly much work remains to be done,” he added. “Not least of which is to convince practitioners of the value of ceding some important decisions to machines.”

Murphy now has his sights on deep learning, a subset of machine learning that is guided by the neural networks and structures of the human brain, allowing scientists to build models that require less by way of human intervention.

Earlier this year, a group of researchers at MIT landed upon a deep learning drug discovery technique that uses images of binding relationships between drug candidates and target proteins to “yield precise results in a fraction of the time compared to previous state-of-the-art methods.” By feeding a machine images of binding relationships between models, the researchers taught the test to calculate how well a drug binds to a protein almost 50 times faster than previous methods, the researchers claim.

Murphy’s own research facility, Carnegie Mellon, is in the throes of piloting the world’s first university-based cloud lab, a remotely-operated research facility that uses laboratory automation to hand day-to-day judgment calls in the experimentation process over to machines.

A $40-million project, the lab will be capable of running more than 100 complex experiments at the same time, 24/7.

“You don’t need to have scientists saying, ‘Well what should we do next, let me think about that,’” Murphy said. “You can have the computer be making that decision, and then executing, and just continuously running to improve your outcomes. And then stopping when it reaches whatever the desired goal is.”

ML technology, he says, is constantly improving to enhance drug creation—and thus, patient outcomes—though he’s unsure for certain how much time tools like deep learning have sliced off the process since he first entered the field. But he’s certain it’s made the drug creation process more accurate.

“In terms of experiments being done, it still takes time to do those cycles. But you get better results,” he said. “It may be that the improvement we see in that next phase is much more about getting better, more successful therapeutics than it is getting them faster.”