The Algorithm in the Armchair

This article appears in VICE Magazine’s Algorithms issue, which investigates the rules that govern our society, and what happens when they’re broken.

Meghan’s sleeping patterns were a clue that something was wrong. She would feel excited, happy, and barely sleep at all, her mind buzzing with ideas and creative energy. Then, just a few days later, she’d crash—sleeping 13 hours a day, too depressed and listless to even make conversation with her friends or family.

Videos by VICE

Now 19, Meghan had experienced symptoms of anxiety and depression since she was 12 years old. But these big swings, between high and low, led to a bipolar disorder diagnosis in 2018. A year after, she decided she wanted to try medication.

“I didn’t know what to do anymore to make myself, you know, want to be a person,” said Meghan, who is sharing her first name only to protect her privacy. But she felt nervous: “The thing that scared me is that medications are so general, you never know what’s going to work for you.”

Her hesitation was warranted. While medication can help many people manage their symptoms, there’s currently no biological test to take for mental illness—either to help arrive at the right diagnosis, or to predict what kind of treatment will work best for you.

When Meghan went to her psychiatrist near where she lives, in Canada, they diagnosed her using the DSM, or the Diagnostic and Statistical Manual of Mental Disorders, sometimes referred to as the “Bible” of psychiatry. The DSM is a collection of observed mental health symptoms sorted into buckets we know as anxiety, depression, schizophrenia, OCD, and more. Yet there’s an acknowledgment throughout the mental health profession that while the DSM is the best we’ve got, its categorizations may not always be the most effective way to reliably recommend treatment.

Most people with mental health issues have symptoms that could apply to many different disorders. Take sadness, for example. Rather than being an indicator of one specific disorder, it’s more like a fever: a general sign of distress that could be caused by any number of illnesses. Same goes for symptoms like an inability to focus, anxiety, and even hallucinations. To make things more confusing, people with mental health disorders can often have more than one disorder, or two people with the same disorder could have vastly different experiences. To be diagnosed with depression, a person needs to have five of nine depression symptoms in the DSM. That means that two people with depression could have only one symptom overlap.

There’s long been a desire to tap the brain for more meaningful guidance on this front since, after all, that’s where our emotions arise. Perhaps two depression patients have different things going on in their brains, and need different treatments. Yet, we’re diagnosing both of them with depression, and potentially putting them on the same medications. This could be why only 25 to 35 percent of people with chronic depression are able to find relief after they take their first drug.

Massive research efforts are now trying to harness the power of big data and machine learning for a more precise approach to mental health. By creating databases of patients’ brain activity, an algorithm might be able to figure out what clinicians can’t on their own: What makes this mentally ill person’s brain different from a healthy person’s? What treatment will their brain best respond to?

The hope is that one day, just as a visit with a doctor is often followed with blood tests or scans, a mental health visit would be followed with some sort of brain imaging. Then an algorithm, trained on thousands of other people’s brains, could determine what disorder a person has, and what medication should be tried first.

We’re still a long way from this being regular practice. Researchers are figuring out the best kind of data to use, how to train the algorithms properly, and confronting the human bias that overshadows all artificial intelligence: When we, humans, collect, interpret, and make decisions about data, it will inevitably influence what our algorithms learn. The stakes are especially high when we consider that we’re training algorithms to decide who’s healthy, and who’s not; what a “normal” brain looks like, and what a sick one does.

Some researchers feel that at this stage, even with machine learning, we’re no closer to understanding how, and even if, mentally ill brains are different than healthy ones in ways that are useful for patients. There’s some argument for bypassing this desire to find the cause of these disorders, and just apply algorithms to what treatments work best—even if we don’t know why. Some clinicians think we should forget about the brain and apply machine learning to other measures, like smart phone data or patient interviews, as an even faster way to use algorithms to help the mentally ill. It’s agreed, though, that we somehow need to level up our decision-making around mental health.

“This is a really important pivot that our field needs to make,” said Amit Etkin, a neuroscientist at Stanford University.

Meghan recently contributed her brain and its activity to these efforts. She took part in a study that looked at brain scans of people with mood disorders to help train an algorithm to predict a future patient’s responses to antidepressants or mood stabilizers. Though Meghan wasn’t able to access this kind of approach for herself, she was happy to contribute her brain data to help get there. It would have been better than what she had to go through.

Meghan initially tried Prozac, and was on it for about two months. “I felt like a zombie,” she said. “Sometimes it’s like you’re so numb that you can’t even cry if you need to. I had no interest in anything. I was just kind of floating through life.” She stopped taking it. She moved on to mood stabilizers, then a medication for generalized anxiety, then mood stabilizers again.

“I’ve literally cried about medication,” Meghan said. “It just sucks.”

In 1976, a paper found that people with schizophrenia had enlarged cerebral ventricles, which are interconnected cavities within the brain. The discovery “seemed to usher psychiatry into a new era where neuroimaging would help identify mental disorders and ultimately clarify their mechanisms,” explained a 2012 review in the journal Neuron.

But more than three decades later, we still don’t have any reliable biomarkers, or biological red flags, that can help out with diagnosis and treatment. “Compared to many other fields of medicine, psychiatry is actually one of the most left behind, unfortunately,” said Xiaosi Gu, an assistant professor in psychiatry and neuroscience at Icahn School of Medicine at Mount Sinai. “If you go to a psychiatrist, it’s very narrative based. You have a talk. They might do a structured interview on you, or you fill out survey questionnaires. But there’s nothing biological to it.”

A machine learning approach doesn’t want to strip psychiatry the human conversation between clinician and patient—it just wants to add biological measures on top of that. Still, we have to be careful about what questions we’re asking the data to tell us, and whether the answers we’re receiving are helpful.

Let’s say we collect data from two groups of people—one diagnosed with a certain disorder, the other a group of healthy controls. An algorithm could be trained to capture the key differences between those two groups, and then applied to a patient to help determine whether they have that condition or not.

It may be years or decades before we fully understand the biological underpinnings of mental illness, or find a true biomarker for schizophrenia or bipolar disorder.

There’s a problem here: If you were teaching an algorithm to tell the difference between pictures of cars and planes, you would give it pictures of cars and planes to learn from, telling it which ones are cars and which are planes, so it can be trained. Similarly, if we want to develop an algorithm to detect people with schizophrenia, we need lots of people with schizophrenia to collect data from to tell the algorithm how to spot it.

But those people with schizophrenia got their diagnoses using the DSM. When we compare them to controls, even if it leads to the development of a 100 percent accurate algorithm that can differentiate between the two groups, the algorithm is essentially just replicating the DSM categories, since that’s what it was trained on.

Andrea Mechelli, a Professor of Early Intervention in Mental Health at King’s College London, calls this a circularity problem. “We don’t know exactly what we’re looking for,” agreed Vince Calhoun, an engineer and neuroscientist at Georgia State University. “We don’t necessarily know if the answer is the right answer.”

This problem reveals how just training an algorithm and collecting brain data isn’t enough—algorithms need to be applied in ways that are actually useful. Mechelli thinks the algorithms should be dedicated to what clinicians can’t do already, but desperately need: not just determining what disorder people have, but predicting what will happen to patients in the future, how their disease will progress, and what medication they’ll thrive on.

Scott Martin, a 46-year-old in Sacramento, started to feel anxious about ten years ago. His doctor gave him Lexapro, which he tried for six months. It took the edge off, but it had side effects he didn’t like, and so he gradually stopped taking it. About five years ago, he started feeling another swirl of anxious and depressed emotions, but it was hard to figure out exactly what was wrong.

“It seems like I have this laundry list of three or four different things, and none of my symptoms that I could easily identify seem to really match any one obvious diagnosis in particular,” Martin said. He’s tried three or four medications since, all which had little to no positive effects.

Like Martin, people often bounce from drug to drug, coping with the side effects, and also suffering from not improving the symptoms they originally sought help for. For severe depression or psychotic disorders, this lack of progress can result in people dying by suicide. “We are losing lives,” Gu said. “And we’re also losing quality of life during this very mindless fishing expedition.”

Algorithms could help to tackle this issue. By collecting data from people who have a certain disorder, and do well on a specific medication, clinicians could see how well an individual’s brain matches to that sample. If they are alike, that person might feel better on that drug too. If they don’t match, it might nudge them to go in another direction.

Calhoun and his colleagues are doing this now: working on an algorithm that looks at fMRI brain scans of people with mood disorders to try and predict their responses to medication. This is the study that Meghan contributed her brain imaging to. They tracked how people responded to medications like antidepressants or mood stabilizers, with the goal of later being able to predict when a person comes in with a hodgepodge of symptoms that may indicate depression or bipolar disorder, which type of medication people with similar brains did best on.

The algorithm used that information to determine whether a person might have MDD or Bipolar I, and when tested again on a new group of people (who had been previously diagnosed with one or the other) the algorithm was 92.4 percent accurate. In 12 more people whose diagnoses were unclear, they used the algorithm again to predict diagnoses, and response to medication. In 11 out of 12 of the people, they responded to the medication suggested by the algorithm.

In Etkin’s research, he and his colleagues are using electroencephalogram (EEG) to try to predict which people will most likely respond to the antidepressant, Zoloft. They’re using data from a large study from 2011, which collected information, including EEG, from 309 people who tried Zoloft as their first medication. Martin contributed his brain activity to this research.

The algorithm they’ve developed EEG looks at a depressed person’s EEG read out, and tries to predict who is going to respond to an antidepressant. They found that their algorithm could predict who responded to Zoloft, and then they replicated their findings at four different locations. Additional work they’ve done showed that people who didn’t do well on the Zoloft are more likely to respond to treatment with TMS—a treatment that Martin just finished doing.

“From the clinician’s perspective it’s quite easy to see if someone is ill or not. That’s not really useful,” Mechelli said. “What is difficult is, will they develop bipolar or schizophrenia? Will they have multiple relapses or will they be stable? Will they respond to antipsychotic medication and will their response last over time or will they relapse? Algorithms that are useful are algorithms that make a prediction into the future.”

There are many types of information you can get from a person’s brain. fMRI can measure the activity of the brain by visualizing blood flow, while MRI measures the physical shape. EEG measures electrical activity. Etkin said that some kinds of data, like fMRI or MRI, are better for story telling, or writing papers, but not necessarily better for getting people the help they need right away.

“We’ve banged away for many years with MRI, finding all sorts of interesting things and publishing all sorts of interesting papers, but never were able to get to the level of robustness that you need for an individual person,” he said.

In the past few years, his group has switched to EEG—still a measure of the brain, but much faster (about 20 minutes), cheaper, and can provide a lot of information to an algorithm. Etkin said that during the study, EEG imaging was also more consistent between various locations. “But fMRI fell apart when you tried to do that, because each scanner is very different,” he said.

Any use of brain data to train algorithms will have to grapple with variability between different research groups, both in patient populations, but also in the data collection itself. An fMRI doesn’t just take a picture of the brain like a camera; it requires a lot of data analysis and programming to arrive at the final result. A recent international study in Nature highlighted this: 70 teams of over 200 neuroscientists all received the same fMRI data, but all did their analysis in slightly different ways, and came to different conclusions about the data.

Tel Aviv University neuroscientist Tom Schonberg, a co-author on the paper, said that their results apply to every branch of empirical research that is done by humans. When humans are involved, variations occur, and the best way to address it is radical transparency in every step of decision making.

“Those are problems and they need to be dealt with,” said Dave Schnyer, a professor of psychology and neuroscience at the University of Texas, Austin. “I think neuroimaging was definitely a bit oversold. It is a scientific technique, just like every scientific technique, that requires that people use it and apply it in a rigorous fashion.”

The Nature paper was a good example of how we need to understand the different analyses that teams can use, Schnyer said. What’s needed is a more universal, standardized approach to data, Schnyer said, along with openness and transparency when it comes to how people processed their data, before we rely on algorithms to make complicated predictions.

Outside of the processing of the data, it’s crucial that the data itself is representative—meaning it needs to come from a diverse enough group of people.

If an algorithm was trained on people between 20 to 30 years old, from a certain ethnic or socioeconomic background, and the person it’s being used on is not from that group, it probably won’t work for them. “In that case, the algorithm would be more likely to make an incorrect inference,” Mechelli said.

Algorithms can be developed to be highly sensitive to one group of people, but then not work on another. This is called “overfitting,” which means that the algorithm is learning something about the dataset that doesn’t have much to do with what you’re trying to measure. Then, when the algorithm is unleashed on a new dataset, it doesn’t perform as well because what it was measuring before isn’t present.

We also risk defining what a “normal” “healthy” brain is based on a specific subtype of person. Gu said that’s been true for older psychology studies, which can often have white, college-aged students as their primary participants.

“You can’t claim that you have a powerful algorithm unless it has been tested across a diverse range of datasets,” he said. Many research teams, if they have a large data set, will train their algorithms on one half of the data, and then test it on the other. But Schnyer said that’s only one level of reliability. The way to ensure a dataset is really accurate is to test it on a completely new group of people. “It’s that final step that most people don’t do,” he said.

There are ways to take an algorithmic approach that don’t scan the brain at all. Spring Health, a company that employees of large companies, like Gap, Whole Foods, Pfizer, and Equinox have access to as part of their healthcare, uses machine learning to determine what kind of therapy a person might do best with, or what kind of medication they should try. Instead of using brain imaging, Adam Chekroud, a neuroscientist from Yale University and Spring Health’s cofounder, said you can gain a lot of insight from the patients themselves.

Chekroud first started partnering up with scientists conducting research who would share their data. As the company has grown, they now have over one million members to feed information to their algorithm from. Patients take assessments, short online quizzes they can take on their phone, which continually track their progress. The assessments are not that different from what a typical patient would experience going to a psychiatrist. The difference is that when a new patient comes along, the history and experience of every previous person, in aggregate, is factored into the decision making for what the right treatment option may be.

“Instead of just saying, do you want therapy, or do you want meds, or, let’s start with Lexapro—what we’ll do is go through all of the thousands of people we’ve treated and say, let’s find the people who most look like [you] based on your symptom profile. And let’s see what treatments work well for people like you,” Chekroud said.

A brain scanner would never fully replace a human person interviewing and talking with a patient.

In a 2016 paper in The Lancet, and in a follow-up paper in JAMA in 2017, Chekroud and his colleagues found that their approach could help people feel better in eight weeks, compared to those who didn’t.

Chekroud thinks that while the brain imaging data is important to research for basic science, self-assessments are more accessible in clinical practice, and can help people right now. Brain scans are expensive, and can be hard for people in rural areas to access. It may be years or decades before we fully understand the biological underpinnings of mental illness, or find a true biomarker for schizophrenia or bipolar disorder. Perhaps the underlying biological isn’t distinct enough to help with diagnostics, the brain imaging tools we have aren’t powerful enough yet, or we don’t have enough data.

“The more academic crowd has continued to pursue brain and genetic markers, mostly because it’s like there’s a biological fact,” Chekroud said. “It’s way cooler and way sexier if they can figure it out like a brain biomarker. The reality is that the signal is just not there yet. I think the clinical data is particularly attractive because it seems like it has the most value. It’s like the closest thing to the symptoms.”

Schnyer agreed that in the short term he doesn’t think that machine learning is going to be very helpful to reveal the underlying mechanisms of mental illnesses. Even if an algorithm detects that a depressed person’s brain is different in certain areas than a healthy person’s, does that mean that difference caused the depression, or the depression caused those areas to be different?

But if the goal is just to direct the person to the treatment that might work the best, we don’t have to answer that question today. We can continue to research that, it isn’t one or the other. With algorithms, we can circumvent what we don’t know, and potentially arrive at an effective way to help anyway.

“If all you really want to do is predict reliably whether one group of people will respond to a drug better than another group of people, you can train an algorithm to do that,” Schnyer said. “Even if you have no idea how that algorithm is working. That’s certainly going to be worthwhile in the clinical setting.”

There are ethical issues surrounding this research that are fast approaching, if algorithms prove to be helpful and used in everyday mental healthcare. How do you tell someone that they have a brain that’s similar to people who needed extensive support for the rest of their lives? Or that their brains are similar to people who have treatment-resistant depression?

Who is responsible for when an algorithm is wrong? Is it the doctor who decided to use the algorithm? The company that developed the algorithm? What if a clinician decides to ignore the prediction made by an algorithm and that turns out to be detrimental to the patient?

“These are important questions,” Mechelli said. “And we haven’t spent enough time thinking about all of them, or dedicated resources to really understand what these issues are, for the clinicians as well.”

Would machine learning on brain imaging be too biologically focused, homing in on physiological changes in the brain, and ignore contextual, societal factors that influence mental health?

“It’s a valid concern,” said Devin, a 22-year-old in Sacramento. She was diagnosed with depression at 17 years old, right after she was accepted into college, and is sharing only her first name to protect her privacy. She started with an antidepressant, Cymbalta, that her mom had been using for most of her life. Her doctors thought that because her mom did well with it, it would work for Devin as well. It didn’t.

One day, she was laying in bed, and when she tried to get up she fell to the ground on her hands and knees, overcome with nausea and vertigo. “I shouted to my roommates, ‘I’m going to need a minute.’ Everything was shifting. It was awful,” she said.

Devin also took part in Etkin’s study, and said she enjoyed getting her EEG scans.

She hopes that by participating in the research, someone in the future wouldn’t have to go through as much trial and error as she did. She ultimately found relief from her depression with TMS, and wishes she could have gone straight to that—that maybe something in her brain could have directed her there, a reminder that algorithms could one day indicate who would do best with non-pharmacological treatments too, like TMS or therapy.

“I wish I could have had this shift in mindset a long time ago,” Devin said, about how she feels after TMS. “I wish I wouldn’t have had to wait five years.”

Devin said as long as a computer didn’t fully replace a human doctor checking in with her, that she and others like her needed as much information as possible to make tough decisions about care. It didn’t reduce her to just her brain.

“To say that our mental world is represented by biology, particularly by neurobiology, does not and should not undermine the importance of our surroundings,” Gu said. “The brain is not a static organ. It is the most dynamic organ in your body, constantly responding to the outside world.”

A brain scanner would never fully replace a human person interviewing and talking with a patient. Calhoun envisions doctors being able to reference datasets or programs quickly, and use that to guide their decision making, and have it be another tool in their arsenal.

“I would say the most important tool remains the clinician, the person, the human,” Mechelli said. “That’s more powerful than anything else at the end of the day—that relationship, that projection between the patient and the doctor is probably the most powerful tool that we still have.”

Meghan agrees. “For me, I feel like the human connection part is the most important part because it’s so emotional. So I think it’s just a great combo: pairing the technology alongside the human connection part. But definitely I would always put the human part first.”

Follow Shayla Love on Twitter.