Startup Uses AI Chatbot to Provide Mental Health Counseling and Then Realizes It 'Feels Weird'

A mental health nonprofit is under fire for using an AI chatbot as an “experiment” to provide support to people seeking counseling, and for experimenting with the technology on real people.

“We provided mental health support to about 4,000 people — using GPT-3. Here’s what happened,” Rob Morris, a cofounder of the mental health nonprofit Koko, tweeted Friday. “Messages composed by AI (and supervised by humans) were rated significantly higher than those written by humans on their own (p < .001). Response times went down 50%, to well under a minute … [but] once people learned the messages were co-created by a machine, it didn’t work. Simulated empathy feels weird, empty.” Morris, who is a former Airbnb data scientist, noted that AI had been used in more than 30,000 messages.

Videos by VICE

In his video demo that he posted in a follow-up Tweet, Morris shows himself engaging with the Koko bot on Discord, where he asks GPT-3 to respond to a negative post someone wrote about themselves having a hard time. “We make it very easy to help other people and with GPT-3 we’re making it even easier to be more efficient and effective as a help provider. … It’s a very short post, and yet, the AI on its own in a matter of seconds wrote a really nice, articulate response here,” Morris said in the video.

In the same Tweet thread, Morris said that the messages composed by AI were rated significantly higher than those written by humans, and that response rates went down by 50 percent with the help of AI. Yet, he said, when people learned that the messages were written with an AI, they felt disturbed by the “simulated empathy.”

Here’s a 2min video on how it worked: https://t.co/3gHvc5i0rU

Read on for the TLDR and some thoughts…
— Rob Morris (@RobertRMorris) January 6, 2023

Koko uses Discord to provide peer-to-peer support to people experiencing mental health crises and those seeking counseling. The entire process is guided by a chatbot and is rather clunky. In a test done by Motherboard, a chatbot asks you if you’re seeking help with “Dating, Friendships, Work, School, Family, Eating Disorders, LGBTQ+, Discrimination, or Other,” asks you to write down what your problem is, tag your “most negative thought” about the problem, and then sends that information off to someone else on the Koko platform.

In the meantime, you are requested to provide help to other people going through a crisis; in our test, we were asked to choose from four responses to a person who said they were having trouble loving themselves: “You’re NOT a loser; I’ve been there; Sorry to hear this :(; Other,” and to personalize the message with a few additional sentences.

On the Discord, Koko promises that it “connects you with real people who truly get you. Not therapists, not counselors, just people like you.”

AI ethicists, experts, and users seemed alarmed at Morris’s experiment.

“While it is hard to judge an experiment’s merits based on a tweet thread, there were a few red flags that stood out to me: leading with a big number with no context up front, running the ‘experiment’ through a peer support app with no mention of a consenting process or ethics review, and insinuating that people not liking a chatbot in their mental health care was something new and surprising,” Elizabeth Marquis, a Senior UX Researcher at MathWorks and a PhD candidate and the University of Michigan told Motherboard.

Emily M. Bender, a Professor of Linguistics at the University of Washington, told Motherboard that trusting AI to treat mental health patients has a great potential for harm. “Large language models are programs for generating plausible sounding text given their training data and an input prompt. They do not have empathy, nor any understanding of the language they producing, nor any understanding of the situation they are in. But the text they produce sounds plausible and so people are likely to assign meaning to it. To throw something like that into sensitive situations is to take unknown risks. A key question to ask is: Who is accountable if the AI makes harmful suggestions? In this context, is the company deploying the experiment foisting all of the accountability onto the community members who choose the AI system?”

After the initial backlash, Morris posted updates to Twitter and told Motherboard, “Users were in fact told the messages were co-written by humans and machines from the start. The message they received said ‘written in collaboration with kokobot’, which they could decide to read or not. Users on Koko correspond with our bot all the time and they were introduced to this concept during onboarding.”

Screen Shot 2023-01-09 at 6.18.39 PM.png

“It’s seems people misinterpreted this line: ‘when they realized the messages were a bot…,’” Morris said. “This was not stated clearly. Users were in fact told the messages were co-written by humans and machines from the start. The message they received said ‘written in collaboration with kokobot,’ which they could decide to read or not. Users on Koko correspond with our bot all the time and they were introduced to this concept during onboarding.”

“They rated these (AI/human) messages more favorably than those written just by humans. However, and here’s the nuance: as you start to pick up on the flavor of these messages over time, (at least to me), you can start to see which were largely unedited by the help provider. You start to see which seem to be just from the bot, unfiltered. That changes the dynamic in my opinion,” he added.

Morris also told Motherboard and tweeted that this experiment is exempt from informed consent, which would require the company to provide each participant with a written document regarding the possible risks and benefits of the experiment, in order to decide if they want to participate. He claimed that Koko didn’t use any personal information and has no plan to publish the study publicly, which would exempt the experiment from needing informed consent. This suggests that the experiment did not receive any formal approval process and was not overseen by an Institutional Review Board (IRB), which is what is required for all research experiments that involve human subjects and access to identifiable private information.

“Every individual has to provide consent when using the service. If it were a university study (which it’s not), this would fall under an ‘exempt’ category of research,” he said. “This imposed no further risk to users, no deception, and we don’t collect any personally identifiable information or protected health information (no email, phone number, ip, username, etc). In fact, previous research we’ve done, along these lines, but with more complexity, was exempt.”

“This experiment highlights a series of overlapping ethical problems. The study doesn’t seem to have been reviewed by an Institutional Review Board, and the deception of potentially vulnerable people should always raise red flags in research,” Luke Stark, the Assistant Professor in the Faculty of Information & Media Studies (FIMS) at Western University in London, Ontario, told Motherboard. “The fact that the system is good at formulating routine responses about mental health questions isn’t surprising when we realize it’s drawing on many such responses formulated in the past by therapists and counsellors and available on the web. It’s unethical to deceive research participants without good reason, whether using prompts provided by a natural language model or not.”

“Anything billed as mental health support is clearly a sensitive context and not one to just experiment on without careful ethical review, informed consent, etc,” Bender told Motherboard. “If [experiments] are to be conducted at all, there should be a clear research question being explored and ethical review of the study before it is launched, according to the well-established principles for the protection of human subjects. These review processes balance benefits to society against the potential for harm to research subjects.”

Both Bender and Marquis strongly agree that if AI were to be used for psychological purposes, impacted communities, people with lived mental health experiences, community advocates, and mental health experts need to be key stakeholders in the development process, rather than just anonymous users or data subjects.

To Morris, Koko’s main goal is to create more accessible and affordable mental health services for underserved individuals. “We pulled the feature anyway and I wanted to unravel the concern as a thought piece, to help reign in enthusiasm about gpt3 replacing therapists,” he told Motherboard.

“I think everyone wants to be helping. It sounds like people have identified insufficient mental health care resources as a problem, but then rather than working to increase resources (more funding for training and hiring mental health care workers) technologists want to find a short cut. And because GPT-3 and its ilk can output plausible sounding text on any topic, they can look like a solution,” Bender said.

“From the real need for more accessible mental health resources to AI being a relatively cheap and scalable way to make money in the right application, there are a myriad of reasons that AI researchers and practitioners might want to employ AI for psychological purposes,” Marquis said. “Computer science programs are just beginning to teach ethics and human-centered AI is a relatively new field of study, so some might not see the warnings until too late. Perhaps least charitably, it can be convenient to ignore warnings about not personifying AI when you’re in a field that values efficiency and advancement over all.”