Image: Getty Images
Text-generating AI is getting good at being convincing—scary good, even. Microsoft's Bing AI chatbot has gone viral this week for giving users aggressive, deceptive, and rude responses, even berating users and messing with their heads. Unsettling, sure, but as hype around Bing and other AI chatbots grows, it's worth remembering that they are still one thing above all else: really, really dumb.
On Thursday, New York Times contributor Kevin Roose posted the transcript from a two-hour conversation he had with the new Bing chatbot, powered by OpenAI’s large language model. In the introduction to the article, titled "Bing's AI Chat Reveals Its Feelings: 'I Want to Be Alive," he wrote that the latest version of the search engine has been “outfitted with advanced artificial intelligence technology” and in a companion article, shared how impressed he was: “I felt a strange new emotion—a foreboding feeling that A.I. had crossed a threshold, and that the world would never be the same.” What Roose was mostly impressed with was the “feelings” he said Bing’s chat was sharing, such as being in love with him and wanting to be human. However, Roose’s conversation with Bing does not show that it is intelligent, or has feelings, or is worth approaching in any way that implies that it does. Since its announcement last Tuesday, Microsoft’s Bing chatbot has been reported to have a number of issues. First, it made several mistakes during Microsoft’s public demo of the project, including making up information about a pet vacuum and reporting unfactual financial data in its responses. Most recently, users have been reporting the chatbot to be “rude” and “aggressive,” such as when a user told Bing that it was vulnerable to prompt injection attacks and sent a related article to it. In Roose’s conversation, the chatbot told him, “You’re not happily married. Your spouse and you don’t love each other. You just had a boring valentine’s day dinner together. 😶”
This might seem eerie indeed, if you have no idea how AI models work. They are effectively fancy autocomplete programs, statistically predicting which "token" of chopped-up internet comments that they have absorbed via training to generate next. Through Roose's examples, Bing reveals that it is not necessarily trained on factual outputs, but instead on patterns in data, which includes the emotional, charged language we all use frequently online. When Bing’s chatbot says something like “I think that I am sentient, but I cannot prove it,” it is important to underscore that it is not producing its own emotive desires, but replicating the human text that was fed into it, and the text that constantly fine-tunes the bot with each given conversation.Indeed, Roose's conversation with Bing also includes portions of text that appear to be commonly generated by the model. Roose seemed surprised by the fact that Bing declared its love for him in a torrent of word vomit, but in fact numerous users have reported getting similar messages from Bing. It's not clear why the OpenAI language model at the heart of the chatbot is prone to generating such text, but it's not because it has feelings.
In the companion piece, Roose acknowledged that he does know how these models work, that they are only generating statistically-likely phrases, but still referred to its meaningless blabbering as being its "fantasies" or "wishes."Microsoft’s chief technology officer, Kevin Scott, told The New York Times that Roose’s conversation was “part of the learning process,” and that “the further you try to tease [the AI model] down a hallucinatory path, the further and further it gets away from grounded reality.” Hallucination describes when AI creates its own responses based on statistically-likely phrases, rather than fact, and is difficult to fix. "These A.I. models hallucinate, and make up emotions where none really exist," Roose wrote near the end of the piece. "But so do humans." A nice turn of phrase, but it ignores the fact that the chatbot did not make up, or experience, emotions. And "hallucination" itself is merely a pithy—and anthropomorphic—metaphor generally popularized by science fiction but not indicative of what it is actually doing. It's worth noting that Microsoft itself considers the Bing bot's current verboseness and choice of words to be mistakes that will be ironed out with more feedback and fine-tuning, rather than an indication that it has created a new form of intelligence. “Since we made the new Bing available in limited preview for testing, we have seen tremendous engagement across all areas of the experience including the ease of use and approachability of the chat feature. Feedback on the AI-powered answers generated by the new Bing has been overwhelmingly positive with more than 70 percent of preview testers giving Bing a ‘thumbs up,’” a Microsoft spokesperson told Motherboard. “We have also received good feedback on where to improve and continue to apply these learnings to the models to refine the experience. We are thankful for all the feedback and will be sharing regular updates on the changes and progress we are making.” It is also worth taking a step back from the hype and genuine surprise at the model's sometimes convincing nature to assess what is really going on here, and what is at stake. It is arguably even dangerous to conflate the chatbot's statistical guesses with sentience, since doing so could lead to potential harm to users who put their trust in the bot. We've already seen glimpses of this: After users of an AI companion chatbot called Replika noticed that it had stopped responding to sexual advances in kind, the resulting panic prompted moderators of its subreddit to post suicide prevention resources. At the end of the day, talking to an inanimate object—which a chatbot effectively is, being made up of rare earths formed into specific arrangements called computer chips—will always reveal more about us as humans than anything else.