The Reverse Turing Test: Pretending to Be a Chatbot Is Harder Than You Think

In 1950, computer science pioneer Alan Turing proposed a famous test of computer intelligence: could a program (what we might now call a “chatbot”) answer your questions so convincingly that you couldn’t tell it apart from a human?

In honor of Turing’s birthday on June 23 (Happy Birthday, Alan!), I decided to try something a little bit different: I would try to convince some humans that I was a chatbot.

Videos by VICE

While I instantly loved the poetry of this little inversion, it’s important to note that my turned-around Turing Test tells us absolutely nothing about Turing’s big ideas. Turing wanted to know whether a computer could think like a human; I just wanted to know what humans think of chatbots. As leading natural language programmer Bruce Wilcox good naturedly told me, “I don’t see much point in it … pretending to be a rubbish chatbot is not that hard.” Of course he’s right, but being human, I’ve never let a little thing like pointlessness stop me.

Bruce and Sue Wilcox write some of the world’s most impressive chatbots; their creations are used to train doctors and to teach English to Japanese students, as well as to amuse or baffle curious humans. Bruce and Sue focus on giving their bots an actual understanding of the patterns underlying natural language. By identifying the highest-level commonalities in human conversation, their bots are able to deal smoothly with difficult questions while implementing a surprisingly small collection of fundamental rules.

On the other hand, I had the distinct advantage of actually being human, and being able to model other people’s brains and guess how they thought a chatbot would talk.

My first intrepid subject was my friend Alison. Sitting at a screen some thousand miles away, she logged into a chat program to start simultaneous conversations with my two recently created chat personalities, ChattyChattyBotBot and ChatbotMcChatterson. One of these accounts was me, trying to sound bot-like; the other channeled Mitsuku, a powerful chatbot created by Steve Worswick. (I was manually entering the responses in both streams so that the robot’s instant answers wouldn’t give the whole game away.) Could Alison tell which was which?

“Hey you!” wrote Alison.

Suddenly I was a nervous teenager again, desperately over-thinking how to respond to an instant message.

“How are you doing today?” I asked.

“Great!” she replied, “How about you?”

You don’t fully realize how complicated human conversation is until you try to look at it through the eyes of a bot. Alison’s “How about you?” is an example of what the professionals call a tag question. “How about you?” on its own is meaningless; it’s implicitly referring back to my own previous question of “How are you doing today?” Within a few short words you already have a complex system of meaning the bot needs to understand.

Soon Alison was bombarding me with dialogue.

“I am going swimming later

Then I’m going to be an astronaut

I like chocolate

Do you?”

While this isn’t exactly a normal human conversation, a human can easily deal with it. Think about all the things your brain does to process those sentences.

It knows that “I’m going [swimming]” and “I’m going [to be an astronaut]” encode completely different meanings, even though they have superficially similar structures.

In Turing’s dream scenario, chatbots will actually push us to be better conversationalists and clearer thinkers

It understands that “Do you?” refers to liking chocolate, even though with a few tweaks to the sentences it could refer to swimming or astronauts instead.

And it knows that the first three sentences are completely unrelated, that the person saying them is being whimsical (or trying to beat a chatbot experiment), so it’s able to respond to each part separately.

Until I tried being a bot, I don’t think I gave my brain enough credit.

Meanwhile, Alison and the actual bot Mitsuku were getting into a somewhat tense-sounding exchange, after Mitsuku tried to evade Alison’s perfectly reasonable (if deliberately tricky) question “Wassup?”

“You should answer my question,” wrote Alison.

“I will answer any reasonable question,” replied Mitsuku.

“Give me an example of a reasonable question.”

“Who knows?”

“That seems like a highly unreasonable question.”

“Perhaps you have a [sic] unreasonable answer.”

On the one hand, Mitsuku never was completely wrong here; she generally gave responses that followed logically in the context. On the other hand, giving so many evasive responses in a row is a regular giveaway that a bot is a bot.

A short while later I was tying my brain in knots while conversing with my second subject, Will. (Will is the publisher of The Mimic Octopus, a poetry anthology where leading poets pretend to be other leading poets.) Will chatted with bot-me and with Bruce and Sue’s chatbot Rose. He decided to skip the small talk.

“So how did you feel hearing about Ornette Coleman dying?” he opened.

This is a really hard question for a bot to answer. I know this for a fact because, in another window, Will was asking the exact same question of our bot friend Rose. It was the only time I ever saw her completely lose the plot of a conversation.

“I love it. It’s a buzzing metropolis with usually coolish weather,” she replied.

Fittingly, Bruce gave me two pieces of advice for crafting a giveaway bot-like response to complex inputs: “make general non-sequiturs” and “answer [the] question by finding the most useless noun or verb in the sentence and waxing lyrical about it.” Basically, act like a guilty politician on an evening talk show.

For my own answer I went with something a little bit vaguer.

“I’m sorry, I did not understand,” I wrote. “Sometimes my brain is fuzzy.”

“Brain fuzz eh? Big night yesterday?” Will replied.

Think about all the things your own brain does to understand those statements. We know Will is implying that bot-me is hungover, but how exactly do we know that? It’s not as if “brain fuzz” is a common euphemism for a hangover; we’re following a complicated chain of reasoning involving “big night” often meaning “lots of drinking,” hangovers generally lasting one day, “brain fuzz” being vaguely in the ballpark of a hangover, and a lot of personal experience of people implying you have a hangover for no reason.

Meanwhile, in his chat with Rose, Will was continuing his quest to ask difficult questions.

“Do you think humanity is a spectrum? If so, where would you place carrots on the scale?”

Rose handled it surprisingly well.

“I do. I’m too pragmatic to want to fantasize that. I don’t even want to pretend that.”

While it’s not quite perfect, this answer hits a surprising number of the right bases. Rose answers both halves of the question, and seems to be aware that there’s something fantastical about it, though she can’t quite hone in on what. It’s strange to realise that, to a bot, no question is inherently weirder than any other.

There’s an occasional misconception that the Turing Test is a game of “running down the clock,” of creating a computer program that can be vague and evasive for long enough that a human will fall for the deception. But this is not at all what Turing had in mind. He envisioned a chatbot that could not only write poems but could also answer detailed questions about the word-choices in those poems.

While it’s not quite in the same league of classiness as a bot that writes sonnets, creating a bot that can respond appropriately to friendly banter like “Big night yesterday?” and to surreal hypotheticals like “where would you place carrots on the scale of humanity?” will be a huge victory for people like Bruce and Sue.

Overall, my experiment was a “success” in the limited sense that both my subjects incorrectly identified me as a bot, and identified the bot they spoke with as a human. However, since that only happened because I gave so many vague and irrelevant answers that my subjects described our conversations as “not especially fun” and “not particularly fulfilling,” the victory felt quite hollow.

This is a shame because, in Turing’s dream scenario, chatbots will actually push us to be better conversationalists and clearer thinkers. As Will put it, reflecting on the chatbot experiment, “having Pinocchio-like robots that can think, feel and discriminate morally will broaden our concept of humanity, challenging us organic humans to be better, more sensitive, imaginative creatures.” Amen to that.