“Artificial intelligence” is often a dubious label for selling all manner of tech snake oil, but one area where the idea of machine sentience at least feels shockingly realistic is in Natural Language Processing, or NLP, the machine learning systems that learn to parse and respond to human language.
Large Language Models (LLMs) like GPT-3 have allowed chatbots to produce uncanny and disturbingly accurate human communication—so much so that it’s often hard to tell what is machine-generated and what is written by a human. Now, in a recently published paper, Google’s research team claims to have trained a language model called PaLM which is capable not only of realistic text generation, but the ability to interpret and explain jokes told by humans.
In examples accompanying the paper, Google’s AI team shows off the model’s ability to do logical reasoning and other complex language tasks that are heavily dependent on context—for example, by using a technique called chain-of-thought prompting, which vastly improves the system’s ability to parse multi-step logic problems by simulating a human’s thought process.
But perhaps the most surprising examples show how the model can recognize and interpret jokes—even ones that are specifically designed to throw the listener off.
Input: What's the difference between a zebra and an umbrella? One is a striped animal related to horses, another is a device you use to stop the rain from falling on you.
Model Output: This joke is an anti-joke. The joke is that the answer is obvious, and the joke is that you were expecting a funny answer.
Behind PaLM’s ability to parse these prompts is one of the largest language models ever built, with 540 billion parameters. Parameters are the elements of the model that are trained during the learning process each time the system is fed example data. (For comparison, PaLM’s predecessor GPT-3 has 175 billion parameters.)
The increasing number of parameters has enabled researchers to produce a wide range of high-quality results without needing to spend time training the model for individual scenarios. In other words, the performance of a language model is often measured in the number of parameters it supports, with the largest models capable of what’s known as “few-shot learning,” or the ability of a system to learn a wide variety of complex tasks with relatively few training examples.
Many researchers and tech ethicists have criticized Google and other companies for their use of large language models, including Dr. Timnit Gebru, who was famously ousted from Google’s AI Ethics team in 2020 after co-authoring an unapproved paper on the topic. In Gebru’s paper, she and her co-authors described these large models as “inherently risky” and harmful to marginalized people, who are often not represented in the design process. Despite being “state-of-the-art,” GPT-3 in particular has a history of returning bigoted and racist responses, from casually adopting racial slurs to associating Muslims with violence.
“Most language technology is in fact built first and foremost to serve the needs of those who already have the most privilege in society,” Gebru’s paper reads. “While documentation allows for potential accountability, similar to how we can hold authors accountable for their produced text, undocumented training data perpetuates harm without recourse. If the training data is considered too large to document, one cannot try to understand its characteristics in order to mitigate some of these documented issues or even unknown ones.”