On Monday, a team of Google employees published a letter explaining the events leading up to Google’s firing of Dr. Timnit Gebru, a prominent AI ethics researcher whose work has helped to reveal racial bias in facial recognition algorithms.
Gebru co-authored an academic paper on the ethical considerations of machine learning models with researchers from the University of Washington, which was submitted to a conference after being approved internally. According to Gebru, Google then objected to the paper and asked her to remove her name from it or retract it. In response, Gebru detailed conditions she wanted Google to meet in order for her to stay at the company, including transparency about the paper’s internal review, in exchange for removing her name. If the company declined, then the parties would negotiate a date for her to leave. She was fired shortly after, after sending an email about the situation to an internal listserv.
Jeffrey Dean, Google's artificial intelligence chief, has publicly stated that Gebru resigned. But this doesn't square with Gebru's own claims, or the fact that her manager, Samy Bengio, was unaware of any resignation. Additionally, California law says a worker is considered fired if they have expressed an intent to resign and provide a date by which they will, but are then immediately terminated, as Gebru was.
Motherboard obtained a copy of the paper that Gebru co-authored before Google fired her. The paper is about large language models, which are machine learning algorithms with billions of parameters. The paper examines the risks of human biases perpetuated by massive and potentially incomprehensible datasets, the carbon emissions associated with training large language models, and research efforts that develop models for manipulating and mimicking human language, instead of understanding it.
Machine learning models have been getting larger over the years, and also more capable with less supervised training. OpenAI’s GPT-3 model for example has 175 billion parameters and can produce convincing text from a brief prompt. Google, for its part, uses large machine learning models in its products. These new capabilities are impressive, but experts worry about the biases (whether racial, gendered, or otherwise) that these powerful models may pick up from their underlying training data.
"How big is too big?" is a question that the paper asks early on in its abstract. "It is important to understand the limitations of language models and put their success in context. This not only helps reduce hype which can mislead the public and researchers themselves regarding the capabilities of these LMs, but might encourage new research directions that do not necessarily depend on having larger LMs."
Gebru sought to present the paper at a computer science conference in March and submitted it for internal review on October 7th, a day before it was approved. Dean said in a statement that research teams are required to give two weeks for review, but the letter published by Google Walkout for Real Change—the group behind the global 2018 walkouts over the company’s handling of sexual harassment—explains that data collected from internal reviews and approvals show that most happen right before the deadline, while 41 percent happen after it. So, Gebru’s actions were not abnormal.
"There's no question that for whatever reason research censorship was used to retaliate against Timnit and Timnit’s work,” Meredith Whittaker, faculty director of the AI Now Institute, told Motherboard. “I think there is also no question that large scale language models are one of the more profitable forms of AI that are used in fundamental technologies like search. Google has a corporate interest in making sure that they're able to continue using those technologies and gaining that revenue without having to make the type of fundamental structure changes they would need to if they were truly listening to their ethics researchers like Timnit."
In 2018, Google's AI division created its own large language model—BERT—that was eventually incorporated into Search, which alone brought in $26.3 billion in revenue this past fiscal quarter. Would following the suggestions of a paper like Gebru's—which argues large language models have significant risks and are becoming too inscrutable to correct—force Google to put aside profits in pursuit of less harmful, smaller language models?
Audrey Beard, co-founder of the Coalition for Critical Technology and a machine learning researcher herself, suggests Google is choosing profitability here.
"The direction of machine learning has been towards increased performance, something the co-authors called out," Beard told Motherboard after analyzing the paper. "This is the de facto standard for how AI researchers evaluate the contributions of their developments to very tightly constrained technical problems, which lend themselves to quick and easily evaluated research. We, machine learning researchers, have done a really good job of translating this math into the space of actionable business problems, solutions, and money making ventures—take Google’s outfitting of consumer and business-based cloud computing. The models they're deploying are computationally intensive and they're also making a ton of money on the back-end by selling these resources to people building apps but don't want a computer on site."
All of this is amplified by the fact that researchers, especially at Google, have become increasingly dependent on corporations to provide support to formulate research questions and pursue answers. A similar problem has emerged in the gig economy, where corporations refuse to share data about operations that have significant impacts on the daily lives of drivers and riders who rely on them. When any data is revealed, it is either as a sunny PR move or in selective (and secret) excerpts to researchers whose work cannot be adequately peer-reviewed. Predictably, such work more often than not tends to reaffirm the company’s talking points.
"I also think that whatever was going on in this specific case, the overall issue it showed light on is that these companies can spike any research they want, they can retaliate against people who reach conclusions they don't find convenient for whatever reason,” Whittaker told Motherboard. “I think that goes to the point of why should we be trusting these companies and these institutions to create this form of knowledge, especially given the public stakes involved, especially given how important these technologies are becoming to everyday life and the significant risks that go along with that."
Whittaker suggests that new forums for research—whether it is a return to academia, more public funding and financing of research, or greater involvement of philanthropic foundations—might make things better, as well as "research integrity standards'' at Google and across the private sector to prevent companies from turning research into propaganda. But, she added, it’s not enough.
Gebru’s firing should highlight that academic researchers and Google employees need to unionize to realize greater power and autonomy in their own workplace, she said.
"This is a catalyst,” Whittaker said. “Increasingly elite workers—academic and tech workers who have identified themselves with their place within a hierarchy and identified more with the bosses—are recognizing that they've been invited to the table, but it's not their table. They may have some privilege, but that privilege ends when they displease people who are increasingly making decisions that a lot of these workers morally and ethically can't support."
Over the past few years, thousands of workers have taken part in large protests and actions at Google over the company's handling of sexual assault, contracts with government agencies, retaliation against employee organizers, and now Gebru's firing. A petition by the Google Walkout group now has signatures from 2,040 Googlers and 2,658 supporters across academia, the industry, and in civil society groups.
Google did not immediately respond to Motherboard’s request for comment.