Teaching computers to understand human language used to be a tedious and imprecise process. Now, language algorithms analyze oceans of text to teach themselves how language works. The results can be unsettling, such as when the Microsoft bot Tay taught itself to be racist after a single day of exposure to humans on Twitter.It turns out that data-fueled algorithms are no better than humans—and frequently, they’re worse.
“Data and datasets are not objective; they are creations of human design,” writes data researcher Kate Crawford. When designers miss or ignore the imprint of biased data on their models, the result is what Crawford calls a “signal problem,” where “data are assumed to accurately reflect the social world, but there are significant gaps, with little or no signal coming from particular communities.”Siri, Google Translate, and job applicant tracking systems all use the same kind of algorithm to talk to humans. Like other machine learning systems, NLPs (short for “natural language processors” or sometimes “natural language programs”) are bits of code that comb through vast troves of human writing and churn out something else––insights, suggestions, even policy recommendations. And like all machine learning applications, a NLP program’s functionality is tied to its training data––that is, the raw information that has informed the machine’s understanding of the reading material.Skewed data is a very old problem in the social sciences, but machine learning hides its bias under a layer of confusion. Even AI researchers who work with machine learning models––like neural nets, which use weighted variables to approximate the decision-making functions of a human brain––don’t know exactly how bias creeps into their work, let alone how to address it.As NLP systems creep into every corner of the digital world, from job recruitment software to hate speech detectors to police data, that signal problem grows to fit the size of its real-world container. Every industry that uses machine language solutions risks contamination. Algorithms given jurisdiction over public services like healthcare frequently exacerbate inequalities, excusing the ancient practice of shifting blame the most vulnerable populations for their circumstances in order to redistribute the best services to the least in need; models that try to predict where crime will occur can wind up making racist police practices even worse.
Advertisement
BIAS FROM THE MACHINE
Advertisement
SEXISM, EMBEDDED
Advertisement
Essentially, word embeddings find parallels in usage, according to María De-Arteaga, a Carnegie Mellon researcher specializing in machine learning and public policy. “Word embeddings are very popular, especially when you don’t have much data,” De-Arteaga told Motherboard.Embeddings are the reason Google search is so powerful and chatbots can follow a conversation––their insights into how words are used by real people are fundamental to a NLP’s understanding of a language. But those insights––the data contained in Word2vec and other word embedding datasets––are also trained on old books and movie reviews, and sometimes tweets. An algorithm that turns to embeddings to understand the real world is still encumbered by the biases of every human voice represented in that data.“If you want to use word embeddings to analyze how women have been represented in the meaning, then the presence of that bias is actually useful,” De-Arteaga said, “but if you’re saying you’re using it as a dictionary, as your ground truth, then you’re considering bias as truth.”De-Arteaga’s recent projects focus on limiting the skewing power of word embeddings in different contexts––for example, by using “fairness constraints” to force the model to take away points for accuracy when it relies too much on stereotypes. Another approach she’s tried: “scrubbing” the data, or completely removing gender-linked aspects of words from the dataset before analysis. Both scrubbing and fairness constraints help reduce sexist outputs, but not enough, says De-Arteaga.
Advertisement
MORE PRONOUNS, FEWER ASSUMPTIONS
Advertisement
One overlooked cause of that opacity is the fact that known debiasing methods are especially weak in languages with grammatical gender––like Spanish, which puts neckties and women in the same grammatical class ( la corbata; la mujer) in contrast to, say, dresses and men ( el vestido; el hombre). The whole NLP field is absurdly English-centric, but gender debiasing approaches are especially hard to translate. The models wind up having to choose between gender balance and correct grammar, which means they’re useless either way.A key exception to that is Zmigrod’s approach, which has shown promise in gendered languages like Spanish and Hebrew, though it takes a lot of processing power to keep up with all the standard gendered articles and endings, let alone new pronouns to describe nonbinary or gender-neutral identities.Attempts to regulate the use of NLPs containing bias––that is, all of them––may be somewhere on the distant political horizon. But until then, it’s possible that the best way to intervene in a biased pipeline is to start with the researchers themselves. Yasmeen Hitti and Andrea Eunbee Jang of Mila initially came together as part of the AI for Social Good summer lab, which takes aim at machine learning bias by bringing women researchers onto AI projects early on.Hitti, Jang, and fellow researcher Carolyne Pelletier are now deep in the data-mining phase of their current project on gender generalizations, but they’re also looking ahead to new ways to build justice into the pipeline.“In our paper, we talk about […] two genders, male and female, but we also consulted with non-binary activists to see if our model could be adapted to their needs,” explained Pelletier. “But it’s not that different. When you think about it, sentences like ‘A programmer must always carry his laptop’ are biased against both she programmers and they programmers.”Finally, there is the source of the data: human bias. “Our goal is to train models [to be less sexist], but maybe in the end, it’s easier to train a human,” Hitti told Motherboard. “We’re spending a lot of time trying to teach the machine, but we have intelligence, too. Maybe we could put a little effort into being more inclusive.”