A cartoon shows a top-down view of rows of cubicle workers.
Image: Shutterstock
Tech

Underpaid Workers Are Being Forced to Train Biased AI on Mechanical Turk

Workers who label images on platforms like Mechanical Turk say they’re being incentivized to fall in line with their responses—or risk losing work.
On the Clock is Motherboard's reporting on the organized labor movement, gig work, automation, and the future of work.

Like many other workers on Amazon's Mechanical Turk platform, Riley does the grueling and repetitive work of labeling images used to train artificial intelligence systems. Last year, he was one of the more than 6,000 microworkers assigned to help train ArtEmis, an algorithmic system that analyzes the emotional impact of more than 50,000 artworks from the 15th century to the present.

But this process—which prompted microworkers to respond to artworks with subjective labels like “excitement”, “amusement”, “fear”, “disgust”, “awe”, “anger” or “sadness”—was highly skewed toward producing certain results. Often, these low-paid workers  on platforms such as Mechanical Turk are compelled to submit answers that fall in line with the majority—or risk losing jobs by deviating from the norm.

Advertisement

“To be honest, a lot of it felt very forced,” Riley told Motherboard. “There were many images that were just formless blobs or of basic objects. It was quite a stretch to come up with emotions and explanations at times.”  

Motherboard granted Riley and several other microtask workers anonymity because they feared retaliation and losing job opportunities on the platforms. Others approached for interviews cited non-disclosure agreements (NDAs) they had signed. 

While data annotations can themselves be affected by the individual unconscious biases of the workers, majority thinking can be perpetuated by employers or the platforms themselves, with the threat of a ban or rejection looming if peoples’ answers deviate too strongly from the majority. 

“If your answers just differ a little too much from everybody else, you may get banned,” said Sarah, who labels datasets for the Germany-based platform Clickworker and the Massachusetts-based Lionbridge. Sarah lives in a politically repressive country and finds the income from Clickworker essential to her livelihood. 

“I sometimes find myself thinking like, I think this is a wrong answer ... but I know that if I say what I really think I will get booted from the job, and I will get bad scores,” said Sarah. "And I'm like, okay, I will just do what they want me to do. Even though I think it's a shitty choice.” 

Advertisement

A paper published in February by researchers at Cornell, the Universite de Montreal, the National Institute of Statistical Sciences, and Princeton University, highlighted that “many of these workers are contributing to AI systems that are likely to be biased against underrepresented populations in the locales they are deployed in.” It further noted that “the development of AI is highly concentrated in countries in the Global North for a variety of reasons” (abundance of capital, well-funded research institutions, technical infrastructure). 

Riley said that the Mechanical Turk workers eligible to participate in the ArtEmis study “must be located in Australia, USA, Great Britain, or Canada,” among other requirements. Its output will therefore be dramatically skewed towards the Global North, and responses across the world may differ drastically. Previous studies have shown that 75 percent of Mechanical Turk workers are from the US. 

Mechanical Turk doesn’t stand alone in the market. Other platforms run by companies like Appen, Clickworker, IBM, and Lionbridge all rely on a similar business model. These platforms are known for recruiting low-pay remote workers, and have seen a boom in worker availability since the beginning of the pandemic. Many are expected to continue growing as they provide services for tech giants such as Google. 

Advertisement

When contacted by Motherboard, Clickworker confirmed Sarah's claims that the majority answers prevail when workers label data. “Usually tasks where answers have to be given are completed by three or more different Clickworkers (depending on the customer's request and quality requirements),” said Ines Maione, Clickworker's marketing manager, in emailed comments sent to Motherboard. “The answers are compared by the system automatically and the correct one can be ensured by majority decision.”

However, experts warn that by using these methods, platforms like Clickworker and Mechanical Turk are effectively encouraging conformity. 

“You're teaching people to be compliant,” said Dr. Saiph Savage, director of the Human Computer Interaction Lab at West Virginia University and co-director at the National Autonomous University of Mexico’s Civic Tech lab. “You're also affecting creativity as well. Because if you're outside the norm, or proposing new things, you're actually getting penalized.”

Some microworkers recognize the potential issues with majority decisions and deliberately take some time clarifying their answers despite recognizing the lack of financial incentives to do so. In some cases the median pay is less than $3 per hour, so there are incentives to complete as many jobs as possible in a short space of time. “[Mechanical Turk] are really good about including an ‘anything else’ section at the end of each survey,” said Robert, another employee based in the US. “I sometimes use this section to further explain my position. It does tend to extend your survey time and may not make sense on the financial end,” he added. 

Alexandrine Royer, an educational program manager at the Montreal AI Ethics Institute, recently highlighted the importance of regulating microwork, or what she termed “ghost work”, partly on account of this issue. In an article published by the Brookings Institute, she noted that digital workers on the Mechanical Turk marketplace can spend hours labelling pictures they deem offensive, based on their own judgement calls. “It is difficult to quantify the bias that creeps in due to workers' own predispositions and interpretations. Yet, these workers perform tasks that can be highly subjective and culturally-dependent,” she told Motherboard. 

Advertisement

Savage recognized the potential for deeply entrenched societal homophobia to impact data annotation tasks too. During an ongoing study around YouTube, her colleagues noticed that some videos were being censored that didn’t seem to violate any of YouTube’s terms of service. “It was basically videos that were related to LGBTQ+ content," she said. “YouTube doesn't ban LGBTQ+ content, but what was happening was that the workers that they had hired to oversee what content gets banned or not banned came from countries where being LGBTQ+ is considered against the law; those workers had their own biases that were influencing how they were labelling content.”

Sarah is often responsible for rating the "featured snippets" that appear in the box as the first answer at the top of search engines. Robert, another microtask worker, has been involved in a wide variety of tasks from “labeling the people in a park" to  "talking to chat bots in order to familiarize programs with human interaction,” to slightly stranger tasks such as “ranking photos of human stool samples,” he said. However, some jobs have been downright disturbing, and the data annotation occasionally “exposes workers to graphic and violent images and has been linked to cases of post-traumatic stress disorder,” said Royer. 

Advertisement

A 2018 report by the United Nations' International Labor Organization (ILO) on microtask workers highlighted the shockingly low pay, tedious nature of the work, and the apparent negative impact on future employment prospects. It calculated that across five different platforms, a worker earned as little as $4.43 per hour in 2017—not taking into account the unpaid “invisible” work involved such as searching for tasks, taking unpaid qualification tests, researching clients to mitigate fraud, and writing reviews, all of which can be very time consuming. “Median earnings were lower, at just $2.16 per hour,” the report stated. 

The compensation is “probably minimum wage if you work through [the tasks] fast,” said Riley, the ArtEmis task worker. “...I can not advocate enough that this is NOT a primary job and do not listen to ANYONE who says otherwise.” 

Workers who label data for machine learning projects can also have their assessments rejected by clients, which means they may not even be paid. “If a requester decides to reject your work, there is no way to contest this and have them make a fair ruling. This is completely up to the requester and you basically did their work for free if they decide to be dishonest,” according to one Mechanical Turk worker cited in the ILO report. 

“Rejecting work means that workers will have completed tasks for an employer, but they're not going to get paid for it,” said Savage. “Being rejected stays forever on the record of the worker. And so, for instance, I had a worker who mentioned that she was rejected [after] she did over 1,000 tasks for an employer.” The worker was completely unable to clear their record and lost subsequent jobs and future opportunities as a result, but had nobody to whom they could complain. However, Amazon has reputedly started to change this by creating reputation systems for the employers themselves.  

Amazon did not respond to Motherboard's request for comment. 

In the end, microtask workers are unlikely to see the products of their labor, even though automated systems are projected to boost rates of profitability by an average of 38 percent by 2035. “The labour of these unseen workers generates massive profits that others capture,” stated the February study. It specifically cited the organization behind the commonly-used  ImageNet database, which pays workers only around a median of $2 per hour—presumably not a massive incentive to provide detailed, nuanced, and reflective annotations. Ultimately, “a deep chasm exists between workers and the downstream product,” the study concluded. “The exclusion of those from communities most likely to bear the brunt of algorithmic inequity only stands to worsen.”