China Is Achieving AI Dominance by Relying on Young Blue-Collar Workers
To remain the world leader in artificial intelligence, China relies on young “data labelers” who work eight hours a day processing massive amounts of data to make computers smart.
Employees at Jun Peng Technology, the only data labeling shop in Minquan, a town of 318,000 in Henan.
MINQUAN, CHINA—Zhou Junkai’s office sits on the edge of Dongsha river, a staid body of water that divides the old and new sections of Minquan, a town of 318,000 in central China’s Henan province. It is here that Zhou, 19, founded his small shop of data labelers along with his 26-year old cousin this summer.
The office of Jun Peng Technology company’s office is a rented traditional courtyard home, found in China’s rural areas. These homes are large and two or three stories tall, unlike the ubiquitous apartment towers seen across China. Behind the house, a man is raking dead leaves on a plot of land that Zhou said is still used for crops.
Inside, the only warm room is the office, where a dozen young people sit in front of wide, glowing screens. The screens and the fluorescent lights do little to brighten the room on a November day where the pollution levels have blocked the sun with a dense smoggy haze.
The young people are “data labelers,” people who sit in front of computers for eight hours a day and click on dozens of photos, outlining backgrounds, foregrounds, and specific objects, all according to the specifications of a client who is working on artificial intelligence. Some may label medical scans; others, photos of landscapes and trees; and still others, pictures of the road for a driverless vehicle. This is the data given to artificial intelligence algorithms to learn to “see.” The artificial intelligence industry relies on this cheap, human labor as algorithms and “machine learning” are in many cases trained by real people.
Artificial intelligence requires large amounts of data to learn and discern patterns, whether those are pictures, audio or text as they interpret media differently from humans. To teach the algorithms how to accurately recognize an apple is an apple, it needs thousands to millions of pictures of apples. Further, it’s easily fooled. In one experiment, security researchers found that by distorting a picture of a school bus, although the change was invisible to the human eye, the artificial intelligence system could no longer recognize that it was a school bus.
Money is flowing into China’s artificial intelligence industry, and few places illustrate that better than Henan. In a province that just a few years ago was known for its Foxconn plant (which makes Apple products) and electronics factories, its towns now boast offices of workers who are doing the laborious input work that makes computers smart.
Last year, venture capitalists poured $5 billion into AI startups in China, which raised more money in the sector than the United States for the first time, according to consultancy AIB research. The Chinese government has made the field a priority, announcing an ambitious policy the same summer to construct an industry worth $150 billion by 2030.
AI is also one of the ten key industries outlined in Made in China 2025, an economic master plan that the government is pushing to take the country from a mass manufacturing, low-value economy to a high technology, high-value one. China is now home to Sensetime, the world’s most valuable AI company, which focuses on facial and image recognition and works with local governments across the country on surveillance. It’s worth an estimated $4.5 billion, according to research firm CB Insights.
But in an echo of the manufacturing factories that pushed China’s economic development in the 2000s, the country has also found itself home to a growing side industry of labor-intensive data labeling companies, which supply and process the massive amounts of data for the algorithms to consume. Aside from a few established large firms in China’s biggest cities, these companies are mainly growing in smaller cities, towns, and rural areas.
Zhou had the idea of setting up shop after seeing a number of similar outfits in the town of Pingding Shan, a few hours west. Together, the cousins pooled together their family’s years of savings ($45,000) to buy a few dozen computers and rent an office space. They are, as far as they know, the only ones in Minquan.
“You can’t expect people who have such high salaries to do this labor-intensive work”
Zhou landed in the industry after graduating as a mechanic from trade school and had been searching for something to do. The possibilities were finite.
“If you don’t know what you’ll do in the future, you can either go to a big city, and be a white-collar worker and then everyday you’re squeezed onto public transport,” he said. “As for other [fields], if you want to be No. 1, you need a lot of knowledge, experience, and education. These are things we don’t have.”
It was difficult to find a job as a car mechanic, he said. He worked in a factory briefly and then quit. Those shifts were grueling—14 hour days.
“I thought I couldn't stand it anymore,” he said. But “this industry felt like it had potential.”
Many are flocking to the data labeling industry now, said Han Jinhao, who started his data labeling company a little more than a year ago in Zhengzhou, the capital city of Henan province. His company, Dianwokeji, employs more than 100 data labelers.
“Even though labeling is rather low-level work, the barrier to entry is relatively low, and it is still the AI industry,” he said. “So we thought if we can start from here, we can slowly, step by step, move towards something more high-value.”
Han counts more than 6,000 data labeling outfits that have registered on a Craigslist-like platform he built, where smaller outfits can find outsourcing gigs and hire new employees.
Zhao Mengyao, 18, is new to the job. She started working at Zhou’s company in October. When I visit the office she is tracing over the white lines of a parking space in a parking lot. The picture is distorted, with the lines bent as if the camera had a fish-eye lens, but she mouses over them with ease. After 20 minutes, Zhao moves on to the next photo in her set. It’s another photo of a parking lot, from a different angle.
Next to her, a young man draws around the fluffy edges of a singer’s orange dress, pixel by pixel. After that, he starts tracing the outline of a man playing golf.
Zhao used to be a makeup artist at a wedding portrait studio, but quit because she found the work exhausting. There were days where she had to wake up at 4 AM to prepare for the client’s shoot and would get home by 7 PM.
Now, she says, she starts at 8 AM and leaves at 6 PM, with an hour and a half off in between. During the lunch break, Zhao and her co-workers trade snappy comments as they play games at the same consoles where they labeled photos.
“I think this is pretty good. There’s a lot of freedom working here,” she said.
Zhao says the salary is okay. She gets paid per set of 20 pictures, at about 20 RMB (roughly $3). She can finish anywhere between four and eight sets, or 80 to 160 pictures, a day. When I asked her where she thought the pictures would go, she said she didn’t know.
The seven data labelers I talked to received monthly salaries from around 2,000 RMB ($290) to 4,000 RMB ($580). This is on par with a Chinese worker’s average disposable income or their take home income after taxes, which in 2017 was 2,164 RMB (or about $330). “There are many jobs available at this kind of salary in Zhengzhou,” said Wang Yushuang, a 25-year-old Dianwokeji employee.
The standard for teaching AI photo recognition is to use images from ImageNet, a database of more than 14 million images created by Stanford University professor Li Fei-Fei and her team. The database relies on Amazon’s Mechanical Turk, which outsources labor-intensive tasks such as labeling photos for a few cents to internet users.
But as businesses around the world are racing to find applications for artificial intelligence across industries ranging from driverless vehicles to medical diagnostics, ImageNet and Mechanical Turk are proving to be insufficient.
“Do you think mankind will let something that’s not even alive control mankind?”
A healthcare business that helps provide more accurate diagnoses needs very detailed points to help the artificial intelligence learn the difference, say, for example, between a tumor and an eyeball in a CT scan, because it wouldn’t be able to distinguish them on its own at first, Peter Yang, the founder of data labeling company Awakening Vector, told me by phone. It needs data that points out what a tumor looks like in a picture, across many different pictures, which requires a human to click and label the photo.
But most AI startups only have a few full-time employees, usually data scientists, Yang said.
“It’s something that requires a lot of physical labour.” Yang said. “You can’t expect people who have such high salaries to do this labor-intensive work, so you have to outsource this.”
Further, there are issues of privacy and quality control. Medical images need to be kept private, for example. Mechanical Turk tasks are performed by any registered user who wants to earn money, not employees with a dedicated salary who work Monday through Friday.
Outsourcing has meant that these businesses are now popping up all over China. Yang’s business is based in China’s Xinjiang Uyghur Autonomous Region and clients include Baidu, China’s main search engine, and Novartis, the multinational pharmaceutical company. Han’s company, which serves Chinese firms, such as a few driverless vehicle startups, has branches in smaller cities across Henan and neighboring Shandong province.
Conventional wisdom goes that with more advanced technology, those with low-skilled jobs will lose the most. Academic research has mostly backed that up. But it doesn’t mean that technology will necessarily replace all jobs.
Historical research shows that automation has led to a job boom, James Bessen, the executive director at Boston University’s Technology and Policy Research Initiative, told me. He pointed to the textile industry as an example.
In the early 19th century, most people only had one set of clothing because of the cost of cloth, Bessen said. But as technology increased, and certain tasks became automated and lowered the cost of creating clothing, the demand for cloth grew. More clothes led to more jobs. Although the textile industry was considered “low-skilled,” as it expanded dramatically in size, it also brought on new workers who had to learn to operate complicated machinery. While jobs were outsourced to developing countries, there was no net loss of jobs. It is only when demand is satisfied that the number of jobs start declining.
China, for now, is cheap relative to the US and it has the labour to take advantage of this.
The work is also expanding beyond picture labeling. Many companies are also paying for sound recognition, video labeling, and even raw data. Zhou and his team have collected children’s voice recordings, or people speaking a Henan dialect.
For some workers, there’s a distinct sense of pride at being a part of a new industry. “We are doing something very basic, but we are [also] a very important part of it, helping the robots learn and see a bunch of data,” said Wang.
What happens when one day the algorithms have learned to recognize things on their own? Will the tens of thousands of low-skilled in AI lose their jobs?
Han seems unconcerned. “If it's really at that stage, then maybe humans won't be alive anymore. Do you think mankind will let something that’s not even alive control mankind? We would only teach it to serve us. I wouldn’t teach it so well that one day I serve the machine.”
Listen to CYBER, Motherboard’s new weekly podcast about hacking and cybersecurity.