Mark Zuckerberg's Information Monopoly

Facebook just acquired WhatsApp for $19 billion, a crazy amount of money to spend buying a company with scant revenue—but Mark Zuckerberg isn't buying a business; he's trying to corner the market on data.

Illustration by James Harvey

Facebook’s recent assimilation of WhatsApp—a company I’m too old and too uncool to have heard of—for more money than the contents of Scrooge McDuck’s swimming pool has raised a lot of eyebrows. How can WhatsApp possibly be worth $19 billion when it brings a relatively paltry $1 per user per year? Why has Facebook CEO and tech boy-king Mark Zuckerberg gone around town telling everyone he got a deal? Well, what a lot of intelligent people don't seem to understand is that it's not about the revenue. The warbling of the punditocracy in recent days has been ridiculous. Their complaints are a bit like a committee of audience members explaining to Kasparov that he should be playing his knight on the double-letter score.

Mark Zuckerberg doesn’t care about WhatsApp’s revenue, because the revenue isn’t important. What is important are the half billion users whom Facebook can now link together and make money from. As Zuckerberg and others have pointed out since, there’s basically no service that’s ever grown that large and not been valuable.

As with the personal genomics firm 23AndMe, which I wrote about a few months ago, the game here is data. It’s tedious and banal to point out—again—that we live in an information economy, but it’s very much true. The tech companies with the biggest databases are king, whether it’s Google’s copies of the internet, Facebook’s billion users, or 23andMe’s unparalleled collection of gene sequences.

But it’s not just data that these companies are hoarding—increasingly, the means of analyzing and understanding that data are being taken over too. The field of "deep learning" (a hot topic in artificial-intelligence circles) is one of the most striking examples.

What is deep learning? Well, let’s say you have 10,000 photos. Of those, 5,000 are of cats and 5,000 are of lizards, and you're trying to train a computer to look at some pictures of cats and lizards that it hasn't seen before and tell the difference between them. A machine set up for the "shallow learning" approach might convert all those images into rows of numerical data, label each row cat or lizard, and feed the whole lot into a classifier that tries to figure out a way to split the cats from the lizards.

It works, and sometimes it works pretty well, but it’s not really how a human brain works. We don’t have a bit of our brain that detects cats or lizards. Instead, we break the problem down and learn individual parts of it. We see four legs, swiveling eyes, scales, the color green, whatever, and we understand a lizard as being made up of those features. Because we understand the assemblage of qualities that make up a creature, we’re better at identifying them in an image that we’ve never seen before. With shallow learning, often a machine is trying to distinguish between cats and lizards when it doesn’t even know the difference between scales and fur.

The idea behind deep learning is that instead of explicitly teaching the algorithm "cats vs. lizards," you allow computers to learn those simpler components and then build on them, the way a child would learn first sounds, then words, then complete sentences. It’s an approach that’s proven remarkably effective, and it has the potential to transform many of the algorithms that power our day-to-day experiences on the net, from a search engine that can understand the web pages it crawls to a photo-sharing site that can recognize the faces of your friends in the photos you upload, to a street-view service that can read the numbers on people’s front doors.

Deep-learning techniques have been around for 20 or 30 years, but they’re computationally expensive and require large amounts of data, and so it’s only in the last few years that their potential has begun to be realized. Suddenly, deep-learning experts are in huge demand. That much was made clear by another crazy-looking acquisition in recent weeks: Google’s purchase of the London-based startup DeepMind Technologies for $400 million—a bauble the search engine giant had to beat out Facebook to acquire. Only two years old, with a few dozen employees, DeepMind seems to have functioned almost like a radical recruitment agency, drawing some of the world’s leading AI talent onto one team, and then selling the group to Google as a ready-made center of excellence.

The price was so high because world-class experts in the field are scarce, and Google is hoarding a scarily large percentage of them. Peter Norvig, an AI legend and director of research at Google whose genius is only matched by the shittiness of his personal website, was quoted as saying recently that Google employs “less than 50 percent but certainly more than 5 percent” of the world’s supply of machine-learning experts. Factor Facebook, Apple, Microsoft, Netflix, and other big tech companies into the equation, and what we’re seeing is a cutthroat, expensive race for a shrinking number of experts who could hold in their hands not just the future of the internet but our ability to analyze massive data sets in science.

But is this monopoly of data and expertise healthy? What would it mean for the future of AI research if half of the world’s experts were cooped up in the same Silicon Valley pen, assimilated en masse by a company seeking to protect and preserve its position as the King of the Internet? One corporation could have the ability to dictate the course of an important area of science for decades to come, with results that could be either miraculous or catastrophic.

The most remarkable thing about the response to Facebook’s acquisition of WhatsApp is that pundits seemed more worried about Mark Zuckerberg’s bank balance than the potential implications for the public of one company controlling such a vast amount of user data. It’s as if people are so convinced that it’s an outlandish bit of excess between two faddish companies that they’ve ignored the bigger picture—that a company with a billion users is merging its data with a company with half a billion users.

In the world of conventional business, a single entity that captured 25 percent of market revenue would be considered a monopoly. Facebook had 1.19 billion users even before this acquisition, accounting for more than a third of all people on the internet. Its power is so profound that a minor tweak to the news feed algorithm shrank the traffic of aggregation master Upworthy by 46 percent nearly overnight. People's incomes and careers can be affected by something as simple as that, yet nobody seems to be asking whether this is a safe idea.

In centuries past, economists and governments came to understand the need to control economic monopolies for the greater good, to prevent stagnation and avoid unhealthy concentrations of power. In the 21st century, will we come to a similar realization about information monopolies?

Follow Martin on Twitter.