Our phones contain our entire lives. Appointments, photos, loves, losses—hell, even our favourite words and emojis are all stored on our devices. It's all very tantalizing for companies like Google and Apple, which are developing algorithms that can read emotions in faces or auto-complete sentences, but only after being trained on a ton of user data. The million dollar question is: How do you get that data without making a whole bunch of privacy-conscious people really angry?
Google suggested a novel solution in a new blog and research paper this week, which explain what the company is calling "federated learning." If all goes according to plan, Google will be able to train its deep learning AI on sensitive data from millions of phones without Google ever seeing the data itself.
"Beyond Gboard query suggestions, for example, we hope to improve the language models that power your keyboard based on what you actually type on your phone (which can have a style all its own) and photo rankings based on what kinds of photos people look at, share, or delete," a blog by Google scientists Brendan McMahan and Daniel Ramage states.
The idea behind "federated learning" is to train AI on your phone. Normally, AI training has to be done with all of the data sitting on the one server. But with federated learning, the data is spread across millions of phones with a tiny AI sitting on all of them, learning the user's patterns of use. Instead of the raw data being sent to a Google training server, the phone AI transmits an encrypted "update" that only describes what it's learned, to Google's main AI where it's "immediately" aggregated with the updates from every other phone. The individual update itself isn't stored, Google states.
This solves the problem of Google seeing your sensitive information, but it doesn't prevent someone from reverse-engineering what an AI's learned to glean some knowledge about the underlying training data. It's well-known that features from training data can filter up to the decisions an AI makes, especially when it comes to prejudices. To solve this problem, the Google researchers suggest that looking to something called "differential privacy" might be a next step.
Last year, Apple announced that its AI training would start using a branch of cryptography called "differential privacy," which makes relies upon a host of different mathematical techniques to render sensitive information unidentifiable. However, Wired noted at the time, the company's focus on privacy methods likely implied sending more data to Apple than ever before, even if it's obscured.
If differential privacy and federated learning could be combined, Google's research paper states, then we'd have a framework for making sure companies using AI never see actual user data, and nobody could reverse-engineer it.
But that's a research paper for another time.
Subscribe to pluspluspodcast , Motherboard's new show about the people and machines that are building our future.