Facebook Doesn’t Know What It Does With Your Data, Or Where It Goes: Leaked Document

Facebook is facing what it describes internally as a “tsunami” of privacy regulations all over the world, which will force the company to dramatically change how it deals with users’ personal data. And the “fundamental” problem, the company admits, is that Facebook has no idea where all of its user data goes, or what it’s doing with it, according to a leaked internal document obtained by Motherboard.

“We’ve built systems with open borders. The result of these open systems and open culture is well described with an analogy: Imagine you hold a bottle of ink in your hand. This bottle of ink is a mixture of all kinds of user data (3PD, 1PD, SCD, Europe, etc.) You pour that ink into a lake of water (our open data systems; our open culture) … and it flows … everywhere,” the document read. “How do you put that ink back in the bottle? How do you organize it again, such that it only flows to the allowed places in the lake?”

Videos by VICE

(3PD means third-party data; 1PD means first-party data; SCD means sensitive categories data.)

“We can’t confidently make controlled policy changes or external commitments such as ‘we will not use X data for Y purpose.’ And yet, this is exactly what regulators expect us to do”

The document was written last year by Facebook privacy engineers on the Ad and Business Product team, whose mission is “to make meaningful connections between people and businesses,” and which “sits at the center of our monetization strategy and is the engine that powers Facebook’s growth,” according to a recent job listing that describes the team.

This is the team that is tasked with building and maintaining Facebook’s sprawling ads system, the core of the company’s business. And in this document, the team is both sounding an alarm, and making a call to change how Facebook deals with users’ data to prevent the company from running into trouble with regulators in Europe, the US, India, and other countries that are pushing for more stringent privacy constraints on social media companies.

“We do not have an adequate level of control and explainability over how our systems use data, and thus we can’t confidently make controlled policy changes or external commitments such as ‘we will not use X data for Y purpose.’ And yet, this is exactly what regulators expect us to do, increasing our risk of mistakes and misrepresentation,” the document read. (Motherboard retyped the document from scratch to protect a source.)

In other words, even Facebook’s own engineers admit that they are struggling to make sense and keep track of where user data goes once it’s inside Facebook’s systems, according to the document. This problem inside Facebook is known as “data lineage.”

In the last few years, regulators all over the world have tried to limit how platforms like Facebook can use their own users’ data. One of the most notable and significant regulations is the European Union’s General Data Protection Regulation (GDPR), which went into effect in May 2018. In its article 5, the law mandates that personal data must be “collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes.”

What that means is that every piece of data, such as a user’s location, or religious orientation, can only be collected and used for a specific purpose, and not reused for another purpose. For example, in the past Facebook took the phone number that users’ provided to protect their accounts with two-factor authentication and fed it to its “people you may know” feature, as well as to advertisers. Gizmodo, with the help of academic researchers, caught Facebook doing this, and eventually the company had to stop the practice.

According to legal experts interviewed by Motherboard, GDPR specifically prohibits that kind of repurposing, and the leaked document shows Facebook may not even have the ability to limit how it handles users’ data. The document raises the question of whether Facebook is able to broadly comply with privacy regulations because of the sheer amount of data it collects and where it flows within the company.

A Facebook spokesperson denied that the document shows the company is not complying with privacy regulations.

“Considering this document does not describe our extensive processes and controls to comply with privacy regulations, it’s simply inaccurate to conclude that it demonstrates non-compliance. New privacy regulations across the globe introduce different requirements and this document reflects the technical solutions we are building to scale the current measures we have in place to manage data and meet our obligations,” the spokesperson said in a statement sent via email.

In regards to the ink in a lake analogy, the spokesperson said “this analogy lacks the context that we do, in fact, have extensive processes and controls to manage data and comply with privacy regulations.”

Do you have more information about how Facebook handles user data? You can contact Lorenzo Franceschi-Bicchierai securely on Signal at +1 917 257 1382, Wickr/Telegram/Wire @lorenzofb, or email lorenzofb@vice.com

A former Facebook employee, who asked to remain anonymous to avoid retaliation, reviewed the document for Motherboard and called it “blunt.”

“Facebook has a general idea of how many bits of data are stored in its data centers,” he said in an online chat. “The where [the data] goes part is, broadly speaking, a complete shitshow.”

“It is a damning admission, but also offers Facebook legal cover because of how much it would cost Facebook to fix this mess,” he added. “It gives them the excuse for keeping that much private data simply because at their scale and with their business model and infrastructure design they can plausibly claim that they don’t know what they have.”

“The where [the data] goes part is, broadly speaking, a complete shitshow.”

Privacy experts who have been fighting against Facebook in an attempt to limit how the company uses private data say they believe the document is an admission that it cannot comply with regulations.

“This document admits what we long suspected: that there is a data free-for-all inside Facebook, and that the company has no control whatsoever over the data it holds,” Johnny Ryan, a privacy activist and senior fellow at the Irish Council for Civil Liberties, told Motherboard in an online chat. “It is a black and white recognition of the absence of any data protection. Facebook details how it breaks each principle of data protection law. Everything it does to our data is illegal. You’re not allowed to have an internal data free-for-all.”

Facebook also made two employees available to discuss how it handles data internally. In the call, company representatives told Motherboard that Facebook is trying to get ahead of more privacy laws and building infrastructure to meet the requirements it may face. That means investing in tools that make analyzing user data and figuring out where it can or cannot go more automated, and less reliant on humans being involved in the process, as it is today. The representatives said that to get to that point there will need to be significant investments, but that this is a priority for the company. They also said that Facebook at this point does not have technical control over every piece of data. But it already has some mechanism to manage user data such as an opt out flag that goes along with data that the user has opted out of using for advertising, and that follows the data making it clear it can’t be used for certain purposes, they said.

Screen Shot 2022-04-25 at 3.52.34 PM.png — A representation of Facebook’s “data lake,” created by Facebook’s engineers.

Jason Kint, CEO of Digital Content Next, a trade organization that represents journalism publishers and an outspoken critic of Facebook, said that “consumers and regulators would and should be shocked at the magnitude and disorder of the data inside of Facebook’s systems.”

Kint said the ink in the lake metaphor shows that Facebook can’t keep track of the “source and purpose” of the user data it collects. Kint is referring to GDPR’s article 5, which sets a principle known as “purpose limitation.”

This principle, according to Ryan, means companies like Facebook need to be able to tell users and regulators what they are doing with every specific piece of data and the specific reason they are collecting it. For example, if you provide your religious orientation for your Facebook bio, that shouldn’t be used to target you with ads.

The principle of purpose limitation was created to protect people’s privacy. In 2020, Ryan sued Google in Ireland, accusing the tech giant of violating this principle with its “several hundred processing purposes that are conflated in a vast, internal data free-for-all.”

Ravi Naik, Ryan’s lawyer in that case and a privacy expert himself, told Motherboard that if regulators consider Facebook in violation of GDPR, the company could not only face administrative fines of up to 4 percent of its global revenue, but also open the door for the regulators to order the company to stop processing data in a certain way. Individual users could also sue Facebook requesting to tell them what it does with their data, like Naik and Ryan are doing with Google.

The leaked document also refers to a new, unreleased, product called “Basic Ads,” which the document authors refer to as a “short term” response to requirements of regulations around the world.

“When launched, Facebook users will be able to ‘opt-out’ from having almost all of their 3P [third party] and 1P [first party] data used by Ads systems – page likes, posts, friends list, etc,” the document reads.

The document said that Basic Ads “needs to be launch-ready in Europe by January, 2022.”

As of this writing, Facebook has yet to launch Basic Ads, showing that the company is late to the deadline its own employees established.

Facebook declined to comment on basic ads.

Company representatives said that the name is an internal codename, and that the product will show that Facebook can build advertising that is relevant to users while preserving their privacy.

Subscribe to our cybersecurity podcast, CYBER. Subscribe to our new Twitch channel.