Google has removed hundreds of thousands of links from its European search results under "right to be forgotten" laws, and every single request was submitted and evaluated manually.
This might seem archaic in the era of complex algorithms that can parse mind-boggling amounts of data in real-time, but it does make some sense. The creation of the right to be forgotten in a Spanish court case last year required that Google set up a system for people to submit links containing information they didn't want on the web, and charged Google with deciding whether to scrub the links from their search results or not. If the information in question is found to be inaccurate, outdated, or not in the public interest, Google is required to take action.
The ruling was met with criticism from advocacy groups worried that it could lead to censorship. When dealing with ethical quandaries, humans fare better than rule-driven programs.
This meatbag-centric system isn't good enough, however, according to the researchers behind Oblivion, a program that automates the process of verifying personal information found in Google search results. With such a high volume of takedown requests flooding in to Google, the humans there need some help from computers, the researchers argue in a paper published this week to the Arxiv server, a repository for papers awaiting peer review.
"Clearly, in order to enable efficient enforcement, it is essential to develop techniques that at least partly automate this process and are scalable to internet size," the authors write, "while being censorship-resistant by ensuring that malicious users cannot effectively blacklist links to internet sources that do not affect them."
How do you ensure that a program is resistant to phony takedown requests?
According to the researchers, a prototype implementation of Oblivion was able to process 278 takedown requests per second, while running on a notebook laptop. The system in practice would require several distributed parties working together—Google, the user, and potentially the government, for example—so for expediency's sake they ran all the modules on the same machine.
Even so, that's pretty impressive. But how do you make sure that the program in question is fast, as well as secure and resistant to phony takedown requests? Oblivion uses a three-tiered system to ensure this.
In the first phase, the user submits digital copies of ID to a centralized certificate authority (CA) that can confirm their attributes—name, age, nationality, and so on—and create a cryptographic signature for every attribute that guarantees its validity. The authors imagine that this authority would be a "national or government-wide CA that issues credentials to citizens." This kind of thing doesn't really exist yet, so for Oblivion to work in practice, a CA would have to be set up for nations with the right to be forgotten.
When the user submits a link to Oblivion, the software uses natural language processing and image recognition algorithms to scan the article for information that matches the attributes that have been approved by the CA. These attributes are then tagged by Oblivion.
Watch more from Motherboard: Buying Guns and Drugs on the Darknet
In the second phase, Oblivion submits the links and tagged attributes backed up by the CA's trusted signatures to a third party—say, the Google helpdesk—also running the software. On that end, the tagged attributes are matched up with what appears in the submitted article. If everything checks out, the user is issued an "ownership token" that confirms the articles they submitted contain personal information that affects them.
Finally, the third and final phase: The ownership token is submitted to Google, along with the user's explanation as to why the links should be deleted. Google staff would still need to decide whether the reason for takedown request is bullshit or not—for example, if the information is of no public importance—but at least the validity of all the information in question would already have been confirmed by Oblivion.
Oblivion has some noteworthy limitations, the researchers write: First, it can't account for information that might belong to more than one person, like a name. To solve this, it uses multiple points of information—name and date of birth, for example—to confirm someone's identity. Oblivion also can't decide on its own whether or not a piece of information is of public interest and should therefore not be removed from Google search results. Humans are still needed for that.
However, the researchers note, with algorithms that can semantically analyze an article and decide whether information is "sensitive" or not, this could change. And then, of course, there's the whole bit about needing institutions that can verify information and provide signatures.
Even if the right to be forgotten only applies to humans—for now—it could one day be administered by machines.