Using artificial intelligence, researchers have created a tool that crawls privacy policies on popular websites like Facebook, Reddit, and Twitter. But the software’s findings are not as detailed as those done by humans.
Nobody actually reads through the privacy policies of every website, which is why researchers recently used artificial intelligence to create a tool that reads them for you and flags anything you might not be psyched to agree to.
Most of us don’t bother to read these policies, despite the fact that the majority of Americans feel very strongly that privacy is important. But we have a good excuse: studies have shown the average internet user would need to take a month off of work every year to read through all the privacy policies of websites they use.
“Even after you’ve read it, sometimes you need special training to fully understand the nuances of the language,” Norman Sadeh, the lead principal investigator on the project and a professor of computer science at Carnegie Mellon University, told me over the phone. “We don’t want people to read these privacy policies because that would be highly unrealistic. Instead, with technology, we can extract statements and match with things people care about.”
But it turns out even AI can’t make sense of these dense, jargon-laced documents, and might miss some important context.
“This is really pushing the envelope so it’s very hard to do this overnight,” Sadeh said. “We’re [actually] able to do more than what we’re showing but we’re trying to be careful because machine learning is not perfect and never will be.”
In a paper published alongside the project, Sadeh and his colleagues stated that when searching for entire paragraphs, the AI was able to identify relevant passages with 79 percent accuracy. Its accuracy was 70 percent when looking for individual, relevant sentences.
“It’s machine learning so you’re building classifiers and these classifiers are trained on as large of a dataset as you can get,” Sadeh said. “But obviously to have the data, you need to rely on humans annotating policies in the first place, which is a very time-consuming process.”
The researchers noted that at this stage, the AI isn’t able to parse sentences in context with preceding or following sentences. And in case you think AI might be more objective than humans, think again: bias in artificial intelligence is a real issue that researchers are still struggling with.
The team wants to be able to make Usable Privacy available as a custom browser plugin by the end of the year, Sadeh told me. As this kind of technology develops, he said it can be used for other headache-inducing online legalese, like terms of service, which state what a user is agreeing to in order to access the site.
Though there are plenty of valid concerns with automation, if researchers can improve the accuracy of this kind of technology, it’s a job I’m sure all of us will have no problem outsourcing to a robot.
Get six of our favorite Motherboard stories every day by signing up for our newsletter .