This story is over 5 years old.


OkCupid 'Research' That Exposed 70k Users Removed After Copyright Claim

Open Science Framework has removed the data set that was scraped from the dating site.

Last week, Danish students publicly released a dataset on nearly 70,000 users of the dating site OkCupid, including their usernames, sexual turn-ons, and orientation. Now an open science site has removed the data in response to a copyright notice sent by OkCupid, reports Retraction Watch.

"The repository is currently unavailable due to a DMCA claim sent by OKCupid. It's unclear to me which part they claim copyright on," Emil O. Kirkegaard, a masters student at Aarhus University and one of the people behind the dataset, told Retraction Watch. The Digital Millennium Copyright Act (DMCA) is a US law that companies can use to have data pulled from websites, legitimately or otherwise.


"I have no comments about the DMCA, other than to note that DMCA claims are often used as censorship tools," Kirkegaard told Motherboard in an email. He added that OkCupid had not contacted him directly about the data.

Open Science Framework is a repository for scientific papers as well as full data sets, and researchers post their data here so that others may use it to gain new insights. The students published their OkCupid dataset in the hopes that others could use it for their own work.

DMCA is often used by companies to stop the sharing of hacked data

DMCA is often used by companies to stop the sharing of hacked data. Avid Life Media, the parent company of dating site Ashley Madison, sent this reporter a DMCA notice for tweeting two cells of a company spreadsheet (Twitter did not enforce the notice). Sony tried something similar back in 2014. Although this OkCupid data wasn't strictly hacked, it was still obtained and distributed by a third-party.

OSF is maintained by non-profit group the Center for Open Science, and the page where the data was hosted now says "This content has been removed."

According to OkCupid's terms and conditions, "the contents of this website are protected by copyright and may not be copied or otherwise reproduced," without written permission, and "users may not publish or create derivative works from the contents of this website for any public or commercial purposes."

The data was collected using a scraper—an automated tool for saving information from a website—between November 2014 and March 2015. Although the dump didn't include the real names of users (unless customers had registered it as their username), academics said they could reverse-engineer the dump and identify individuals, along with their sexual preferences and desires. OkCupid told Motherboard the scraping the site violates its terms of service.

The publication triggered a fierce debate around the ethics of collecting and redistributing public or semi-public data, and Aarhus University posted a series of tweets, distancing itself from the research and its authors. OkCupid is reportedly checking if its lawyers might have some work to do. "This is a clear violation of our terms of service—and the Computer Fraud and Abuse Act—and we're exploring legal options," OkCupid spokesperson Matthew Traub told Vocativ.

Kirkegaard previously referred to his critics as "social justice warriors," in an email to Motherboard.

Neither OkCupid or the Center for Open Science immediately responded to a request for comment.