Image: Getty Images
On Wednesday, OpenAI and Microsoft were sued in a class action lawsuit seeking $3 billion in damages for allegedly stealing “vast amounts of private information” from internet users without consent in order to train ChatGPT. This lawsuit, which was filed on June 28 in federal court in San Francisco, CA, and includes sixteen anonymous plaintiffs, claimed that OpenAI secretly “scraped 300 billion words from the internet” without registering as a data broker or obtaining consent. Microsoft is OpenAI's main customer and corporate partner, licensing AI technology from the company for billions of dollars.
It also claimed that the companies continue to “unlawfully collect and feed additional personal data from millions of unsuspecting consumers worldwide…in order to continue developing and training the products” referring to the information that is fed into AI models as prompts. The lawsuit compares OpenAI to another AI firm that made headlines for scraping people's information from the internet without their explicit consent: Clearview AI, which gathered social media photos in order to build a facial recognition tool widely used by police. Clearview AI was sued by multiple parties including the ACLU. The firm settled that lawsuit last year, and stopped offering its services to most private U.S. persons and businesses. "Clearview can no longer treat people’s unique biometric identifiers as an unrestricted source of profits,” Nathan Freed Wessler, a deputy director of the ACLU’s Speech, Privacy, and Technology Project, said at the time.The lawsuit cited popular AI tools developed by OpenAI and used by Microsoft including language models GPT 3.5 and 4.0, image model Dall-E, and text-to-speech model Vall-E. It lists the plaintiffs' internet activity over the years, saying they "did not consent to the use of [their] private information by third parties [to train AI] in this manner," and that the companies stole their "personal data from across this wide swath of online applications and platforms to train the products."
The data that the lawsuit alleges was stolen by OpenAI includes names, contact details, email addresses, payment information, social media information, chat log data, usage data, analytics, and cookies. "Defendants have been unjustly enriched by their theft of personal information as its billion-dollar AI business, including ChatGPT and beyond, was built on harvesting and monetizing Internet users’ personal data," the lawsuit states. "Thus, Plaintiffs and the Classes have a right to disgorgement and/or restitution damages representing the value of the stolen data and/or their share of the profits Defendants earned thereon."The lawsuit asks that OpenAI and Microsoft be enjoined from violating people's privacy, and take additional steps. Step one is to disclose what data is being collected and how it is being used. Step two, the plaintiffs wrote, is to follow a code of ethical principles and compensate plaintiffs for their stolen data. Finally, the lawsuit stated, internet users should have the right to opt out of any data collection and all illegal taking of data should stop. Previously, in November, OpenAI and Microsoft were sued in another class action lawsuit filed by GitHub programmers who alleged that GitHub Copilot, an AI coding tool owned by Microsoft, violated their open-source licenses and used their code for training without their permission. This new lawsuit comes after OpenAI had been repeatedly scrutinized for its secrecy regarding its training methods and dataset, as well as possible copyright violations. When GPT-4 was released in March, many AI researchers were vocal about the fact that withholding such information can lead to greater harm and close off any opportunity for external scientists to root out flaws and biases in the system. The lawsuit also mentions the possible “existential threat” of AI without “immediate legal intervention.” It references the recent calls to action from notable figures who have asked to either pause or regulate the proliferation of AI systems. Examples include the open letter to pause training on AI signed by experts and tech leaders including Elon Musk, and Italy’s decision to temporarily ban ChatGPT in the country over concerns that it violated the European data protection laws. Service was resumed after OpenAI added privacy controls. “The proliferation of AI—including Defendants’ products—pose an existential threat if not constrained by the reasonable guardrails of our laws and societal mores. Defendants’ business and scraping practices raise fundamentally important legal and ethical questions that must also be addressed. Enforcing the law will not amount to stifling AI innovation, but rather a safe and just AI future for all,” the lawsuit states. OpenAI did not immediately respond to a request for comment. Microsoft declined to comment.