Tech

GitHub Users File a Class-Action Lawsuit Against Microsoft for Training an AI Tool With Their Code

This lawsuit represents a growing concern from programmers, artists, and other people that AI systems may be using their code, artwork, and other data without permission.
GettyImages-967200020
Image: Getty Images

GitHub programmers have filed a class-action lawsuit against GitHub, its parent Microsoft, and its technology partner, OpenAI, for allegedly violating their open-source licenses and using their code to train Microsoft’s latest AI tool, called Copilot.

Advertisement

GitHub Copilot, which was launched in June, suggests code and functions to GitHub users in real time. Copilot is powered by Codex, an AI system that was created by OpenAI and licensed to Microsoft. According to OpenAI, Codex was trained on “millions of public repositories” and is “an instance of transformative fair use.” However, open-source programmers on GitHub disagree, claiming that Codex has violated their open-source licenses, which only allow non-commercial redistribution and modification of the code and often have restrictions including a requirement to preserve the name of the authors. 

Lawyer and programmer Matthew Butterick has been leading the action against Microsoft, starting a site dedicated to the GitHub Copilot investigation and teaming up with the Joseph Saveri Law Firm to file the class-action lawsuit. 

"As a longtime open-source programmer, it was apparent from the first time I tried Copilot that it raised serious legal concerns, which have been noted by many others since Copilot was first publicly previewed in 2021," Butterick said in a press release. "Because I'm also a lawyer, I felt compelled to stand up for the open-source community."

Advertisement

Other programmers who have been using Copilot have noted that it generated the incorrect license for code and produced users’ copyrighted code verbatim without proper attribution or license

“We’ve been committed to innovating responsibly with Copilot from the start, and will continue to evolve the product to best serve developers across the globe,” a spokesperson from GitHub told Motherboard when asked to comment on the lawsuit.

When GitHub was purchased by Microsoft in 2018, many users were vocal about their concern about how the largest open-source community in the world would be affected. In the late 90s and 2000s, Microsoft waged a number of campaigns against Linux, an open-source operating system, claiming that it violated 235 Microsoft patents in 2007

"I am grateful to the programmers and users who came forward to bring this case to fruition and ensure that corporations like Microsoft, GitHub, and OpenAI cannot unfairly profit from the work of open-source creators," said Joseph Saveri, the lawyer whose firm is filing the class-action lawsuit. "This case represents the first major step in the battle against intellectual-property violations in the tech industry arising from artificial-intelligence systems. In this case, the work of open-source programmers is being exploited. But this will not be the last community of creators who are affected by AI systems. Our firm is committed to standing up for these creators and ensuring that companies developing AI products are held accountable under the law."

This lawsuit represents a growing concern that programmers, artists, and other people have been vocalizing—that AI systems may be using their code, artwork, and other data without permission. Algorithms used by image-generating AI tools such as DALL-E and Stable Diffusion scrape billions of webpages of data from the internet, without factoring in if their usage is violating any ownership or licensing restrictions. Companies like Getty Images and Shutterstock have banned the use of AI images on their platforms due to copyright concerns.

Butterick says that Microsoft’s offering of Copilot as an alternative to open-source code not only violates copyright but also removes the incentive for programmers to explore open-source communities. To Butterick, Microsoft’s compartmentalization of open-source code violates the ethos of open-source programming, in which programmers often voluntarily share code with one another as part of their mutual learning and development.