Microsoft’s GitHub Copilot sued over “software piracy on an unprecedented scale”

GitHub code on a dark background
(Image credit: Shutterstock)

Microsoft’s GitHub Copilot is being sued in a class action lawsuit that claims the artificial intelligence product is committing software piracy on an unprecedented scale.

The case was launched on 3 November by Matthew Butterick, a designer and programmer, along with the Joseph Saveri Law Firm to investigate GitHub Copilot. The team has filed a class action lawsuit in the San Francisco federal court on behalf of potentially millions of GitHub users.

The lawsuit seeks to challenge the legality of GitHub Copilot, as well as OpenAI Codex which powers the AI tool, and has been filed against GitHub, its owner Microsoft, and OpenAI.

GitHub and OpenAI launched Copilot in June 2021, an AI-based product that aims to help software coders by providing or filling in blocks of code using smart suggestions. It charges users $10 per month or $100 a year for its service.

“By train­ing their AI sys­tems on pub­lic GitHub repos­i­to­ries (though based on their pub­lic state­ments, pos­si­bly much more), we con­tend that the defen­dants have vio­lated the legal rights of a vast num­ber of cre­ators who posted code or other work under cer­tain open-source licences on GitHub,” said Butterick.

These licences include a set of 11 popular open source licences that all require attribution of the author’s name and copyright. This includes the MIT licence, the GNU General Public Licence, and the Apache licence.

The case claimed that Copilot violates and removes these licences offered by thousands, possibly millions, of software developers, and is therefore committing software piracy on an unprecedented scale.

RELATED RESOURCE

Big payoffs from big bets in AI-powered automation

Automation disruptors realise 1.5 x higher revenue growth

FREE DOWNLOAD

Copilot, which is entirely run on Microsoft Azure, often simply reproduces code that can be traced back to open-source repositories or licensees, according to the lawsuit. The code never contains attributions to the underlying authors, which is in violation of the licences.

“It is not fair, permitted, or justified. On the contrary, Copilot’s goal is to replace a huge swath of open source by taking it and keeping it inside a GitHub-controlled paywall. It violates the licences that open-source programmers chose and monetises their code despite GitHub’s pledge never to do so,” detailed the class-action complaint.

Moreover, the case stated that the defendants have also violated GitHub’s own terms of service and privacy policies, the DMCA code 1202 which forbids the removal of copy­right-man­age­ment infor­ma­tion, and the California Consumer Privacy Act.

“As far as we know, this is the first class-action case in the US chal­leng­ing the train­ing and out­put of AI sys­tems,” said Butterick. “It will not be the last. AI sys­tems are not exempt from the law. Those who cre­ate and oper­ate these sys­tems must remain account­able. If com­pa­nies like Microsoft, GitHub, and OpenAI choose to dis­re­gard the law, they should not expect that we the pub­lic will sit still.

“AI needs to be fair and eth­i­cal for every­one. If it’s not, then it can never achieve its vaunted aims of ele­vat­ing human­ity. It will just become another way for the priv­i­leged few to profit from the work of the many,” he added.

When asked for comment, GitHub highlighted that it had announced on 1 November that it’s set to bring in new features to the Copilot platform in 2023.

Whenever the tool suggests a code fragment, it’s hoping to provide developers with an inventory of similar code found in GitHub public repositories as well as the ability to organise the inventory by filters like the commit date, repository licence, and more.

IT Pro has contacted Microsoft and OpenAI for further comment.

In October 2022, developer Tim Davis, professor of computer science at Texas A&M University, wrote on Twitter that GitHub Copilot had emitted large chunks of his copyrighted code, with no attribution to him.

Davis added that he could probably reproduce his entire sparse matrix libraries from simple prompts, aiming to underline the similarity between his work and what the AI tool produced.

“The code in question is different from the example given. Similar, but different. If you can find a way to automatically identify one as being derivative of the other, patent it,” responded Alex Graverly on Twitter, creator of GitHub Copilot.

This comes at a time when Microsoft is looking at developing Copilot technology for use in similar programmes for other job categories, like office work, cyber security, or video game design, according to a Bloomberg report.

Microsoft's chief technology officer revealed that the tech giant will build some of the tools itself, while others will be provided by its customers, partners, and rivals.

Examples of what the technology could do include helping video game creators make dialogue for non-playable characters, while the tech giant’s cyber security teams are investigating how the tool can help combat hackers.

GitHub did admit that in some cases Copilot can produce copied code, with the current version of the tool aiming to prevent suggestions that match existing code in public repositories.

Zach Marzouk

Zach Marzouk is a former ITPro, CloudPro, and ChannelPro staff writer, covering topics like security, privacy, worker rights, and startups, primarily in the Asia Pacific and the US regions. Zach joined ITPro in 2017 where he was introduced to the world of B2B technology as a junior staff writer, before he returned to Argentina in 2018, working in communications and as a copywriter. In 2021, he made his way back to ITPro as a staff writer during the pandemic, before joining the world of freelance in 2022.