OpenAI is being sued again - why is generative AI such a legal conundrum?
Legal woes continue to bedevil AI technology amid privacy and copyright concerns
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
You are now subscribed
Your newsletter sign-up was successful
OpenAI and Microsoft have both been named in a complaint filed in the Northern District of California US District Court, which alleges that the ChatGPT creator has been using personal data to train its models without first obtaining permission.
It marks the second major lawsuit launched against the company in the space of a month after it allegedly defamed a radio host by generating untrue statements about him.
The most recent legal accusations relate to the alleged non-consensual use of personal data. The plaintiffs involved have been left anonymous in the complaint, which is seeking class-action status, but $3 billion in damages was mentioned due to the “millions of class members” potentially involved.
RELATED RESOURCE
Three essential requirements for flawless data protection
Why you need a unified platform that covers all your cloud data channels
Of the defendants, the lawsuit alleges “unlawful and harmful conduct in developing, marketing, and operating their AI products, including ChatGPT-3.5, ChatGPT-4.0, Dall-E, and Vall-E”.
It goes on to say that the products listed “use stolen private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge”.
OpenAI’s models were trained by scraping data from the internet, according to the complaint.
“Despite established protocols for the purchase and use of personal information, defendants took a different approach: Theft,” it read.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
While scraping data from webpages is not a new technique, the failure to seek consent coupled with the difficulty in extracting personal information for the resulting models has resulted in legal concerns.
The complaint read: "Through their AI products, integrated into every industry,” says the complaint, “defendants collect, store, track, share, and disclose private information of millions of users". These include:
- All details entered into the products
- Account information users enter when signing up
- Name
- Contact details
- Login credentials
- Emails
- Payment information for paid users
- Transaction records;
- Identifying data pulled from users’ devices and browsers, like IP addresses and location, including geolocation of the users
- Social media information
- Chat log data
- Usage data
- Analytics
- Cookies
- Key strokes
- Typed searches, as well as other online activity data
As well as the exhaustive list of data collected from the plaintiffs, the complaint also mentioned data collected through integrations with tools such as Slack and Microsoft Teams.
Is AI facing other legal issues?
Several lawsuits have been filed related to the performance of AI technology.
One challenging the legality of GitHub Copilot is currently working its way through the legal system. At question is how Copilot, an AI pair programmer capable of serving up coding suggestions, has been trained.
It is alleged that the use of code found in public GitHub repositories has violated the rights of developers or the licences under which the repositories have been made public. An attempt to have the case dismissed in May was denied, in part, meaning the lawsuit has longer to run.
“We firmly believe AI will transform the way the world builds software, leading to increased productivity and most importantly, happier developers,” a GitHub spokesperson told ITPro.
“We are confident that Copilot adheres to applicable laws and we’ve been committed to innovating responsibly with Copilot from the start. We will continue to invest in and advocate for the AI-powered developer experience of the future.”
The training of models is not the only aspect of generative AI to attract the attention of the legal profession.
OpenAI is also currently being sued for defamation after ChatGPT provided an inaccurate legal summary in response to a query.
The case centers on a complaint filed by Florida radio host Mark Walters that ChatGPT defamed him by erroneously claiming that he had been accused of financial crimes.
The technology has also come under the scrutiny of regulators. The Italian data protection regulator recently ordered OpenAI to implement a ‘right to be forgotten’ option amid concerns that the collection and processing of personal data is in violation of GDPR.
Why is AI having these problems?
Generative AI faces two major technical challenges.
The first is sourcing training data without infringing privacy or violating restrictions. OpenAI has taken the approach in the past of scraping any available data from webpages - including data entered into its tools - into its models.
While this has enabled the company to build up a large data set, questions are being asked regarding the rights it has around this gathering and use. It is this sourcing, or scraping, of training data that forms the basis for many of the complaints.
The second is the trust given to the output of generative AI platforms, particularly where a ‘hallucination’ has occurred. This is where an AI model has insufficient data to generate an accurate output.
This type of hallucination is what triggered the defamation lawsuit, and underlines the importance of checking the output of tools such as ChatGPT.
Generative AI is a relatively new technology that has only become practical in recent years with improvements in processing power and storage. Lawmakers are therefore working to catch up and understand its implications.
Amid widely spread fears regarding the risks posed by AI are genuine concerns around privacy and the use of personal data.
As an example, staffers in the US House of Representatives were recently instructed to only use ChatGPT Plus - which features privacy controls - and not paste sensitive data into the system.
Use of the technology has also been forbidden by a number of financial institutions and enterprises amid fears that sensitive financial information or company secrets might find their way into the chatbot’s models.
OpenAI recently updated its data usage and retention policies to permit customers to opt out of data sharing, although that policy did not apply to data submitted to the API before 1 March 2023 and doesn’t apply to OpenAI’s non-API consumer services, including ChatGPT.

Richard Speed is an expert in databases, DevOps and IT regulations and governance. He was previously a Staff Writer for ITPro, CloudPro and ChannelPro, before going freelance. He first joined Future in 2023 having worked as a reporter for The Register. He has also attended numerous domestic and international events, including Microsoft's Build and Ignite conferences and both US and EU KubeCons.
Prior to joining The Register, he spent a number of years working in IT in the pharmaceutical and financial sectors.
-
AutoCAD Users may have a ransomware problem – here's what they can doIn-depth A new malware family is currently using the same file types as the professional design software AutoCAD
-
Google Workspace just got a huge Gemini updateNews Google is targeting deeper Gemini integration across a range of Workspace applications
-
Microsoft has a new AI poster child in Anthropic – and it’s about timeOpinion Microsoft is cosying up to Anthropic at a crucial time in the race to deliver on AI promises
-
Will AI hiring entrench gender bias?ITPro Podcast This International Women's Day, it's more important than ever to consider the inherent biases of training data
-
Why Amazon’s ‘go build it’ AI strategy aligns with OpenAI’s big enterprise pushNews OpenAI and Amazon are both vying to offer customers DIY-style AI development services
-
February rundown: SaaS-pocalypse now?ITPro Podcast Geopolitical uncertainty is intensifying public and private sector focus on true sovereign workloads
-
‘A huge vote of confidence’: London set to host OpenAI's largest research hub outside USNews OpenAI wants to capitalize on the UK’s “world-class” talent in areas such as machine learning
-
Sam Altman just said what everyone is thinking about AI layoffsNews AI layoff claims are overblown and increasingly used as an excuse for “traditional drivers” when implementing job cuts
-
OpenAI's Codex app is now available on macOS – and it’s free for some ChatGPT users for a limited timeNews OpenAI has rolled out the macOS app to help developers make more use of Codex in their work
-
Amazon’s rumored OpenAI investment points to a “lack of confidence” in Nova model rangeNews The hyperscaler is among a number of firms targeting investment in the company