OpenAI and Microsoft have both been named in a complaint filed in the Northern District of California US District Court, which alleges that the ChatGPT creator has been using personal data to train its models without first obtaining permission.
It marks the second major lawsuit launched against the company in the space of a month after it allegedly defamed a radio host by generating untrue statements about him.
The most recent legal accusations relate to the alleged non-consensual use of personal data. The plaintiffs involved have been left anonymous in the complaint, which is seeking class-action status, but $3 billion in damages was mentioned due to the “millions of class members” potentially involved.
Three essential requirements for flawless data protection
Why you need a unified platform that covers all your cloud data channels
Of the defendants, the lawsuit alleges “unlawful and harmful conduct in developing, marketing, and operating their AI products, including ChatGPT-3.5, ChatGPT-4.0, Dall-E, and Vall-E”.
It goes on to say that the products listed “use stolen private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge”.
OpenAI’s models were trained by scraping data from the internet, according to the complaint.
“Despite established protocols for the purchase and use of personal information, defendants took a different approach: Theft,” it read.
While scraping data from webpages is not a new technique, the failure to seek consent coupled with the difficulty in extracting personal information for the resulting models has resulted in legal concerns.
The complaint read: "Through their AI products, integrated into every industry,” says the complaint, “defendants collect, store, track, share, and disclose private information of millions of users". These include:
- All details entered into the products
- Account information users enter when signing up
- Contact details
- Login credentials
- Payment information for paid users
- Transaction records;
- Identifying data pulled from users’ devices and browsers, like IP addresses and location, including geolocation of the users
- Social media information
- Chat log data
- Usage data
- Key strokes
- Typed searches, as well as other online activity data
Is AI facing other legal issues?
Several lawsuits have been filed related to the performance of AI technology.
One challenging the legality of GitHub Copilot is currently working its way through the legal system. At question is how Copilot, an AI pair programmer capable of serving up coding suggestions, has been trained.
It is alleged that the use of code found in public GitHub repositories has violated the rights of developers or the licences under which the repositories have been made public. An attempt to have the case dismissed in May was denied, in part, meaning the lawsuit has longer to run.
“We firmly believe AI will transform the way the world builds software, leading to increased productivity and most importantly, happier developers,” a GitHub spokesperson told ITPro.
“We are confident that Copilot adheres to applicable laws and we’ve been committed to innovating responsibly with Copilot from the start. We will continue to invest in and advocate for the AI-powered developer experience of the future.”
The training of models is not the only aspect of generative AI to attract the attention of the legal profession.
OpenAI is also currently being sued for defamation after ChatGPT provided an inaccurate legal summary in response to a query.
The case centers on a complaint filed by Florida radio host Mark Walters that ChatGPT defamed him by erroneously claiming that he had been accused of financial crimes.
The technology has also come under the scrutiny of regulators. The Italian data protection regulator recently ordered OpenAI to implement a ‘right to be forgotten’ option amid concerns that the collection and processing of personal data is in violation of GDPR.
Why is AI having these problems?
Generative AI faces two major technical challenges.
The first is sourcing training data without infringing privacy or violating restrictions. OpenAI has taken the approach in the past of scraping any available data from webpages - including data entered into its tools - into its models.
While this has enabled the company to build up a large data set, questions are being asked regarding the rights it has around this gathering and use. It is this sourcing, or scraping, of training data that forms the basis for many of the complaints.
The second is the trust given to the output of generative AI platforms, particularly where a ‘hallucination’ has occurred. This is where an AI model has insufficient data to generate an accurate output.
This type of hallucination is what triggered the defamation lawsuit, and underlines the importance of checking the output of tools such as ChatGPT.
Generative AI is a relatively new technology that has only become practical in recent years with improvements in processing power and storage. Lawmakers are therefore working to catch up and understand its implications.
Amid widely spread fears regarding the risks posed by AI are genuine concerns around privacy and the use of personal data.
As an example, staffers in the US House of Representatives were recently instructed to only use ChatGPT Plus - which features privacy controls - and not paste sensitive data into the system.
Use of the technology has also been forbidden by a number of financial institutions and enterprises amid fears that sensitive financial information or company secrets might find their way into the chatbot’s models.
OpenAI recently updated its data usage and retention policies to permit customers to opt out of data sharing, although that policy did not apply to data submitted to the API before 1 March 2023 and doesn’t apply to OpenAI’s non-API consumer services, including ChatGPT.
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2023.
Richard Speed is an expert in databases, DevOps and IT regulations and governance. He was previously a Staff Writer for ITPro, CloudPro and ChannelPro, before going freelance. He first joined Future in 2023 having worked as a reporter for The Register. He has also attended numerous domestic and international events, including Microsoft's Build and Ignite conferences and both US and EU KubeCons.
Prior to joining The Register, he spent a number of years working in IT in the pharmaceutical and financial sectors.