Generative AI training in the crosshairs as ICO set to examine legality of personal data use
Generative AI training methods have become a contentious issue in recent months amid data privacy concerns and a slew of lawsuits against major industry players


The legality of generative AI training methods is set to be examined by the Information Commissioner’s Office (ICO) amid concerns over the use of personal data.
AI training methods have been a key talking point in recent months due to the manner in which large language models (LLMs) are built. LLMs such as ChatGPT are typically built using vast amounts of data collected through web scraping.
However, these practices have raised concerns both about data privacy and the legal repercussions for developers that fall foul of copyright laws.
The ICO said conversations with developers in the AI space have highlighted several areas where organizations seek greater clarity around how data protection laws apply to the development and use of generative AI.
This includes questions over the appropriate lawful basis for training generative AI models, and how the purpose limitation principle plays out in the context of generative AI development and deployment.
There are also lingering questions about complying with the accuracy principle, as well as the expectations in terms of complying with data subject rights.
Over the coming months, the ICO said it plans to release guidance on its position on the matter, outlining how specific requirements of UK GDPR and the Data Protection Act (2018) could impact generative AI training methods.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
"The impact of generative AI can be transformative for society if it’s developed and deployed responsibly," said Stephen Almond, the ICO's executive director for regulatory risk.
"This call for views will help the ICO provide industry with certainty regarding its obligations and safeguard people’s information rights and freedoms."
Generative AI training and ‘legitimate interest’
Under the UK GDPR, the purpose of data processing must be legitimate and necessary for that purpose, and the individual’s interests must not override the interest being pursued.
The ICO said its current thinking is that legitimate interests can be a valid lawful basis for training generative AI models on web scraped data, as long as the model developer can ensure they pass this three-part test.
The developer’s interest could be simply the business interest in developing a model and deploying it for commercial gain, or wider societal interests - as long as the developer can evidence the model’s specific purpose and use.
As for necessity, the ICO recognizes that, currently, most generative AI training is only possible using data obtained through large-scale scraping.
With the 'balancing' test, the data watchdog noted that things can be complicated depending on whether generative AI models are deployed by the initial developer, by a third-party through an API, or simply provided to third parties.
RELATED RESOURCE
Stop fighting fires and start rethinking your supply chain
DOWNLOAD NOW
The ICO said it will engage with stakeholders from across the technology industry as part of the investigation, including developers and users of generative AI, legal advisors and consultants working in the space, civil society groups, and public bodies with an interest in generative AI.
The first consultation is open until 1 March, with future consultations planned during the first half of this year to examine issues such as the accuracy of generative AI outputs.
Emma Woollacott is a freelance journalist writing for publications including the BBC, Private Eye, Forbes, Raconteur and specialist technology titles.
-
LaunchDarkly to "double down" on observability with Highlight acquisition
News Highlight's observability tools will be integrated into LaunchDarkly's Guarded Releases software deployment service
By Daniel Todd
-
Samsung Galaxy Tab S10 FE review
Reviews The Tab S10 FE retains the feel and core capabilities of Samsung's high-end S10 tablets, but compromises on the display and the performance
By Stuart Andrews
-
NHS in more data security blunders
News Two NHS trusts lose a significant amount of patient data... again.
By Tom Brewster
-
Week in Review: Tech City and Google guilt
News This week, David Cameron announced plans to turn the East End of London into a Silicon Valley rival and Google was told off by the ICO.
By Jennifer Scott
-
ICO reports record levels of business
News The ICO is busier than it ever has been after seeing a spike in data protection cases over the 2009/10 period.
By Tom Brewster
-
ICO urges ‘Privacy by Design’
News A report and conference aim to explore the barriers to widespread uptake of privacy enhancing technologies, and their design into plans and projects.
By Miya Knights
-
ICO: Databases never ‘risk free’
News The Information Commissioner’s Office has warned on government plans for a communications database, but welcomed public debate.
By Nicole Kobie
-
Police must delete old data
News The ICO’s ruling that there is no justification for holding old criminal record data has been upheld by the Information Tribunal.
By Miya Knights
-
Data watchdog issues stark warnings
News The Information Commissioner condemns government database expansion plans, with particular venom reserved for the MoD and HMRC after data breaches.
By Miya Knights