EU's AI legislation aims to protect businesses from IP theft

EU flag render shown as 12 gold stars hovering and creating a ripple effect in a wave of blue data
(Image credit: Getty Images)

A new draft of EU artificial intelligence (AI) legislation could better protect business IP from being secretly scraped by AI firms, with developers facing new transparency obligations on copyrighted content.

The long-awaited AI Act could force developers to disclose when they collect and use copyrighted material to train large language models (LLMs). 

The aim is to protect firms from having information such as source code used without their permission.

In addition to protection from unauthorized uses of data, the bill would offer companies legal grounds to establish the degree to which AI firms they work with are using ethically sourced, non-copyrighted data.

This could save businesses from costly legal battles over the use of tools that, unbeknown to them, contain stolen intellectual property (IP). 

The bill is also expected to categorize AI models by their risk factor, from ‘minimal’ to ‘unacceptable’. 

It aims to provide clear criteria that organizations can use to assess whether an AI tool’s risks outweigh its use cases.

High-risk AI systems have been defined as those that present “significant risks to the health and safety or fundamental rights of persons”, such as live facial recognition, and will be subject to additional transparency obligations.

EU’s AI Act and wider industry data scraping

At present, many developers of large language models (LLMs), the algorithms behind generative AI, use a great deal of data harvested from the internet for training purposes. 

Lawmakers from across the EU’s political divide have come to a provisional agreement for the bill, which will now be pushed to the EU trilogue for further debate.

Previous drafts of the bill, first proposed in 2021, stated that transparency obligations “will not disproportionately affect the right to protection of intellectual property”.

The current bill states that non-compliant firms could face fines totaling up to 4% of their annual worldwide turnover, or €20 million ($22 million), whichever is higher.

Alistair Dent, chief strategy officer at data company Profusion, said that the requirement “raises the question of why AI should be treated differently from other platforms, such as social media or search engines”.


Purple whitepaper cover with white text over background image of suited female wearing glasses

(Image credit: Mimecast)

AI and cyber security

The promise and truth of the AI security revolution


“These platforms index or use a huge amount of copyrighted material often without citation - should they be forced to adhere to the same standards as AI?” he added.

Examples of this can be found in historical examples of wide-scale data scraping. 

Earlier this year, Meta sued a ‘data scraping for hire’ firm Voyager Labs for its practices, alleging the firm had collected data on 600,000 Facebook users in a hidden campaign utilizing fake accounts.

Meta itself also attracted a €265 million ($291 million) fine for “unacceptable” data scraping from the Irish Data Protection Commission (DPC) in November 2022.

At present, firms face a confusing regulatory landscape over the permittance of data scraping. 

A case in the US last year, between hiQ Labs and LinkedIn, concluded that web scraping was not in violation of federal law.

However, the court was not convened to directly rule on the practice of web scraping, nor did it offer judgment on how the legality of the practice could be impacted by claims of IP theft.

The EU, in contrast, has protections against the use of data that could infringe on the rights of citizens such as for biometric identification, and has issued fines on the subject as seen with the Irish DPC’s decision on Meta.

Like GDPR, the EU’s AI Act is expected to have widespread implications for the market. 

Those firms looking to sell or deploy their models in the EU will have to comply with the terms of the bill, which will require a market-wide homogenized approach to risk and ethics criteria for AI products.

Researchers from the University of Oxford have created a tool called capAI, which they describe as “an independent, comparable, quantifiable, and accountable assessment of AI systems that conforms with the proposed AIA regulation”.

Within the paper, a proposal for internal review protocols is given as well as a suggestion that firms could provide stakeholders and customers with a scorecard for their AI system.

This could alleviate the concerns of some experts such as Dent.

"A risk-based approach, which seeks to categorize different uses of AI and then add rules based on the perceived 'risk' of the solution causing harm, has the drawback of being unable to anticipate how AI will develop and the impact new tools will have on society,” he stated.

“Put simply, if you can't know what form a new AI solution will take and how it will be used, it's very hard to predetermine what category it should go in and apply the compliance burden accordingly.

Rory Bathgate
Features and Multimedia Editor

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.

In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at or on LinkedIn.