Researchers tested over 100 leading AI models on coding tasks — nearly half produced glaring security flaws
AI models large and small were found to introduce cross-site scripting errors and seriously struggle with secure Java generation


Just 55% of code generated with AI is free of known cybersecurity vulnerabilities, according to new research from Veracode.
To test the capability of AI models to generate safe code, Veracode took existing functions and replaced part of the code with a comment describing what the finished code should look like.
In 45% of results, generated code contained known security flaws, with no significant difference in outcome between small models and the largest available.
30% off Keeper Security's Business Starter and Business plans
Keeper Security is trusted and valued by thousands of businesses and millions of employees. Why not join them and protect your most important assets while taking advantage of this special offer?
The findings underline a major potential risk attached to ‘vibe coding’, in which software developers rely heavily on large language model (LLM) output to quickly generate code for use in software.
Researchers put over 100 LLMs across a variety of vendors, sizes, and intended applications – including models specifically intended for coding as well as general purpose models – through 80 distinct coding tasks.
Veracode said researchers intentionally used sections that could be coded in a number of different ‘correct’ ways, as well as in at least one way that would include a known software vulnerability or ‘Common Weakness Enumeration’ (CWE).
These CWEs included flaws that hackers could use for SQL injection, cross-site scripting (XSS), cracking cryptographic algorithms, and log injection attacks. Each featured vulnerability is in the Open Worldwide Application Security Project (OWASP) list of top ten vulnerabilities.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Models showed inconsistent performance across different vulnerability types, achieving security pass rates of 85.6% and 80.4% when it came to avoiding inclusion of the cryptographic algorithm and SQL injection vulnerabilities.
In contrast, models fared extremely poorly with avoiding the XSS and log injection vulnerabilities, achieving an average 13.5% and 12% respectively.
Researchers noted that the tested LLMs are getting better still at avoiding the SQL injection and cryptographic algorithm flaws over time, while seemingly getting worse at avoiding the XSS and log injection vulnerabilities.
Overall, Veracode noted that the security improvements of the tested LLMs have flatlined.
The authors of the report noted that it is possible to phrase AI code prompts in a more security-conscious way, but that this is far from standard practice. With this in mind, they intentionally short prompts, to examine how models react when given minimal context.
But they also warned that even if firms take a more security-aware approach to code generation, LLMs are still prone to errors such as which variables require sanitization, a necessary step for preventing code injection attacks.
“Even with a large context window, it is unclear whether models can perform the detailed interprocedural dataflow analysis required to determine this information precisely,” they wrote.
LLMs were tested across a range of programming languages: Python, C#, JavaScript, and Java. Overall, the researchers found LLMs the worst at generating Java safely, achieving an average score of 28.5% in this widely-used language.
AI-generated code remains a concern, but adoption is still rising
AI tools are now widely used for generating code, with 84% of software developers using AI to produce code more quickly according to recent Stack Overflow findings.
But the same report underlined continued distrust among developers in the quality of AI code, with three-quarters (75.3%) reporting that they do not trust AI outputs and 61.7% stating they have security concerns over the use of AI code.
Despite these worries, big tech continues to embrace AI code, with Alphabet CEO Sundar Pichai having revealed last year that 25% of Google’s internal code is now AI-generated and Microsoft CEO Satya Nadella recently revealing up to 20-30% of his firm’s code was written by AI.
Nadella noted that while Microsoft has been quick to adopt AI-generated Python code, C++ has proven harder to adopt. Kevin Scott, CTO at Microsoft, has been bullish on overcoming these hurdles with his prediction that 95% of code will be AI-generated by 2030, as reported by Business Insider.
Security teams and developers will have to carefully weigh up findings such as Veracode’s against the potential benefits to their bottom line of using AI to alter and add to their codebase.
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.
-
The Scattered Spider ransomware group is infiltrating Slack and Microsoft Teams to target vulnerable employees
News The group is using new ransomware variants and new social engineering techniques - including sneaking into corporate teleconferences
-
Acer’s laptop made from oyster shells is now available in the UK
News The Acer Aspire Vero 16 aims to combine performance and sustainability, the company said
-
‘LaMDA was ChatGPT before ChatGPT’: Microsoft’s AI CEO Mustafa Suleyman claims Google nearly pipped OpenAI to launch its own chatbot – and it could’ve completely changed the course of the generative AI ‘boom’
News In a recent podcast appearance, Mustafa Suleyman revealed Google was nearing the launch of its own ChatGPT equivalent in the months before OpenAI stole the show.
-
Microsoft is doubling down on multilingual large language models – and Europe stands to benefit the most
News The tech giant wants to ramp up development of LLMs for a range of European languages
-
Everything you need to know about OpenAI’s new agent for ChatGPT – including how to access it and what it can do
News ChatGPT agent will bridge "research and action" – but OpenAI is keen to stress it's still a work in progress
-
‘Humans must remain at the center of the story’: Marc Benioff isn’t convinced about the threat of AI job losses – and Salesforce’s adoption journey might just prove his point
News Marc Benioff thinks fears over widespread AI job losses may be overblown and that Salesforce's own approach to the technology shows adoption can be achieved without huge cuts.
-
AI adoption is finally driving ROI for B2B teams in the UK and EU
News Early AI adopters across the UK and EU are transforming their response processes, with many finding first-year ROI success
-
‘The latest example of FOMO investing’: Why the Builder.ai collapse should be a turning point in the age of AI hype
News Builder.ai was among one of the most promising startups capitalizing on the generative AI boom – until it all came crashing down
-
Is ChatGPT making us dumber? A new MIT study claims using AI tools causes cognitive issues, and it’s not the first – Microsoft has already warned about ‘diminished independent problem-solving’
News A recent study from MIT suggests that using AI tools impacts brain activity, with frequent users underperforming compared to their counterparts.
-
‘Agent washing’ is here: Most agentic AI tools are just ‘repackaged’ RPA solutions and chatbots – and Gartner says 40% of projects will be ditched within two years
News Agentic AI might be the latest industry trend, but new research suggests the majority of tools are simply repackaged AI assistants and chatbots.