‘Frontier models are still unable to solve the majority of tasks’: AI might not replace software engineers just yet – OpenAI researchers found leading models and coding tools still lag behind humans on basic tasks
Large language models struggle to identify root causes or provide comprehensive solutions


AI might not replace software engineers just yet as new research from OpenAI reveals ongoing weaknesses in the technology.
Having created a benchmark dubbed ‘SWE-Lancer’ to evaluate AI’s effectiveness at completing software engineering and managerial tasks, researchers concluded that the technology is lacking.
“We evaluate model performance and find that frontier models are still unable to solve the majority of tasks,” researchers said.
Researchers found that, while AI excels in certain areas, it is limited in others. For example, AI agents are skilled at localizing problems but bad at working out what the root cause is.
While they can pinpoint the location of an issue with speed and use search capabilities to access necessary repositories faster than humans can, their understanding is limited in terms of how an issue spans across different components and files.
This frequently leads to solutions that are incorrect or insufficiently comprehensive, and agents can often fail by not finding the right file or location to edit.
In a comparison between two OpenAI models, o1 and GPT-4o, and Claude’s 3.5 Sonnet model, researchers found they all failed to entirely solve one particular user interface (UI) problem.
Get the ITPro daily newsletter
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
While o1 solved the basic issue, it missed a range of others, and GPT-4o failed to solve even the initial problem. Sonnet was quick to identify the root cause of the issue and fix the bug, but the solution was not comprehensive and did not pass the researcher’s end-to-end tests.
All told, researchers said that while AI coding tools have the capacity to make software engineering more productive, but that users need to be wary of the potential flaws in AI-generated code.
Are AI coding tools more trouble than they’re worth?
While businesses are ramping up the use of AI coding tools, there have been plenty of warning signs to make firms stop and consider whether the tools are worth it.
Research from Harness earlier this year found that many developers are becoming increasingly bogged down with manual tasks and code remediation due to the increased use of AI coding tools.
The study noted that while these tools may offer huge benefits to software engineers, experts say they are still littered with weaknesses and lack some of the capabilities of human engineers.
“While these tools can boost efficiency, in their current state they often result in a surge of errors, security vulnerabilities, and downstream manual work that burdens developers," Sheila Flavell, COO of FDM Group, told ITPro.
The risk of vulnerabilities and malicious code being introduced into organizations is also significantly higher when AI coding tools are used, according to Shobhit Gautam, security solutions architect at HackerOne.
“AI-generated code is not guaranteed to follow security guidelines and best practices as defined by the organization standards. As the code is generated from LLMs, there is a possibility that third-party components may be used in the code and go unnoticed,” Gautam told ITPro.
RELATED WHITEPAPER
“Aside from the risk of copyright infringement, the code hasn’t been through the company’s validation testing and peer reviews, potentially resulting in unchecked vulnerabilities,” Gautam added.
An overreliance on AI coding tools may also be eroding the skills of human programmers, with research from education platform O’Reilly finding that interest in traditional programming languages is in decline.
Similarly, a post from tech blogger and programmer Namanyay Goel sparked debate on this topic recently when Goel claimed junior developers lack coding skills owing to a heightened use of automated AI tooling.
How can businesses use these tools effectively?
Despite concerns, there are clear signs AI coding tools are delivering value for both software engineers and enterprises. GitHub research from last year revealed AI coding tools have helped engineers deliver more secure software, better quality code, and the adoption of new languages.
With this in mind, firms need to prioritize certain processes to deliver success with AI tools. Flavell said businesses need to put upskilling front and center, as well as improving code reviews and quality assurance.
“It is essential that organizations create and implement governance processes to manage the use of AI generated code,” Gautam added.
“When it comes to coding, AI tools and human input will all play their part. Organizations gain the best of both worlds when they integrate these two together. Human Intelligence is essential to tailor coding to specific requirements, and AI can help experts increase their efficiency.”
MORE FROM ITPRO
- Can AI code generation really replace human developers?
- AI-generated code risks: What CISOs need to know
- The world's 'first AI software engineer' isn't living up to expectations

George Fitzmaurice is a former Staff Writer at ITPro and ChannelPro, with a particular interest in AI regulation, data legislation, and market development. After graduating from the University of Oxford with a degree in English Language and Literature, he undertook an internship at the New Statesman before starting at ITPro. Outside of the office, George is both an aspiring musician and an avid reader.
-
RSAC Conference 2025: The front line of cyber innovation
ITPro Podcast Ransomware, quantum computing, and an unsurprising focus on AI were highlights of this year's event
-
Anthropic CEO Dario Amodei thinks we're burying our heads in the sand on AI job losses
News With AI set to hit entry-level jobs especially, some industry execs say clear warning signs are being ignored
-
Mistral targets security-conscious developers with new AI coding assistant
The coding assistant, available now in private preview, will be fully customizable
-
Big tech promised developers productivity gains with AI tools – now they’re being rendered obsolete
Opinion Big tech promised software developers huge benefits with AI tools, but now they face job cuts as companies ramp up automation.
-
Shifting left might improve software security, but developers are becoming overwhelmed – communication barriers, tool sprawl, and ‘vulnerability overload’ are causing serious headaches for development teams
News Developers are becoming overwhelmed amid the 'shift left' in development practices, new research shows.
-
Anthropic’s new AI model could be a game changer for developers: Claude Opus 4 ‘pushes the boundaries in coding’, dramatically outperforms OpenAI’s GPT-4.1, and can code independently for seven hours
News Claude Opus 4 boasts huge performance capabilities and is fine-tuned for software developers.
-
The NCSC wants developers to get serious on software security
News The NCSC's new Software Security Code of Practice has been welcomed by cyber professionals as a positive step toward bolstering software supply chain security.
-
AI was a harbinger of doom for low-code solutions, but peaceful coexistence is possible – developers still love the time savings and simplicity despite the allure of popular AI coding tools
News The impact of AI coding tools on the low-code market hasn't been quite as disastrous as predicted
-
Red teaming comes to the fore as devs tackle AI application flaws
News Only a third of organizations employ adequate testing practices in AI application development, according to new research, prompting calls for increased red teaming to reduce risks.
-
NetSuite targets UK customer productivity gains with new AI tools
News Oracle NetSuite has announced new AI tools and features for UK customers aimed at supercharging productivity.