The world's 'first AI software engineer' isn't living up to expectations: Cognition AI's 'Devin' assistant was touted as a game changer for developers, but so far it's fumbling tasks and struggling to compete with human workers
Devin failed to complete most tasks given to it by researchers


Devin, a coding assistant hailed as the world’s 'first AI software engineer’, was given 20 coding tasks – it managed to complete just three, taking longer than expected and going down strange routes to achieve its goals.
The AI coding tool, developed by Cognition AI, was hailed as a transformative solution to help streamline software development when it was unveiled last year.
Costing around $500 per month, the AI assistant works via Slack so it feels like chatting to a colleague. At the time, Cognition showed a demo of Devin picking up jobs on Upwork, a freelancing platform that is used by software engineers to find work.
However, the results haven't been replicable by third-party researchers, according to reports, with one software developer picking apart the Upwork claims and AI researchers assessing Devin found it lacking.
Devin was framed as a game changer AI tool
At Devin's launch last year, Cognition claimed that the tool could "make money taking on messy Upwork tasks," sharing a video purporting to show just that.
But software developer Carl Brown posted his own video in response, arguing that the company was not telling the truth about the tool's abilities, revealing what "Devin was supposed to do, what it actually managed to do instead, and how bad of a job that it did."
Brown noted that it took 36 minutes to do the task himself, and six hours for Devin to fail to do it.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Cognition's claims about Devin were also tested by a team of researchers at Answer.AI, and their results were closer to Brown's than what the original blog post claimed, achieving only three of 20 tasks.
There were some "early wins", however. Devin could pull a Notion database into Google Sheets with "surprising competence", they noted, completing the task in an hour with only a few minutes of human interaction.
The code worked, but was "a bit verbose." Another task, building a planet tracker, was similarly successful.
"This felt like a glimpse into the future — an AI that could handle the 'glue code' tasks that consume so much developer time.
More complicated tasks started to raise challenges, or as the researchers said: "as we scaled up our testing, cracks appeared."
"Tasks that seemed straightforward often took days rather than hours, with Devin getting stuck in technical dead-ends or producing overly complex, unusable solutions," they noted. "Even more concerning was Devin’s tendency to press forward with tasks that weren’t actually possible."
Over a month, they tasked Devin with creating new projects from scratch, performing research and analyzing or modifying existing projects, but out of 20 such tasks, just three were successful.
"The most frustrating aspect wasn’t the failures themselves - all tools have limitations - but rather how much time we spent trying to salvage these attempts," they said.
How to use Devin
That's a far cry from what was advertised when the AI assistant was first unveiled in March of last year. A blog post on Cognition's website claimed Devin could take on basic tasks for software engineers, allowing them to focus on bigger problems.
The website says Devin can find and fix bugs, build and deploy an entire app end-to-end, and even train and fine-tune an AI model.
"With our advances in long-term reasoning and planning, Devin can plan and execute complex engineering tasks requiring thousands of decisions," the company said. "Devin can recall relevant context at every step, learn over time, and fix mistakes."
Cognition hasn't yet replied to a request for comment from ITPro, but its own blog post does give some context to how the system could be used more successfully than these tests suggest.
RELATED WHITEPAPER
The company says Devin "can be an all-purpose tool", but recommends starting with smaller tasks such as simple bugs. Notably, the company said that it works best when you "give Devin tasks that you know how to do yourself" and tell the tool how to test or check its own work.
Thereafter, Devin can prove beneficial in helping to break down large tasks into smaller ones that will take less than three hours.
Given Answer.AI's success using Devin for smaller "glue code" tasks, perhaps such advice about starting small should be heeded.
Indeed, this research challenging the usefulness of the current crop of AI software assistants comes as Meta founder Mark Zuckerberg has predicted that AI will be doing the work of mid-level engineers this year — but with some serious caveats.
"In the beginning it’ll be really expensive to run, then you can get it to be more efficient and then over time we’ll get to the point where a lot of the code in our apps and including the AI that we generate is actually going to be built by AI engineers instead of people engineers," he said.
Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.
Nicole the author of a book about the history of technology, The Long History of the Future.
-
Using DeepSeek at work is like ‘printing out and handing over your confidential information’
News Thinking of using DeepSeek at work? Think again. Cybersecurity experts have warned you're putting your enterprise at huge risk.
-
Can cyber group takedowns last?
ITPro Podcast Threat groups can recover from website takeovers or rebrand for new activity – but each successful sting provides researchers with valuable data
-
Developers say AI can code better than most humans – but there's a catch
News A new survey suggests AI coding tools are catching up on human capabilities
-
84% of software developers are now using AI, but nearly half 'don't trust' the technology over accuracy concerns
News AI coding tools are delivering benefits for developers, but they’re still worried about security and compliance
-
Think AI coding tools are speeding up work? Think again – they’re actually slowing developers down
News AI coding tools may be hindering the work of experienced software developers, according to new research
-
Atlassian says AI has created an 'unexpected paradox' for software developers – they're saving over 10 hours a week, but they’re still overworked and losing an equal amount of time due to ‘organizational inefficiencies’
News While AI is helping save developers over 10 hours a week, these productivity boosts are being offset by growing workloads and poor operational efficiency, Atlassian says.
-
AI coding tools are booming – and developers in this one country are by far the most frequent users
News AI coding tools are soaring in popularity worldwide, but developers in one particular country are among the most frequent users.
-
MCP servers used by developers and 'vibe coders' are riddled with vulnerabilities – here’s what you need to know
News Security researchers have issued a warning over rampant vulnerabilities found in MCP servers used by developers and 'vibe coders'.
-
AI-generated code is in vogue: Developers are now packing codebases with automated code – but they’re overlooking security and leaving enterprises open to huge risks
News While AI-generated code is helping to streamline operations for developer teams, many are overlooking crucial security considerations.
-
Mistral targets security-conscious developers with new AI coding assistant
The coding assistant, available now in private preview, will be fully customizable