Anthropic's Claude Opus 4.5 model stakes its claim as the the new frontrunner for AI coding

An abstract image of a human outline holding an atom structure, to represent Claude Opus 4.5.

(Image credit: Anthropic)

Anthropic has announced Claude Opus 4.5, its most advanced model to date and the new industry leader for AI code generation.

The AI company claims the new model is a strong contender for all agentic workflows, including code generation and autonomous computer use.

Opus 4.5 scored 80.9% in SWE-Bench Verified, cementing it as the new state of the art model for code generation.

SWE-Bench Verified is one of the most rigorous for testing the agentic coding capabilities of AI models, with models tested according to the benchmark are presented with real-world coding problems taken from open source GitHub repositories.

In comparison, GPT-5.1 Codex Max scored 77.9% and Gemini 3 Pro, Google’s latest frontier model, scored 76.2%.

Claude Sonnet 4.5 has, to date, been widely praised as the best AI model for generating code across a variety of programming languages.

In addition to its raw performance, Opus 4.5 offers developers more choice in how to approach a problem.

Via the Claude API, developers can use a new ‘effort’ parameter to determine how many tokens they want the model to use for a given task. This affects how long the output will take and how expensive it will be.

In tests, Opus 4.5 set to ‘medium’ was able to match Claude Sonnet 4.5 scores on SWE-bench Verified while using 76% fewer output tokens.

A big leap forward for Anthropic

Aside from its coding capabilities, Anthropic underlined the overall improvement Opus 4.5 brings to various enterprise tasks.

For example, the model is capable of complex information retrieval, agentic tool use, and deep analysis, as well as Excel automation.

Across agentic tool use benchmarks, Opus 4.5 was found to consistently outclass rival models.

A benchmark table for Opus 4.5 versus Sonnet 4.5, Opus 4.1, Gemini 3 Pro, and GPT-5.1.

In early testing with Excel automation, Anthropic said its customers measured 20% accuracy improvements and 15% efficiency gains.

Anthropic emphasized these tangible improvements as a sign that the Claude model family has become a strong choice for various enterprise tasks, in addition to its code-generation pedigree.

With the launch of Opus 4.5, Anthropic sees the three models in the Claude family fulfilling distinct roles in the development lifecycle:

Opus 4.5 is the go-to model for core agentic tasks and production code, with a focus on maximum sophistication and accuracy.
Sonnet 4.5 is the model of choice for agents at scale, particularly customer-facing agents, as well as generating low latency code for iterative development.
Haiku 4.5 is for businesses seeking to access a free tier to Claude, as well as for sub-agents.

Anthropic defines sub-agents as those with specific, pre-defined tasks, which agents don’t necessarily require its frontier model to accomplish.

Expanding on its computer use capabilities, Opus 4.5 will become available via a new Chrome extension, Claude for Chrome.

This will allow Max subscribers to let Claude take various actions across their browser.

"Claude Opus 4.5 represents a breakthrough in self-improving AI agents,” said Yusuke Kaji, GM of AI for Business at Rakuten.

“For automation of office tasks, our agents were able to autonomously refine their own capabilities—achieving peak performance in 4 iterations while other models couldn’t match that quality after 10.

“They also demonstrated the ability to learn from experience across technical tasks, storing insights from past work and applying them to new challenges like SRE operations."

New resilience to prompt injection

In addition to its benchmark improvements, Opus 4.5 was trained to be as trustworthy as possible and to defend against common prompt injection attacks launched against reasoning models.

When simulated attackers used 100 “very strong” prompt injection attacks, they saw a success rate of 63% against Opus 4.5 Thinking, compared to 87.8% against GPT-5.1 Thinking and 92% against Gemini 3 Pro Thinking.

When just one attack was used, just 4.7% of attacks succeeded versus 12.6% against GPT-5.1 Thinking and 12.5% against Gemini 3 Pro.

MORE FROM ITPRO

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.

In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.