‘We experimented with efforts to differentially reduce these capabilities’: Anthropic toned down Opus 4.7’s cyber uses in wake of Claude Mythos release

Testing of new cyber-related safeguards for Anthropic’s Opus 4.7 model could shape the future public release of Claude Mythos

Claude Opus 4.7 logo and branding featuring a human head on left hand side and notepad on right.
(Image credit: Anthropic)

Anthropic claims it used new techniques to “differentially reduce” the cyber capabilities of its Opus 4.7 model in the wake of the Claude Mythos release.

The company unveiled the new model on 16 April, providing users with marked improvements in areas such as software engineering. The AI developer specifically highlighted gains in coding, as well as improvements on knowledge work tasks.

“Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks,” the company said in a blog post.

“Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence.”

Anthropic noted that Opus 4.7 is capable of handling “complex, long-running tasks with rigor and consistency” and is “substantially better” at following instructions.

Reduced cyber capabilities

The launch of Opus 4.7 comes hot on the heels of the gated release of Claude Mythos, the company’s most powerful model yet which it claims “reshape cybersecurity”.

Through Project Glasswing, a host of big tech partners have been given preview access to the model in a bid to test and evaluate its defensive capabilities, which initial testing shows could deliver marked improvements.

Anthropic claimed Mythos identified “thousands of zero-day vulnerabilities” during testing, many of which were critical and some that had flown under the radar for nearly three decades.

Concerns have been raised over its potential for nefarious use, however, and Anthropic said the limited release aimed to prevent misuse by threat actors.

Opus 4.7 is the first model released by the firm in the wake of Project Glasswing, and Anthropic noted that its cyber capabilities are “not as advanced” as those of Mythos.

Indeed, it appears the company actively toned down its capabilities in cybersecurity use-cases.

“During its training we experimented with efforts to differentially reduce these capabilities,” the company said in a blog post.

Opus 4.7 also comes with new safeguards that detect and block requests that “indicate prohibited or high-risk cybersecurity uses”. Long-term, the testing of these safeguards could help steer the company’s approach to a full public release of Mythos.

“What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models,” the company said.

“Security professionals who wish to use Opus 4.7 for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) are invited to join our new Cyber Verification Program.”

Will Anthropic's gambit work?

A headshot of Rory Bathgate, Features and Multimedia Editor at ITPro, shot in black and white and including his head and part of his shoulders.
Rory Bathgate

Anthropic is working at speed, with just over two months having passed between the release of its last Opus model and this successor. The headline here is, of course, the leap in code generation and agentic computer use – an unsurprising focus for a company that has become widely known for Claude Code and Claude Cowork.

At the same time, Anthropic appears to be more and more concerned about releasing cyber-capable models to the general public. I’m interested in seeing actual testing of Opus 4.7’s cyber safeguards, as hiding capabilities behind model refusal is never a guaranteed method to stop bad actors from accessing them via prompt injection.

Interestingly, Anthropic says it’s paired these traditional guardrails with active efforts to reduce Opus 4.7’s cybersecurity performance during training. This is the first time I can remember a model developer actively trying to scuttle one of its frontier models.

If it proves successful, the ‘Opus’ family could become more fragmented, with sub-models that specialize in certain tasks and fall behind on others.

For example, as it leans further into computer use capabilities with Claude Cowork, it could be beneficial for the firm to produce a model that excels at this but not code generation, to remove the possibility of malicious misuse.

For now, it seems, Anthropic is saving its best cyber performance for Mythos. That said, we have to be careful about taking everything Anthropic says about Mythos at face value.

Recent analysis from the UK’s AI Security Institute, which rigorously tested Claude Mythos’ capabilities using custom cybersecurity tests, found that Mythos Preview is more capable than previous frontier models in cyber performance but doesn’t exceed human cyber performance.

In a 32-step corporate network attack simulation, Mythos Preview was the first model to reach the end, albeit only in three of its ten attempts.

“Mythos Preview’s success on one cyber range indicates that it is at least capable of autonomously attacking small, weakly defended and vulnerable enterprise systems where access to a network has been gained,” AISI researchers wrote.

“However, our ranges have important differences from real-world environments that make them easier targets,” they added, explaining this includes a lack of active defenders, defensive tooling, and security alerts that would be present in the real world.

FOLLOW US ON SOCIAL MEDIA

Follow ITPro on Google News and add us as a preferred source to keep tabs on all our latest news, analysis, views, and reviews.

You can also follow ITPro on LinkedIn, X, Facebook, and BlueSky.

Ross Kelly
News and Analysis Editor

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.

He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.

For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.

With contributions from