AI is getting better at security – and it's doing it faster than expected

UK AISI warns that AI models are already exceeding existing benchmarks for testing

A depiction of AI software
(Image credit: Getty Images)

AI models are getting better at handling more complex security tasks, doubling results in one benchmark in the last few months – and that's before the arrival of security-focused models, notably Anthropic's Claude Mythos and OpenAI's GPT-5.5.

That's according to the UK's AI Security Institute (AISI), which tracks the potential impact of AI on the security industry and efforts to protect organisations, and found newer models had doubled the length of cyber tasks they could complete in just 4.7 months – much faster than expected.

"In February 2026, we internally estimated that the length of cyber tasks AI models could complete had doubled every 4.7 months since late 2024 – already an acceleration from our November 2025 estimate of 8 months," the organisation said in a blog post. "Since then, AISI reported on two new models, Claude Mythos Preview and GPT-5.5, which substantially exceeded both doubling rate trends."

AISI added: "It is unclear whether this represents a new, faster trend."

Latest Videos From

That follows the release of Claude Mythos, which sparked concerns that companies wouldn't be able to keep up with AI security, as well as GPT-5.5 Cyber last week, with OpenAI releasing the security focused model in a limited preview with access only to security professionals, amid fears that generative AI was accelerating a security arms race. Indeed, Forescout VP of security intelligence Rik Ferguson last week said AI tools are now "a standard part of the attacker toolkit."

How the AISA tests

These results are based on a time-horizon benchmark, which tracks the success rate of AI models on tasks of different lengths based on how long a human expert would take on the same task. For example, one set of tests includes reverse engineering and web exploits in self-contained setups. AISA is looking for a model to succeed 80% of the time to be considered capable of doing a task of a certain length.

AISI admits the time horizon benchmark is imperfect. "They are inexact predictors of performance; AI struggles with some tasks humans do quickly, and easily completes others that humans find hard," the blog post noted. "However, we use this type of benchmark because it offers a measure of AI autonomy from which we can draw trends."

Plus, the tests only include some capabilities that would be necessary to run a real-world attack. Alongside that, AISI limits the models to 2.5m tokens to maintain comparability across results.

But AISI said that the 2.5m token cap limits the success of models, as without that cap, the "success rates are so high that time horizons become impossible to calculate." But the organisation also added that its own tests are now too short, meaning it's not possible to reveal at what point model reliability would start to fail on a longer task; the longest task is 12 hours.

"No single benchmark result should be read as a precise measure of AI capability," the post noted, adding: "Regardless, the direction of change and rapid growth have been consistent across the models, methodological choices, and independent data we examined."

New evaluation methods were in development, the AISI added.

What this means for security

The AISI said it was unclear how AI's pace of progress would continue, or how the technology's capabilities would work against real-world systems. But the agency said it was clear that AI was bringing opportunities and risks.

"The time to invest in strong security baselines is now," the AISI post warned. "Frontier AI can strengthen attackers as well as defenders, and there is a critical window to build resilience."

That was echoed by Palo Alto Networks this week, with CTO Lee Klarich warning that AI cyberattacks would become the new normal in the next few months. "This impending vulnerability deluge demands urgency," he wrote.

Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.

Nicole the author of a book about the history of technology, The Long History of the Future.