AI is getting better at security – and it's doing it faster than expected

A depiction of AI software — (Image credit: Getty Images)

AI models are getting better at handling more complex security tasks, doubling results in one benchmark in the last few months – and that's before the arrival of security-focused models, notably Anthropic's Claude Mythos and OpenAI's GPT-5.5.

That's according to the UK's AI Security Institute (AISI), which tracks the potential impact of AI on the security industry and efforts to protect organisations, and found newer models had doubled the length of cyber tasks they could complete in just 4.7 months – much faster than expected.

"In February 2026, we internally estimated that the length of cyber tasks AI models could complete had doubled every 4.7 months since late 2024 – already an acceleration from our November 2025 estimate of 8 months," the organisation said in a blog post. "Since then, AISI reported on two new models, Claude Mythos Preview and GPT-5.5, which substantially exceeded both doubling rate trends."

AISI added: "It is unclear whether this represents a new, faster trend."

How the AISA tests

These results are based on a time-horizon benchmark, which tracks the success rate of AI models on tasks of different lengths based on how long a human expert would take on the same task. For example, one set of tests includes reverse engineering and web exploits in self-contained setups. AISA is looking for a model to succeed 80% of the time to be considered capable of doing a task of a certain length.

AISI admits the time horizon benchmark is imperfect. "They are inexact predictors of performance; AI struggles with some tasks humans do quickly, and easily completes others that humans find hard," the blog post noted. "However, we use this type of benchmark because it offers a measure of AI autonomy from which we can draw trends."

Plus, the tests only include some capabilities that would be necessary to run a real-world attack. Alongside that, AISI limits the models to 2.5m tokens to maintain comparability across results.

But AISI said that the 2.5m token cap limits the success of models, as without that cap, the "success rates are so high that time horizons become impossible to calculate." But the organisation also added that its own tests are now too short, meaning it's not possible to reveal at what point model reliability would start to fail on a longer task; the longest task is 12 hours.

"No single benchmark result should be read as a precise measure of AI capability," the post noted, adding: "Regardless, the direction of change and rapid growth have been consistent across the models, methodological choices, and independent data we examined."

New evaluation methods were in development, the AISI added.

What this means for security

The AISI said it was unclear how AI's pace of progress would continue, or how the technology's capabilities would work against real-world systems. But the agency said it was clear that AI was bringing opportunities and risks.

"The time to invest in strong security baselines is now," the AISI post warned. "Frontier AI can strengthen attackers as well as defenders, and there is a critical window to build resilience."

That was echoed by Palo Alto Networks this week, with CTO Lee Klarich warning that AI cyberattacks would become the new normal in the next few months. "This impending vulnerability deluge demands urgency," he wrote.