OpenAI turns to red teamers to prevent malicious ChatGPT use as company warns future models could pose 'high' security risk

(Image credit: Getty Images)

published 12 December 2025

OpenAI says its future AI models could create serious cybersecurity risks – so it's taking drastic action to crack down on malicious use.

The ChatGPT developer has previously detailed attempts by criminals to use its AI models to automate malware campaigns, banning accounts with suspicious activity.

Now, it plans to boost its efforts to avoid AI models being used in cyber attacks by training models to avoid malicious use, hiring red teaming organizations to test systems, and setting up a trusted partner system so only known groups can access the latest models for security purposes.

In addition, the company said its agentic security researcher Aardvark was now in private beta.

"Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyber defense as well as new dual-use risks that must be managed carefully," the company said in a blog post.

Using one benchmark, GPT-5 scored 27% on a capture the flag challenge, but just a few months later GPT-5.1-Codex-Max scored 76%, OpenAI said. The company expects upcoming models will “continue on this trajectory."

Because of that, OpenAI said it was planning for each model as though it would reach "high levels of cybersecurity capability."

"By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects," the post added.

Warnings over the use of AI among cyber criminals have been growing in recent months as threat groups increasingly flock to powerful new tools to supercharge attacks.

Last month, for example, Google said it spotted malware that used AI to adapt to its environment mid-attack, while security researchers in August spotted a ransomware strain that made use of an OpenAI model.

More recently, TrendMicro has spotted criminals using intelligence reports as input to "vibe code" malware, while NetScout has warned that attackers are already using AI chatbots for DDoS attacks.

Jon Abbott, co-founder and CEO of ThreatAware, noted: "With models that can develop working zero-day remote exploits or assist with complex, stealthy intrusions, the barrier to entry for criminals has been dramatically lowered."

OpenAI is fighting back

The first step to battling growing risks is ensuring its models are useful for defensive security by helping cyber professionals with code auditing and bug spotting.

"Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced," OpenAI said.

However, such tools are also useful for the other side, OpenAI admitted, so it's building in a range of safeguards to help avoid their use by criminals.

"At the foundation of this, we take a defense-in-depth approach, relying on a combination of access controls, infrastructure hardening, egress controls, and monitoring," the post added.

"We complement these measures with detection and response systems, and dedicated threat intelligence and insider-risk programs, making it so emerging threats are identified and blocked quickly."

In practice, that means training models to refuse harmful requests, while still allowing what it calls "educational use cases", as well as bolstering detection systems, including blocking outputs that could be used for malicious purposes, with automatic and human-led review for enforcement.

OpenAI’s work with red teamers will also help spot flaws in the system, the company noted, and will play a key role in mitigating future issues.

"Their job is to try to bypass all of our defenses by working end-to-end, just like a determined and well-resourced adversary might," the post noted.

Future plans

OpenAI also announced a "trusted access program" for partners and known customers in security, offering access to the latest models and enhanced features. However, the company said it was still working on the "right boundary" between broad access for some capabilities versus which ones should fall behind such restrictions.

Aardvark, OpenAI's agentic security tool, is now in private beta. The AI tool can look for vulnerabilities and suggest patches, and OpenAI said it had already spotted novel flaws in open-source software.

The company said it would make Aardvark free to some non-commercial open source projects to help boost security of their ecosystems and supply chain.

Beyond that, the company revealed it will establish a Frontier Risk Council, an advisory body to keep a close watch on these issues in its own models, and would work with other AI developers via the Frontier Model Forum, a non-profit that works with labs on threat models.

"Taken together, this is ongoing work, and we expect to keep evolving these programs as we learn what most effectively advances real-world security," the company added.

In the meantime, ThreatAware's Abbott warned that the best way for businesses to battle the increasing security threat sparked by AI is to focus on the basics like user awareness, multi-factor authentication (MFA), and security controls.

“OpenAI’s warning that new models pose ‘high’ cybersecurity risks is exactly why getting the security foundations right is absolutely critical," he said. "AI might be accelerating the pace of attacks, but our best defence will continue to be nailing the fundamentals first."

He added: "Failing to address the basics should be a far greater concern, and there’s little point trying to implement advanced solutions if they’re not in place.”

Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.

MORE FROM ITPRO

Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.

Nicole the author of a book about the history of technology, The Long History of the Future.