OpenAI turns to red teamers to prevent malicious ChatGPT use as company warns future models could pose 'high' security risk
The ChatGPT maker wants to keep defenders ahead of attackers when it comes to AI security tools
OpenAI says its future AI models could create serious cybersecurity risks – so it's taking drastic action to crack down on malicious use.
The ChatGPT developer has previously detailed attempts by criminals to use its AI models to automate malware campaigns, banning accounts with suspicious activity.
Now, it plans to boost its efforts to avoid AI models being used in cyber attacks by training models to avoid malicious use, hiring red teaming organizations to test systems, and setting up a trusted partner system so only known groups can access the latest models for security purposes.
In addition, the company said its agentic security researcher Aardvark was now in private beta.
"Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyber defense as well as new dual-use risks that must be managed carefully," the company said in a blog post.
Using one benchmark, GPT-5 scored 27% on a capture the flag challenge, but just a few months later GPT-5.1-Codex-Max scored 76%, OpenAI said. The company expects upcoming models will “continue on this trajectory."
Because of that, OpenAI said it was planning for each model as though it would reach "high levels of cybersecurity capability."
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
"By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects," the post added.
Warnings over the use of AI among cyber criminals have been growing in recent months as threat groups increasingly flock to powerful new tools to supercharge attacks.
Last month, for example, Google said it spotted malware that used AI to adapt to its environment mid-attack, while security researchers in August spotted a ransomware strain that made use of an OpenAI model.
More recently, TrendMicro has spotted criminals using intelligence reports as input to "vibe code" malware, while NetScout has warned that attackers are already using AI chatbots for DDoS attacks.
Jon Abbott, co-founder and CEO of ThreatAware, noted: "With models that can develop working zero-day remote exploits or assist with complex, stealthy intrusions, the barrier to entry for criminals has been dramatically lowered."
OpenAI is fighting back
The first step to battling growing risks is ensuring its models are useful for defensive security by helping cyber professionals with code auditing and bug spotting.
"Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced," OpenAI said.
However, such tools are also useful for the other side, OpenAI admitted, so it's building in a range of safeguards to help avoid their use by criminals.
"At the foundation of this, we take a defense-in-depth approach, relying on a combination of access controls, infrastructure hardening, egress controls, and monitoring," the post added.
"We complement these measures with detection and response systems, and dedicated threat intelligence and insider-risk programs, making it so emerging threats are identified and blocked quickly."
In practice, that means training models to refuse harmful requests, while still allowing what it calls "educational use cases", as well as bolstering detection systems, including blocking outputs that could be used for malicious purposes, with automatic and human-led review for enforcement.
OpenAI’s work with red teamers will also help spot flaws in the system, the company noted, and will play a key role in mitigating future issues.
"Their job is to try to bypass all of our defenses by working end-to-end, just like a determined and well-resourced adversary might," the post noted.
Future plans
OpenAI also announced a "trusted access program" for partners and known customers in security, offering access to the latest models and enhanced features. However, the company said it was still working on the "right boundary" between broad access for some capabilities versus which ones should fall behind such restrictions.
Aardvark, OpenAI's agentic security tool, is now in private beta. The AI tool can look for vulnerabilities and suggest patches, and OpenAI said it had already spotted novel flaws in open-source software.
The company said it would make Aardvark free to some non-commercial open source projects to help boost security of their ecosystems and supply chain.
Beyond that, the company revealed it will establish a Frontier Risk Council, an advisory body to keep a close watch on these issues in its own models, and would work with other AI developers via the Frontier Model Forum, a non-profit that works with labs on threat models.
"Taken together, this is ongoing work, and we expect to keep evolving these programs as we learn what most effectively advances real-world security," the company added.
In the meantime, ThreatAware's Abbott warned that the best way for businesses to battle the increasing security threat sparked by AI is to focus on the basics like user awareness, multi-factor authentication (MFA), and security controls.
“OpenAI’s warning that new models pose ‘high’ cybersecurity risks is exactly why getting the security foundations right is absolutely critical," he said. "AI might be accelerating the pace of attacks, but our best defence will continue to be nailing the fundamentals first."
He added: "Failing to address the basics should be a far greater concern, and there’s little point trying to implement advanced solutions if they’re not in place.”
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO
Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.
Nicole the author of a book about the history of technology, The Long History of the Future.
-
Why Dell PowerEdge is the right fit for any data center needAs demand rises for RAG, HPC, and analytics, Dell PowerEdge servers provide the broadest, most powerful options for the enterprise
-
Oracle's huge AI spending has some investors worriedNews Oracle says in quarterly results call that it will spend $15bn more than expected next quarter
-
AWS has dived headfirst into the agentic AI hype cycle, but old tricks will help it chart new watersOpinion While AWS has jumped on the agentic AI hype train, its reputation as a no-nonsense, reliable cloud provider will pay dividends
-
BT unveils sovereign platform to secure UK AI and cloud infrastructureNews The telecom giant’s new offering aims to insulate UK public and private sector data from geopolitical instability, supporting the government’s national AI strategy
-
AWS CEO Matt Garman says AI agents will have 'as much impact on your business as the internet or cloud'News Garman told attendees at AWS re:Invent that AI agents represent a paradigm shift in the trajectory of AI and will finally unlock returns on investment for enterprises.
-
Westcon-Comstor partners with Fortanix to drive AI expertise in EMEANews The new agreement will help EMEA channel partners ramp up AI and multi-cloud capabilities
-
Microsoft quietly launches Fara-7B, a new 'agentic' small language model that lives on your PC — and it’s more powerful than GPT-4oNews The new Fara-7B model is designed to takeover your mouse and keyboard
-
Anthropic announces Claude Opus 4.5, the new AI coding frontrunnerNews The new frontier model is a leap forward for the firm across agentic tool use and resilience against attacks
-
Gartner says 40% of enterprises will experience ‘shadow AI’ breaches by 2030 — educating staff is the key to avoiding disasterNews Staff need to be educated on the risks of shadow AI to prevent costly breaches
-
Google blows away competition with powerful new Gemini 3 modelNews Gemini 3 is the hyperscaler’s most powerful model yet and state of the art on almost every AI benchmark going
