OpenAI turns to red teamers to prevent malicious ChatGPT use as company warns future models could pose 'high' security risk
The ChatGPT maker wants to keep defenders ahead of attackers when it comes to AI security tools
OpenAI says its future AI models could create serious cybersecurity risks – so it's taking drastic action to crack down on malicious use.
The ChatGPT developer has previously detailed attempts by criminals to use its AI models to automate malware campaigns, banning accounts with suspicious activity.
Now, it plans to boost its efforts to avoid AI models being used in cyber attacks by training models to avoid malicious use, hiring red teaming organizations to test systems, and setting up a trusted partner system so only known groups can access the latest models for security purposes.
In addition, the company said its agentic security researcher Aardvark was now in private beta.
"Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyber defense as well as new dual-use risks that must be managed carefully," the company said in a blog post.
Using one benchmark, GPT-5 scored 27% on a capture the flag challenge, but just a few months later GPT-5.1-Codex-Max scored 76%, OpenAI said. The company expects upcoming models will “continue on this trajectory."
Because of that, OpenAI said it was planning for each model as though it would reach "high levels of cybersecurity capability."
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
"By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects," the post added.
Warnings over the use of AI among cyber criminals have been growing in recent months as threat groups increasingly flock to powerful new tools to supercharge attacks.
Last month, for example, Google said it spotted malware that used AI to adapt to its environment mid-attack, while security researchers in August spotted a ransomware strain that made use of an OpenAI model.
More recently, TrendMicro has spotted criminals using intelligence reports as input to "vibe code" malware, while NetScout has warned that attackers are already using AI chatbots for DDoS attacks.
Jon Abbott, co-founder and CEO of ThreatAware, noted: "With models that can develop working zero-day remote exploits or assist with complex, stealthy intrusions, the barrier to entry for criminals has been dramatically lowered."
OpenAI is fighting back
The first step to battling growing risks is ensuring its models are useful for defensive security by helping cyber professionals with code auditing and bug spotting.
"Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced," OpenAI said.
However, such tools are also useful for the other side, OpenAI admitted, so it's building in a range of safeguards to help avoid their use by criminals.
"At the foundation of this, we take a defense-in-depth approach, relying on a combination of access controls, infrastructure hardening, egress controls, and monitoring," the post added.
"We complement these measures with detection and response systems, and dedicated threat intelligence and insider-risk programs, making it so emerging threats are identified and blocked quickly."
In practice, that means training models to refuse harmful requests, while still allowing what it calls "educational use cases", as well as bolstering detection systems, including blocking outputs that could be used for malicious purposes, with automatic and human-led review for enforcement.
OpenAI’s work with red teamers will also help spot flaws in the system, the company noted, and will play a key role in mitigating future issues.
"Their job is to try to bypass all of our defenses by working end-to-end, just like a determined and well-resourced adversary might," the post noted.
Future plans
OpenAI also announced a "trusted access program" for partners and known customers in security, offering access to the latest models and enhanced features. However, the company said it was still working on the "right boundary" between broad access for some capabilities versus which ones should fall behind such restrictions.
Aardvark, OpenAI's agentic security tool, is now in private beta. The AI tool can look for vulnerabilities and suggest patches, and OpenAI said it had already spotted novel flaws in open-source software.
The company said it would make Aardvark free to some non-commercial open source projects to help boost security of their ecosystems and supply chain.
Beyond that, the company revealed it will establish a Frontier Risk Council, an advisory body to keep a close watch on these issues in its own models, and would work with other AI developers via the Frontier Model Forum, a non-profit that works with labs on threat models.
"Taken together, this is ongoing work, and we expect to keep evolving these programs as we learn what most effectively advances real-world security," the company added.
In the meantime, ThreatAware's Abbott warned that the best way for businesses to battle the increasing security threat sparked by AI is to focus on the basics like user awareness, multi-factor authentication (MFA), and security controls.
“OpenAI’s warning that new models pose ‘high’ cybersecurity risks is exactly why getting the security foundations right is absolutely critical," he said. "AI might be accelerating the pace of attacks, but our best defence will continue to be nailing the fundamentals first."
He added: "Failing to address the basics should be a far greater concern, and there’s little point trying to implement advanced solutions if they’re not in place.”
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO
Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.
Nicole the author of a book about the history of technology, The Long History of the Future.
-
What is Microsoft Maia?Explainer Microsoft's in-house chip is planned to a core aspect of Microsoft Copilot and future Azure AI offerings
-
If Satya Nadella wants us to take AI seriously, let’s forget about mass adoption and start with a return on investment for those already using itOpinion If Satya Nadella wants us to take AI seriously, let's start with ROI for businesses
-
Half of agentic AI projects are still stuck at the pilot stage – but that’s not stopping enterprises from ramping up investmentNews Organizations are stymied by issues with security, privacy, and compliance, as well as the technical challenges of managing agents at scale
-
What Anthropic's constitution changes mean for the future of ClaudeNews The developer debates AI consciousness while trying to make Claude chatbot behave better
-
Satya Nadella says a 'telltale sign' of an AI bubble is if it only benefits tech companies – but the technology is now having a huge impact in a range of industriesNews Microsoft CEO Satya Nadella appears confident that the AI market isn’t in the midst of a bubble, but warned widespread adoption outside of the technology industry will be key to calming concerns.
-
DeepSeek rocked Silicon Valley in January 2025 – one year on it looks set to shake things up again with a powerful new model releaseAnalysis The Chinese AI company sent Silicon Valley into meltdown last year and it could rock the boat again with an upcoming model
-
Workers are wasting half a day each week fixing AI ‘workslop’News Better staff training and understanding of the technology is needed to cut down on AI workslop
-
Retailers are turning to AI to streamline supply chains and customer experience – and open source options are proving highly popularNews Companies are moving AI projects from pilot to production across the board, with a focus on open-source models and software, as well as agentic and physical AI
-
Microsoft CEO Satya Nadella wants an end to the term ‘AI slop’ and says 2026 will be a ‘pivotal year’ for the technology – but enterprises still need to iron out key lingering issuesNews Microsoft CEO Satya Nadella might want the term "AI slop" shelved in 2026, but businesses will still be dealing with increasing output problems and poor returns.
-
OpenAI says prompt injection attacks are a serious threat for AI browsers – and it’s a problem that’s ‘unlikely to ever be fully solved'News OpenAI details efforts to protect ChatGPT Atlas against prompt injection attacks
