OpenAI turns to red teamers to prevent malicious ChatGPT use as company warns future models could pose 'high' security risk
The ChatGPT maker wants to keep defenders ahead of attackers when it comes to AI security tools
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
You are now subscribed
Your newsletter sign-up was successful
OpenAI says its future AI models could create serious cybersecurity risks – so it's taking drastic action to crack down on malicious use.
The ChatGPT developer has previously detailed attempts by criminals to use its AI models to automate malware campaigns, banning accounts with suspicious activity.
Now, it plans to boost its efforts to avoid AI models being used in cyber attacks by training models to avoid malicious use, hiring red teaming organizations to test systems, and setting up a trusted partner system so only known groups can access the latest models for security purposes.
In addition, the company said its agentic security researcher Aardvark was now in private beta.
"Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyber defense as well as new dual-use risks that must be managed carefully," the company said in a blog post.
Using one benchmark, GPT-5 scored 27% on a capture the flag challenge, but just a few months later GPT-5.1-Codex-Max scored 76%, OpenAI said. The company expects upcoming models will “continue on this trajectory."
Because of that, OpenAI said it was planning for each model as though it would reach "high levels of cybersecurity capability."
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
"By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects," the post added.
Warnings over the use of AI among cyber criminals have been growing in recent months as threat groups increasingly flock to powerful new tools to supercharge attacks.
Last month, for example, Google said it spotted malware that used AI to adapt to its environment mid-attack, while security researchers in August spotted a ransomware strain that made use of an OpenAI model.
More recently, TrendMicro has spotted criminals using intelligence reports as input to "vibe code" malware, while NetScout has warned that attackers are already using AI chatbots for DDoS attacks.
Jon Abbott, co-founder and CEO of ThreatAware, noted: "With models that can develop working zero-day remote exploits or assist with complex, stealthy intrusions, the barrier to entry for criminals has been dramatically lowered."
OpenAI is fighting back
The first step to battling growing risks is ensuring its models are useful for defensive security by helping cyber professionals with code auditing and bug spotting.
"Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced," OpenAI said.
However, such tools are also useful for the other side, OpenAI admitted, so it's building in a range of safeguards to help avoid their use by criminals.
"At the foundation of this, we take a defense-in-depth approach, relying on a combination of access controls, infrastructure hardening, egress controls, and monitoring," the post added.
"We complement these measures with detection and response systems, and dedicated threat intelligence and insider-risk programs, making it so emerging threats are identified and blocked quickly."
In practice, that means training models to refuse harmful requests, while still allowing what it calls "educational use cases", as well as bolstering detection systems, including blocking outputs that could be used for malicious purposes, with automatic and human-led review for enforcement.
OpenAI’s work with red teamers will also help spot flaws in the system, the company noted, and will play a key role in mitigating future issues.
"Their job is to try to bypass all of our defenses by working end-to-end, just like a determined and well-resourced adversary might," the post noted.
Future plans
OpenAI also announced a "trusted access program" for partners and known customers in security, offering access to the latest models and enhanced features. However, the company said it was still working on the "right boundary" between broad access for some capabilities versus which ones should fall behind such restrictions.
Aardvark, OpenAI's agentic security tool, is now in private beta. The AI tool can look for vulnerabilities and suggest patches, and OpenAI said it had already spotted novel flaws in open-source software.
The company said it would make Aardvark free to some non-commercial open source projects to help boost security of their ecosystems and supply chain.
Beyond that, the company revealed it will establish a Frontier Risk Council, an advisory body to keep a close watch on these issues in its own models, and would work with other AI developers via the Frontier Model Forum, a non-profit that works with labs on threat models.
"Taken together, this is ongoing work, and we expect to keep evolving these programs as we learn what most effectively advances real-world security," the company added.
In the meantime, ThreatAware's Abbott warned that the best way for businesses to battle the increasing security threat sparked by AI is to focus on the basics like user awareness, multi-factor authentication (MFA), and security controls.
“OpenAI’s warning that new models pose ‘high’ cybersecurity risks is exactly why getting the security foundations right is absolutely critical," he said. "AI might be accelerating the pace of attacks, but our best defence will continue to be nailing the fundamentals first."
He added: "Failing to address the basics should be a far greater concern, and there’s little point trying to implement advanced solutions if they’re not in place.”
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO
Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.
Nicole the author of a book about the history of technology, The Long History of the Future.
-
Sectigo taps Clint Maddox to lead global field operationsReviews The appointment follows a year of strong momentum for the security vendor as it expands its global channel footprint
-
Microsoft has a new AI poster child in AnthropicOpinion Microsoft is cosying up to Anthropic at a crucial time in the race to deliver on AI promises
-
Microsoft has a new AI poster child in Anthropic – and it’s about timeOpinion Microsoft is cosying up to Anthropic at a crucial time in the race to deliver on AI promises
-
Concerns are mounting over the cognitive impact of AI as workers report experiencing ‘brain fry’ – and it’s causing "increased employee errors, decision fatigue, and intention to quit"News Research from Boston Consulting Group backs earlier studies in highlighting the negative cognitive impact of AI at work
-
Will AI hiring entrench gender bias?ITPro Podcast This International Women's Day, it's more important than ever to consider the inherent biases of training data
-
Why Amazon’s ‘go build it’ AI strategy aligns with OpenAI’s big enterprise pushNews OpenAI and Amazon are both vying to offer customers DIY-style AI development services
-
February rundown: SaaS-pocalypse now?ITPro Podcast Geopolitical uncertainty is intensifying public and private sector focus on true sovereign workloads
-
‘A huge vote of confidence’: London set to host OpenAI's largest research hub outside USNews OpenAI wants to capitalize on the UK’s “world-class” talent in areas such as machine learning
-
If you thought RTO battles were bad, wait until AI mandates start taking hold across the industryOpinion Forcing workers to adopt AI under the threat of poor performance reviews and losing out on promotions will only create friction
-
Sam Altman just said what everyone is thinking about AI layoffsNews AI layoff claims are overblown and increasingly used as an excuse for “traditional drivers” when implementing job cuts
