OpenAI turns to red teamers to prevent malicious ChatGPT use as company warns future models could pose 'high' security risk
The ChatGPT maker wants to keep defenders ahead of attackers when it comes to AI security tools
OpenAI says its future AI models could create serious cybersecurity risks – so it's taking drastic action to crack down on malicious use.
The ChatGPT developer has previously detailed attempts by criminals to use its AI models to automate malware campaigns, banning accounts with suspicious activity.
Now, it plans to boost its efforts to avoid AI models being used in cyber attacks by training models to avoid malicious use, hiring red teaming organizations to test systems, and setting up a trusted partner system so only known groups can access the latest models for security purposes.
In addition, the company said its agentic security researcher Aardvark was now in private beta.
"Cyber capabilities in AI models are advancing rapidly, bringing meaningful benefits for cyber defense as well as new dual-use risks that must be managed carefully," the company said in a blog post.
Using one benchmark, GPT-5 scored 27% on a capture the flag challenge, but just a few months later GPT-5.1-Codex-Max scored 76%, OpenAI said. The company expects upcoming models will “continue on this trajectory."
Because of that, OpenAI said it was planning for each model as though it would reach "high levels of cybersecurity capability."
Sign up today and you will receive a free copy of our Future Focus 2026 report - the leading resource for IT decision-maker insight on priorities and investment areas in AI, security and more.
"By this, we mean models that can either develop working zero-day remote exploits against well-defended systems, or meaningfully assist with complex, stealthy enterprise or industrial intrusion operations aimed at real-world effects," the post added.
Warnings over the use of AI among cyber criminals have been growing in recent months as threat groups increasingly flock to powerful new tools to supercharge attacks.
Last month, for example, Google said it spotted malware that used AI to adapt to its environment mid-attack, while security researchers in August spotted a ransomware strain that made use of an OpenAI model.
More recently, TrendMicro has spotted criminals using intelligence reports as input to "vibe code" malware, while NetScout has warned that attackers are already using AI chatbots for DDoS attacks.
Jon Abbott, co-founder and CEO of ThreatAware, noted: "With models that can develop working zero-day remote exploits or assist with complex, stealthy intrusions, the barrier to entry for criminals has been dramatically lowered."
OpenAI is fighting back
The first step to battling growing risks is ensuring its models are useful for defensive security by helping cyber professionals with code auditing and bug spotting.
"Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced," OpenAI said.
However, such tools are also useful for the other side, OpenAI admitted, so it's building in a range of safeguards to help avoid their use by criminals.
"At the foundation of this, we take a defense-in-depth approach, relying on a combination of access controls, infrastructure hardening, egress controls, and monitoring," the post added.
"We complement these measures with detection and response systems, and dedicated threat intelligence and insider-risk programs, making it so emerging threats are identified and blocked quickly."
In practice, that means training models to refuse harmful requests, while still allowing what it calls "educational use cases", as well as bolstering detection systems, including blocking outputs that could be used for malicious purposes, with automatic and human-led review for enforcement.
OpenAI’s work with red teamers will also help spot flaws in the system, the company noted, and will play a key role in mitigating future issues.
"Their job is to try to bypass all of our defenses by working end-to-end, just like a determined and well-resourced adversary might," the post noted.
Future plans
OpenAI also announced a "trusted access program" for partners and known customers in security, offering access to the latest models and enhanced features. However, the company said it was still working on the "right boundary" between broad access for some capabilities versus which ones should fall behind such restrictions.
Aardvark, OpenAI's agentic security tool, is now in private beta. The AI tool can look for vulnerabilities and suggest patches, and OpenAI said it had already spotted novel flaws in open-source software.
The company said it would make Aardvark free to some non-commercial open source projects to help boost security of their ecosystems and supply chain.
Beyond that, the company revealed it will establish a Frontier Risk Council, an advisory body to keep a close watch on these issues in its own models, and would work with other AI developers via the Frontier Model Forum, a non-profit that works with labs on threat models.
"Taken together, this is ongoing work, and we expect to keep evolving these programs as we learn what most effectively advances real-world security," the company added.
In the meantime, ThreatAware's Abbott warned that the best way for businesses to battle the increasing security threat sparked by AI is to focus on the basics like user awareness, multi-factor authentication (MFA), and security controls.
“OpenAI’s warning that new models pose ‘high’ cybersecurity risks is exactly why getting the security foundations right is absolutely critical," he said. "AI might be accelerating the pace of attacks, but our best defence will continue to be nailing the fundamentals first."
He added: "Failing to address the basics should be a far greater concern, and there’s little point trying to implement advanced solutions if they’re not in place.”
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO
Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.
Nicole the author of a book about the history of technology, The Long History of the Future.
-
Apple’s Siri overhaul is a ‘watershed moment’ in its long-awaited AI pushNews The revamped Siri AI could put to rest questions over its lackluster approach to AI, providing it nails the roll-out
-
AMD chief exec Lisa Su touts UK’s AI potential as firm eyes £2bn investmentNews The deal will see a new AI supercomputer built in Cambridge and partnerships with Imperial College London and Oriole Networks
-
Uber’s eye-watering AI bill shows enterprises are ‘still measuring AI success through consumption rather than outcomes’ – and it's warping our perception of ROI and productivityNews ‘Tokenmaxxing’ might pad the stats, but it’s a trend that could come back to haunt enterprises
-
Destination AI: Una partnership affidabile per superare gli ostacoli e gettare le basi per la crescita futuraSponsored Con l'accelerazione dell'adozione dell''AI aziendale, i partner IT devono spostare la loro attenzione dall'hype tecnologico ai risultati aziendali tangibili, sfruttando ecosistemi strutturati per promuovere la monetizzazione a lungo termine
-
Le programme Destination AI : un partenariat de confiance pour surmonter les obstacles et poser les bases de votre croissance futureSponsored Alors que l'adoption de l'IA en entreprise s'accélère, les partenaires informatiques doivent réorienter leurs priorités : délaisser le battage technologique au profit de résultats commerciaux concrets, en exploitant des écosystèmes structurés pour assurer une monétisation à long terme
-
‘Organizations are layering AI on top of existing chaos’: AI is speeding up work for individual employees, but businesses-wide productivity is flounderingNews A study from Atlassian aligns with Accenture research showing individual productivity is improving, but business-wide gains aren’t being realized
-
‘The jobs picture is likely to be very different than we thought’: Sam Altman pours cold water on AI 'jobs apocalypse' claims – but that doesn’t mean there won’t be some workforce disruptionNews OpenAI CEO Sam Altman “thought there would have been more impact” on white collar and entry-level jobs at this point
-
‘Too many employees are serving as the human middleware’: Workers are wasting a full day each week switching between disparate AI tools and internal systemsNews Transferring data from one AI tool to another is costing more time than the tools actually save
-
Microsoft joins competitors in handing over AI models for advanced testingNews US and UK government agencies will evaluate the firm's frontier models, along with those from Google and xAI
