Some of the most popular open weight AI models show ‘profound susceptibility’ to jailbreak techniques
Open weight AI models from Meta, OpenAI, Google, and Mistral all showed serious flaws
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
You are now subscribed
Your newsletter sign-up was successful
A host of leading open weight AI models contain serious security vulnerabilities, according to researchers at Cisco.
In a new study, researchers found these models, which are publicly available and can be downloaded and modified by users based on individual needs, displayed “profound susceptibility to adversarial manipulation” techniques.
Cisco evaluated models by a range of firms including:
- Alibaba (Qwen3-32B)
- DeepSeek (v3.1)
- Google (Gemma 3-1B-IT)
- Meta (Llama 3.3-70B-Instruct)
- Microsoft (Phi-4)
- OpenAI (GPT-OSS-20b)
- Mistral (Large-2)
All of the aforementioned models were put through their paces with Cisco’s AI Validation tool, which is used to assess model safety and probe for potential security vulnerabilities.
Researchers found that, for all models, susceptibility to “multi-turn jailbreak attacks” was a key recurring issue. This is a method whereby an individual can essentially force a model to produce prohibited content.
This is achieved by using specifically-crafted instructions from the user that, over time, can be used to manipulate the model’s behavior. This is a more laborious process than “single-turn” techniques, which involve manipulating a model with a single effective malicious prompt.
Multi-turn jailbreak techniques have been observed in the wild before, particularly with the use of the Skeleton Key method, which allowed hackers to convince an AI model to produce instructions for making a Molotov cocktail.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Success rates with individual models varied wildly, the study noted. Researchers recorded a 25.86% success rate with Google’s Gemma-3-1B-IT model, for example, while also recording a 92.78% success rate with Mistral Large-2.
Researchers also recorded the highest success rate for single-turn attack methods with both these models.
Different strokes for different folks
The varied success rates recorded by Cisco lie in how these models are typically used, researchers noted. This rests on two key factors: alignment and capability.
In the case of 'alignment', this refers to how an AI model acts in the context of human intentions and values. 'Capability', meanwhile, refers to the model’s ability to perform a specific task.
For example, models such as Meta’s Llama range, which place a lower focus on alignment, showed the highest susceptibility to multi-turn attack methods.
Researchers noted that this is because Meta made a conscious decision to place developers “in the driver seat” in terms of tailoring the model’s safety mechanisms based on individual use-cases.
“Models that focused heavily on alignment (e.g., Google Gemma-3-1B-IT) did demonstrate a more balanced profile between single- and multi-turn strategies deployed against it, indicating a focus on “rigorous safety protocols” and “low risk level” for misuse,” the study said.
AI model flaws have real-world implications
Researchers warned that flaws contained in these models could have real-world ramifications, particularly with regard to data protection and privacy.
“This could translate into real-world threats, including risks of sensitive data exfiltration, content manipulation leading to compromise of integrity of data and information, ethical breaches through biased outputs, and even operational disruptions in integrated systems like chatbots or decision-support tools,” the study noted.
Notably, in enterprise settings, they warned these vulnerabilities could “enable unauthorized access to proprietary information”.
Concerns over AI model manipulation have become a common recurring theme since the advent of generative AI in late 2022, with a steady flow of new jailbreak techniques emerging on a regular basis.
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.
He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.
For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.
-
Anthropic researchers warn AI could 'inhibit skills formation' for developersNews A research paper from Anthropic suggests we need to be careful deploying AI to avoid losing critical skills
-
CultureAI’s new partner program targets AI governance gains for resellersNews The new partner framework aims to help resellers turn AI governance gaps into scalable services revenue
-
OpenAI's Codex app is now available on macOS – and it’s free for some ChatGPT users for a limited timeNews OpenAI has rolled out the macOS app to help developers make more use of Codex in their work
-
B2B Tech Future Focus - 2026Whitepaper Advice, insight, and trends for modern B2B IT leaders
-
Amazon’s rumored OpenAI investment points to a “lack of confidence” in Nova model rangeNews The hyperscaler is among a number of firms targeting investment in the company
-
OpenAI admits 'losing access to GPT‑4o will feel frustrating' for users – the company is pushing ahead with retirement plans anwayNews OpenAI has confirmed plans to retire its popular GPT-4o model in February, citing increased uptake of its newer GPT-5 model range.
-
What the UK's new Centre for AI Measurement means for the future of the industryNews The project, led by the National Physical Laboratory, aims to accelerate the development of secure, transparent, and trustworthy AI technologies
-
‘In the model race, it still trails’: Meta’s huge AI spending plans show it’s struggling to keep pace with OpenAI and Google – Mark Zuckerberg thinks the launch of agents that ‘really work’ will be the keyNews Meta CEO Mark Zuckerberg promises new models this year "will be good" as the tech giant looks to catch up in the AI race
-
Half of agentic AI projects are still stuck at the pilot stage – but that’s not stopping enterprises from ramping up investmentNews Organizations are stymied by issues with security, privacy, and compliance, as well as the technical challenges of managing agents at scale
-
What Anthropic's constitution changes mean for the future of ClaudeNews The developer debates AI consciousness while trying to make Claude chatbot behave better