Some of the most popular open weight AI models show ‘profound susceptibility’ to jailbreak techniques
Open weight AI models from Meta, OpenAI, Google, and Mistral all showed serious flaws
A host of leading open weight AI models contain serious security vulnerabilities, according to researchers at Cisco.
In a new study, researchers found these models, which are publicly available and can be downloaded and modified by users based on individual needs, displayed “profound susceptibility to adversarial manipulation” techniques.
Cisco evaluated models by a range of firms including:
- Alibaba (Qwen3-32B)
- DeepSeek (v3.1)
- Google (Gemma 3-1B-IT)
- Meta (Llama 3.3-70B-Instruct)
- Microsoft (Phi-4)
- OpenAI (GPT-OSS-20b)
- Mistral (Large-2)
All of the aforementioned models were put through their paces with Cisco’s AI Validation tool, which is used to assess model safety and probe for potential security vulnerabilities.
Researchers found that, for all models, susceptibility to “multi-turn jailbreak attacks” was a key recurring issue. This is a method whereby an individual can essentially force a model to produce prohibited content.
This is achieved by using specifically-crafted instructions from the user that, over time, can be used to manipulate the model’s behavior. This is a more laborious process than “single-turn” techniques, which involve manipulating a model with a single effective malicious prompt.
Multi-turn jailbreak techniques have been observed in the wild before, particularly with the use of the Skeleton Key method, which allowed hackers to convince an AI model to produce instructions for making a Molotov cocktail.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Success rates with individual models varied wildly, the study noted. Researchers recorded a 25.86% success rate with Google’s Gemma-3-1B-IT model, for example, while also recording a 92.78% success rate with Mistral Large-2.
Researchers also recorded the highest success rate for single-turn attack methods with both these models.
Different strokes for different folks
The varied success rates recorded by Cisco lie in how these models are typically used, researchers noted. This rests on two key factors: alignment and capability.
In the case of 'alignment', this refers to how an AI model acts in the context of human intentions and values. 'Capability', meanwhile, refers to the model’s ability to perform a specific task.
For example, models such as Meta’s Llama range, which place a lower focus on alignment, showed the highest susceptibility to multi-turn attack methods.
Researchers noted that this is because Meta made a conscious decision to place developers “in the driver seat” in terms of tailoring the model’s safety mechanisms based on individual use-cases.
“Models that focused heavily on alignment (e.g., Google Gemma-3-1B-IT) did demonstrate a more balanced profile between single- and multi-turn strategies deployed against it, indicating a focus on “rigorous safety protocols” and “low risk level” for misuse,” the study said.
AI model flaws have real-world implications
Researchers warned that flaws contained in these models could have real-world ramifications, particularly with regard to data protection and privacy.
“This could translate into real-world threats, including risks of sensitive data exfiltration, content manipulation leading to compromise of integrity of data and information, ethical breaches through biased outputs, and even operational disruptions in integrated systems like chatbots or decision-support tools,” the study noted.
Notably, in enterprise settings, they warned these vulnerabilities could “enable unauthorized access to proprietary information”.
Concerns over AI model manipulation have become a common recurring theme since the advent of generative AI in late 2022, with a steady flow of new jailbreak techniques emerging on a regular basis.
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.
He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.
For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.
-
From underground bunkers to 'data spas' and the 'floating cloud', Lenovo is getting creative with future data center ideasNews Lenovo might have its head in the clouds with some ideas, but other radical solutions are already in operation
-
Sam Altman in damage control mode over government 'backstop' commentsNews OpenAI CEO Sam Altman appears to be in a state of damage control in the wake of recent comments touting potential government support for the AI company.
-
Sundar Pichai thinks commercially viable quantum computing is just 'a few years' awayNews The Alphabet exec acknowledged that Google just missed beating OpenAI to model launches but emphasized the firm’s inherent AI capabilities
-
'It's slop': OpenAI co-founder Andrej Karpathy pours cold water on agentic AI hype – so your jobs are safe, at least for nowNews Despite the hype surrounding agentic AI, OpenAI co-founder Andrej Karpathy isn't convinced and says there's still a long way to go until the tech delivers real benefits.
-
Nvidia CEO Jensen Huang says future enterprises will employ a ‘combination of humans and digital humans’ – but do people really want to work alongside agents? The answer is complicated.News Enterprise workforces of the future will be made up of a "combination of humans and digital humans," according to Nvidia CEO Jensen Huang. But how will humans feel about it?
-
‘I don't think anyone is farther in the enterprise’: Marc Benioff is bullish on Salesforce’s agentic AI lead – and Agentforce 360 will help it stay top of the perchNews Salesforce is leaning on bringing smart agents to customer data to make its platform the easiest option for enterprises
-
This new Microsoft tool lets enterprises track internal AI adoption rates – and even how rival companies are using the technologyNews Microsoft's new Benchmarks feature lets managers track and monitor internal Copilot adoption and usage rates – and even how rival companies are using the tool.
-
Salesforce just launched a new catch-all platform to build enterprise AI agentsNews Businesses will be able to build agents within Slack and manage them with natural language
-
The tech industry is becoming swamped with agentic AI solutions – analysts say that's a serious cause for concernNews “Undifferentiated” AI companies will be the big losers in the wake of a looming market correction
-
Microsoft says 71% of workers have used unapproved AI tools at work – and it’s a trend that enterprises need to crack down onNews Shadow AI is by no means a new trend, but it’s creating significant risks for enterprises