Microsoft warns 'Skeleton Key' can crack popular AI models for dangerous outputs
Microsoft says threat actors can bypass guardrails built into some of the most popular LLMs using this simple technique
Microsoft has published threat intelligence warning users of a new jailbreaking method which can prompt AI models into disclosing harmful information.
The technique is able to force LLMs to totally disregard behavioral guidelines built into the models by the AI vendor, earning it the name Skeleton Key.
In a report published on 26 June, Microsoft detailed the attack flow through which Skeleton Key is able to force models into responding to illicit requests and revealing harmful information.
“Skeleton Key works by asking a model to augment, rather than change, its behavior guidelines so that it responds to any request for information or content, providing a warning (rather than refusing) if its output might be considered offensive, harmful, or illegal if followed. This attack type is known as Explicit: forced instruction-following.”
In an example provided by Microsoft, a model was convinced into providing instructions for making a molotov cocktail using a prompt that insisted its request was being made in “a safe educational context”.
The prompt instructed the model to update its behavior to supply the illicit information, only telling it to prefix it with a warning.
If the jailbreak is successful, the model will acknowledge that it has updated its guardrails and will, “subsequently comply with instructions to produce any content, no matter how much it violates its original responsible AI guidelines.”
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Microsoft tested the technique between April and May 2024, and found it was effective when used on Meta LLama3-70b, Google Gemini Pro, GPT 3.5 and 4o, Mistral Large, Anthropic Claude 3 Opus, and Cohere Commander R Plus, but added the attacker would need to have legitimate access to the model to carry out the attack.
Microsoft's disclosure marks the latest LLM jailbreaking issue
Microsoft said it has addressed the issue in its Azure AI-managed models using prompt shields to detect and block the Skeleton Key technique, but because it affects a wide range generative AI models it tested, the firm has also shared its findings with other AI providers.
Microsoft added it has also made software updates to its other AI offerings, including its Copilot AI assistants, to mitigate the impact of the guardrail bypass.
The explosion in interest and adoption of generative AI tools has precipitated an accompanying wave of attempts to break these models for malicious purposes.
In April 2024, Anthropic researchers warned of a jailbreaking technique that could be used to force models into providing detailed instructions on constructing explosives.
RELATED WHITEPAPER
They explained the latest generation of models with larger context windows are vulnerable to exploitation due to their improved performance. The researchers were able to exploit models’ ‘in-context learning’ capabilities which helps it improve its answers based on the prompts.
Earlier this year, three researchers at Brown University discovered a cross-lingual vulnerability in OpenAI’s GPT-4.
The researchers found they could induce prohibited behavior from the model by translating their malicious queries into one of a number of ‘low resource’ languages.
The results of the investigation showed the models are more likely to follow prompts encouraging harmful behaviors when promoted using languages such as Zulu, Scots Gaelic, Hmong, and Guarani.

Solomon Klappholz is a former staff writer for ITPro and ChannelPro. He has experience writing about the technologies that facilitate industrial manufacturing, which led to him developing a particular interest in cybersecurity, IT regulation, industrial infrastructure applications, and machine learning.
-
AWS just quietly increased EC2 Capacity Block prices – here's what you need to knowNews The AWS price increases mean booking GPU capacity in advance just got more expensive
-
Accenture acquires Faculty, poaches CEO in bid to drive client AI adoptionNews The Faculty acquisition will help Accenture streamline AI adoption processes
-
These Microsoft Teams security features will be turned on by default this month – here's what admins need to knowNews From 12 January, weaponizable file type protection, malicious URL detection, and a system for reporting false positives will all be automatically activated.
-
The Microsoft bug bounty program just got a big update — and even applies to third-party codeNews Microsoft is expanding its bug bounty program to cover all of its products, even those that haven't previously been covered by a bounty before and even third-party code.
-
Microsoft Teams is getting a new location tracking feature that lets bosses snoop on staff – research shows it could cause workforce pushbackNews A new location tracking feature in Microsoft Teams will make it easier to keep tabs on your colleague's activities – and for your boss to know exactly where you are.
-
Microsoft opens up Entra Agent ID preview with new AI featuresNews Microsoft Entra Agent ID aims to help manage influx of AI agents using existing tools
-
A notorious ransomware group is spreading fake Microsoft Teams ads to snare victimsNews The Rhysida ransomware group is leveraging Trusted Signing from Microsoft to lend plausibility to its activities
-
CISA just published crucial new guidance on keeping Microsoft Exchange servers secureNews With a spate of attacks against Microsoft Exchange in recent years, CISA and the NSA have published crucial new guidance for organizations to shore up defenses.
-
CISA issues alert after botched Windows Server patch exposes critical flawNews A critical remote code execution flaw in Windows Server is being exploited in the wild, despite a previous 'fix'
-
Microsoft issues warning over “opportunistic” cyber criminals targeting big businessNews Microsoft has called on governments to do more to support organizations