Anthropic wants to demystify the inner workings of its Claude AI models – and it might force OpenAI’s hand on transparency
Anthropic's decision to publish the system prompts that control the outputs of its Claude AI models marks a rare move for the industry
AI startup Anthropic has elected to publish the system prompts for its flagship Claude large language model as part of a new effort to improve transparency in the private model ecosystem.
System prompts comprise a set of rules or instructions that dictate how a model should respond to queries, outlining exactly what it can and can’t respond to, as well as the sentiment embodied in the output.
The instructions are intended to prevent the model from behaving maliciously, and steering its responses into a uniform tone and style, namely that of a helpful and inquisitive assistant.
The decision to make this information publicly available will help developers, as well as the general public gain a better understanding of how these often mystified models actually work in practice, Anthropic said.
Experts have broadly welcomed the move, describing it as a positive step in terms of AI ethics, and one that is aimed at giving the company an edge in the battle against competitors such as OpenAI.
The move was announced on 26 August by Alex Albert, head of developer relations at Anthropic, who revealed the newly disclosed system prompts will be included in a new release section on Anthropic documents.
Speaking to ITPro, Alastair Paterson, CEO and co-founder of data protection firm Harmonic Security said the move was likely an attempt to present Anthropic as leading the market in terms of transparency and responsible AI governance.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
“Anthropic seem to be trying to position themselves as ‘more-open’ than competitors such as OpenAI and Google which may help to give themselves a differentiator in the market. OpenAI, in particular, has been criticized for not living up to being ‘open’ by none other than Elon Musk – so if anything, it would seem a direct challenge to OpenAI.”
A prominent member of OpenAI’s GPT Builder creator program, Nick Dobos, who has built a number of custom GPTs on the platform, voiced his support for the move on X, contrasting Antropic’s openness with that of OpenAI.
Criticism of OpenAI’s transparency has not been limited to external parties, with a group of current and former employees penning an anonymous open letter warning that the company had strong incentives to “avoid effective oversight” over their models.
“AI companies possess substantial non-public information about the capabilities and limitations of their systems, the adequacy of their protective measures, and the risk levels of different kinds of harm. However, they currently have only weak obligations to share some of this information with governments, and none with civil society. We do not think they can all be relied upon to share it voluntarily,” the letter stated.
Prompt engineering threats not significantly increased by Anthropics decision to go public
Cyber criminals could be unintended beneficiaries of Anthropic’s decision to make Claude’s system prompts public. Some industry stakeholders have warned that threat actors could potentially leverage this information to gain a deeper understanding of system frailties, which can then be exploited in the future.
This threat should not be exaggerated, however, according to Peter van der Putten, director of the AI Labat Pegasystems and assistant professor of AI at Leiden University. Putten told ITPro making these prompts public was more important than any associated risks.
“I see the move to publish system messages as a positive one, and a significant one from an AI ethics principles perspective. On the flip side, one should not overestimate both the importance of the system prompt, nor exaggerate the risks,” he argued.
Paterson came to a similar conclusion, adding that Anthropic likely weighed the potential threats associated with the move against the benefits.
“It is likely that a judgment will have been made that any additional risk posed by providing these system prompts is outweighed by the benefits of publicity and the value of being able to position themselves as more virtuous than their competitors."
Vincenzo Ciancaglini, senior threat researcher at Trend Micro, told ITPro attackers already had various ways in which they could corrupt LLMs without needing access to the system prompts, and in many cases actively try to remove these prompts.
“Understanding the system prompt for a specific LLM could give insights into the inner workings of the LLM itself, which might help in some classes of jailbreaking. However, there are plenty of other jailbreaking techniques which are independent of the system prompt. Many times, the prompt injection starts with trying to get the LLM to forget the system prompt.”
Shaked Reiner, principal security researcher at CyberArk Labs, agreed with this assessment, adding that the public benefits of publishing the system prompts were more important than any perceived increase in the threat of malicious prompt engineering.
“Attackers will inevitably get their hands on system prompts, but by making them publicly available, the company empowers normal users who otherwise wouldn't have access to this information,” Reiner told ITPro.
“As humanity is still in the early stages of our AI journey, we have yet to establish adequate safety and security standards. We believe that sharing more information about private models publicly will contribute to the development of these standards.”

Solomon Klappholz is a former staff writer for ITPro and ChannelPro. He has experience writing about the technologies that facilitate industrial manufacturing, which led to him developing a particular interest in cybersecurity, IT regulation, industrial infrastructure applications, and machine learning.
-
How the UK public sector could benefit from strategic channel partnershipsIndustry Insights Is the channel the answer to the growing cost vs budget problem facing the public sector?
-
Microsoft wants to replace C and C++ with Rust by 2030News Windows won’t be rewritten in Rust using AI, according to a senior Microsoft engineer, but the company still has bold plans for embracing the popular programming language
-
OpenAI says prompt injection attacks are a serious threat for AI browsers – and it’s a problem that’s ‘unlikely to ever be fully solved'News OpenAI details efforts to protect ChatGPT Atlas against prompt injection attacks
-
OpenAI says GPT-5.2-Codex is its ‘most advanced agentic coding model yet’ – here’s what developers and cyber teams can expectNews GPT-5.2 Codex is available immediately for paid ChatGPT users and API access will be rolled out in “coming weeks”
-
OpenAI turns to red teamers to prevent malicious ChatGPT use as company warns future models could pose 'high' security riskNews The ChatGPT maker wants to keep defenders ahead of attackers when it comes to AI security tools
-
Some of the most popular open weight AI models show ‘profound susceptibility’ to jailbreak techniquesNews Open weight AI models from Meta, OpenAI, Google, and Mistral all showed serious flaws
-
'It's slop': OpenAI co-founder Andrej Karpathy pours cold water on agentic AI hype – so your jobs are safe, at least for nowNews Despite the hype surrounding agentic AI, OpenAI co-founder Andrej Karpathy isn't convinced and says there's still a long way to go until the tech delivers real benefits.
-
OpenAI signs another chip deal, this time with AMDnews AMD deal is worth billions, and follows a similar partnership with Nvidia last month
-
OpenAI signs series of AI data center deals with SamsungNews As part of its Stargate initiative, the firm plans to ramp up its chip purchases and build new data centers in Korea
-
Why Nvidia’s $100 billion deal with OpenAI is a win-win for both companiesNews OpenAI will use Nvidia chips to build massive systems to train AI