OpenAI says prompt injection attacks are a serious threat for AI browsers – and it’s a problem that’s ‘unlikely to ever be fully solved'
OpenAI details efforts to protect ChatGPT Atlas against prompt injection attacks
OpenAI has updated its browser to boost protection against prompt injection attacks, but it warned the risk may never fully disappear.
Released in October, OpenAI's ChatGPT Atlas browser includes agent mode, which looks at webpages to click its way through transactions, forms and other online tasks.
But OpenAI noted that as a browser agent can do more, it also becomes more at risk to "adversarial attacks" – in particular prompt injections, which are sneaking malicious instructions into the agent to drive its behavior.
"Prompt injection is one of the most significant risks we actively defend against to help ensure ChatGPT Atlas can operate securely on your behalf," OpenAI said in a blog post.
Indeed, days after the release of OpenAI's browser, security researchers spotted several serious flaws, including a prompt-injection technique – no wonder then that analysts at Gartner have warned companies to ban AI browsers for fear of security risks.
OpenAI said it recently updated ChatGPT Atlas's agent security safeguards and gave it a new model that had been "adversarially trained", as well developing a "rapid response loop" to find flaws and address them.
That was sparked by red teaming, in which an internal team acts like threat actors to test the system for flaws or weaknesses. In this instance, what they found suggests prompt injection is a "long-term AI security challenge".
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
"Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully 'solved'," the company said.
"But we’re optimistic that a proactive, highly responsive rapid response loop can continue to materially reduce real-world risk over time," OpenAI added.
"By combining automated attack discovery with adversarial training and system-level safeguards, we can identify new attack patterns earlier, close gaps faster, and continuously raise the cost of exploitation."
New challenge for AI browsers
Prompt injection is when attackers get between an agent's prompt box and the AI model, changing instructions to create malicious results. It's a new problem for browsers that boast AI features – and there’s a growing array already.
The main issue here is that since agents can take many of the same actions as a user, OpenAI said the potential impact of a successful attack could be “just as broad”.
As an example, OpenAI said an attacker could send a malicious email to trick an agent to ignore the user's actual request in favour of forwarding sensitive documents.
The user gives access to email for a legitimate task, such as summarizing messages, but if the agent also follows the injected instructions, sensitive data could leak.
Fighting back
OpenAI has made previous efforts to protect against such attacks, but is now adding new techniques to help avoid prompt injections.
First, the company has built an AI-powered hacker to use as an automated red teaming tool to proactively hunt out prompt injection attacks – even complicated ones taking hundreds of steps.
"We trained this attacker end-to-end with reinforcement learning, so it learns from its own successes and failures to improve its red teaming skills," the company said.
Beyond that, OpenAI has developed what it calls a "rapid response loop". When that automated red team spots a potential injection technique, that's fed back into the AI via adversarial training.
"We continuously train updated agent models against our best automated attacker—prioritizing the attacks where the target agents currently fail," the company added.
"The goal is to teach agents to ignore adversarial instructions and stay aligned with the user’s intent, improving resistance to newly discovered prompt-injection strategies."
Similarly, when using the agent in the ChatGPT Atlas browser, OpenAI advised using "logged out" mode where possible, signing in only when necessary to complete a task, and told users to review all confirmation requests carefully.
When it comes to prompts, be specific rather than broad: saying "review my emails and take whatever action is needed" gives space for threat actors to meddle.
FOLLOW US ON SOCIAL MEDIA
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
You can also follow ITPro on LinkedIn, X, Facebook, and BlueSky.
Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.
Nicole the author of a book about the history of technology, The Long History of the Future.
-
European Commission approves data flows with UK for another six yearsNews The European Commission says the UK can have seamless data flows for another six years despite recent rule changes
-
Keeper Security expands federal bench with latest senior hiresNews The security vendor has bolstered its federal team to support zero-trust access, operational execution, and government modernization efforts
-
OpenAI says GPT-5.2-Codex is its ‘most advanced agentic coding model yet’ – here’s what developers and cyber teams can expectNews GPT-5.2 Codex is available immediately for paid ChatGPT users and API access will be rolled out in “coming weeks”
-
Google DeepMind CEO Demis Hassabis thinks startups are in the midst of an 'AI bubble'News AI startups raising huge rounds fresh out the traps are a cause for concern, according to Hassabis
-
OpenAI turns to red teamers to prevent malicious ChatGPT use as company warns future models could pose 'high' security riskNews The ChatGPT maker wants to keep defenders ahead of attackers when it comes to AI security tools
-
AWS has dived headfirst into the agentic AI hype cycle, but old tricks will help it chart new watersOpinion While AWS has jumped on the agentic AI hype train, its reputation as a no-nonsense, reliable cloud provider will pay dividends
-
AWS CEO Matt Garman says AI agents will have 'as much impact on your business as the internet or cloud'News Garman told attendees at AWS re:Invent that AI agents represent a paradigm shift in the trajectory of AI and will finally unlock returns on investment for enterprises.
-
Westcon-Comstor partners with Fortanix to drive AI expertise in EMEANews The new agreement will help EMEA channel partners ramp up AI and multi-cloud capabilities
-
Microsoft quietly launches Fara-7B, a new 'agentic' small language model that lives on your PC — and it’s more powerful than GPT-4oNews The new Fara-7B model is designed to takeover your mouse and keyboard
-
Anthropic announces Claude Opus 4.5, the new AI coding frontrunnerNews The new frontier model is a leap forward for the firm across agentic tool use and resilience against attacks
