OpenAI says prompt injection attacks are a serious threat for AI browsers – and it’s a problem that’s ‘unlikely to ever be fully solved'

OpenAI details efforts to protect ChatGPT Atlas against prompt injection attacks

Logo of OpenAI, developer of the GPT-4.1 AI model family, pictured on a smartphone screen placed on a table.
(Image credit: Getty Images)

OpenAI has updated its browser to boost protection against prompt injection attacks, but it warned the risk may never fully disappear.

Released in October, OpenAI's ChatGPT Atlas browser includes agent mode, which looks at webpages to click its way through transactions, forms and other online tasks.

But OpenAI noted that as a browser agent can do more, it also becomes more at risk to "adversarial attacks" – in particular prompt injections, which are sneaking malicious instructions into the agent to drive its behavior.

"Prompt injection⁠ is one of the most significant risks we actively defend against to help ensure ChatGPT Atlas can operate securely on your behalf," OpenAI said in a blog post.

Indeed, days after the release of OpenAI's browser, security researchers spotted several serious flaws, including a prompt-injection technique – no wonder then that analysts at Gartner have warned companies to ban AI browsers for fear of security risks.

OpenAI said it recently updated ChatGPT Atlas's agent security safeguards and gave it a new model that had been "adversarially trained", as well developing a "rapid response loop" to find flaws and address them.

That was sparked by red teaming, in which an internal team acts like threat actors to test the system for flaws or weaknesses. In this instance, what they found suggests prompt injection is a "long-term AI security challenge".

"Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully 'solved'," the company said.

"But we’re optimistic that a proactive, highly responsive rapid response loop can continue to materially reduce real-world risk over time," OpenAI added.

"By combining automated attack discovery with adversarial training and system-level safeguards, we can identify new attack patterns earlier, close gaps faster, and continuously raise the cost of exploitation."

New challenge for AI browsers

Prompt injection is when attackers get between an agent's prompt box and the AI model, changing instructions to create malicious results. It's a new problem for browsers that boast AI features – and there’s a growing array already.

The main issue here is that since agents can take many of the same actions as a user, OpenAI said the potential impact of a successful attack could be “just as broad”.

As an example, OpenAI said an attacker could send a malicious email to trick an agent to ignore the user's actual request in favour of forwarding sensitive documents.

The user gives access to email for a legitimate task, such as summarizing messages, but if the agent also follows the injected instructions, sensitive data could leak.

Fighting back

OpenAI has made previous efforts to protect against such attacks, but is now adding new techniques to help avoid prompt injections.

First, the company has built an AI-powered hacker to use as an automated red teaming tool to proactively hunt out prompt injection attacks – even complicated ones taking hundreds of steps.

"We trained this attacker end-to-end with reinforcement learning, so it learns from its own successes and failures to improve its red teaming skills," the company said.

Beyond that, OpenAI has developed what it calls a "rapid response loop". When that automated red team spots a potential injection technique, that's fed back into the AI via adversarial training.

"We continuously train updated agent models against our best automated attacker—prioritizing the attacks where the target agents currently fail," the company added.

"The goal is to teach agents to ignore adversarial instructions and stay aligned with the user’s intent, improving resistance to newly discovered prompt-injection strategies."

Similarly, when using the agent in the ChatGPT Atlas browser, OpenAI advised using "logged out" mode where possible, signing in only when necessary to complete a task, and told users to review all confirmation requests carefully.

When it comes to prompts, be specific rather than broad: saying "review my emails and take whatever action is needed" gives space for threat actors to meddle.

FOLLOW US ON SOCIAL MEDIA

Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.

You can also follow ITPro on LinkedIn, X, Facebook, and BlueSky.

Freelance journalist Nicole Kobie first started writing for ITPro in 2007, with bylines in New Scientist, Wired, PC Pro and many more.

Nicole the author of a book about the history of technology, The Long History of the Future.