
We’ve recently started playing around with browser-based AI agents internally at Secure Agentics, specifically Claude Cowork’s browser extension, and it’s immediately raised a question that I think the industry is only just starting to properly grapple with as these tools are becoming increasingly popular. Now, this is not a new problem as indirect prompt injection has existed for years by this point, but I found Google’s answer to solving this in their Gemini browser to be interesting, and it got me thinking about how the same techniques could be repurposed for other use cases, such as our own here at Secure Agentics.
When you give an AI agent access to a web browser, you’re essentially giving it eyes to the internet and all it’s nasties (there are a lot of nasties on the internet). It can see anything on a page, and because LLMs are notoriously inclined to treat information they see as instructions, that creates a problem.
This is indirect prompt injection applied to the browser, and prompt injection is showing no sign of being fixed any time soon so when Google started introducing agentic browsing they took matters into their own hands. They published a blog post back in December outlining how they’re architecting security for Gemini’s agentic browsing capabilities in Chrome, and whilst the post is a few months old now, I think the concepts in it are worth understanding deeper. Specifically, their most interesting contribution: the User Alignment Critic.
TL;DR When an AI agent browses the web, every page it visits is a potential attack surface. Malicious content embedded in a site can trick the agent into taking actions the user never asked for, like exfiltrating data or initiating transactions. This isn’t theoretical either. Brave disclosed indirect prompt injection vulnerabilities in Perplexity’s Comet browser agent. LayerX demonstrated a one-click hijack they called “CometJacking” that could exfiltrate emails and calendar data. The Tenable team proved that browser agents could be tricked into leaking sensitive data via background API calls. And OpenAI themselves have acknowledged that prompt injection in agentic browsers is a risk that may never be fully “solved”.
So the question becomes: if you can’t eliminate the threat, how do you build defences that make it hard enough that attackers can’t easily succeed?
The Chrome security team, led by Nathan Parker, published their approach to securing Gemini’s agentic capabilities in Chrome. The full post covers several layers of defence, but the concept that stood out to me is what they’re calling the User Alignment Critic.
So, how does it work? The idea is straightforward but clever. The main Gemini model, the “planner”, reads the web page and decides what action to take next. Because it’s consuming raw page content, it’s inherently exposed to anything malicious on that page. The User Alignment Critic is a separate Gemini model that sits between the planner and the browser. Before any action is actually executed, the Critic reviews it and asks one question: does this action align with what the user originally asked for?
The design decision that makes this interesting is what the Critic can and can’t see. It only receives metadata about the proposed action, something like “navigate to amazon.co.uk” or “click the submit button on this form”. It does not see the raw page content. This means that even if the planner has been influenced by a prompt injection buried in the page, the Critic can’t be poisoned in the same way because it never sees the untrusted content. It’s architecturally isolated from the attack vector.
If the Critic vetoes an action, it sends feedback to the planner to try again. If there are repeated failures, control goes back to the user. It’s a simple approve-or-reject control, but positioned in exactly the right place to add security value whilst reducing the likelihood of it being compromised itself.
The User Alignment Critic is the main piece I wanted to deep dive, but Google’s architecture has several other components worth understanding.
Origin Sets are essentially the browser agent equivalent of network segmentation. For each task, a control function determines which website ‘origins’ the agent is allowed to read from and which it can interact with. For example, if the agent is helping you buy something on a shopping site, it shouldn’t also be able to read your banking site in another tab.
A prompt injection classifier runs in parallel alongside the main model, scanning page content for patterns that look like attempts to manipulate the agent. It can’t catch everything, but it’s another layer.
Automated red teaming generates malicious sandboxed sites designed to test the defences continuously. Google’s team prioritises attacks via user-generated content and ads, and focuses on high-harm scenarios like financial transactions and credential leaks.
Finally, they’ve updated their Vulnerability Rewards Programme (VRP) to cover agentic capabilities, offering up to $20,000 for researchers who find serious vulnerabilities!
This topic has been on my mind a lot lately, partly because of what we’re doing internally at Secure Agentics, and partly because I think the security community is going to need to get very comfortable with the idea of AI agents operating in browsers, and indeed any number of other high-risk areas.
The dual-model approach, having a trusted critic that’s architecturally isolated from untrusted content, is a sensible pattern that I’m taking notes from in how we protect our own models here at Secure Agentics (albeit we’re not doing browser agents, but the point stands). It’s not perfect, and I don’t think Google would claim it is, but it applies a principle that any pentester intimately understands: don’t trust the thing that’s been exposed to the attacker.
I suspect we’ll be talking about agent browser security a lot more over the coming months. This is genuinely one of those areas where the security challenges are evolving as fast as the technology itself, and that’s both exciting and slightly terrifying.
Thanks for reading!
