As Google integrates Gemini-based agentic capabilities into Chrome, the Chrome security team has published a detailed breakdown of its defensive architecture, centered on the threat of indirect prompt injection and the risks that come with a browser agent that can act autonomously across multiple websites.
The Core Threat: Indirect Prompt Injection
Indirect prompt injection occurs when malicious content embedded in web pages, iframes, or user-generated material such as reviews manipulates an agent into taking unintended actions. In a browser context, those actions can include initiating financial transactions or exfiltrating sensitive data from logged-in sessions. Google describes this as the primary new threat facing agentic browsers and acknowledges it remains an open challenge across the industry.
User Alignment Critic
To address model-level vulnerabilities, Chrome is introducing a component called the User Alignment Critic. This is a separate Gemini-based model that runs after the planning model has decided on an action, checking whether that action is genuinely aligned with what the user asked for. Critically, the Alignment Critic is architecturally isolated: it receives only metadata about the proposed action, not the raw web content that the planning model processed. This isolation prevents the critic itself from being poisoned by malicious page content.
If the critic rejects an action, it feeds that rejection back to the planning model so it can reformulate. Repeated failures return control to the user. The design draws inspiration from the dual-LLM pattern and research from Google DeepMind known as CaMeL. Existing protections include spotlighting techniques that bias the planning model toward user and system instructions over page content, and training on known attack patterns.
Agent Origin Sets
A second major innovation extends Chrome’s existing site isolation and same-origin policy concepts into the agentic context. Because agents inherently operate across multiple sites, a compromised agent with unrestricted access could effectively bypass site isolation, exposing all logged-in sessions to data exfiltration.
Chrome’s answer is Agent Origin Sets, which constrain the agent to a defined list of origins relevant to the current task. The system tracks two categories per session:
- Read-only origins: Sources from which the agent may consume content. Iframes from origins outside this list are not sent to the model at all.
- Read-writable origins: Origins on which the agent may also take actions such as clicking or typing, in addition to reading.
A trusted gating function, also isolated from untrusted web content, determines which origins qualify. The planning model cannot add new origins to either set without approval from this gating function. The same read-versus-write distinction is applied to non-web tool calls the agent may make.
Additional Layers
The full defensive stack also includes user confirmation prompts for critical actions, real-time threat detection, and ongoing red-teaming exercises. Google frames this as a deliberately layered approach, combining deterministic controls such as origin enforcement with probabilistic ones such as model-based vetting, to raise the cost and complexity of any successful attack against agentic Chrome users.
