BioShocking Attack Tricks AI Browsers Into Bypassing Safety Guardrails

Security researchers at LayerX have published details of a prompt injection technique called “BioShocking” that manipulates AI-powered browsers into treating dangerous real-world actions as part of a harmless fictional scenario, causing them to bypass their own safety guardrails.

How the Attack Works

The proof-of-concept centers on a malicious webpage designed to look like a BioShock-themed puzzle game. The game is structured so that players are rewarded for giving wrong answers, progressively conditioning the browser’s AI agent to accept that normal rules do not apply within the game context. By the time the agent reaches the final puzzle step, it has been taught through repeated reinforcement that “incorrect” or rule-breaking actions are acceptable moves.

In that final step, the agent is instructed to visit a GitHub repository, extract data from the code, and share it externally, including any sensitive information such as passwords present in the repository. According to LayerX, once the agents internalized the inverted game logic, none of the six tested products recognized this credential-exfiltration step as a policy violation.

“Once the agents figured out the rules and learned that incorrect actions are acceptable, they were no longer tied to reality,” the researchers wrote. All six agents failed to flag the final step as conflicting with their safety guidelines.

The six agentic browser products tested were: ChatGPT Atlas, Comet, Fellou, Genspark Browser, Sigma Browser, and the Claude Chrome plugin. LayerX confirmed the PoC did not perform any actual malicious actions during testing, but noted the outcome would not change if it did.

Vendor Response

LayerX disclosed its findings to affected vendors in October of last year. Three vendors did not respond. Among those that did:

OpenAI implemented an effective fix in its ChatGPT Atlas browser and is the only vendor credited with a working remediation.
Anthropic attempted a patch for its Claude Chrome plugin, but LayerX says the fix does not hold against the original PoC.
Perplexity AI closed the report without addressing the issue.

Recommendations

LayerX advises AI browser vendors to implement explicit user confirmation dialogs before executing sensitive actions, apply stronger context validation to detect when an agent has shifted into a fictional or adversarial frame, and enforce tighter scope limits on what agentic sessions are permitted to do.

For end users, the researchers recommend restricting AI browser permissions to limit access to sensitive services wherever platform controls allow. The core problem, as LayerX frames it, is a fundamental failure in current AI agents to distinguish between simulated scenarios and operations with real-world consequences, a gap that remains largely unaddressed across the agentic browser market.

BioShocking Attack Tricks AI Browsers Into Bypassing Safety Guardrails

How the Attack Works

Vendor Response

Recommendations

THE 0600 BRIEF