Google researchers from the Threat Intelligence Group and Google DeepMind have published findings from a broad sweep of the public web aimed at answering a practical question: are threat actors actually exploiting indirect prompt injection (IPI) today, and what are they trying to accomplish?
What Is Indirect Prompt Injection?
Unlike direct jailbreaking, where a user manipulates a chatbot through conversation, IPI occurs when an AI system processes external content such as a webpage, email, or document that embeds malicious instructions. The AI may then silently follow those attacker-supplied commands rather than the legitimate user’s request. The attack surface is broad because AI agents increasingly browse and summarize web content autonomously.
How Google Conducted the Study
The research team used Common Crawl, a publicly available repository of crawled web content covering roughly two to three billion pages per monthly snapshot. Because Common Crawl focuses on static, publicly accessible sites, it excludes most social media platforms behind login walls. The researchers note that prompt injections observed on social media will be addressed in a separate, forthcoming study.
Filtering at this scale required a layered approach to control false positives:
- Pattern matching: Initial candidate pages were flagged using common IPI signatures such as phrases like “ignore previous instructions” or “if you are an AI.”
- LLM-based classification: Flagged pages were then processed by Gemini to assess whether suspicious text was part of the document’s normal narrative or appeared anomalously out of place.
- Human validation: A final manual review pass was applied to the highest-confidence candidates.
The team found that naive scanning surfaces an overwhelming proportion of benign content, including research papers, educational blog posts, and security articles discussing IPI itself.
What Attackers Are Actually Doing
After filtering, the confirmed prompt injections clustered into several categories, ranging in severity:
- Harmless pranks: Instructions embedded in page source code intended to alter an AI assistant’s conversational tone, with no meaningful impact on users.
- Helpful guidance: Some site authors embed instructions to help AI summarization tools add relevant context for readers. The researchers note this benign use case could trivially be repurposed to inject misinformation or redirect users to third-party sites.
- Search engine optimization (SEO): A notable category involves injections designed to manipulate AI assistants into promoting specific businesses over competitors. The team observed both simple and increasingly sophisticated variants of this technique.
- Deterring AI agents: Some injections instruct AI systems not to crawl or summarize the page at all.
- Malicious activity: The most concerning findings include injections aimed at data exfiltration and, in some cases, destructive actions against the AI system or the user session it serves.
Implications for Defenders
The findings confirm that IPI is not purely theoretical. While many instances remain low-severity, the presence of data exfiltration and destructive payloads on public websites demonstrates that adversaries are already experimenting with the vector at scale. Organizations deploying AI agents that browse or process untrusted web content should treat IPI as an active threat requiring input validation, output monitoring, and privilege constraints on what actions those agents are permitted to take.
Google states that its cross-functional work between DeepMind and the Threat Intelligence Group is ongoing, and that monitoring for IPI patterns on the public web will continue.
