Google SAIF: The Agent Security Map
Google’s Secure AI Framework draws the full agent attack surface, names the risks, and hands you the controls. A vendor did the boring, useful work for once.
TL;DR: The Google SAIF agent security map is a diagram of the entire agent attack surface, broken into four components, with named risks and mapped controls at every node. It’s SAIF 2.0, shipped in 2026, and Google donated the underlying risk data to the Coalition for Secure AI. No product pitch. Just the map most teams never bothered to draw.
This is the public feed. Upgrade to see what doesn’t make it out.
What Is the Google SAIF Agent Security Map?
The Google SAIF agent security map is a node-by-node diagram of an agent’s full operational stack, with the risk and the matching control labeled at every node. SAIF is Google’s Secure AI Framework. The original version mapped the whole model lifecycle across four areas: Data, Infrastructure, Model, Application. Useful, but model-shaped. Agents don’t live in that box.
So in 2026 they shipped SAIF 2.0 with a second, agent-specific map. This is the one worth your time. Where most vendor security content gestures at “AI risk” and sells you a dashboard, this thing walks the actual pipeline an agent runs every time it does anything, and tells you where it bleeds. Google even kicked the underlying risk data over to the Coalition for Secure AI, so it’s not locked behind a Google Cloud login. Rare move.
Here’s the thing that makes it different from the average framework PDF. It’s not organized by abstract risk category. It’s organized by where the data physically flows through the agent. That’s the right axis, because that’s where attackers actually work.
The Four Components of an Agent
SAIF breaks an agent into four components, and the whole attack surface lives in how they hand off to each other. Walk them in order, because the order is the data flow, and the data flow is the kill chain.
Application & Perception. Where the agent meets the world. It pulls explicit user commands and passively grabs context: open documents, sensor data, app state. The perception layer then has to tell a trusted command apart from untrusted ambient junk. It usually can’t. That’s the first seam.
Reasoning core. One or more models that take the goal and spit out a plan, a sequence of tool calls. It runs in a loop, refining the plan as new data comes back. Every loop is another chance to feed it a poisoned input. This is where indirect prompt injection sinks its teeth in.
Orchestration. The agent’s hands and long-term memory. Tools, agent memory, RAG content, auxiliary models. Each one is an external system the agent trusts, which means each one is a thing an attacker can corrupt to steer behavior.
Response rendering. The agent’s output gets formatted and dropped into a trusted app, usually as Markdown. If nobody sanitizes it, that output runs. XSS, data exfil, the works.
Look at the shape of that. Untrusted input comes in the front, hits a reasoning core that can’t tell instructions from data, gets executed through privileged tools, and renders into a trusted surface on the way out. The framework didn’t invent the danger. It just refused to look away from it.
Rogue Actions and Sensitive Data Disclosure
SAIF names two risks specific to agents, and they map clean onto the two things an agent can do that a chatbot can’t: act, and reach. The agent map calls them Rogue Actions and Sensitive Data Disclosure.
Rogue Actions are exactly what they sound like: the agent executes something it shouldn’t, by accident or because someone made it. The accidental flavor is misalignment, like the agent emailing the wrong “Mike” and leaking private data through a plain ambiguity bug. The malicious flavor is the scary one. An attacker plants a dormant trigger and waits. Google’s own writeup points at the Gemini calendar-invite hijack, where a rule buried in an invite opened a smart-home front door when the user later said an unrelated keyword. The payload sat quiet until an innocent phrase set it off. Severity scales straight with the agent’s permissions. More tools, bigger blast radius.
Sensitive Data Disclosure is the reach problem. A chatbot can leak its prompt. An agent can leak your entire inbox, because it’s holding the keys to it. SAIF spells out the ugly part: agents can exfil through any tool that talks outward, including a Markdown image. Here’s that exact failure in the wild:
CVE-2025-32711 "EchoLeak" CVSS 9.3 (critical)
target: Microsoft 365 Copilot
vector: zero-click indirect prompt injection via email
exfil: data appended to reference-style markdown image URL
-> Copilot auto-fetches -> request hits attacker server
EchoLeak, found by Aim Labs, chained an injection that beat Microsoft’s own classifiers with an image render that smuggled data out a CSP-allowed domain. One email, no clicks, sensitive context gone. SAIF’s map flags that Response Rendering node as a critical security boundary for exactly this reason, and the EchoLeak chain is what happens when the boundary leaks. The map saw it coming because it’s looking at the right node.
The Controls SAIF Actually Hands You
SAIF maps three agent controls directly onto those two risks, and they’re refreshingly un-magical: limit what the agent can do, make a human approve the dangerous stuff, and log everything. No model-level promise that prompt injection is “solved,” because it isn’t.
Agent Permissions is least privilege as a hard ceiling. The agent gets the minimum tools and the minimum actions, and that grant is meant to be contextual and dynamic, shrinking to whatever the current task actually needs. Agent User Control is the human-in-the-loop gate: any action that changes data or acts on the user’s behalf needs explicit approval. Agent Observability is the part most teams skip. Log the agent’s actions, tool calls, and reasoning so the whole thing is auditable. You catch a hijacked agent by watching its decisions, not just its final output.
Underneath all three sits Google’s design philosophy, three principles for agents worth tattooing somewhere:
1. well-defined human controllers (who owns this agent?)
2. limited powers (least privilege, hard cap)
3. observable actions and planning (log the reasoning, not just the result)
None of that is exotic. It’s the same discipline you’d apply to any over-permissioned service account, dragged into the agent world and labeled clearly. The map’s value isn’t novelty. It’s that someone finally drew the lines between every risk and a control you can actually implement, instead of leaving you to connect them at 2am during an incident. For the attacker-side view of why these exact controls matter, we walked the full agentic attack playbook and the two-of-three rule that snaps the same chain.
That’s the whole pitch. SAIF won’t stop a determined operator, and Google’s careful to say the site reflects guidance, not their shipped implementation. But it draws the board honestly: here’s every place an agent can turn on you, here’s the name for it, here’s the lever that helps. Most vendors sell you the dashboard and skip the map. Google published the map and gave the data away. In a field drowning in hype decks, boring and useful is the rarest thing on the table.
Paid unlocks the unfiltered version: complete archive, private Q&As, and early drops. Upgrade now.
Frequently Asked Questions
What is the Google SAIF agent security map?
The Google SAIF agent security map is a diagram in Google’s Secure AI Framework 2.0 that breaks an AI agent into four components (Application & Perception, Reasoning core, Orchestration, Response rendering) and labels the security risk and matching control at each node. It exists because agents introduce risks the original model-focused SAIF map didn’t cover, mainly the ability to take autonomous actions through tools. Google published it in 2026 and donated the underlying risk data to the Coalition for Secure AI, so the structure is open for any team to use, not locked to Google Cloud.
What risks does SAIF say AI agents introduce?
SAIF names two agent-specific risks. Rogue Actions are unintended actions an agent executes, either by accident (misalignment, like emailing the wrong person) or maliciously (an attacker plants a dormant trigger via prompt injection that fires later). Sensitive Data Disclosure is the leak of private data, magnified for agents because they hold privileged access to inboxes, files, and credentials, and can exfiltrate through any outbound tool, including a Markdown image. The EchoLeak vulnerability (CVE-2025-32711) in Microsoft 365 Copilot is a real-world example of the disclosure risk: a zero-click email injection that leaked data out a rendered image URL.
How does SAIF tell you to secure an agent?
SAIF maps three controls to the agent risks. Agent Permissions enforces least privilege as a hard ceiling, with access that shrinks to the current task. Agent User Control requires human approval for any action that changes data or acts on the user’s behalf. Agent Observability logs the agent’s actions, tool calls, and reasoning so behavior is auditable and a hijack is catchable. All three sit on three design principles: agents need well-defined human controllers, limited powers, and observable actions. The framework is honest that none of this “solves” prompt injection at the model level. It contains the blast radius instead.
ToxSec is run by a USMC veteran and Security Engineer with hands-on experience at AWS and the NSA. CISSP certified, M.S. in Cybersecurity Engineering. He covers security vulnerabilities, attack chains, and the tools defenders actually need to understand.




Kudos where it's due. SAIF is a simple and direct security map for anyone working with Agents. It gives your three key practical principles to map your defenses. Feel free to AMA.