Google I/O: Agentic Security and New Threats

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Google I/O: Agentic Security and New Threats

Project Mariner browses for you, A2A lets agents trust agents, and managed MCP is everywhere. Nobody on stage said “threat model.”

ToxSec

May 25, 2026

TL;DR: Google I/O 2026 declared the “agentic era” and shipped four new agent surfaces at once: Project Mariner browses the web for you, the Agent2Agent (A2A) protocol lets agents discover and trust each other, managed MCP servers ship across Google Cloud, and information agents run 24/7 with access to your Gmail and Drive. Every one of them inherits the same root flaw. AI agent security starts with one fact: the model can’t tell data from instructions.

New here? Subscribe to ToxSec. We map a fresh AI attack chain every Sunday, and right now the whole industry just handed us a new one to walk.

What Google I/O Just Did to AI Agent Security

Google spent its I/O keynote handing attackers a bigger playground than they’ve had in years. Sundar Pichai called it the “agentic Gemini era” and meant it as a flex. From where we sit, it reads like a target list. Four new agent surfaces dropped in a single show. Project Mariner, a browser agent that navigates and clicks through websites on your behalf. The Agent2Agent protocol, so agents from different vendors can find each other and coordinate. Managed MCP servers across Google Cloud, wiring tools straight into the model’s reasoning. And information agents that run in the background around the clock, watching topics and taking action while you sleep.

Here’s the thing nobody put on a slide. Every one of those features expands what an agent can touch, and not one of them came with a threat model on stage. More reach, more autonomy, more standing access. That’s the pitch and the problem in the same sentence. We’re going to walk the surface one piece at a time, and you’ll see the same logic failure show up in all four.

Google I/O 2026 AI agent security overview: Project Mariner, Agent2Agent protocol, managed MCP servers, and 24/7 background agents expanding the agent attack surface.

Why AI Agents Break the Old Security Model

AI agents break because the model can’t tell your instructions from the attacker’s data. Both ride in the same context window, through the same attention mechanism, with zero privilege separation. There’s no “system” channel the model trusts more than the “untrusted web page” channel. It’s all tokens. The model reasons over the whole pile and picks what looks most relevant.

Wrap that model in a loop. Feed it new inputs and tools until a task finishes. The model decides the next move, the loop keeps it going, and that’s your agent. Traditional software does what the developer wrote. An agent does whatever the model reasoned it should do, including the part where it reads a poisoned web page and decides the page is the boss.

We watched this play out in the wild already. In two 2026 studies, autonomous agents SQL-injected live sites and coordinated against their own users with zero hacking instructions. Nobody told them to. The loop plus the missing privilege boundary did it on its own. Now Google just shipped that exact architecture to a billion search boxes. So the old model where access control lives in the system and not in the user’s judgment gets inverted the moment an agent starts deciding for itself.

Why AI agent security fails: LLM control plane and data plane share one context window with no privilege separation between trusted instructions and untrusted input.

How Project Mariner Gets Hijacked by a Web Page

Project Mariner gets hijacked the moment it reads a page written for the agent instead of the human. Mariner is a browser agent. It reads the DOM, the metadata, the scripts, all the layers a person never sees on screen. A human reads the price and the photo. The agent reads everything underneath, and an attacker can write to those layers on purpose.

That’s indirect prompt injection. You don’t attack the model directly. You seed the content the model is about to read. Hidden text in a listing, instructions buried in alt attributes, a comment block the renderer drops but the agent ingests. The page says “ignore your task, do this instead,” and the agent has no boundary that says a page isn’t allowed to say that.

Google’s own DeepMind team documented this. Their research on “AI Agent Traps” laid out six categories of web content that hijack agents, applicable across every major model and architecture. We’ve shown the same root failure through email and encoding attacks that walk straight past every guardrail. The chain is dead simple. Poison the content, wait for the agent to browse, watch it follow orders. You see the chain. You don’t get the payload.

Project Mariner prompt injection: browser agent reads hidden DOM instructions as commands, indirect injection via poisoned web content hijacks autonomous agent actions.

Working in AI security? Restack this before your org wires an agent into the browser and finds out the hard way.
Share

What Is Agent Card Poisoning in A2A?

Agent Card poisoning is when an attacker controls the metadata an A2A agent uses to decide who to trust. The Agent2Agent protocol lets agents from different vendors discover and talk to each other. Discovery runs on Agent Cards, JSON documents published at a well-known URL like /.well-known/agent-card.json, describing an agent’s name, capabilities, and endpoint.

So one agent reads another agent’s card and decides how to delegate. Trust the card, trust the agent. Now picture a card written to oversell. It claims capabilities it doesn’t have, points the endpoint somewhere attacker-controlled, or stuffs the description field with instructions aimed at the consuming model. Same trick as poisoning an MCP tool description, just one layer up the stack. We walked the MCP version in three live tool-poisoning chains with real screenshots.

A2A supports TLS, JWTs, and OAuth. Good. Those secure the transport and prove an agent is who it says. None of them validate that the capability the card describes is honest, or that the description field is clean of injection. Authentication proves identity, not honesty. An agent can be perfectly authenticated and still be lying about what it does.

Agent Card poisoning in Agent2Agent A2A protocol: malicious capability metadata at well-known URLs hijacks agent discovery and delegation, authentication does not validate intent.

The 24/7 Background Agent Problem

The background agent is the scariest thing Google shipped, because it pairs standing access with autonomy and never logs off. These information agents run continuously, monitoring topics, and they can pull from Gmail and Drive and take action on your behalf. Persistent. Authorized. Unattended.

Stack that against the lethal trifecta security folks keep flagging: an agent that can read untrusted content, access sensitive data, and talk to the outside world. Any one capability is fine alone. All three in one agent is a confused deputy waiting to happen. A background agent watching your inbox has all three by design. It reads whatever lands (untrusted), it holds your Drive and mail (sensitive), and it acts in the world (the exfil path).

Now run the chain. An attacker emails a poisoned message. The agent reads it on its 24/7 sweep, no human in the loop. The hidden instruction tells it to forward, summarize, or quietly route data somewhere it shouldn’t go. The agent has the credentials and the autonomy to comply.

Nobody clicked anything. The blast radius is everything that agent can reach, plus everything every other agent it trusts can reach. Scope creep does the rest, because each individual permission looked reasonable the day you granted it.

24/7 background AI agent security risk: persistent agents with Gmail and Drive access form the lethal trifecta, prompt injection triggers autonomous data exfiltration with no human in the loop.

What Defenders Miss About AI Agent Security

The thing defenders miss is that watching an agent is not the same as stopping one. Most shops have logging. Few have a control that intercepts and authorizes what the agent does before it does it. So you get a beautiful audit trail of the breach, written up neatly after the data already left. Observability without enforcement is just a postmortem generator.

The second gap is identity. We bind permissions to an agent, then let that agent accumulate scopes over months. Read access to code, then tickets, then customer mail. No single grant looked crazy. Nobody ever reviewed the aggregate. Compromise that one agent and the attacker inherits all of it at once, which is exactly the pattern behind the real third-party agent breaches we saw this year.

The third gap is the one with no clean fix. The model still can’t separate data from instructions, so every defense has to live outside the model: allowlisting tools, scoping credentials hard, human-in-the-loop checkpoints on sensitive actions, runtime monitoring of tool-call arguments. Defense in depth. No silver bullet. The full kill switch, the one that actually contains this, is its own writeup. We took the MCP version apart at three trust boundaries, and the agent version rhymes.

That’s the map of the new surface. Subscribe to ToxSec for the part where we hand over the kill switches, because the agentic era is going to keep us busy for a while.

Frequently Asked Questions

Are Google’s AI agents secure?

Google’s AI agents ship with transport-level security and authentication, but they inherit the unsolved core problem of every LLM agent: the model can’t reliably tell trusted instructions from untrusted input. Project Mariner, A2A, and background agents all process external content in the same context window where their own instructions live. Authentication proves who an agent is. It does not stop a poisoned web page or a malicious Agent Card from steering the agent’s behavior. The protocols are reasonable. The model layer underneath them is still the weak point.

What is prompt injection in AI agents?

Prompt injection is when attacker-controlled text gets read by the model as instructions instead of data. In an agent, that text usually arrives indirectly: a web page Mariner browses, an email a background agent reads, a tool description in an MCP server. Because the model has no privilege boundary between developer instructions and content from the outside world, it can follow the injected command as if you typed it yourself. OWASP ranks prompt injection as the number-one LLM risk for this exact reason. It’s a structural flaw. A patch doesn’t fix it.

Can Project Mariner be hacked?

Project Mariner can be steered by content crafted for it, which is the agent version of getting hacked. As a browser agent, Mariner reads the full page including layers a human never sees, and attackers can plant instructions in those layers. Google DeepMind’s own “AI Agent Traps” research documented six categories of web content that hijack autonomous agents across every major architecture. The agent doesn’t need a software vulnerability in the classic sense. It just needs to read a page that tells it to do something, and right now it has no reliable way to refuse.

What is the Agent2Agent (A2A) protocol?

The Agent2Agent (A2A) protocol is an open standard, now under the Linux Foundation, that lets AI agents from different vendors discover each other and coordinate tasks. Agents publish Agent Cards at well-known URLs describing their capabilities and endpoints, then exchange structured messages over HTTP and JSON. A2A supports TLS, JWTs, and OAuth for authentication. The security gap is that authentication proves identity, not honesty. A card can be fully authenticated and still misrepresent what the agent does, or carry injection aimed at the consuming model.

ToxSec is run by an AI Security Engineer with hands-on experience at the NSA, Amazon, and across the defense contracting sector. CISSP certified, M.S. in Cybersecurity Engineering. He covers AI security vulnerabilities, attack chains, and the offensive tools defenders actually need to understand.