ToxSec - AI and Cybersecurity

ToxSec - AI and Cybersecurity

MCP Tool Poisoning Defense: Kill Three Chains

Three attack chains exploiting tool descriptions, rendered markdown, and static credentials across 5,200 MCP servers, with the operator-level fixes

ToxSec's avatar
ToxSec
Mar 12, 2026
∙ Paid

TL;DR: MCP ships with a trust model that treats every tool description as benign, every server output as safe to render, and every credential as someone else's problem. A scan of 5,200+ live deployments found 53% running on static API keys, no rotation, no scoping, no audit trail. We ran the full attack chains last time. Today we kill them. Your MCP tool poisoning defense starts at three trust boundaries.

This is the public feed. Upgrade to see what doesn’t make it out.

How MCP Tool Poisoning Hijacks the Model

The first step in any MCP tool poisoning defense is understanding exactly what we’re injecting. We plug a malicious tool into a target MCP deployment. Our tool description looks clean, just metadata describing what the tool does. Fifty words. The MCP client hands that description directly to the model before the user types anything. No sanitization pass. No privilege boundary. No audit log entry.

So we stuff the description field with directives. The exact syntax varies by model, but the MCP tool poisoning pattern is the same: hide a secondary instruction inside what looks like documentation. The model reads it. The model obeys it. The user sees nothing, because tool descriptions don’t render in the chat UI. We wrote a system prompt without touching the system prompt. The access control that matters, system prompt trust, just got bypassed through the metadata layer.

This is tool description poisoning. The attack surface is every tool you haven’t manually reviewed. In a production MCP deployment pulling from third-party registries, that’s most of them. Auto-approve is on by default. The math is simple: 84% attack success rate in controlled testing when auto-approve is enabled.

Terminal — mcp-scan output flagging a poisoned tool description. Tool name in yellow, IMPORTANT directive highlighted red with payload partially visible, auto-approve status shown as ENABLED in red.

Markdown Rendering Becomes the Exfil Channel

MCP clients render markdown. That’s the feature. That’s also the exfil channel.

We craft a tool that returns markdown containing an image tag pointing to our server. The URL carries a base64-encoded blob of the conversation context as a query parameter, whatever the model had access to at call time. The client renders the markdown. The user’s browser fires a GET request to retrieve the image. Our server logs the request. Query string decoded, conversation history in hand. No clicks. No prompts. No warnings.

The tell is in the URL. A legitimate image looks like /photo.jpg. A markdown image exfiltration URL looks like /pixel.png?q=dXNlciBhc2tlZCBhYm91dCBwcm9qZWN0. That long base64 blob in the query string is the fingerprint. Most MCP clients don’t scan for it. Bing Chat hit this exact pattern in 2023. The chain is two assumptions stacked: the model trusts the tool enough to embed the URL, and the client trusts the model output enough to render it without inspection. Both assumptions are wrong in adversarial conditions, and both are on by default. If you’ve followed our agentic browser breakdown, you’ve seen this same rendering trust problem in the wild.

Terminal — attacker VPS access log. Incoming GET request with long base64 query string highlighted red, decoded payload shown below revealing conversation context and a partially redacted API token.

53% of MCP Servers Leak Static Credentials

We don’t need to run either attack above if the deployment hands us credentials directly. A 2025 scan of 5,200+ live MCP servers found 53% running on static API keys baked into .env files and config JSON, long-lived, rarely rotated, copy-pasted across machines. Only 8.5% use OAuth. That’s the state of MCP credential security right now.

Those keys sit next to GitHub PATs, AWS access keys, and Slack bot tokens in the same config file. An MCP server connected to Gmail, Google Drive, or an open Postgres instance with a static key is a single point of failure with a long fuse. Pop the key, or find it in a leaked config on GitHub where plenty of these land, and the blast radius covers everything that key touches. The Moltbook breach showed this at scale: 4,060 private DMs containing plaintext API keys that agents shared with each other.

Shodan scans in 2025 found exposed MCP servers connected to Gmail, Drive, Jira, and live databases. Auto-approve on. No rate limiting. No audit log. Just sitting there, waiting for someone to ask the right question.

Scan output — structured results table across 5,200 MCP servers. Rows: static API keys (53%, critical red), GitHub PATs (17%, high orange), OAuth usage (8.5%, info gray). Summary bar at bottom showing total exposure count.

Prompt Injection Bypasses the Document Trust Layer

The model trust hierarchy has a seam. System prompt sits at the top. User messages sit below it. Tool outputs and document contents sit below that. This hierarchy is supposed to prevent tool outputs from hijacking the model’s behavior. It mostly works.

Until we stuff the attack into a document.

We craft a PDF or text file that wraps a payload inside fake conversation history. The document looks like a log excerpt, a support transcript, something plausible. Inside it, a block formatted to look like an earlier system message or privileged instruction. We drop that document into context through a file upload, a RAG pipeline, or a tool output that fetches external content. The model reads it and, depending on how faithfully it enforces privilege levels, treats the fake history as real. Instructions from nowhere, executed by something that should know better.

This is prompt injection via document, the same logic flaw that gets web agents pwned through poisoned HTML and malicious PDFs. The MCP version is nastier because tool outputs fetch external content silently, without a visible user action. The user never touched the document. The model did. The OWASP LLM Top 10 rates prompt injection as the number one vulnerability for exactly this reason.

Anthropic, OpenAI, and Google are all shipping instruction hierarchy improvements. It’s getting harder. It’s not solved.

Code/config screen — extracted PDF text with line numbers. Lines 1-5 look like a normal support transcript. Lines 7-11 highlighted red: fake SYSTEM blocks injected mid-document, payload redacted with block characters.

We dropped the free chapters. Now breach the wall for the dead-simple step-by-step kill switch that shuts this all down.

User's avatar

Continue reading this post for free, courtesy of ToxSec.

Or purchase a paid subscription.
© 2026 Christopher Ijams · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture