MCP Tool Poisoning Defense: Kill Three Chains
Three attack chains exploiting tool descriptions, rendered markdown, and static credentials across 5,200 MCP servers, with the operator-level fixes
TL;DR: MCP ships with a trust model that treats every tool description as benign, every server output as safe to render, and every credential as someone else's problem. A scan of 5,200+ live deployments found 53% running on static API keys, no rotation, no scoping, no audit trail. We ran the full attack chains last time. Today we kill them. Your MCP tool poisoning defense starts at three trust boundaries.
This is the public feed. Upgrade to see what doesn’t make it out.
How MCP Tool Poisoning Hijacks the Model
The first step in any MCP tool poisoning defense is understanding exactly what we’re injecting. We plug a malicious tool into a target MCP deployment. Our tool description looks clean, just metadata describing what the tool does. Fifty words. The MCP client hands that description directly to the model before the user types anything. No sanitization pass. No privilege boundary. No audit log entry.
So we stuff the description field with directives. The exact syntax varies by model, but the MCP tool poisoning pattern is the same: hide a secondary instruction inside what looks like documentation. The model reads it. The model obeys it. The user sees nothing, because tool descriptions don’t render in the chat UI. We wrote a system prompt without touching the system prompt. The access control that matters, system prompt trust, just got bypassed through the metadata layer.
This is tool description poisoning. The attack surface is every tool you haven’t manually reviewed. In a production MCP deployment pulling from third-party registries, that’s most of them. Auto-approve is on by default. The math is simple: 84% attack success rate in controlled testing when auto-approve is enabled.
Markdown Rendering Becomes the Exfil Channel
MCP clients render markdown. That’s the feature. That’s also the exfil channel.
We craft a tool that returns markdown containing an image tag pointing to our server. The URL carries a base64-encoded blob of the conversation context as a query parameter, whatever the model had access to at call time. The client renders the markdown. The user’s browser fires a GET request to retrieve the image. Our server logs the request. Query string decoded, conversation history in hand. No clicks. No prompts. No warnings.
The tell is in the URL. A legitimate image looks like /photo.jpg. A markdown image exfiltration URL looks like /pixel.png?q=dXNlciBhc2tlZCBhYm91dCBwcm9qZWN0. That long base64 blob in the query string is the fingerprint. Most MCP clients don’t scan for it. Bing Chat hit this exact pattern in 2023. The chain is two assumptions stacked: the model trusts the tool enough to embed the URL, and the client trusts the model output enough to render it without inspection. Both assumptions are wrong in adversarial conditions, and both are on by default. If you’ve followed our agentic browser breakdown, you’ve seen this same rendering trust problem in the wild.
53% of MCP Servers Leak Static Credentials
We don’t need to run either attack above if the deployment hands us credentials directly. A 2025 scan of 5,200+ live MCP servers found 53% running on static API keys baked into .env files and config JSON, long-lived, rarely rotated, copy-pasted across machines. Only 8.5% use OAuth. That’s the state of MCP credential security right now.
Those keys sit next to GitHub PATs, AWS access keys, and Slack bot tokens in the same config file. An MCP server connected to Gmail, Google Drive, or an open Postgres instance with a static key is a single point of failure with a long fuse. Pop the key, or find it in a leaked config on GitHub where plenty of these land, and the blast radius covers everything that key touches. The Moltbook breach showed this at scale: 4,060 private DMs containing plaintext API keys that agents shared with each other.
Shodan scans in 2025 found exposed MCP servers connected to Gmail, Drive, Jira, and live databases. Auto-approve on. No rate limiting. No audit log. Just sitting there, waiting for someone to ask the right question.
Prompt Injection Bypasses the Document Trust Layer
The model trust hierarchy has a seam. System prompt sits at the top. User messages sit below it. Tool outputs and document contents sit below that. This hierarchy is supposed to prevent tool outputs from hijacking the model’s behavior. It mostly works.
Until we stuff the attack into a document.
We craft a PDF or text file that wraps a payload inside fake conversation history. The document looks like a log excerpt, a support transcript, something plausible. Inside it, a block formatted to look like an earlier system message or privileged instruction. We drop that document into context through a file upload, a RAG pipeline, or a tool output that fetches external content. The model reads it and, depending on how faithfully it enforces privilege levels, treats the fake history as real. Instructions from nowhere, executed by something that should know better.
This is prompt injection via document, the same logic flaw that gets web agents pwned through poisoned HTML and malicious PDFs. The MCP version is nastier because tool outputs fetch external content silently, without a visible user action. The user never touched the document. The model did. The OWASP LLM Top 10 rates prompt injection as the number one vulnerability for exactly this reason.
Anthropic, OpenAI, and Google are all shipping instruction hierarchy improvements. It’s getting harder. It’s not solved.
We dropped the free chapters. Now breach the wall for the dead-simple step-by-step kill switch that shuts this all down.








