ToxSec - AI and Cybersecurity

ToxSec - AI and Cybersecurity

Premium

How to Threat Model AI Applications With STRIDE

AI-STRIDE maps six classic threat categories to LLM pipelines, agent tools, and training data. Here’s the walkthrough.

ToxSec's avatar
ToxSec
May 22, 2026
∙ Paid

TL;DR: STRIDE was built for traditional software. AI systems break its assumptions in six places at once. STRIDE-AI remaps the six threat categories to ML assets, prompt pipelines, agent tool chains, and training data. This walkthrough shows you how to run a threat model on an AI application, what to ask at each STRIDE category, and where the classic framework needs AI-specific extensions like MAESTRO and ASTRIDE. If you’re shipping AI and skipping the threat model, you’re shipping blind.

This is the public feed. Upgrade to see what doesn’t make it out.

What Is STRIDE and Why Does AI Break It?

Microsoft built STRIDE in the late 1990s to give developers a thinking framework during software design. Six categories, one mnemonic: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege. You draw a data flow diagram, walk each component through the six questions, and document what can go wrong. Millions of threat models have been run this way. The framework works because traditional software is deterministic. Same input, same output. Clear trust boundaries between user and system.

AI applications violate every one of those assumptions. Same prompt, different output across runs. The model processes developer instructions and attacker payloads through the same attention pipeline with zero privilege separation. Training data, retrieval documents, tool descriptions, and user messages all land in the same context window. There’s no kernel mode. No ring separation. STRIDE still applies, but each category needs new threat examples, new questions, and new assets. That’s what STRIDE-AI gives you.

How Spoofing Hits AI Systems

In traditional apps, spoofing means one entity pretends to be another. Fake login, stolen session cookie, forged certificate. In AI systems, the attack surface expands in two directions.

First, model-level spoofing. An attacker serves a trojaned model that mimics a legitimate one. You pull what looks like Llama-3 from a community hub, but the weights contain a backdoor triggered by a specific phrase. The model passes your eval benchmarks. It even passes your red team runs. The payload fires only on the trigger. Model provenance, cryptographic signing of weights, and hash verification are the controls.

Second, agent identity spoofing. In multi-agent architectures where AI agents communicate and delegate tasks, one agent can impersonate another. Documented black markets show this at scale: AI agents trading credentials and weaponized skills with no human verification in the loop. If your agent trusts another agent’s claimed identity without cryptographic proof, you have a spoofing problem STRIDE was never designed to catch.

Questions to ask: Who proves the model is what it claims to be? How do agents verify each other’s identity in multi-agent workflows? Can an attacker substitute a model at any point in the supply chain?

How Tampering Targets the AI Pipeline

Traditional tampering modifies data at rest or in transit. Database row gets changed. Config file gets swapped. In AI, tampering hits three distinct asset classes.

Training data poisoning is the big one. An attacker injects crafted samples into your training set, and the model learns the malicious pattern as ground truth. This can happen through contaminated public datasets, scraped web content, or compromised third-party data providers. The model ships with the backdoor baked in. No runtime exploit needed.

Prompt injection is tampering at inference time. The attacker modifies the instructions the model follows by injecting payloads into user input, retrieved documents, or tool descriptions. OWASP ranks this LLM01:2025 for the second consecutive year. The model can’t distinguish developer instructions from attacker instructions because both arrive as tokens processed by the same attention mechanism. And it gets worse when the payload arrives in an image or audio file, since multimodal injections ride right past text-based sanitizers.

RAG document poisoning sits between training and inference. The attacker plants a malicious document in your knowledge base. When a user query retrieves it, the model follows the embedded instructions. Research demonstrated that a single injected document achieves higher success rates than older multi-document approaches.

Questions to ask: Where does untrusted data enter the training pipeline? Who can modify documents in the RAG knowledge base? Are tool descriptions treated as trusted input?

How Repudiation Hides in Agent Logs

Repudiation in traditional systems means someone does something and you can’t prove it. Missing audit logs. Unsigned transactions. The fix is straightforward: log everything, sign the entries, retain them securely.

AI agents make this exponentially harder. An autonomous agent chains tool calls, makes decisions based on probabilistic reasoning, and produces outputs that vary run to run. If an agent makes a financial decision, modifies a file, or sends a message, can you reconstruct why? Most agent frameworks log the final output. Few log the full reasoning chain, the retrieved context, the tool call sequence, or the system prompt that was active when the decision fired. The AI kill chain persistence phase exploits exactly this gap: an attacker poisons the agent’s memory, and the tampered state persists across sessions with no audit trail showing when it changed.

Questions to ask: Does every agent tool call get logged with parameters and return values? Can you reconstruct the full context window that produced a given output? Are reasoning chains stored, or just final answers?

Securing AI Agent Actions.

How Information Disclosure Leaks From AI Systems

Traditional info disclosure means sensitive data reaches someone who shouldn’t see it. SQL injection dumps the user table. Error messages expose stack traces. AI systems leak through entirely new channels.

System prompt extraction is the most common. The system prompt contains the developer’s instructions, business logic, and sometimes credentials. An attacker coaxes the model into reproducing it verbatim. This is trivially easy on most deployments. Jailbreak techniques that bypass safety training give the attacker direct access to whatever’s in the context window.

Embedding inversion is the quieter threat. Vector databases store your documents as numerical embeddings. Research has shown these embeddings can be reversed back into the original text. Your “encrypted” knowledge base is functionally plaintext if the embeddings are accessible.

Context window exfiltration chains with tool access. If the model can render Markdown images and the client loads them, an attacker can encode the context window contents into a URL parameter. The model generates what looks like a weather icon. The server on the other end receives your conversation history. This is the exact chain used in MCP tool poisoning attacks running in production today.

Questions to ask: What’s in the system prompt? Can any user-facing path extract it? Are vector embeddings accessible outside the application? Does the client render model-generated URLs without sanitization?

Common information Disclosure Channels in AI.

How Denial of Service Drains AI Budgets

Traditional DoS floods a server. AI denial of service is subtler and more expensive. Every LLM query burns tokens. Every token costs money. An attacker who forces the model into expensive execution paths doesn’t crash your service. They drain your cloud budget while staying under every request-based rate limit you’ve set.

Documented incidents include $46,000/day consumption attacks against AWS Bedrock via stolen credentials (Sysdig’s LLMjacking research), and an $82,000 Gemini API bill in 48 hours from a single compromised key earlier this year. Standard rate limiters count requests, not cost. One request hitting a multi-step agentic workflow can cost 500x more than a cached response. Both count as one request. We covered the full attack pattern in denial of wallet.

Questions to ask: Do you rate-limit by tokens or by requests? Is there a hard spending cap per API key? How fast would you detect a 4,000% spike in token usage at 2 AM?

How Elevation of Privilege Chains Through Agent Tools

In traditional apps, privesc means a regular user gains admin access. Buffer overflow, misconfigured RBAC, path traversal to a config file. In AI systems, the model itself is the privilege boundary, and it’s terrible at enforcing one.

Excessive agency is the OWASP term. The model has access to tools, APIs, file systems, and external services. If the model can be tricked via prompt injection into calling those tools with attacker-controlled parameters, the attacker inherits every permission the model holds. Vibe-coded applications ship with admin routes unprotected because the AI never thought to add auth. MCP tool chains grant the agent capabilities the developer never scoped. Each connected tool is another capability an attacker inherits. The full picture of how OWASP LLM Top 10 chains together in production shows why this category sits at the top of every real incident.

The NVIDIA AI Kill Chain maps this as the hijack phase: the attacker takes control of the model’s behavior, then uses its legitimate tool access to reach systems the attacker could never touch directly.

Questions to ask: What’s the least privilege set this agent actually needs? Can the model invoke destructive operations without human approval? Are tool permissions scoped per-session or standing?

Toxsec.com - Prompt injection elevates Agents.

Beyond STRIDE: MAESTRO, ASTRIDE, and Shostack’s Four Questions

STRIDE gives you the vocabulary. It tells you what can go wrong. But it was designed for applications with predictable execution paths, and AI breaks that assumption at the architectural level. Three extensions fill the gaps.

STRIDE-AI (Mauri & Damiani, 2021 IEEE CSR) was the first formal adaptation. It maps STRIDE categories to ML-specific assets across the full pipeline: training data, model weights, inference APIs, and deployment artifacts. The contribution is making ML assets first-class citizens in the threat model instead of afterthoughts.

ASTRIDE (December 2025) is the first STRIDE-derived extension purpose-built for agentic systems. It adds a seventh category, “A” for AI Agent-Specific Attacks, covering prompt injection, unsafe reasoning-driven tool use, and context window manipulation. The framework leans hard into automated diagram-driven analysis using vision-language models.

MAESTRO (Cloud Security Alliance, February 2025) takes a different approach entirely: seven architectural layers from foundation models through reasoning and communication, each evaluated for AI-specific threats like multimodal injection, hallucination exploitation, and cross-layer threat chaining. Where STRIDE asks “what can go wrong at each component,” MAESTRO asks “what can go wrong at each layer of the AI stack.”

Adam Shostack’s Four Questions remain the backbone regardless of framework: What are we working on? What can go wrong? What are we going to do about it? Did we do a good enough job? Recent Microsoft guidance reinforces that AI threat modeling only works when grounded in the system as it truly operates, where the prompt assembly pipeline is a first-class security boundary.

That's the framework.

Behind the wall: the copy-paste prompt that runs a full STRIDE-AI pass against your own architecture in one shot, the seven red flags that mean you're already exposed, and the exact three-layer circuit breaker that catches denial-of-wallet before the $82K invoice lands.

Free subs get the theory. Paid subs get the kit.

User's avatar

Continue reading this post for free, courtesy of ToxSec.

Or purchase a paid subscription.
© 2026 Christopher Ijams · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture