AI Security 101

AppSec has rules. Input validation, buffer boundaries, SQL parameterization: the primitives are old and well-mapped. You learn them once and they transfer.

AI security doesn’t play by those rules. The attack surface is semantic. You’re looking for a place where natural language becomes unintended execution. The model is both the application logic and the parser. Attacker-controlled input goes straight into the thing that decides what happens next.

That’s a new problem. Most of the tooling, most of the mental models, and most of the defenses are still catching up.


The Threat Landscape: Five Layers

Think of AI systems as a stack. Each layer has its own threat class.

Training Layer. This is where the model gets its knowledge. Poison the training data and you corrupt the model’s behavior at the source. Backdoor attacks hide trigger conditions in the weights. The model behaves normally until it sees a specific input, then it does something else entirely. Supply chain attacks live here too. You don’t have to train your own poisoned model when someone else’s compromised weights are already on Hugging Face.

Inference Layer. The model is running and someone is talking to it. Prompt injection is the headline attack here. An attacker embeds instructions inside content the model reads: a webpage, a document, a support ticket. The model follows those instructions instead of the user’s. Direct injection comes from the user. Indirect injection comes from the environment the model is operating in. Indirect is nastier because the victim never typed a thing.

Context and RAG Layer. Retrieval-Augmented Generation means the model pulls in external documents before it responds. Poison those documents and you control what context the model reasons over. The model trusts its context window. Attackers love trusted channels.

Agent Layer. This is where the blast radius goes vertical. An agent has tools: browser access, code execution, email, API calls. A successful prompt injection against an agent doesn’t just produce bad output. It runs commands, exfiltrates data, and pivots through connected systems. Think RCE with a natural language interface.

Supply Chain Layer. AI applications depend on packages, models, and datasets from third parties. Slopsquatting is what happens when AI-generated code hallucinates a package name that doesn’t exist and an attacker registers it with a malicious payload inside. The model writes the vector for its own supply chain compromise.


The Attack Loop

Every AI attack is a variation on the same chain:

Attacker controls input --> model treats it as instruction --> output lands in something that matters.

That’s it. The entire threat landscape collapses into that loop. The input might be a chat message, a web page the agent scraped, a document in a RAG store, or a hallucinated dependency in generated code. The “something that matters” might be a database, an inbox, an API, or another model in a multi-agent pipeline.

Once you internalize the loop, you can read any AI security research and immediately identify where in the chain the failure lives.


Vocabulary You Actually Need

Prompt Injection: attacker input overrides developer intent. The model follows the attacker’s instructions. Direct injection means the attacker talks to the model directly. Indirect injection means the attacker pre-poisons something the model reads.

Jailbreak vs. Guardrail Bypass: a jailbreak gets a model to produce output its safety training was supposed to prevent. A guardrail bypass defeats a specific filter or moderation layer. Related, but architecturally different. Jailbreaks attack the model. Bypasses attack the wrapper.

RAG (Retrieval-Augmented Generation): a pattern where the model retrieves relevant documents from a knowledge base before generating a response. Expands capability. Also expands the attack surface to include every document in that knowledge base.

LLM Agent: a model that can take actions in the world. Tools, APIs, memory, multi-step reasoning. The difference between an agent and a chatbot is the difference between a terminal session and a read-only file. One produces output. The other executes.

Model Supply Chain: the dependency graph for AI: training data, base models, fine-tuning datasets, third-party integrations. Anything upstream of your model is a potential attack vector. Weights are software artifacts. Treat them accordingly.


Where to Go From Here

ToxSec publishes attack chains, not abstractions. Every piece on this site walks the threat from entry point to impact, with the payload logic shown and the weaponizable parts redacted.

If you’re new here, feel free to send me a message. Subscribers get two articles a week. One is written for people still learning the landscape. The other goes deep for practitioners who want the full chain. Regardless of where you’re starting from, there’s something here for you.

Subscribe below and come find out which one you are.


ToxSec is written by an AI security engineer with experience at big tech companies, the NSA, and defense contractors. The content is strictly practitioner-to-practitioner.