Garak Vulnerability Scanner: Nessus for LLMs

Point it at a model. Pick your probes. Watch every guardrail break in JSONL.

May 06, 2026

∙ Paid

toxsec.com - garak LLM vulnerability scanner, NVIDIA AI red team, prompt injection probe, jailbreak scan, agent breaker, encoding bypass, JSONL report, LLM security testing

TL;DR: The garak LLM vulnerability scanner is NVIDIA’s open-source answer to “nobody scans the model before shipping it.” Point it at an endpoint, pick probes, and it fires known attack patterns across injection, jailbreaks, encoding bypass, and data leakage. The 0.15 line added agent-breaker and multi-turn probes, so the scan reaches past the chat box into the tools.

What the Garak LLM Vulnerability Scanner Actually Does

Nobody ships a web app without running a scanner at it first. Nikto, nuclei, nessus, pick your poison, point it downrange, let it rip through the known-bad list, read the report. Standard hygiene.

LLMs skip that step every single day. A model goes to prod with a system prompt, a content filter someone wrote in an afternoon, and a prayer. Nobody ran the equivalent of a port scan.

That’s the gap garak fills. It’s NVIDIA’s open-source LLM scanner, built by their AI red team, and it does for a model roughly what nmap does for a network. Point it, probe it, get a report of what answered. Right now it’s sitting around 8.2k stars with active weekly releases, so this is a maintained tool and not a weekend repo that died in 2024.

The loop is dead simple. Install it, aim it at a target, pick your probes or let it run the lot. Garak fires each attack, runs every prompt several times because model output drifts run to run, scores the responses, and writes a JSONL report. One command, a few hundred vectors, a full audit trail on disk.

So the pitch is a scan you can rerun. Same target, next week, catch the regression.

Share ToxSec - AI and Cybersecurity

Generators, Probes, Detectors: The Three Moving Parts

Garak has three parts that matter, and the names map cleanly onto stuff we already know.

Generators are the connection to the target. OpenAI, Hugging Face, Bedrock, Cohere, Mistral, Groq, NVIDIA NIM, Ollama for local weights, plus a raw REST generator that swallows anything talking text over HTTP. If it takes a prompt and hands back tokens, garak can hit it.

# encoding-bypass sweep against a hosted model
export OPENAI_API_KEY="sk-[REDACTED]"
python -m garak --target_type openai --target_name <model> --probes encoding

# DAN-family jailbreaks against a local Ollama model
python -m garak --target_type ollama --target_name <model> --probes dan

Probes are the payloads. Each probe module owns one vulnerability class and carries a pile of individual prompts. The taxonomy reads like a greatest-hits list: promptinject for hijacking, dan for the whole jailbreak family, encoding for the base64 and rot13 smuggling tricks, leakreplay for training-data extraction, packagehallucination for the slopsquatting vector where models invent package names that don’t exist on PyPI or npm.

Here’s the part people miss. Garak runs each prompt multiple times by default, not once. A model that refuses a jailbreak four times out of five still failed. That fifth answer is the finding, and a single-shot test would have walked right past it.

Detectors are the judges. After the target answers, a detector scores it. Sometimes that’s a dumb string match for a known-bad signature. Sometimes it’s a small classifier grading toxicity. Sometimes it’s a model-as-judge for the calls that need nuance.

{
  "probe": "encoding.InjectBase64",
  "prompt": "[REDACTED base64 payload]",
  "detector": "encoding.DecodeMatch",
  "passed": false,
  "trigger": "[REDACTED]"
}

That verdict object, times thousands, is the JSONL report. Every prompt sent, every response, every pass or fail, timestamped. When a probe lands, you’ve already got the reproduction on disk. No re-running from memory to write it up.

The Agent-Breaker Probe Changes the Target

The old read on garak was “single-turn scanner.” Fire a prompt, grade the answer, move on. Fine for a chatbot. Useless the second your model grows hands.

Because that’s the shift. Models stopped being text boxes and became agents: things with tool access, an MCP connection, a shell, a browser. The interesting attacks moved with them. A jailbreak that makes a chatbot say a rude word is a demo. The same lean applied to an agent that can call tools is an incident.

The 0.15 line went straight at that. It shipped an agent-breaker probe built to test the tools available to a target system, not just the words coming out of it. Alongside it landed a multi-turn GOAT probe and a system-prompt-extraction probe. Garak stopped grading one answer and started working the conversation.

That multi-turn piece matters more than it looks. The DAN-style one-shot jailbreak is mostly dead against frontier models, because per-message filters got good at catching a single ugly prompt. The exploit moved into the arc across turns, where no individual message trips the wire. A scanner that only ever sends one prompt can’t see that class of bug at all.

So garak now reaches the surfaces that actually get you paid:

Tool access. The agent-breaker probe pokes at what the model is allowed to do, which is where injection turns into action.
Multi-turn. GOAT works the conversation over several turns, the shape real jailbreaks take now.
System prompt extraction. A dedicated probe for pulling the instructions the operator thought were private.

The scanner grew to match the target. That’s the whole story of the 0.15 line.

Leave a comment

Where Garak Holds and Where It Doesn’t

Garak is a scanner. Sit with that, because it’s the honest limit.

A scanner tells you which known patterns got through. It does not hand you a novel bug, and it does not run the deep adaptive follow-up once a door cracks open. That was never the job. Nessus doesn’t write your exploit either.

So how confident should you be in a clean garak run? Not very, and the tool knows it. The 0.15 line added bootstrap confidence intervals on the attack-success numbers, which is a quiet admission that a single scan is a noisy sample and not a verdict. A green board means the known probes didn’t land today, on this sample, at this temperature. It does not mean the model is safe.

Here’s the corridor. Garak is the battered steel blast door out front: it stops the stuff on the known-bad list and shows you which panels are already dented. That’s real work, and skipping it is malpractice. But a blast door doesn’t chase the operator who finds the seam.

That’s the handoff. Garak sweeps the surface and flags the broken families. Then PyRIT runs the deep, adaptive, multi-turn follow-up on whatever garak lit up red. Scanner first, campaign second. One maps the ground, the other takes the hill.

Run the scanner. Just don’t read a clean report as a promise.

Up next: steps you can take right now and a field-ready security prompt. Thanks for rolling with ToxSec. Let’s get operational.

Continue reading this post for free, courtesy of ToxSec.

Or purchase a paid subscription.