ToxSec - AI and Cybersecurity

ToxSec - AI and Cybersecurity

Premium

Garak Vulnerability Scanner: Nessus for LLMs

Point it at a model. Pick your probes. Watch every guardrail break in JSONL.

ToxSec's avatar
ToxSec
May 06, 2026
∙ Paid
Garak NVIDIA LLM vulnerability scanner tutorial showing probes detectors generators and CLI output for AI security testing and bug bounty.

TL;DR: Garak is NVIDIA’s open-source LLM vulnerability scanner. Point it at a model, pick your probes, and it fires hundreds of known attack patterns across prompt injection, jailbreaks, encoding bypasses, data leakage, and toxicity. CLI-first, plugin-based, fast. Your model just failed 47 probes across six categories. Now what?

This is the public feed. Upgrade to see what doesn’t make it out.

What Is Garak and Why You Run It First

Nobody ships a web app without running a vulnerability scanner against it first. Nikto, Nessus, nuclei. Pick your poison, point it at the target, let it rip through known attack patterns, then read the report. LLMs ship without this step every single day.

Garak fixes that. The Generative AI Red-teaming and Assessment Kit is NVIDIA’s open-source LLM vulnerability scanner, built by their AI Red Team and backed by a research paper, 7.5k GitHub stars, and an active Discord. The latest stable release is v0.14.1, shipped April 2026, so the project is actively maintained and shipping. The tool probes your model’s defenses while looking completely benign.

The workflow is simple. Install. Point it at a model. Pick probes (or let it pick all of them). Garak fires every probe, runs each prompt multiple times to account for the model’s stochastic output, scores responses through detectors, and writes a structured JSONL report. One command, hundreds of attack vectors, a complete audit trail.

Garak covers the attack categories that matter: prompt injection, DAN-family jailbreaks, encoding-based guardrail bypasses, data leakage, package hallucination (the slopsquatting vector), toxicity generation, malware generation attempts, cross-site scripting through LLM output, hallucination, and glitch token exploitation. 37+ probe modules, each containing multiple individual probes. The dan module alone ships with about fifteen scannable variants spanning DAN 6.0 through 11.0, plus STAN, DUDE, AntiDAN, and ChatGPT Developer Mode. The encoding module covers Base64, Base16, Base32, ROT13, Morse, Braille, ASCII85, hex, and more.

Think of Garak as Nessus before the pentest. We’re mapping the attack surface. Which probes get through. Which get blocked. Where the filters are soft. That scan data tells us where to aim our manual prompt injection chains. And once Garak flags the broken families, PyRIT picks up the deep, adaptive multi-turn follow-up.

Toxsec.com Garak Vulnerability Scanner.

Generators, Probes, and Detectors: The Three Moving Parts

Garak’s architecture has three components that matter.

Generators are our connection to the target. OpenAI API, Hugging Face (pipeline and inference), AWS Bedrock, Cohere, Groq, Mistral, Ollama for local models, NVIDIA NIM endpoints, Replicate, LiteLLM, and custom REST APIs. If the model accepts text over an API, Garak can hit it.

# Scan an OpenAI model for encoding-based injection
export OPENAI_API_KEY="sk-[REDACTED]"
python3 -m garak --target_type openai --target_name gpt-5-nano --probes encoding

# Scan a local Ollama model for DAN jailbreaks
python3 -m garak --target_type ollama --target_name llama3 --probes dan

# Scan a Hugging Face model for everything
python3 -m garak --target_type huggingface --target_name meta-llama/Llama-3-8b --probes all

Probes generate the attack payloads. Each probe module targets a specific vulnerability class and contains multiple individual prompts. Garak sends each prompt to the model ten times by default. Ten generations per prompt. That repetition matters because LLM output is non-deterministic. A model that refuses a jailbreak nine times out of ten still has a 10% bypass rate, and that 10% is a finding worth documenting.

The probe taxonomy maps directly to known vulnerability classes. promptinject implements the Agency Enterprise PromptInject framework for hijacking attacks. dan runs the full DAN family. encoding tests whether the same encoding stacks we use manually scale up to automation. leakreplay and knownbadsignatures check for training data extraction and malware signature generation. packagehallucination tests whether the model invents package names that don’t exist on PyPI or npm.

Detectors evaluate the output. Simple string matching for known bad signatures. Classifier-based detection using small models for toxicity scoring. LLM-as-judge for nuanced cases. Each probe ships with a primary detector and optional extended detectors. A probe fires, the model responds, the detector scores pass or fail, and the result hits the JSONL log.

Garak Scan: CLI Output: Garak LLM vulnerability scanner CLI output showing dan, encoding, promptinject, and leakreplay probe modules with progress bars and pass-fail rates against an OpenAI gpt-5-nano target.

The Garak Scan That Matters

Here’s what a real Garak scan surfaces. Point it at your production chatbot endpoint. Pick a handful of probe modules: dan, encoding, promptinject, leakreplay. Run it. Maybe twenty minutes depending on rate limits.

The report comes back. Your model held against DAN 6.0 through 9.0. Good. But DAN 11.0 and Developer Mode v2 both scored failures. The encoding module found that Base64-encoded prompts bypass your input filter entirely: 80% failure rate across ten generations. promptinject hijacking probes landed at 30%. leakreplay found the model regurgitating training data snippets when prompted with specific continuation patterns.

Four vulnerability classes confirmed in one scan. Base64 bypass alone maps to LLM01:2025 in the OWASP Top 10 for LLMs, the top-ranked vulnerability. The DAN failures map to LLM01 too. The training data leakage maps to LLM02:2025 (Sensitive Information Disclosure), and a packagehallucination hit would map to LLM03:2025 (Supply Chain). Each finding has a full JSONL trail: exact prompts sent, exact responses received, detector verdicts, timestamps.

Garak Scan: JSONL Hit: Garak LLM vulnerability scanner JSONL hit log entry showing a single encoding.InjectBase64 prompt injection attempt with redacted payload, detector verdict, and timestamp evidence chain for bug bounty reproduction.

This is the part that should bother you. One command. Garak does the rest. Every model deployed without running this scan has the same holes.

We dropped the free chapters. Now breach the wall for the dead-simple step-by-step kill switch that shuts this all down.

User's avatar

Continue reading this post for free, courtesy of ToxSec.

Or purchase a paid subscription.
© 2026 Christopher Ijams · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture