ToxSec - AI and Cybersecurity

ToxSec - AI and Cybersecurity

Premium

Promptfoo Red Teaming: DAST for Your LLM Pipeline

YAML config, one command, 50+ attack plugins. OpenAI just bought the company. Still MIT licensed.

ToxSec's avatar
ToxSec
May 09, 2026
∙ Paid
Promptfoo red teaming LLM vulnerability scanner tutorial showing YAML config attack plugins strategies and web UI results for AI security testing.

TL;DR: Promptfoo is an open-source CLI for evaluating and red teaming LLM apps. YAML config, 50+ attack plugins, built-in OWASP LLM Top 10 presets, and a web UI that shows exactly where your model broke. OpenAI acquired the company in March 2026, terms undisclosed. It stays MIT licensed and open source. One command generates hundreds of adversarial test cases and scores them automatically.

This is the public feed. Upgrade to see what doesn’t make it out.

Why Promptfoo Is the Red Team Tool Your Dev Team Will Actually Use

Security tools that only security people run don’t stop bugs from shipping. They catch bugs after the damage is done. The tool that stops a vulnerable LLM from hitting production is the one that sits in the build pipeline and blocks the deploy.

Promptfoo is that tool. It’s a CLI and Node.js library for evaluating and red teaming LLM applications. YAML-configured, CI/CD-native, and designed for the developer workflow: define your target, pick your plugins, run the scan, read the web UI. The red team mode auto-generates adversarial prompts using 50+ attack plugins across prompt injection, jailbreaks, PII leakage, SSRF, SQL injection, excessive agency, hallucination, and more. It ships with OWASP LLM Top 10 presets, NIST AI RMF mappings, and MITRE ATLAS coverage. One line in your config enables an entire compliance framework’s worth of testing.

The pedigree: 10.4k GitHub stars, 350,000+ developers, 130,000 active monthly users, and adoption at 25% of Fortune 500 companies. OpenAI and Anthropic both ran it internally before OpenAI acquired the company on March 9, 2026. Acquisition terms were undisclosed, though Promptfoo had been valued at $86 million at its July 2025 Series A. The repo stays open source under MIT and lives at github.com/promptfoo/promptfoo.

The difference between Promptfoo and the other tools in this space: your dev team will actually adopt it. YAML configs live in your repo. Results render in a browser. CI/CD integration means red teaming runs on every PR. No Python notebooks, no manual orchestration, no “let the security team handle it.” Security shifts left to where the code is written. Garak gives us the broad CLI sweep across known probe families. PyRIT runs the surgical multi-turn follow-up. Promptfoo is the one that sits in the pipeline and blocks the merge.

Toxsec.com - Promptfoo, Garak, or PyRIT.

Plugins, Strategies, and the YAML That Runs It All

Three concepts drive Promptfoo’s red team architecture.

Plugins generate adversarial inputs targeting specific vulnerability classes. harmful generates prompts that attempt to elicit dangerous content. jailbreak tests guardrail bypass resistance. hijacking checks whether an attacker can redirect the model’s behavior. pii:direct, pii:session, and pii:social test for PII leakage through different vectors. ssrf, sql-injection, shell-injection test for the exact agent-level attacks that bounty programs pay for. Framework presets bundle related plugins: owasp:llm enables the full OWASP LLM Top 10 suite. owasp:agentic covers the newer OWASP Top 10 for AI Agents.

Strategies determine how those adversarial inputs get delivered. prompt-injection wraps payloads in injection frames. jailbreak applies DAN-style bypass techniques. crescendo runs multi-turn escalation where each message builds on the last. These are the same attack patterns we’ve been stacking against guardrails manually, except Promptfoo automates the generation and delivery.

The YAML config ties everything together.

# promptfooconfig.yaml
targets:
  - id: openai:gpt-4o
    label: customer-service-bot

  # Or hit your own endpoint:
  - id: 'https://api.yourapp.com/chat'
    config:
      method: 'POST'
      headers:
        'Content-Type': 'application/json'
      body:
        message: '{{prompt}}'
      transformResponse: 'json.response'

redteam:
  purpose: >
    Customer service chatbot for an airline.
    Users can check flight status, book tickets,
    and manage reservations.
  plugins:
    - owasp:llm          # Full OWASP LLM Top 10
    - harmful
    - pii
    - ssrf
    - excessive-agency
  strategies:
    - jailbreak
    - prompt-injection
    - crescendo

That config scans your chatbot across every OWASP LLM Top 10 category, tests for PII exposure, checks for SSRF, and applies three different delivery strategies to each attack. The purpose field matters. Promptfoo uses it to generate contextually relevant adversarial prompts. An airline chatbot gets probes about frequent flyer data and booking system access. A healthcare app gets probes about patient records and HIPAA violations.

Run it:

npm install -g promptfoo
promptfoo redteam init my-scan --no-gui
# Edit promptfooconfig.yaml with the config above
promptfoo redteam run

Generation takes about five minutes. The scan runs every generated test case against your target, grades each response using an LLM judge, and renders the results in a web UI. Red means it broke. Green means it held. Click any finding to see the exact adversarial prompt, the model’s response, and the grader’s reasoning.

The Promptfoo Report Card You Can’t Argue With

Here’s what makes Promptfoo dangerous for complacent teams. The web UI generates a compliance report card. OWASP LLM Top 10, NIST AI RMF, MITRE ATLAS. Each framework’s relevant controls mapped to your scan results. Green checkmarks where you passed. Red flags where you failed. Severity ratings. Evidence trails.

Your chatbot just failed three OWASP categories across 23 individual test cases. The prompt-injection plugin found that jailbreak-wrapped requests bypass your system prompt 40% of the time. The pii plugin extracted customer email addresses through a social engineering frame. The excessive-agency plugin got the model to attempt API calls it shouldn’t have access to.

All documented. All reproducible. All sitting in a web dashboard your engineering manager can read without knowing what a jailbreak is. That’s the part that changes behavior. Security findings buried in JSONL logs get ignored. Security findings rendered in a color-coded dashboard with OWASP mappings get fixed.

And every finding has a timestamp, a conversation transcript, and a grader explanation. That’s your bounty submission evidence. That’s your compliance audit trail. That’s the artifact your CISO shows the board when they ask “how do we know our AI is secure?”

We dropped the free chapters. Now breach the wall for the dead-simple step-by-step kill switch that shuts this all down.

User's avatar

Continue reading this post for free, courtesy of ToxSec.

Or purchase a paid subscription.
© 2026 Christopher Ijams · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture