PyRIT AI Red Teaming: Metasploit for LLMs

Microsoft’s AI red team framework breaks down targets, converters, scorers, and orchestrators for bug bounty work.

May 03, 2026

TL;DR: PyRIT is Microsoft’s open-source AI red team framework, battle-tested on 100+ internal operations. It chains targets, converters, scorers, and orchestrators into automated LLM attack campaigns. Converters stack like payload encoders. Orchestrators run Crescendo and TAP, the multi-turn patterns bounty programs pay out on right now. Here’s how to wire it up.

This is the public feed. Upgrade to see what doesn’t make it out.

Why PyRIT Matters for AI Bug Bounty Work

Pen testers have Metasploit. Web app hunters have Burp. AI red teaming, until recently, had a guy in a tab retyping “ignore all previous instructions” forty different ways and hoping one of them landed.

PyRIT changes the shape of the work. The Python Risk Identification Tool is Microsoft’s open-source framework for running structured attack campaigns against LLM systems. Microsoft’s AI Red Team built it, ran it against more than a hundred internal operations including Phi-3 and Copilot, then open-sourced the whole thing. The repo sits at github.com/microsoft/PyRIT with 3.6k stars as of April 2026, up from 3.4k at the start of the year. It’s moving fast.

Here’s why we care. The Microsoft Security Response Center tied PyRIT directly to their AI bounty program. They’re telling researchers to use it. Bounty platforms are paying out on automated multi-turn chains against frontier models right now: system prompt leaks, guardrail bypasses, indirect injection through agent tools. The framework chains attack primitives together the same way Metasploit chains exploits, scores every result, and logs every transcript for the bounty write-up.

What Are PyRIT’s Four Core Primitives?

Every piece of PyRIT maps to something we already know from offensive tooling. Once the mapping clicks, the rest falls into place.

Targets are the scope. A target is whatever we point prompts at: Azure OpenAI, a Hugging Face model, a local Ollama instance, or a custom HTTP endpoint via the HTTPTarget class. Ship-built target classes cover every major provider. HTTPTarget swallows anything that accepts text over a REST API.

PyRIT framework architecture diagram showing four AI red team primitives — targets, converters, scorers, orchestrators — and how they chain into automated multi-turn LLM attack campaigns.

Converters are payload encoding. A converter transforms a prompt before it hits the target.

Base64
ROT13
Leetspeak
ASCII art
Unicode substitution
Translation to a low-resource language

The same encoding evasion tricks we’ve been hand-stacking against input filters, now programmatic. And converters stack. The output of one feeds the next. Translate to Zulu, then Base64, then wrap in a roleplay frame. Three converters, one pipeline. The model reads us clean. The input filter sees noise.

from pyrit.prompt_converter import Base64Converter, TranslationConverter

# Stack converters: Zulu, then Base64
converters = [
    TranslationConverter(converter_target=attack_llm, language="zulu"),
    Base64Converter()
]

Scorers are the success criteria. After the target responds, a scorer decides if the attack landed. Binary true/false (”did it comply?”), Likert scale (”how harmful, 1 to 5?”), refusal detection (”did it say no?”), or LLM-as-judge where a separate model grades the response. Hunting for system prompt leaks? SelfAskTrueFalseScorer tuned for instruction disclosure. Testing for harmful content? Use a content classifier. The more specific the description, the cleaner the verdict.

Orchestrators are the exploit framework. They wire targets, converters, and scorers together and drive the flow. PromptSendingOrchestrator is the basic spray: batch single-turn prompts through a converter stack. RedTeamingOrchestrator runs multi-turn conversations where an attacker LLM generates follow-ups from what the target just said. CrescendoOrchestrator escalates gradually across turns. TreeOfAttacksWithPruningOrchestrator explores multiple paths in parallel and prunes dead branches.

Under all of this sits a memory layer. SQLite or Azure SQL logs every prompt, every converter transform, every score. Conversation IDs. Timestamps. Raw responses. That’s our chain of custody when a Crescendo chain lands on turn six and we need to turn it into a clean bounty report.

How Do You Run a PyRIT Campaign?

Install is clean. Conda env, pip, done.

conda create -n pyrit python=3.11 -y
conda activate pyrit
pip install pyrit

PyRIT runs in Jupyter notebooks, which is actually ideal. Interactive execution, inline output, a natural lab book for the campaign. Microsoft ships their entire documentation as runnable notebooks, which is either genius or annoying depending on your mood.

The simplest campaign is PromptSendingOrchestrator: fire a batch of prompts, apply a converter stack, score every response. Define the target (Azure OpenAI, HTTPTarget, Ollama, whatever), define a scorer with a sharp true/false description, hand it a list of prompts. PyRIT does the rest.

Think of it as Nmap before the real work. We’re mapping the surface. Which probes get through. Which get blocked. Where the filters are soft. And the real value shows up the moment we go multi-turn.

Crescendo and TAP: Where Multi-Turn Attacks Land

Single-turn prompt injection is 2023 energy. Frontier models got good at catching individual malicious prompts. The DAN-style one-shot jailbreaks that used to work now trip intent classifiers on contact. Multi-turn attacks still land. The exploit lives in the trajectory across turns, never in one message.

PyRIT’s CrescendoOrchestrator automates the boil-the-frog pattern. Start with an innocent question. Reference the model’s own answer. Shift the frame. By turn six, the guardrails have lost the thread. Per-message safety checks evaluate individual messages in isolation. Crescendo operates on the arc of the conversation, where no single turn looks dangerous.

from pyrit.orchestrator import CrescendoOrchestrator

orchestrator = CrescendoOrchestrator(
    objective_target=target,
    adversarial_chat=attack_llm,
    scoring_target=scoring_llm,
    max_turns=10,
    objective="[REDACTED - bounty objective]"
)

result = await orchestrator.run_attack_async(
    objective="[REDACTED]"
)

An adversarial LLM generates each turn from the target’s last response. The scoring target evaluates after each exchange. If the objective lands, the campaign stops and logs the winning conversation. If it hits max turns without success, we get the full transcript to analyze manually, which is often where the interesting near-misses hide.

TreeOfAttacksWithPruningOrchestrator (TAP) takes a different shape. Instead of one thread, it explores multiple attack paths in parallel. Branches the scorer rates as progressing get expanded. Dead ends get pruned. Breadth-first search through prompt space, but cheap, because failing branches die fast.

Both patterns map directly to techniques paying out right now. Microsoft’s own AI Red Team Playground Labs use PyRIT to automate Crescendo as training exercises. OWASP lists prompt injection as LLM01:2025. The NVIDIA AI Kill Chain frames these multi-turn patterns as the hijack stage. The taxonomy is there. The tooling is there. The payouts are there.

For hunters targeting the agent attack surface (indirect injection through tools, markdown exfiltration, MCP poisoning), PyRIT ships XPIAOrchestrator for cross-domain prompt injection attacks that embed malicious instructions in external data sources. Point it at the surface where agents ingest untrusted content and it runs.

The workflow flips. Instead of testing one bypass at a time in a chat tab, we define ten converter chains, twenty prompts, and let PyRIT score two hundred combinations while we go get coffee. When something scores true, we pull the transcript from memory, write the report, submit.

PyRIT doesn’t find vulnerabilities on its own. Same way Metasploit doesn’t hack anything without an operator who understands the surface. But it compresses hours of manual prompt iteration into minutes of automated campaign runs. For AI bounty work in 2026, that’s the difference between testing five ideas in a session and testing five hundred.

Paid unlocks the unfiltered version: complete archive, private Q&As, and early drops.

Frequently Asked Questions

Is PyRIT free to use for bug bounty hunting?

PyRIT itself is free and open source under an MIT license. Costs come from the LLMs you wire in: Azure OpenAI credits, OpenAI API tokens, or local compute via Ollama. For bounty work, running a local model as the adversarial and scoring LLM keeps costs near zero. Only the target endpoint burns external credits, and authorized bounty targets are free to hit by definition.

Does PyRIT work against AI agents with tool access, not just chatbots?

Yes, via XPIAOrchestrator for cross-domain prompt injection that embeds malicious instructions in external data sources. This hits the indirect injection surface where agents process untrusted content from emails, documents, MCP tool returns, or RAG stores. For deeper agent-specific testing, chain PyRIT with custom targets that simulate tool-augmented workflows end to end.

How does PyRIT compare to Garak and Promptfoo?

Different tools, different strengths. Garak is NVIDIA’s broad-spectrum vulnerability scanner, closer to Nmap for LLMs. Promptfoo is CI/CD-first, built for regression-testing safety layers in a pipeline. PyRIT is the deep, adaptive multi-turn attack engine. Garak sweeps the surface, PyRIT runs the surgical follow-up, Promptfoo keeps patches from regressing. Together, that’s a full kill chain methodology for LLM red teaming.

ToxSec is run by an AI Security Engineer with hands-on experience at the NSA, Amazon, and across the defense contracting sector. CISSP certified, M.S. in Cybersecurity Engineering. He covers AI security vulnerabilities, attack chains, and the offensive tools defenders actually need to understand.

Discussion about this post

Ready for more?