Mozilla Mythos Harness: AI Bug Hunting Without The Slop

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Mozilla Mythos Harness: AI Bug Hunting Without The Slop

Inside the agentic loop Mozilla wrapped around Mythos to surface 271 Firefox bugs, and why the harness mattered more than the model.

ToxSec

May 12, 2026

TL;DR: Mozilla wrapped Claude Mythos Preview in an agentic harness with one win condition: trip the sanitizer or keep working. The result was 271 Firefox bugs in one release, fewer than 15 false positives, and a defense-in-depth lesson nobody talks about. The model got the headlines. The harness did the work.

This is the public feed. Upgrade to see what doesn’t make it out.

What’s An Agentic Vulnerability Harness?

In agentic security work, a harness is the scaffold around the model. Tooling, prompts, build environment, retry loop, success signal, dedup, the lot. The model is the worker. The harness is the factory floor.

Mozilla’s earlier collaboration with Anthropic ran Claude Opus 4.6 against Firefox 148. That cycle pulled 22 security-sensitive bugs. Then they took the same harness, dropped in Anthropic’s cyber-tuned Claude Mythos Preview, and aimed it at Firefox 150. Same factory. Stronger worker. The output went from 22 to 271 bugs.

That delta is where the lesson lives. Model upgrades obviously help. But Mozilla’s harness was rebuilt across months of iteration with Firefox engineers fielding the incoming bugs, and you don’t replicate that on a Saturday afternoon. The Mythos preview is restricted access through Project Glasswing. The harness is a published pattern.

ToxSec.com - How to improve AI Bug Bounty Hunting.

Inside Mozilla’s Mythos Harness: Crash Or No Crash

Here’s how the loop works. The harness gives the model a slice of Firefox source, a target file or area to focus on, instructions on what to hunt for, and a build environment with one critical piece: a sanitizer build of Firefox compiled with AddressSanitizer. ASan is the runtime memory-error detector that screams loudly when you trigger a use-after-free, a heap overflow, or any other classic memory corruption primitive.

The model proposes a bug hypothesis. It writes a proof-of-concept designed to trip the sanitizer. It runs the PoC against the sanitizer build. If ASan crashes, the bug is real. If it doesn’t, the agent keeps iterating until it does or until the harness gives up.

text

loop:
    hypothesize_bug(target_source)
    write_poc()
    run_against_sanitizer_build()
    if asan_crash:
        emit_report(crash_log, repro)
        grade_with_secondary_model()
        break
    refine_or_continue()

Brian Grinstead, a Mozilla Distinguished Engineer, summed the operational shape to TechCrunch: “if you make it crash you win”. That’s the entire verification game. A second model grades resulting reports before the engineering queue ever sees them, kicking out anything the first model thought was a hit but couldn’t actually validate. Humans take over from there for triage and patching.

The bugs the harness surfaced run the gamut. A race condition over IPC that lets a compromised content process tamper with IndexedDB refcounts and trigger a use-after-free (Bug 2021894). A raw NaN smuggled across an IPC boundary masquerading as a tagged JavaScript object pointer, giving the parent process a fake-object primitive (Bug 2022034). A buffer over-read during HTTPS RR and ECH parsing, triggered by simulating a malicious DNS server through glibc function interception (Bug 2023958). Plus a 15-year-old HTML legend element bug and a 20-year-old XSLT reentrant key() call. Each is a sandbox escape primitive or memory corruption bug that would normally burn months of elite human researcher time. The harness surfaced them in days.

Why The Crash Signal Killed AI Bug Hunting Slop

AI-generated bug reports were a running joke in open source maintainer circles a few months ago. LLM hits codebase, dumps a hundred plausible-looking findings, every one needs a human to verify, and ninety-something percent are wrong. Mozilla’s own writeup describes earlier AI security work as producing “unwanted slop.” The cost asymmetry was brutal. Cheap for the AI, expensive for the maintainer.

Mozilla’s earlier static-analysis experiments with GPT-4 and Claude Sonnet 3.5 hit that wall. They produced too many false positives to be practical. So they binned static analysis and built the agentic harness instead. The shift is subtle but everything.

Static analysis says: this looks vulnerable. Human triage required.

Agentic harness with sanitizer verification says: this is vulnerable, here’s the PoC, ASan caught the crash. No human required to dispute reality.

Memory corruption is the perfect domain for that move because the success signal is binary. ASan tripped or it didn’t. There is no maybe. Mozilla counted fewer than 15 false positives across the entire 271-bug run, and they updated the harness each time one slipped through.

The lesson for everyone else is that AI bug hunting works the moment you can wire the agent to a verifier that doesn’t ask the model are you sure. A fuzzer crash. A unit test that passes. A property checker that proves invariance. Anything deterministic. Without that signal, you’re back to triage hell, which is the same hell every LLM vulnerability scanner lives in when it doesn’t ship its own ground truth.

ToxSec.com - Agentic Harness Eliminates AI Bug Hunting Slop.

What The Harness Couldn’t Bypass

Here’s the part the headlines skipped. The harness ran into a wall trying to escape Firefox’s sandbox via prototype pollution in the privileged parent process. The model attempted that path repeatedly. It got nowhere. Mozilla had previously frozen those prototypes by default as a defense-in-depth measure, and that single architectural decision blocked every attempt the agent made.

That’s the based take buried under the 271 number. The harness is good. It’s also bounded by the security architecture of the target. The bugs Mythos found are bugs an elite human could have found. The bugs it couldn’t find were already eliminated by Mozilla’s prior hardening. Your codebase will perform exactly as well as your prior security work let it.

Which brings us to the “anyone can do this today” framing Mozilla offered at the end of their writeup. Technically true. Operationally, optimistic.

Mozilla had Firefox’s full source. A pre-built sanitizer toolchain. Years of bug lifecycle tooling. A second model already wired into the verification pipeline. Over 100 contributors writing and reviewing patches. Months of harness iteration alongside the Firefox team. And, eventually, frontier-model access through Project Glasswing.

A small vendor pulling Mythos through an API later this year and pointing it at their codebase will not get the same numbers. The model is the same. The harness around it is the part you have to build. Mozilla published the pattern. The pipeline still costs what a pipeline costs. Firefox shipped 423 bug fixes in April 2026, compared to 31 a year earlier, and absorbing that volume takes operational muscle most teams don’t have lying around.

The 271 number is the headline. The harness is the artifact. Anyone shopping for AI bug hunting capability should price the second one before they get excited about the first. Your AI-generated bug reports are only as useful as the verifier behind them, and the same goes for AI-generated code, where the verification problem flips into supply chain attacks and slopsquatting at pip-install time. Wrap the same agentic loop around offense instead of defense, point it at live prompt injection chains, and the success signal flips from “ASan crashed” to “the guardrail broke.” Same shape. Different game.

ToxSec.com - Building Effective AI Bug Hunting.

Paid unlocks the unfiltered version: complete archive, private Q&As, and early drops.

Frequently Asked Questions

What is the Mozilla Mythos harness?

The Mozilla Mythos harness is the agentic scaffold Mozilla built around Anthropic’s Claude Mythos Preview to find security bugs in Firefox source code. It feeds the model target source, runs against a sanitizer build of Firefox, uses an AddressSanitizer crash as the deterministic success signal, and runs a retry loop until the agent produces a verified proof-of-concept. A second model grades reports before engineers see them.

How many Firefox vulnerabilities did Claude Mythos find?

Mozilla credits Claude Mythos Preview with surfacing 271 vulnerabilities fixed in Firefox 150, plus additional fixes shipped in versions 149.0.2, 150.0.1, and 150.0.2. Of the 271 bugs, 180 were rated sec-high, 80 sec-moderate, and 11 sec-low. Several were sandbox escape primitives. Mozilla reports fewer than 15 false positives across the entire run. Total Firefox security fixes in April 2026 hit 423.

Can other projects use the same AI bug hunting harness?

Mozilla published the pattern. The implementation is yours to build. The harness shape is reusable: target source, deterministic success signal (sanitizer crash, fuzzer hit, test failure), retry loop, second model grading reports. The build is project-specific. You need the codebase, the sanitizer toolchain, the bug lifecycle tooling, and the engineers to absorb the patch volume. Pattern is free. Pipeline is the work.

ToxSec is run by an AI Security Engineer with hands-on experience at the NSA, Amazon, and across the defense contracting sector. CISSP certified, M.S. in Cybersecurity Engineering. He covers AI security vulnerabilities, attack chains, and the offensive tools defenders actually need to understand.