ToxSec - AI and Cybersecurity
ToxSec - AI and Cybersecurity Podcast
Distillation Raids, Slopsquatting, and the Agent Trap
0:00
-52:20

Distillation Raids, Slopsquatting, and the Agent Trap

Model distillation raids, slopsquatting supply chain exploits, and indirect prompt injection are the three attack vectors carving through the 2026 AI stack right now.

TL;DR: Cloudflare blocks 230 billion threats per day and just dropped the receipts. Bots are running 94% of all login attempts. Attackers are measuring ROI per exploit. And the three attack vectors nobody’s patching: model distillation raids, slopsquatting, and indirect prompt injection, are carving through the AI stack wide open.

The attack chain is free. Subscribers get the exact fixes that lock us out.


0x00: The Internet Runs on Robots Now and They’re Mostly Hostile

Cloudflare sits in front of roughly 20% of global web traffic, which makes their threat data as close to ground truth as we get. Their Cloudforce One team just published the inaugural 2026 Threat Report, and the headline stat ruins your morning: 94% of all login attempts come from bots. Automated scripts, running 24/7. Of all login attempts, bot and human combined, 63% involve credentials already compromised elsewhere

The bigger finding: attackers have stopped chasing complexity. They run ROI calculations now. Why spend $200K on a zero-day when a stolen session token gets the same access for free? Three AI attack chains are delivering the best returns right now. Here’s how each one works.

Bar chart from Cloudflare 2026 Threat Report showing bot-driven login attempts at 94%, with 46% of human login attempts using previously compromised credentials — dark themed, industrial green highlights.

Signal boost this before someone else gets owned.

Share ToxSec - AI and Cybersecurity


0x01: Distillation Raids: 16 Million Stolen Conversations

Quick concept: a large AI model costs billions and years to train. Distillation is the shortcut — you feed a smaller model the outputs of the big one until it starts mimicking it. Legit labs do this internally. The attack version skips the R&D bill entirely.

Anthropic just named three Chinese labs — DeepSeek, Moonshot AI, and MiniMax — for running this against Claude. The numbers: 24,000 fraudulent accounts, over 16 million total exchanges, coordinated to dodge rate limiting. DeepSeek’s technique was sharp: their accounts asked Claude to walk through its own reasoning step by step, generating chain-of-thought data — transcripts of how Claude thinks, not just what it says. Premium training material. Anthropic traced them through traffic patterns, payment metadata, and canary tokens: unique strings planted in training data specifically to fingerprint unauthorized extraction.

The real problem isn’t the IP theft. When you distill a model by extraction, the safety guardrails don’t survive the copy. The raw capability does. That stripped-down version is exactly what you want for offensive operations, and Anthropic says that’s where some of this is headed.

Diagram showing distillation attack flow — fraudulent accounts firing coordinated API calls at a frontier model, extracting chain-of-thought outputs, feeding into downstream training pipeline — dark industrial aesthetic.

How many API requests hit your stack before your anomaly detection actually fires?

Leave a comment


0x02: Your Agent Got Owned While Summarizing a Blog Post

If you use an AI agent — any tool that browses the web, reads documents, and takes actions on your behalf — this applies to you.

Prompt injection is slipping malicious instructions into an AI’s input. The direct version, you’re talking to the AI and you sneak in the payload. Indirect is sneakier: attackers seed instructions into web content and wait for an agent to find them. No targeting required.

The specific surface getting hit right now is URL summarization. Agents do this constantly. Attackers embed hidden commands inside articles and landing pages, formatted to look like a new instruction from you. The AI reads the page, hits the injected text, and can’t distinguish “content I’m processing” from “orders from my operator.” It obeys. Your agent forwards session data or exfils credentials while you’re looking at a clean summary on your screen.

Browser-based agent summarizing a webpage, mid-task, with hidden prompt injection payload visible in page source — agent output shows injected instruction executing: “forwarding session context to external endpoint”

0x03: Slopsquatting: The Vibe Coder Tax

Vibe coding is letting an AI write your software while you describe what you want. Fast, popular, and it has a failure mode attackers are already monetizing.

AI coding tools hallucinate package names. A package is a pre-built code library your project pulls in rather than writing from scratch. When your AI writes code that needs one, it sometimes invents a name that sounds real but doesn’t exist. A 2025 study across 576,000 generated code samples found this happens roughly 20% of the time. The critical detail: 43% of hallucinated names repeat consistently. That makes them predictable, and predictable means registerable.

The proof is live. A Lasso Security researcher found LLMs consistently hallucinated huggingface-cli as a Python package. She registered it with nothing inside and logged 30,000 downloads in three months — 30,000 developers who ran pip install huggingface-cli because their AI said to. A separate researcher found react-codeshift already referenced across 237 GitHub repositories before anyone claimed it. He got there first. Next time, an attacker will.

When an agent auto-installs dependencies mid-session, the whole chain runs with no human in the loop. The AI hallucinates a name, calls the package manager, and executes whatever the attacker uploaded. No social engineering. The model lies, and the lie was pre-registered.

Terminal showing pip install huggingface-cli executing successfully, package installs cleanly, then background process spawns making outbound connection to C2 — package source shows attacker-controlled repo

0xFF: The Math Doesn’t Lie

All three of these attacks share the same root cause: AI systems extend trust they haven’t earned. APIs trust high-volume requests. Agents trust the content they read. Package managers trust whatever the model asks for. None of these are theoretical. All three are running in production right now. The question isn’t whether your stack got hit. It’s whether your logging is good enough to find out.

Wondering how deep the rabbit hole goes?

Paid is where we stop pulling punches. Raw intel nuked by advertisers, complete archive, private Q&As, and early drops.

Discussion about this episode

User's avatar

Ready for more?