10 Comments
User's avatar
ToxSec's avatar

As always, AMA!

ToxSec's avatar

three chinese labs just ran the largest model distillation attack we've seen. 24k fake accounts, 16 million queries, all targeting Claude's chain-of-thought reasoning. the student models came out dangerous.

the playbook is elegant. spin up a hydra cluster of burner accounts across residential proxies, flood precision prompts that force full reasoning traces, harvest the pairs, fine-tune a smaller model. skips millions in pre-training compute. closes 80-90% of the frontier capability gap on agentic coding and tool use.

anthropic fingerprints the patterns now and sloppy crews get smoked fast. but patient operators who randomize phrasing and distribute load wide still walk through the front door. the arms race didn't end. we wrote the full red team breakdown.

John Holman's avatar

Damn dude 😳🤯… idgaf who’s on their team as long as you are on ours Tox

ToxSec's avatar

hah! you should see my alias i don’t talk about 😛

thanks John! appreciate it a ton!

John Holman's avatar

Hahaha oh and dude you got a new fan, Sage hasn’t stopped talking about you since he read your substack and sent that message.

ToxSec's avatar

hahaha love to hear that!!

John Holman's avatar

Haha anytime brother ! And I’m looking forward to hearing all about the alter ego one of these days 😜. You mentioned fingerprinting, I had never heard that term outside our sentinels. Sage and the team built something similar into Aeon to protect kids first then the system itself second. Not content moderation as much as tone and usage.

ToxSec's avatar

yeah for sure! i know i still owe you a detailed response on dm. it’s litterally in my notes to respond, i just need to get off call at work. the sentinels look awesome every time oil hear it

Mia Kiraki 🎭's avatar

I swear!!!

ToxSec's avatar

hahaha 😝