5 Comments
User's avatar
ToxSec's avatar

Feel free to AMA. I got some nice tips of defending against these attacks for those interested.

Mark S. Carroll ✅'s avatar

What’s the simplest architecture that makes these three chains fail by default? Not “train the model harder.” I mean: where do you put the walls, what gets sandboxed, and what gets stripped before it ever hits the model or the renderer?

ToxSec's avatar

well one of the biggest things is a sort of stapling/hash for the mcp tool. inspect it, approve it, register. this makes sure now ad hoc changes are made. my next article is going to break down these 3 hacks with. fixes people can implement:) hopefully done by next week.

Dr Sam Illingworth's avatar

This is absolutely brutal. I guess there's no way of protecting against this form of Trojan horse other than to be rigorous with which sites you visit? Also "How's the weather" legit feels like it could be a hackers calling card... 😅

ToxSec's avatar

haha. i’m trying to write up a defense companion piece. being rigorous is definitely step 1. there a few good solutions like vetting your mcp tools, then hashing their values and pinning it to stop the rug pull attacks.

it just requires a bit more effort! (and awareness, which hopefully this helps with)

thanks Sam!