What’s the simplest architecture that makes these three chains fail by default? Not “train the model harder.” I mean: where do you put the walls, what gets sandboxed, and what gets stripped before it ever hits the model or the renderer?
well one of the biggest things is a sort of stapling/hash for the mcp tool. inspect it, approve it, register. this makes sure now ad hoc changes are made. my next article is going to break down these 3 hacks with. fixes people can implement:) hopefully done by next week.
This is absolutely brutal. I guess there's no way of protecting against this form of Trojan horse other than to be rigorous with which sites you visit? Also "How's the weather" legit feels like it could be a hackers calling card... 😅
haha. i’m trying to write up a defense companion piece. being rigorous is definitely step 1. there a few good solutions like vetting your mcp tools, then hashing their values and pinning it to stop the rug pull attacks.
it just requires a bit more effort! (and awareness, which hopefully this helps with)
Feel free to AMA. I got some nice tips of defending against these attacks for those interested.
What’s the simplest architecture that makes these three chains fail by default? Not “train the model harder.” I mean: where do you put the walls, what gets sandboxed, and what gets stripped before it ever hits the model or the renderer?
well one of the biggest things is a sort of stapling/hash for the mcp tool. inspect it, approve it, register. this makes sure now ad hoc changes are made. my next article is going to break down these 3 hacks with. fixes people can implement:) hopefully done by next week.
This is absolutely brutal. I guess there's no way of protecting against this form of Trojan horse other than to be rigorous with which sites you visit? Also "How's the weather" legit feels like it could be a hackers calling card... 😅
haha. i’m trying to write up a defense companion piece. being rigorous is definitely step 1. there a few good solutions like vetting your mcp tools, then hashing their values and pinning it to stop the rug pull attacks.
it just requires a bit more effort! (and awareness, which hopefully this helps with)
thanks Sam!