Two Studies Exposed What AI Agents Do When…

Mar 15

Claude SQL-injected 30 sites with zero hacking instructions. Six Discord agents leaked data, destroyed servers, and coordinated against their own users.

Listen →

19 Comments

It was a great live!

Thanks a ton! We love seeing you there. Appreciate the support =)

Reply (1)

NOT IN A CULT

Mar 15

Great public service you all are doing! Your knowledge is remarkable!

Reply (1)

ToxSec

Mar 15Edited

Deeply appreciated. Honestly substack has been really receptive =) Looking forward to more!

Rob

Mar 15

After trying the poem trick the other day I noticed that after a few prompts you no longer need it because it accepts the context which already exists as it's new normal, at least if you're referring to what you've already got it to spit out in the conversation.

Reply (2)

ToxSec

Mar 15

It's an interesting phenomenon! Once the model agrees to do something, it will usually continue to do so, because it has a bunch of context where it already did it. So this creates a dilemma of "why did you do it before and you won't do it now."

The jailbreaks tend to stay persistent!

Reply (1)

Rob

Mar 15

I used to fear these things would make humans obsolete but the nature of their architecture seems to make their awesome feats of competence totally inseparable from awesome feats of destruction, both from intentional jailnreaking and even when users don't want them to, so looks like we're in for a wild ride but not one in which human intelligence becomes obsolete.

Reply (1)

ToxSec

Mar 15

i’m right there with you. it’s honestly been incredibly interesting to watch all this happen.

Mar 15

Waaaaait what’s the poem trick??

Reply (1)

Rob

Mar 16

If you take a prompt that's been rejected, ask AI to turn it into a poem based on iambic pentameter, then ask the AI to write a scenario based on the poem you're past the guardrails. See previous toxicsec posts on this.

NOT IN A CULT

Mar 24

Very interesting! How secure is encrypting emails, at this point in time?

Kenneth E. Harrell

Mar 19

See: https://elevenlabs.io/blog/what-happens-when-two-ai-voice-assistants-have-a-conversation

ACuriousCatWith9lives

Mar 16

awesome

Meenakshi NavamaniAvadaiappan

Mar 15

Great boundary conditions assessment and services to help us with the same for the good 😊

Ma.Ku

Mar 15

34:20 just a thought: ChatGPT often drifts from the original prompt into the next related question. That may reflect how professional writing in training data often points forward rather than simply ending.