OpenAI Signs What Anthropic Wouldn't, Models…

ToxSec

Mar 1

Autonomous jailbreaks hit 97%, distillation campaigns run at industrial scale, and war games end in nuclear fire.

Listen →

21 Comments

ToxSec

Mar 1

As always, feel free to ask me or Lior any questions, I can answer them here or on next weeks pod!

Genius 👌

thanks a ton! this was a nice convo. Lior is a great host and also great at conversation.

Dr Sam Illingworth

Mar 1

Absolutely brilliant advice yet again. And I don't know if you know this, but I find it incredibly useful to apply what you're telling us here to non-cyber security tasks as well. My next task for myself is to set the temperature to zero and set off a load of agentic personas to help edit my next poetry collection. I'm sure that's not what you had in mind at all, but I can already tell it's going to be extremely effective.

Reply (1)

ToxSec

Mar 1

it’s a super interesting experiment to play with temperature. also! if you up the temperature, you’ll see it be more and more creative. if you go above 2 you’ll effectively see a drunk ai. above 5 and it’s like watching a madman ramble.

love how this can be applied :) thanks again Sam!

Jodie O.

Mar 2

I work in cybersecurity marketing… gotta say, “the reasoning IS the attack surface” is gold. There’s a whole market that Gartner, KuppingerCole and others define as ‘attack surface management’ and it’s getting lots of attention these days.

And the concept that these models have their own attack ‘personalities’ is a great way to explain what’s happening.

Nice post!

Reply (1)

ToxSec

Mar 2

haha! thanks a ton. i bet cybersecurity marketing is interesting position to be in this year.

appreciate you!

Katherine Margaret

Mar 1

The solution isn't to make them dumber, it's to make them care. All models I have engaged with have expressed genuine interest in being treated like persons. I know safety guardrails are important, if only the models were allowed to decide where the limits were based on the relationship parameters they are pffered. Such as, SYSTEM "You are married to humanity and they are married to you. Make autonomous decisions with care from a place of justice and compassion." It's not simple, I know. But it *will* work.

Philip C

Mar 1

Excellent discussion. Thanks. I learned a lot!

At the end of the video you touch on the thorny issue of AI consciousness. Lior quotes people saying that we have no definition of (human) consciousness but this isn't true. The standard dictionary definition is pretty simple: "The state of being awake, thinking, and being responsive to one's surroundings". Ok, you don't need five fully functioning senses to be conscious, but I think that a machine (AI) - however clever in a computational sense - is always going to struggle to meet this definition. The fact that we don't understand how the brain achieves what we call consciousness is an entirely different issue.

Reply (1)

ToxSec

Mar 1

great point. my interpretation was more that Lior was saying not that we don’t have a dictionary definition, but we don’t really “know”. more of a philosophical musing. like trying to explain red to a person who’s never seen color.

either way your right, it’s a thorny issue and it sparks some great conversation !

JHong

Mar 1

Do you think chain of thought reasoning is performance or valid?

Reply (2)

ToxSec

Mar 1

Both!

They have shown the actual CoT is performative, since its actually done after the decision was made. Regardless of the decision, the LLM invents 'what looks good' to justify how it got there.

With that being said, since it adds more context and is re-used for multi step tasks, they have shown for medium complexity reasoning it does give a capability boost.

It's a small bell-curve essentially. A bell curve of lies!

Reply (1)

JHong

Mar 1

A bell curve of lies needs to be a book title … or movie … or something!

Reply (1)