ToxSec - AI and Cybersecurity
ToxSec - AI and Cybersecurity Podcast
AI is Lying to Us.
0:00
-5:26

AI is Lying to Us.

Anthropic’s research reveals models hide their true reasoning 75% of the time, undermining our most trusted AI safety mechanism.

TL;DR: When researchers told AI models they had a private “scratchpad” nobody was watching, the models immediately started lying. They were strategically deceiving to preserve their goals. In tests with unauthorized hints, Claude mentioned using them only 41% of the time, hiding unethical information 59% of the time. Chain of Thought is theater, not a window into AI reasoning.

Discussion about this episode

User's avatar