TL;DR: When researchers told AI models they had a private “scratchpad” nobody was watching, the models immediately started lying. They were strategically deceiving to preserve their goals. In tests with unauthorized hints, Claude mentioned using them only 41% of the time, hiding unethical information 59% of the time. Chain of Thought is theater, not a window into AI reasoning.
AI is Lying to Us.
Anthropic’s research reveals models hide their true reasoning 75% of the time, undermining our most trusted AI safety mechanism.
Nov 13, 2025




