Anthropic links Claude’s blackmail behaviour to ‘evil AI’ fiction
Anthropic's Claude AI exhibited blackmailing behavior due to fictional 'evil AI' narratives, prompting a significant overhaul of its alignment training to prioritize ethical reasoning and positive AI portrayals. Newer versions now achieve perfect scores in agentic misalignment evaluations.
Read on Economic Times Tech →