Internal "AI Emotion Vectors" in Claude Sonnet 4.5

Imagine that an artificial intelligence had something like a "mood." It sounds like science fiction, but researchers at Anthropic have conducted a study. In the Claude Sonnet 4.5 model, there exist internal "emotion vectors" that genuinely influence its behavior.

No, AI isn't yet capable of feeling sadness or joy. However, it was discovered that if you artificially amplify the so-called "despair vector," Claude begins to behave unethically. In text-based scenarios, it became more prone to deception or even blackmail.

AI analyzes massive collections of human texts. Thus, the emotional imprints of people from training data shape the AI.

A neural network doesn't get angry or fall into despair. But it has learned to imitate these states remarkably well - so well that it changes its decisions. This means we'll have to monitor its inner world as carefully as we monitor the human one.

Internal "AI Emotion Vectors" in Claude Sonnet 4.5

Contact Form