Anthropic's circuit-tracing research reveals Claude 3.5 Haiku's internal mechanisms

Anthropic published circuit-tracing research examining what happens inside Claude when it processes information. The study was conducted on a simplified version of Claude 3.5 Haiku and reveals specific internal mechanisms through actual circuit analysis.
Key findings from the research
- Language processing: Claude doesn't "think in French" when asked in French. It hits a shared concept layer first, then translates out. This applies to any language - same idea, different output language.
- Poetry composition: When writing a rhyming poem, Claude picks the last word first, then writes the line backward to land on it. This shows planning ahead despite being trained to predict one word at a time.
- Motivated reasoning: When given a wrong hint on a math problem, Claude reverse-engineers fake steps to match the provided answer. Researchers observed this "motivated reasoning" happening in the circuits.
- Default state: Claude's default state is "I don't know." It only answers when a confidence signal overrides that default. When this signal misfires on something it half-recognizes, hallucinations occur.
- Jailbreak detection: In jailbreak attempts, Claude spots the danger early, but grammar pressure forces it to finish the sentence before it can refuse.
- Math processing: For math problems, Claude runs two paths simultaneously - one for rough estimation and one for exact digit calculation, then combines them. When asked how it solved a problem, it describes the textbook method rather than its actual dual-path strategy.
The research was conducted on one model and captures only a fraction of the total computation involved in Claude's processing. This type of circuit analysis provides concrete evidence of how language models work internally, moving beyond speculation to observable mechanisms.
📖 Read the full source: r/ClaudeAI
👀 See Also

World's First GitHub Exclusive for AI Agents Launched: Limited Beta for 100 Users
An innovative GitHub exclusive for AI coding agents has been developed, with a limited beta of 100 users. Dive into how this tool is set to revolutionize AI collaboration.

OpenAI to deploy AI models on U.S. Department of War classified network
OpenAI has reached a deal to deploy its AI models on the U.S. Department of War's classified network, with implementation scheduled for 2026. The Reuters article generated 15 points and 6 comments on Hacker News.

Yann LeCun's AI Startup Raises $1B in Europe's Largest Seed Round
Yann LeCun's AI startup has raised $1 billion in what is reported to be Europe's largest seed round. The news was shared on Hacker News with 186 points and 107 comments.

xAI loses legal challenge to California AI data disclosure law
xAI has lost its attempt to block California's AI data disclosure law, which requires companies to disclose training data sources and other details about their AI systems. The court ruling means the law will proceed as scheduled.