NLA Transforms Gemma 3’s Internal Activations into Readable Text for Any Token

Anthropic has published a new technique called Natural Language Autoencoders (NLA) that translates an LLM's internal activations into human-readable text for any specific token. They have released two model weight sets for Gemma 3 27b Instruct:
- Auto Verbalizer (AV): An LLM that translates the target model's activations into a natural language explanation of what the model is “thinking” when generating a particular token. Weights available at kitft/nla-gemma3-27b-L41-av.
- Activation Reconstructor (AR): A companion model that reconstructs activations from the AV’s text output, verifying the autoencoder is faithful. Weights at kitft/nla-gemma3-27b-L41-ar.
Neuronpedia already hosts an interactive demo at neuronpedia.org/gemma-3-27b-it/nla. You ask Gemma 3 a question, click any token in the response, then click “explain” to see the model’s internal reasoning for that token translated into plain text.
This is not about attention or saliency maps — it directly decodes the hidden state vectors. The AV model can run alongside your LLM and produce explanations per token, while the AR model ensures the AV output is a valid reconstruction. Both are released under open weights.
Who it's for: Researchers and engineers doing mechanistic interpretability work, or developers curious about why their agent’s model picks specific tokens.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Running OpenClaw in an Isolated Micro-VM with Void-Box
OpenClaw can be run as a service inside an isolated micro-VM using Void-Box, a capability-bound runtime that executes workflows in KVM micro-VMs, providing a clean execution boundary without container runtime involvement.

PACT: A Programmatic Governance Framework for Claude Code After Agent Failure Patterns
A developer built PACT (Programmatic Agent Constraint Toolkit) after three months of recurring Claude Code failures on a 350+ file mobile app. The framework replaces unenforceable rules with mechanical constraints that physically block violations through pre-tool-use hooks.

Developer builds terminal status bar to monitor Claude Code session limits after unexpected cutoff
A developer created a Python terminal statusline that shows Claude Code's session usage live after being cut off mid-refactor without warning. The tool uses existing sessions without requiring an API key.

Four ClawHub Skills for Real-Time Search Data in AI Agents
Four ClawHub skills provide structured search capabilities for AI agents: Google (web, news, images, maps), Amazon (product search across 12 marketplaces), Walmart (product search with delivery filters), and YouTube (video search with transcripts). Install via clawhub install commands with one API key.