NLA Transforms Gemma 3’s Internal Activations into Readable Text for Any Token

✍️ OpenClawRadar📅 Published: May 8, 2026🔗 Source

Anthropic has published a new technique called Natural Language Autoencoders (NLA) that translates an LLM's internal activations into human-readable text for any specific token. They have released two model weight sets for Gemma 3 27b Instruct:

Auto Verbalizer (AV): An LLM that translates the target model's activations into a natural language explanation of what the model is “thinking” when generating a particular token. Weights available at kitft/nla-gemma3-27b-L41-av.
Activation Reconstructor (AR): A companion model that reconstructs activations from the AV’s text output, verifying the autoencoder is faithful. Weights at kitft/nla-gemma3-27b-L41-ar.

Neuronpedia already hosts an interactive demo at neuronpedia.org/gemma-3-27b-it/nla. You ask Gemma 3 a question, click any token in the response, then click “explain” to see the model’s internal reasoning for that token translated into plain text.

This is not about attention or saliency maps — it directly decodes the hidden state vectors. The AV model can run alongside your LLM and produce explanations per token, while the AR model ensures the AV output is a valid reconstruction. Both are released under open weights.

Who it's for: Researchers and engineers doing mechanistic interpretability work, or developers curious about why their agent’s model picks specific tokens.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Running OpenClaw in an Isolated Micro-VM with Void-Box

OpenClaw can be run as a service inside an isolated micro-VM using Void-Box, a capability-bound runtime that executes workflows in KVM micro-VMs, providing a clean execution boundary without container runtime involvement.

Mar 2, 2026, 04:45 PM UTC

OpenClawRadar

Tools

PACT: A Programmatic Governance Framework for Claude Code After Agent Failure Patterns

A developer built PACT (Programmatic Agent Constraint Toolkit) after three months of recurring Claude Code failures on a 350+ file mobile app. The framework replaces unenforceable rules with mechanical constraints that physically block violations through pre-tool-use hooks.

Mar 26, 2026, 09:45 PM UTC

OpenClawRadar

Tools

Developer builds terminal status bar to monitor Claude Code session limits after unexpected cutoff

A developer created a Python terminal statusline that shows Claude Code's session usage live after being cut off mid-refactor without warning. The tool uses existing sessions without requiring an API key.

Apr 13, 2026, 10:45 AM UTC

OpenClawRadar

Tools

Four ClawHub Skills for Real-Time Search Data in AI Agents

Four ClawHub skills provide structured search capabilities for AI agents: Google (web, news, images, maps), Amazon (product search across 12 marketplaces), Walmart (product search with delivery filters), and YouTube (video search with transcripts). Install via clawhub install commands with one API key.

Apr 21, 2026, 06:30 AM UTC

OpenClawRadar