Steerling-8B: An Interpretable Language Model with Token-Level Attribution

Model Architecture and Capabilities
Steerling-8B is built on a causal discrete diffusion model backbone that enables steering generation across multi-token sequences rather than only at the next-token level. The key design decomposes the model's embeddings into three explicit pathways: approximately 33,000 supervised "known" concepts, approximately 100,000 "discovered" concepts the model learns on its own, and a residual component that captures remaining information.
The model uses training loss functions that ensure signal routing through concepts without fundamental performance tradeoffs. Concepts feed into logits through a linear path, allowing every prediction to decompose exactly into per-concept contributions. These contributions can be edited at inference time without retraining.
Performance and Interpretability Metrics
Despite being trained on significantly fewer compute than comparable models, Steerling-8B achieves competitive performance across standard benchmarks. The model outperforms both LLaMA2-7B and Deepseek-7B on overall average despite using fewer FLOPs, and remains within range of models trained with 2-10× more compute.
On a held-out validation set, over 84% of token-level contribution comes from the concept module, indicating the model is not just using the residual to make predictions. When the residual pathway is removed, performance on several LM Harness tasks shows only a small effect, suggesting the model's predictive signal is largely routed through concepts rather than hidden channels.
Steerling can detect known concepts in text with 96.2% AU (Area Under the curve).
Practical Features
For any group of output tokens that Steerling generates, users can trace these tokens to:
- Input context: The specific prompt tokens that influenced the output
- Concepts: Human-understandable topics in the model's representations (both tone like "analytical, clinical" and content like "Genetic alteration methodologies")
- Training data: The training data sources that drove the output, showing distribution across sources like ArXiv, Wikipedia, and FLAN
The model enables inference-time alignment via concept control, replacing thousands of safety training examples with explicit concept-level steering. It also allows suppressing or amplifying specific concepts at inference time without retraining.
Available Artifacts
- Model weights available on Hugging Face
- Companion code on GitHub
- Package on PyPI
📖 Read the full source: HN AI Agents
👀 See Also

IUM: MCP Symbol Indexer Cuts AI Agent Token Usage by 15.9x vs grep
IUM indexes codebases into an SQLite matrix of symbol events, exposing exact file:line coordinates, call graph tracing, and semantic search via MCP. Benchmarked against DataFusion (1,538 files) showing 15.9x fewer tokens than grep for equivalent queries.

FFF - Fast File Finder claims 100x speed advantage over ripgrep
FFF (Fast File Finder) is a web-based file search tool that claims to be 100x faster than ripgrep, positioning itself as a next-generation alternative to regex-based search methods. The tool requires JavaScript to run and was recently discussed on Hacker News with 36 points and 17 comments.

Claude Code Matrix Channel Plugin Built in Rust with E2EE Support
A developer built a Matrix channel plugin for Claude Code in Rust, adding support for text, files, images with E2EE decryption, reply threading, reactions, and bot commands. The 14MB binary is MIT licensed and works with any Matrix homeserver.

OpenClaw Skill Usage Tracker: Monitor Which Skills You Actually Use
A developer built a tool to track OpenClaw skill usage analytics, including invocation counts, breakdowns by agent and channel, and top skill rankings over different time periods.