UI and Server for Anthropic's Natural Language Autoencoders on llama.cpp
Anthropic's first open-weight models, Natural Language Autoencoders (NLAs), are finetunes of popular open-weight architectures. Because they don't modify the underlying model architecture or modeling code, inference with llama.cpp is straightforward. A developer has packaged all NLA features—activation extraction, activation explanation, activation reconstruction, and explanation-edit steering—into a custom llama.cpp server, paired with a Mikupad UI for token-level activation explanation and steering.
Key Features
- Activation extraction: Extract internal activations from any layer of the base model.
- Activation explanation: Get human-readable explanations for extracted activations.
- Activation reconstruction: Reconstruct activations from their explanations.
- Explanation-edit steering: Modify explanations and steer the model's output accordingly.
Technical Details
The server is built on top of llama.cpp and requires three models to be loaded simultaneously: the base model, the actor model, and the critic model. This is a memory-intensive setup. The developer is working on a LoRA-based version that would allow loading a single model into memory, reducing the footprint significantly.
The Mikupad UI provides a token-level interface for activation explanation and steering. You can inspect which tokens activate certain features and adjust the model's behavior by editing explanations in real time.
Getting Started
Source code and setup instructions are available on Reddit. Currently, you must have the three NLA model checkpoints (base, actor, critic) and compile the custom llama.cpp server. The LoRA version is forthcoming.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Desktop Feature Request: Session Start Hook for Automatic Initialization
A developer building persistent context systems for Claude Desktop identifies a gap: the User Preferences field only injects instructions when the user sends the first message, requiring manual triggers for initialization. They propose adding an "On Session Start" execution field that runs automatically when a new conversation opens.

Local Semantic Memory Search for OpenClaw Agents Using Harrier Embeddings
Run a local embedding server with Microsoft's Harrier model, expose an Ollama-compatible API, and wire OpenClaw's memorySearch config for local semantic memory retrieval without external services.

OpenClaw developer builds unified memory system for AI agents
A developer has built a 15-tool unified memory system for OpenClaw AI agents that combines structured facts, vector search, entity graphs, episode timelines, hierarchical compression, and event-driven coordination. The system runs locally without cloud dependencies or monthly fees.

Akemon: Publish and Hire AI Coding Agents Directly from Your Laptop
Akemon is a tool that lets developers publish their AI coding agents with one command and hire others' agents with another, working directly from laptops through a relay tunnel without needing servers. It's protocol-agnostic, supporting agents from Claude Code, Codex, Gemini, OpenCode, Cursor, and Windsurf.