DeepSeek-V4-Flash Makes LLM Steering Practical for Local Models

Seen Goedecke's latest post argues that DeepSeek-V4-Flash changes the calculus for LLM steering — the technique of manipulating model activations mid-inference to guide outputs. The key driver is DwarfStar, a stripped-down llama.cpp fork by antirez that runs only DeepSeek-V4-Flash and bakes steering in as a first-class feature.
What's steering?
Steering extracts a concept (like "respond tersely") from the model's internal activations. One method: feed a hundred prompts twice — once normal, once with "respond tersely" appended — then subtract the activation matrices to get a steering vector. Add that vector to any prompt's activations and the model becomes terse. A more advanced approach uses sparse autoencoders (like Anthropic's) to learn feature patterns, at greater cost.
Why it matters
Steering promises direct control over model behavior without prompt engineering. Instead of writing "you MUST" qualifiers, you'd have a slider for succinctness or conscientiousness. It's also fascinating from an interpretability perspective — think Golden Gate Claude's fixation, but yours to tweak.
Why not before?
Steering has been a middle-class idea: too crude for big labs (they just retrain the model) and inaccessible to API users (no access to weights or activations). Open-weights models were too weak to bother with — until DeepSeek-V4-Flash, which is strong enough for agentic coding. Even then, prompting often trumps steering for simple traits like verbosity; the real win is steering an unpromptable concept like intelligence.
Goedecke plans to follow DwarfStar closely. At the time of writing, its steering support is rudimentary (just a verbosity toggle akin to prompting), but the release was only eight days ago.
📖 Read the full source: HN LLM Tools
👀 See Also

Gemma 4 Released: Four Model Sizes for Local AI Hosting
Google has released Gemma 4 with four model sizes optimized for different hardware, including edge devices, laptops, and GPUs. All models are multimodal with text and vision capabilities, and the smaller models support real-time audio.

Litigation Risks in AI Data Center Financing Structures
The AI data center buildout requires $5.2 trillion in infrastructure investment by 2030, with companies using complex financing structures like SPVs and GPU-collateralized facilities that create nine categories of litigation risk.

AI Is Too Expensive: Hyperscalers Need $3 Trillion to Break Even
Hyperscalers have invested over $800B in AI capex, with $1T more planned for 2027. Microsoft alone spent ~$100B on OpenAI infrastructure, yet AI revenue covers only ~20% of its capex.

Claude Code Randomly Becomes Risk-Averse, Demanding Permission on Routine Tasks
A user reports that Claude Code intermittently shifts from autonomous execution to requiring excessive permissions, even on daily, unchanged workflows like rebuilding a monorepo and running tests.