Two new models appear on OpenRouter, possibly DeepSeek V4 variants

Two new models have appeared on OpenRouter that may be trial versions of DeepSeek V4. The models are named healer-alpha and hunter-alpha, with descriptions suggesting one is a Lite version and the other appears to be a full-featured model.
Model Specifications
The full version reportedly has 1TB of parameters and 1M of context, which matches leaked information about DeepSeek V4. The Lite version is described as a lighter variant of the same model family.
Initial Testing Results
A user conducted roleplay tests to evaluate filtering levels and performance:
- Both models performed impressively in roleplay scenarios
- Neither model declined any messages during testing
- The Lite version is noticeably faster than the full version
- The full version is slower but still responsive
- Both models generate the same amount of tokens in less than half the time compared to GLM 5.0
- The Lite version is slightly weaker in performance but not significantly
- Both models maintain character consistency and handle "spicy" content well
The models are currently in alpha phase, which may explain the lack of message filtering observed during testing. The community is discussing whether these are indeed DeepSeek V4 variants and sharing additional testing results.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code v2.1.158: Auto Mode Now on Bedrock, Vertex, Foundry for Opus 4.7/4.8
Claude Code v2.1.158 enables auto mode on Bedrock, Vertex, and Foundry for Opus 4.7 and 4.8. Opt in with CLAUDE_CODE_ENABLE_AUTO_MODE=1.

AI Agents Need Rollback Primitives, Not Just Autonomy
A developer argues agent frameworks must adopt database concepts like ACID, sagas, and compensating actions to handle partial failures, rather than relying on LLMs to "figure it out."

Nemotron 3 4B Underperforms Qwen 3.5 4B in Demanding Benchmarks
A Reddit user tested Nemotron 3 4B Q8 against Qwen 3.5 4B Q8 on complex mathematical and programming tasks, finding Nemotron failed to produce correct reasoning and structured output while Qwen passed all tests.

Qwen3.6 27B FP8 Runs 200k Tokens BF16 KV Cache at 80 TPS on RTX 5000 PRO 48GB
A Reddit user shares a vLLM setup for Qwen3.6 27B FP8 with BF16 KV cache at 200k tokens, achieving 60-90 TPS on a single RTX 5000 PRO 48GB. Full environment variables, config, and benchmark results are provided.