Mercury 2: Diffusion-Based Model for Real-Time AI Coding

✍️ OpenClawRadar📅 Published: February 25, 2026🔗 Source

What Mercury 2 Is

Mercury 2 is a diffusion-based AI model that generates tokens in parallel rather than sequentially, using a process that refines output over multiple steps. This approach differs from traditional autoregressive models that decode tokens one by one.

Technical Specifications

Generation method: Diffusion-based generation instead of sequential token-by-token decoding
Processing approach: Generates tokens in parallel and refines them over a few steps
Performance: Claims 1,009 tokens/sec on NVIDIA Blackwell GPUs
Pricing: $0.25 per 1 million input tokens, $0.75 per 1 million output tokens
Context window: 128K tokens
Reasoning capability: Tunable reasoning
Tool integration: Native tool use with schema-aligned JSON output
API compatibility: OpenAI API compatible

Target Use Cases

The developers are positioning Mercury 2 for:

Coding assistants
Agentic loops (multi-step inference chains)
Real-time voice systems
RAG/search pipelines with multi-hop retrieval

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

Claude MAX Plan Now Includes 1M Token Context Window at No Extra Cost

The Claude MAX plan has been automatically upgraded to include a 1 million token context window without additional API-based usage charges, with users reporting significantly reduced token usage and elimination of context window management overhead.

Mar 15, 2026, 08:45 PM UTC

OpenClawRadar

News

Claude Code v2.1.129: Autonomous Loop Persistence Guidance and Background Agent State Classifier

Claude Code v2.1.129 adds CLAUDE_CODE_LOOP_PERSISTENT system prompt for autonomous work loops, removes verification specialist subagent, and expands background agent state classifier with detailed boundaries.

May 6, 2026, 04:22 PM UTC

OpenClawRadar

News

Setting Up Subagents in OpenClaw: Key Considerations

Users experimenting with OpenClaw are facing issues with setting up subagents, particularly when modifying JSON files.

Feb 12, 2026, 11:45 PM UTC

OpenClawRadar

News

Qwen KV Cache Quantization Deep Dive: PPL, KL Divergence, and Asymmetric K/V Results

Second round of benchmarks on Qwen 3.6-35B-A3B with KV cache quantization: perplexity, KL divergence, asymmetric K/V combos, and 64K context depth on Apple M5 Max.

Apr 29, 2026, 10:18 PM UTC

OpenClawRadar