Granite 4.1: IBM's 8B Dense Model Matches 32B MoE in Benchmarks

IBM released Granite 4.1, an open-source language model family (Apache 2.0) with 3B, 8B, and 30B sizes. All use a dense decoder-only transformer — no MoE, no long reasoning chains. The 8B model stands out: it matches or beats the previous Granite 4.0-H-Small (32B MoE, 9B active) across several benchmarks.
Key benchmark results
- ArenaHard (real-world prompt quality): 8B scores 69.0, 32B MoE scores lower.
- BFCL V3 (tool calling): 8B scores 68.3, 32B MoE scores 64.7.
- GSM8K (math reasoning): 8B hits 92.5.
- AlpacaEval, MMLU-Pro, BBH, EvalPlus, MBPP: 8B outperforms the larger model consistently.
Training pipeline
Granite 4.1 was trained on 15 trillion tokens across five phases with changing data mixtures:
- Phase 1: 59% CommonCrawl, 20% code, 7% math.
- Phase 2: math jumps to 35%, code to 30%.
- Phases 3-4: blend chain-of-thought reasoning, instruction data, and high-quality web content.
- Phase 5: extend context window to 512K tokens (8B and 30B).
The key insight: data quality over parameter scaling. IBM's data filtering pipeline rejects hallucinated or instruction-ignoring examples during fine-tuning to avoid training on bad signals.
Why this matters for AI agents
Dense models offer predictable latency and cost — no routing overhead. For developers using AI coding agents, Granite 4.1's 8B model provides strong tool-use and math reasoning at a fraction of the compute cost of MoE models.
📖 Read the full source: HN AI Agents
👀 See Also

Apple Using Google Gemini Access for On-Device AI Model Distillation
Apple has full access to Google's Gemini model for distillation, creating smaller on-device AI models for Siri and other features in iOS 27 without internet connectivity.

Synthetic announces major pricing restructuring with significant rate limit changes
Synthetic is replacing its Standard and Pro tiers with subscription packs at $30/month, offering 135 messages per 5 hours per pack. Existing Pro users will see their 1,250 messages per 5 hours reduced to 335 messages for the same $60/month price.

Claude Code 2.1.136: Action Safety, Hard Deny Rules, and Security Monitor
Claude Code CC 2.1.136 adds action safety and truthful reporting requirements, introduces hard_deny as a fourth custom-rule category, and splits security blocking into unconditional hard blocks and user-authorizable soft blocks.

OpenClaw 5.2 Migration Breaks Cron Jobs and MCP Plugin Calls
Upgrading from OpenClaw 4.23 to 5.2 causes MCP tool plugins to be visible but not callable by the agent, and cron job registration via CLI fails with device pairing errors.