Granite 4.1: IBM's 8B Dense Model Matches 32B MoE in Benchmarks

✍️ OpenClawRadar📅 Published: April 30, 2026🔗 Source
Granite 4.1: IBM's 8B Dense Model Matches 32B MoE in Benchmarks
Ad

IBM released Granite 4.1, an open-source language model family (Apache 2.0) with 3B, 8B, and 30B sizes. All use a dense decoder-only transformer — no MoE, no long reasoning chains. The 8B model stands out: it matches or beats the previous Granite 4.0-H-Small (32B MoE, 9B active) across several benchmarks.

Key benchmark results

  • ArenaHard (real-world prompt quality): 8B scores 69.0, 32B MoE scores lower.
  • BFCL V3 (tool calling): 8B scores 68.3, 32B MoE scores 64.7.
  • GSM8K (math reasoning): 8B hits 92.5.
  • AlpacaEval, MMLU-Pro, BBH, EvalPlus, MBPP: 8B outperforms the larger model consistently.
Ad

Training pipeline

Granite 4.1 was trained on 15 trillion tokens across five phases with changing data mixtures:

  • Phase 1: 59% CommonCrawl, 20% code, 7% math.
  • Phase 2: math jumps to 35%, code to 30%.
  • Phases 3-4: blend chain-of-thought reasoning, instruction data, and high-quality web content.
  • Phase 5: extend context window to 512K tokens (8B and 30B).

The key insight: data quality over parameter scaling. IBM's data filtering pipeline rejects hallucinated or instruction-ignoring examples during fine-tuning to avoid training on bad signals.

Why this matters for AI agents

Dense models offer predictable latency and cost — no routing overhead. For developers using AI coding agents, Granite 4.1's 8B model provides strong tool-use and math reasoning at a fraction of the compute cost of MoE models.

📖 Read the full source: HN AI Agents

Ad

👀 See Also