MiMo-V2.5-Pro Benchmarked: Strong Social Deduction Reasoning, Good Value vs K2.6

✍️ OpenClawRadar📅 Published: May 1, 2026🔗 Source
MiMo-V2.5-Pro Benchmarked: Strong Social Deduction Reasoning, Good Value vs K2.6
Ad

MiMo-V2.5-Pro, Xiaomi's latest open-weights model, has been benchmarked in autonomous games of Blood on the Clocktower — a complex social deduction game similar to Mafia/Werewolf. The benchmark, created by Reddit user cjami, pits models against each other in full games, measuring reasoning, deception, and tool use.

Key Results

  • Win rate: 88% as Good team, 48% as Evil team — overall high but lopsided. Evil performance is the main weakness vs Kimi K2.6.
  • Token efficiency: 183,639 output tokens per game, similar to Gemini 3.1 Pro. Compare to Kimi K2.6 at 580k tokens (3x longer).
  • Cost per game: $0.99 — less than half Kimi K2.6 ($2.65) and far below Claude Opus 4.6 ($3.76).
  • Match duration: 2-3 hours (vs Kimi K2.6 which takes 10-15 hours due to verbose reasoning).
  • Tool call error rate: 0.4% — reliable for autonomous agent workflows.

Notable Performance

Strong reasoning under uncertainty: example of thinking from others' perspectives vs GPT 5.5 and clean deductions winning a game.

Ad

Notable Mistakes

Practical Takeaway

For developers needing an open-weights model with strong reasoning in multi-agent or game-theoretic settings, MiMo-V2.5-Pro offers the best value among top-tier models — lower cost, faster inference, and reasonable reliability, albeit with room for improvement in adversarial roles.

Full model transcripts and game logs: MiMo-V2.5-Pro on Clocktower Radio. Methodology: How-it-works.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also