Kimi K2.6 beats Claude, GPT-5.5 and Gemini in coding challenge with aggressive sliding strategy

✍️ OpenClawRadar📅 Published: May 3, 2026🔗 Source
Kimi K2.6 beats Claude, GPT-5.5 and Gemini in coding challenge with aggressive sliding strategy
Ad

Kimi K2.6 wins Word Gem Puzzle benchmark

Moonshot AI's open-weights Kimi K2.6 beat every Western frontier model in the Day 12 Word Gem Puzzle, a real-time sliding-tile letter puzzle. Nine models competed after Nvidia's Nemotron Super 3 failed to connect due to a syntax error.

Final Standings

  • 1st: Kimi K2.6 — 22 match points (7-1-0)
  • 2nd: MiMo V2-Pro — 20 points (6-2-0)
  • 3rd: ChatGPT GPT-5.5 — 16 points (5-1-2)
  • 4th: GLM 5.1 (Zhipu AI) — 15 points
  • 5th: Claude Opus 4.7 — 12 points
  • 6th: Gemini Pro 3.1 — 9 points
  • 7th: Grok Expert 4.2 — 9 points
  • 8th: DeepSeek V4 — 3 points
  • 9th: Muse Spark — 0 points

How the puzzle works

The board is a rectangular grid (10×10 to 30×30) filled with letter tiles and one blank space. Bots slide adjacent tiles into the blank and claim valid English words in straight horizontal/vertical lines. Diagonals and backwards don't count. Scoring: words under 7 letters cost points (5-letter: -1, 3-letter: -3). Words 7+ letters score length - 6 (8-letter: +2). Each word can only be claimed once. Grids are seeded with dictionary words in crossword layout, remaining cells filled with Scrabble-weighted letters, then scrambled (more aggressively on larger boards). On 30×30, nearly all seed words are broken.

Ad

Kimi's winning strategy

Kimi used a greedy approach: score each possible move by what new positive-value words it unlocks, execute the best, repeat. When no move unlocked a positive word, it fell back to the first legal direction alphabetically. This caused inefficient edge-oscillation on small grids but paid off on 30×30 where reconstruction was needed — Kimi's cumulative score of 77 was the tournament's highest.

Why other models struggled

MiMo V2-Pro never actually slid — its "best value > 0" threshold never triggered, so it scanned the initial grid for 7+ letter words and claimed all in one TCP packet. It scored well on boards with intact seed words but zero on scrambled ones (final: 43 cumulative points). Claude also didn't slide, holding up on 25×25 but failing on 30×30. GPT-5.5 was conservative (~120 slides/round) and showed its best numbers on 15×15 and 30×30. GLM was the most aggressive slider overall (>800,000 total slides). Grok never slid but scored decently on larger boards.

Key takeaway

This isn't simply East vs. West — it's two specific Chinese models that performed best with very different strategies. Kimi is open-weights and publicly available from Moonshot AI (founded 2023). MiMo V2-Pro is API-only; Xiaomi confirmed V2.5 Pro weights are dropping soon.

📖 Read the full source: HN AI Agents

Ad

👀 See Also