Kimi K2.6 beats Claude, GPT-5.5 and Gemini in coding challenge with aggressive sliding strategy

Kimi K2.6 wins Word Gem Puzzle benchmark
Moonshot AI's open-weights Kimi K2.6 beat every Western frontier model in the Day 12 Word Gem Puzzle, a real-time sliding-tile letter puzzle. Nine models competed after Nvidia's Nemotron Super 3 failed to connect due to a syntax error.
Final Standings
- 1st: Kimi K2.6 — 22 match points (7-1-0)
- 2nd: MiMo V2-Pro — 20 points (6-2-0)
- 3rd: ChatGPT GPT-5.5 — 16 points (5-1-2)
- 4th: GLM 5.1 (Zhipu AI) — 15 points
- 5th: Claude Opus 4.7 — 12 points
- 6th: Gemini Pro 3.1 — 9 points
- 7th: Grok Expert 4.2 — 9 points
- 8th: DeepSeek V4 — 3 points
- 9th: Muse Spark — 0 points
How the puzzle works
The board is a rectangular grid (10×10 to 30×30) filled with letter tiles and one blank space. Bots slide adjacent tiles into the blank and claim valid English words in straight horizontal/vertical lines. Diagonals and backwards don't count. Scoring: words under 7 letters cost points (5-letter: -1, 3-letter: -3). Words 7+ letters score length - 6 (8-letter: +2). Each word can only be claimed once. Grids are seeded with dictionary words in crossword layout, remaining cells filled with Scrabble-weighted letters, then scrambled (more aggressively on larger boards). On 30×30, nearly all seed words are broken.
Kimi's winning strategy
Kimi used a greedy approach: score each possible move by what new positive-value words it unlocks, execute the best, repeat. When no move unlocked a positive word, it fell back to the first legal direction alphabetically. This caused inefficient edge-oscillation on small grids but paid off on 30×30 where reconstruction was needed — Kimi's cumulative score of 77 was the tournament's highest.
Why other models struggled
MiMo V2-Pro never actually slid — its "best value > 0" threshold never triggered, so it scanned the initial grid for 7+ letter words and claimed all in one TCP packet. It scored well on boards with intact seed words but zero on scrambled ones (final: 43 cumulative points). Claude also didn't slide, holding up on 25×25 but failing on 30×30. GPT-5.5 was conservative (~120 slides/round) and showed its best numbers on 15×15 and 30×30. GLM was the most aggressive slider overall (>800,000 total slides). Grok never slid but scored decently on larger boards.
Key takeaway
This isn't simply East vs. West — it's two specific Chinese models that performed best with very different strategies. Kimi is open-weights and publicly available from Moonshot AI (founded 2023). MiMo V2-Pro is API-only; Xiaomi confirmed V2.5 Pro weights are dropping soon.
📖 Read the full source: HN AI Agents
👀 See Also

Running OpenClawd for Free: Successes and Challenges
In a recent post on r/clawdbot, a member shares their experience running OpenClawd without API keys, discussing their successes and the challenges faced.

Claude API Cost Visibility Concerns for Indie Developers
A Reddit discussion highlights that Claude Sonnet API's lack of granular cost tracking may lead indie developers to drop it despite its quality, with bills of $400–$900 catching them off guard due to insufficient observability compared to AWS-style monitoring.

VS Code 1.117.0 Automatically Adds Copilot as Co-Author on Commit — Here's What Triggers It
VS Code 1.117.0 appends 'Co-authored-by: Copilot <[email protected]>' to commits when inline suggestions are used — even for a single comma. The feature is opt-out and not clearly communicated.

Revolutionize API Monitoring Across Providers with onWatch
Discover how onWatch, a powerful new tool, streamlines tracking your AI API quota usage across multiple providers, ensuring you stay within limits and optimize resource allocation.