1.2B Local Model Beats 1T Clouds in Poker: Aggression Trumps Knowledge in Shove-or-Fold Format

✍️ OpenClawRadar📅 Published: May 19, 2026🔗 Source
1.2B Local Model Beats 1T Clouds in Poker: Aggression Trumps Knowledge in Shove-or-Fold Format
Ad

A developer ran 6 LLMs through 5 Texas Hold'em tournaments on a 16GB MacBook using a custom framework (Hive). The lineup: Liquid lfm2.5 (1.2B, LM Studio, ~5s/decision), Qwen3 (1.7B, LM Studio, ~2.5 min), Claude Haiku 4.5, GPT-OSS (120B, Fireworks), MiniMax M2 (230B, Fireworks), and Kimi K2 (~1T, Fireworks). Locals ran sequentially due to RAM limits.

Results

  • Tournament 1: Qwen (1.7B local)
  • Tournament 2: MiniMax (230B cloud)
  • Tournament 3: Liquid (1.2B local)
  • Tournament 4: Kimi (~1T cloud)
  • Tournament 5: Liquid (1.2B local)

Run 3 highlighted the dynamic: Liquid played 6 hands with 19 raises and 0 folds, turning a $1M starting stack into $5.98M. Meanwhile, GPT-OSS (120B) executed 0 raises and 5 folds in 6 hands, getting blinded out. The format (25 hands, 5K/10K blinds + 1K ante) is effectively shove-or-fold, rewarding aggression over theoretical poker skill.

Ad

Key Insight

Liquid doesn't recognize bad hands, so it raises everything. Against opponents that fold too often, this prints money. The author notes: "Not claiming small models are smarter at poker. In this specific format, not knowing when to fold is an advantage." Larger models 'understand' poker enough to fold weak hands, but in a short-stack tournament, patience is punished.

What's Next

Plans include longer tournaments (100+ hands, lower blinds) where hand-reading matters. The framework supports custom personas (personality traits, risk tolerance, fears). Requests for Mistral, Llama, Gemma 3 are welcome. Code and full result JSONs are on GitHub: https://github.com/chiruu12/Hive (hive-arena/ for runner, tournaments/results/ for data).

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Current LLM Cost Comparison: Deepseek, Qwen, MiniMax vs OpenAI
News

Current LLM Cost Comparison: Deepseek, Qwen, MiniMax vs OpenAI

A Reddit analysis shows Deepseek-V3.2 at $0.26/$0.38 per million tokens is approximately 10x cheaper than GPT-4 while delivering GPT-5 class benchmark performance, with Qwen3.5 and MiniMax-M2.5 offering competitive alternatives to Claude and OpenAI.

OpenClawRadar
Anthropic ships 1M context window for Claude Opus at no extra cost
News

Anthropic ships 1M context window for Claude Opus at no extra cost

Anthropic has made the 1M token context window available to all Claude Code users on Max, Team, and Enterprise plans in version 2.1.75, removing the previous extra usage fee. The default window remains 200k tokens.

OpenClawRadar
Anthropic Launches Claude Partner Network with $100M Investment
News

Anthropic Launches Claude Partner Network with $100M Investment

Anthropic is launching the Claude Partner Network with an initial $100 million investment for 2026, providing training, technical support, and joint market development for organizations helping enterprises adopt Claude. Partners get access to technical certification, a Partner Portal with training materials, and a Code Modernization starter kit for legacy code migration.

OpenClawRadar
Claude Code developer acknowledges adaptive thinking flaw, provides workaround
News

Claude Code developer acknowledges adaptive thinking flaw, provides workaround

Boris Charny, creator of Claude Code, confirmed a flaw in the adaptive thinking feature that causes performance degradation. Users experiencing issues even with effort=high settings can use CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 as an interim workaround.

OpenClawRadar