Open-weight models under 100GB can't beat Claude Haiku on coding benchmarks

✍️ OpenClawRadar📅 Published: February 26, 2026🔗 Source

A recent analysis of open-weight language models reveals a significant performance gap compared to Anthropic's Claude Haiku on coding benchmarks. The comparison was conducted using specific testing parameters and memory requirements.

Benchmark methodology

The evaluation compared models on two coding benchmarks: LiveBench (January 2026) and Arena Code/WebDev. Testing was performed against Claude Haiku 4.5 with thinking capabilities enabled. Models were plotted according to memory requirements for local deployment.

Technical specifications

Quantization: Q4_K_M
Context length: 32K
KV cache: q8_0
VRAM estimation: Calculated using the author's custom calculator

Key findings

No open-weight model under 100GB of memory comes close to Claude Haiku's performance on either benchmark. The nearest competitor is Minimax M2.5, which requires approximately 136GB of memory and roughly matches Haiku's performance on both benchmarks.

The analysis highlights the current gap between proprietary and open-weight models in the under-100GB category for coding tasks. The author expresses frustration with this limitation and calls for development of smaller models that could at least match Haiku's capabilities.

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

Claude Code Postmortem: Three Bugs Caused Quality Degradation, Now Fixed

Anthropic traced recent Claude Code quality complaints to three separate changes: default reasoning effort was lowered, a caching bug dropped session memory, and a verbosity prompt hurt coding quality. All fixed as of April 20 (v2.1.116).

Apr 23, 2026, 06:15 PM UTC

OpenClawRadar

News

OpenAI's Sam Altman Supports Anthropic's Pentagon Red Lines, Proposes Technical Safeguards

OpenAI CEO Sam Altman has expressed support for Anthropic's ethical stance against Pentagon AI use for mass surveillance and autonomous weapons, while proposing technical safeguards like cloud-only deployment as a resolution.

Feb 27, 2026, 06:45 PM UTC

OpenClawRadar

News

The Need for Relational Governance in Multi-Agent AI Systems

Current governance frameworks focus on identity, permissions, and kill switches, but fail to address coordination between agents. Salesforce's research shows agent-to-agent interactions require purpose-built solutions, while studies reveal warmth outperforms dominance in negotiations.

Mar 1, 2026, 03:45 PM UTC

OpenClawRadar

News

Hospital CEO Claims AI Ready to Replace Radiologists

The CEO of America's largest public hospital system says he's prepared to replace radiologists with AI, according to a Radiology Business article that generated significant discussion on Hacker News with 83 comments.

Apr 2, 2026, 08:45 PM UTC

OpenClawRadar