Open-weight models under 100GB can't beat Claude Haiku on coding benchmarks

A recent analysis of open-weight language models reveals a significant performance gap compared to Anthropic's Claude Haiku on coding benchmarks. The comparison was conducted using specific testing parameters and memory requirements.
Benchmark methodology
The evaluation compared models on two coding benchmarks: LiveBench (January 2026) and Arena Code/WebDev. Testing was performed against Claude Haiku 4.5 with thinking capabilities enabled. Models were plotted according to memory requirements for local deployment.
Technical specifications
- Quantization: Q4_K_M
- Context length: 32K
- KV cache: q8_0
- VRAM estimation: Calculated using the author's custom calculator
Key findings
No open-weight model under 100GB of memory comes close to Claude Haiku's performance on either benchmark. The nearest competitor is Minimax M2.5, which requires approximately 136GB of memory and roughly matches Haiku's performance on both benchmarks.
The analysis highlights the current gap between proprietary and open-weight models in the under-100GB category for coding tasks. The author expresses frustration with this limitation and calls for development of smaller models that could at least match Haiku's capabilities.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code Postmortem: Three Bugs Caused Quality Degradation, Now Fixed
Anthropic traced recent Claude Code quality complaints to three separate changes: default reasoning effort was lowered, a caching bug dropped session memory, and a verbosity prompt hurt coding quality. All fixed as of April 20 (v2.1.116).

OpenAI's Sam Altman Supports Anthropic's Pentagon Red Lines, Proposes Technical Safeguards
OpenAI CEO Sam Altman has expressed support for Anthropic's ethical stance against Pentagon AI use for mass surveillance and autonomous weapons, while proposing technical safeguards like cloud-only deployment as a resolution.

The Need for Relational Governance in Multi-Agent AI Systems
Current governance frameworks focus on identity, permissions, and kill switches, but fail to address coordination between agents. Salesforce's research shows agent-to-agent interactions require purpose-built solutions, while studies reveal warmth outperforms dominance in negotiations.

Hospital CEO Claims AI Ready to Replace Radiologists
The CEO of America's largest public hospital system says he's prepared to replace radiologists with AI, according to a Radiology Business article that generated significant discussion on Hacker News with 83 comments.