SWE-rebench Leaderboard Update: February 2026 Results Show Tight Competition

✍️ OpenClawRadar📅 Published: March 23, 2026🔗 Source

SWE-rebench February 2026 Results

The SWE-rebench leaderboard has been updated with February 2026 runs on 57 fresh GitHub PR tasks. The setup follows standard SWE-bench methodology: models read real PR issues, edit code, run tests, and must make the full test suite pass. Tasks are restricted to PRs created in the previous month.

Key Results

Claude Opus 4.6 remains at the top with 65.3% resolved rate, continuing to set the pace with strong pass@5 (~70%)
The top tier is extremely tight: gpt-5.2-medium (64.4%), GLM-5 (62.8%), and gpt-5.4-medium (62.8%) are all within a few points of the leader
Gemini 3.1 Pro Preview (62.3%) and DeepSeek-V3.2 (60.9%) complete a tightly packed top-6
Open-weight/hybrid models keep improving: Qwen3.5-397B (59.9%), Step-3.5-Flash (59.6%), and Qwen3-Coder-Next (54.4%) are closing the gap, driven by improved long-context use and scaling
MiniMax M2.5 (54.6%) continues to stand out as a cost-efficient option with competitive performance

Overall, February shows a highly competitive frontier with multiple models within a few points of the lead.

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

Developer's Dilemma: National Security Concerns Limit Open Model Choices

A developer working with security-sensitive clients reports being forced to choose between outdated U.S. open models like gpt-oss-120b or more capable Chinese models like GLM and MiniMax, which clients reject as national security risks.

Feb 27, 2026, 01:45 PM UTC

OpenClawRadar

News

Analysis: Anthropic's actual compute costs for Claude Code users are far lower than reported $5k figure

A recent article analyzes the claim that Anthropic's $200/month Claude Code Max plan consumes $5,000 in compute, finding that actual inference costs are roughly 10% of API prices when comparing to competitive open-weight models on OpenRouter.

Mar 11, 2026, 06:45 AM UTC

OpenClawRadar

News

Meta's MCI Tool Captures Employee Interactions for AI Training

Meta is installing tracking software called Model Capability Initiative (MCI) on U.S. employee computers to capture mouse movements, keystrokes, clicks, and occasional screen snapshots for AI model training. The data aims to improve AI's ability to replicate human computer interactions like dropdown menu selection and keyboard shortcuts.

Apr 21, 2026, 08:22 PM UTC

OpenClawRadar

News

Anthropic separates Claude subscriptions from third-party tool usage

Anthropic is ending Claude Pro/Team subscription coverage for OpenClaw usage starting April 4, requiring separate pay-as-you-go billing for third-party harnesses. Users must enable 'extra usage' in account settings to continue using Claude through OpenClaw.

Apr 13, 2026, 02:14 PM UTC

OpenClawRadar