SWE-rebench Leaderboard Update: February 2026 Results Show Tight Competition

SWE-rebench February 2026 Results
The SWE-rebench leaderboard has been updated with February 2026 runs on 57 fresh GitHub PR tasks. The setup follows standard SWE-bench methodology: models read real PR issues, edit code, run tests, and must make the full test suite pass. Tasks are restricted to PRs created in the previous month.
Key Results
- Claude Opus 4.6 remains at the top with 65.3% resolved rate, continuing to set the pace with strong pass@5 (~70%)
- The top tier is extremely tight: gpt-5.2-medium (64.4%), GLM-5 (62.8%), and gpt-5.4-medium (62.8%) are all within a few points of the leader
- Gemini 3.1 Pro Preview (62.3%) and DeepSeek-V3.2 (60.9%) complete a tightly packed top-6
- Open-weight/hybrid models keep improving: Qwen3.5-397B (59.9%), Step-3.5-Flash (59.6%), and Qwen3-Coder-Next (54.4%) are closing the gap, driven by improved long-context use and scaling
- MiniMax M2.5 (54.6%) continues to stand out as a cost-efficient option with competitive performance
Overall, February shows a highly competitive frontier with multiple models within a few points of the lead.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Developer's Dilemma: National Security Concerns Limit Open Model Choices
A developer working with security-sensitive clients reports being forced to choose between outdated U.S. open models like gpt-oss-120b or more capable Chinese models like GLM and MiniMax, which clients reject as national security risks.

Analysis: Anthropic's actual compute costs for Claude Code users are far lower than reported $5k figure
A recent article analyzes the claim that Anthropic's $200/month Claude Code Max plan consumes $5,000 in compute, finding that actual inference costs are roughly 10% of API prices when comparing to competitive open-weight models on OpenRouter.

Meta's MCI Tool Captures Employee Interactions for AI Training
Meta is installing tracking software called Model Capability Initiative (MCI) on U.S. employee computers to capture mouse movements, keystrokes, clicks, and occasional screen snapshots for AI model training. The data aims to improve AI's ability to replicate human computer interactions like dropdown menu selection and keyboard shortcuts.

Anthropic separates Claude subscriptions from third-party tool usage
Anthropic is ending Claude Pro/Team subscription coverage for OpenClaw usage starting April 4, requiring separate pay-as-you-go billing for third-party harnesses. Users must enable 'extra usage' in account settings to continue using Claude through OpenClaw.