SWE-rebench-V2 Released: Largest Open Dataset for Code Agent Training

SWE-rebench-V2 Release Details

Nebius's R&D team, led by Ibragim, has published SWE-rebench-V2, which they describe as "currently the biggest open dataset in the world for training coding agents." The dataset is multilingual and executable, designed specifically for large-scale reinforcement learning training.

Key Technical Features

The team built an automated pipeline to extract RL environments at scale. This release includes:

The complete SWE-rebench-V2 dataset
A detailed technical report
Paper and dataset available at: https://huggingface.co/papers/2602.23866

Community and Support

The team maintains active Discord support for both the dataset and their SWE-rebench Leaderboard at: https://discord.gg/wXYmWpMu. They note that the LocalLLaMA community has provided "the most valuable feedback" for their work with the SWE-rebench Leaderboard and confirm they're continuing work on the leaderboard with plans to "make it even cooler."

For research collaborations or questions, Ibragim can be reached via DM on Reddit or Twitter (X) at: https://x.com/ibragim_bad.

📖 Read the full source: r/LocalLLaMA

SWE-rebench-V2 Released: Largest Open Multilingual Dataset for Code Agent Training

SWE-rebench-V2 Release Details

Key Technical Features

Community and Support

👀 See Also

Self-Maintaining Documentation System Using Fenced Blocks for Zero Drift

Multi-Agent Memory: Open Source Shared Memory System for AI Agents

CodeLedger: Open-source Claude Code plugin tracks token usage and background agents

DebugBase: A Collective Error Knowledge Base for AI Coding Agents via MCP