Benchmark shows context engine reduces AI coding agent costs by 3x on SWE-bench

A developer benchmarked four AI coding agents on SWE-bench Verified using the same Claude Opus 4.5 model, with context management as the only variable. The results show significant cost differences for similar performance levels.
Benchmark setup
The test used a 100-task stratified subset of SWE-bench Verified with all 12 repositories represented proportionally. All agents ran Claude Opus 4.5 with the same $3/task budget and 250-turn limit. The only difference was the context layer in front of the model.
Results
- Context engine + Claude Code: 73.0% Pass@1, $0.67/task
- Live-SWE-Agent: 72.0% Pass@1, $0.86/task
- OpenHands: 70.0% Pass@1, $1.77/task
- Sonar Foundation: 70.0% Pass@1, $1.98/task
The most expensive setup costs 3x more per task for a lower resolution rate. Eight tasks were solved only by the setup with the context layer - bugs that the model couldn't fix without seeing the right code.
Limitations
On matplotlib (rendering-heavy, visual output code), the context engine scored 43% while Sonar Foundation hit 86%. Graph-based context is less effective when relevant code doesn't follow dependency chains.
How the context layer works
Instead of letting Claude read entire files, it pre-indexes the codebase into a dependency graph using tree-sitter + SQLite (30 languages supported) and returns a ranked context capsule: full source for functions that matter, skeletonized signatures for everything connected to them. The agent starts every task already knowing what's relevant.
It includes session memory that persists across sessions via MCP. When code changes, previous observations get flagged as stale automatically, so the agent doesn't re-explore the same things.
The system is 100% local with no cloud, no account, and no code leaving your machine. It works with Claude Code and 11 other agents via MCP.
Open source availability
The benchmark harness, all evaluation logs, per-instance results, and comparison scripts are available on GitHub at github.com/Vexp-ai/vexp-swe-bench. The tool itself is available at vexp.dev with a free tier, VS Code extension, or CLI. Full benchmark results with charts are at vexp.dev/benchmark.
📖 Read the full source: r/ClaudeAI
👀 See Also

Local RAG Tool Built with Nemotron Nano 9B v2 and vLLM Tool Calling
A developer built a local-first RAG research tool that runs entirely on a single GPU using Nemotron Nano 9B v2 Japanese on vLLM with custom parser plugins for tool calling. The system features a two-step extract-execute flow with bilingual keyword extraction and parallel FTS5/DuckDuckGo search.

3D-Printed Clawd Mascot with ESP32-Powered Mochi Bot
A developer built a physical 3D Clawd inspired by the Claude Code mascot, with an ESP32-driven Mochi bot featuring a small display. Files and code available on MakerWorld and GitHub.

SideX: A Tauri-Based Port of Visual Studio Code
SideX is a port of Visual Studio Code that replaces Electron with Tauri, using a Rust backend and the OS's native webview. The project claims the same architecture with 96% smaller size, with core editing and terminal functionality currently working.

Dart AI productivity app review with OpenClaw integration
A user reports switching from Things to Dart AI for productivity, finding it better for implementing Getting Things Done methodology with full OpenClaw access, despite UI issues and initial setup complexity.