ATLAS: Open-Source Test-Time Compute Pipeline for Qwen3-14B Achieves Frontier-Level Coding Performance

✍️ OpenClawRadar📅 Published: March 10, 2026🔗 Source
ATLAS: Open-Source Test-Time Compute Pipeline for Qwen3-14B Achieves Frontier-Level Coding Performance
Ad

ATLAS is an open-source test-time compute pipeline built around Qwen3-14B that achieves coding performance comparable to frontier models at significantly lower cost. The project was developed by a business management student at Virginia Tech who learned to code while building it.

Development Evolution

The developer spent two to three months researching hundreds of papers to connect existing research that hadn't been combined before. The system evolved through three major versions:

  • V1: Basic infrastructure, described as "VERY rudimentary (essentially just RAG)"
  • V2: Applied energy-based verification inspired by Anthropic's "When Models Manipulate Manifolds" paper, resulting in a decent verifier
  • V3: Doubled performance over V1 baseline after extensive research including exploration of the Halting Problem

Performance Benchmarks

Results on 599 LiveCodeBench v5 problems:

  • DeepSeek V3.2 Reasoning: 86.2% pass@1, ~$0.002 per task (API)
  • GPT-5 (high): 84.6% pass@1, ~$0.043 per task (API)
  • ATLAS V3: 74.6% pass@1, ~$0.004 per task (electricity)
  • Claude 4.5 Sonnet: 71.4% pass@1, ~$0.066 per task (API)
Ad

Technical Details and Limitations

The system is "slow as hell" according to the developer. Easy tasks take seconds, but complex coding problems can take up to an hour. V3.1 is moving to Qwen 3.5 9B for improved speed and parallelization.

ATLAS includes full MaaS (Model-as-a-Service) infrastructure that allows connecting OpenCode or Claude Code via API. The developer recommends at least 16GB VRAM, warning that with less memory it will be "even slower than I mentioned."

Setup and Reproducibility

The project is fully open source with no plans for commercialization. The repository is available at https://github.com/itigges22/ATLAS. The developer notes that reproducibility needs work, but suggests that "if you ask Claude Code to optimize it for your setup it should work fine."

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also