ATLAS: Open-Source Test-Time Compute Pipeline for Qwen3-14B Achieves Frontier-Level Coding Performance

ATLAS is an open-source test-time compute pipeline built around Qwen3-14B that achieves coding performance comparable to frontier models at significantly lower cost. The project was developed by a business management student at Virginia Tech who learned to code while building it.
Development Evolution
The developer spent two to three months researching hundreds of papers to connect existing research that hadn't been combined before. The system evolved through three major versions:
- V1: Basic infrastructure, described as "VERY rudimentary (essentially just RAG)"
- V2: Applied energy-based verification inspired by Anthropic's "When Models Manipulate Manifolds" paper, resulting in a decent verifier
- V3: Doubled performance over V1 baseline after extensive research including exploration of the Halting Problem
Performance Benchmarks
Results on 599 LiveCodeBench v5 problems:
- DeepSeek V3.2 Reasoning: 86.2% pass@1, ~$0.002 per task (API)
- GPT-5 (high): 84.6% pass@1, ~$0.043 per task (API)
- ATLAS V3: 74.6% pass@1, ~$0.004 per task (electricity)
- Claude 4.5 Sonnet: 71.4% pass@1, ~$0.066 per task (API)
Technical Details and Limitations
The system is "slow as hell" according to the developer. Easy tasks take seconds, but complex coding problems can take up to an hour. V3.1 is moving to Qwen 3.5 9B for improved speed and parallelization.
ATLAS includes full MaaS (Model-as-a-Service) infrastructure that allows connecting OpenCode or Claude Code via API. The developer recommends at least 16GB VRAM, warning that with less memory it will be "even slower than I mentioned."
Setup and Reproducibility
The project is fully open source with no plans for commercialization. The repository is available at https://github.com/itigges22/ATLAS. The developer notes that reproducibility needs work, but suggests that "if you ask Claude Code to optimize it for your setup it should work fine."
📖 Read the full source: r/LocalLLaMA
👀 See Also

TeenyApp lets Claude build and deploy full-stack websites from a single chat link
TeenyApp provides a live subdomain and agent token that Claude can use via HTTP to scaffold code, run migrations, set up auth, and deploy directly to a real URL without leaving the chat.

fintool adds stock and prediction market trading to OpenClaw agents
fintool is a new OpenClaw skill that enables AI agents to trade stocks and prediction markets. Installation requires reading a GitHub file, after which agents can execute trades on Hyperliquid, Binance, and Polymarket with JSON output for clean integration.

Claude Code fails silently when ANTHROPIC_API_KEY is set in cloud environments
Setting ANTHROPIC_API_KEY in cloud environments causes Claude Code to malfunction and may incur unexpected API usage charges. Users report extra usage and unresponsive behavior.

Two Claude Code Skills for Managing CLAUDE.md Configuration
A developer built two Claude Code skills to handle CLAUDE.md configuration: /cc-init creates lean configs for new projects, and /cc-optimize analyzes existing projects for bloat and issues. Both aim to reduce context overhead and improve instruction following.