Claude Code vs. Codex: Real-World Build Test – 36 Files vs. 28, Infinite Loop, and $0.46 Cost Difference

A developer ran a head-to-head comparison of Claude Code and Codex (via Cursor) using identical prompts and the same MCP setup (GitHub + Slack). No hints, no extra help. Two tasks:
- Task 1: PR triage bot – read open PRs, score complexity, write report, ping Slack for high priority. Required retry logic, error logging, strict TypeScript (no
any). - Task 2: Real-time code review UI – React, WebSocket, inline comments, optimistic updates with rollback, virtualized diff viewer, reconnect with backoff. No UI libraries, everything from scratch.
Results
- Claude Code: Verified MCP tools were live before writing code. Built 36 files in 12 minutes. Included a two-client WebSocket smoke test not asked for. Broadcast latency: 3ms. Zero
any. Passed typecheck first try. - Codex (Cursor): Couldn't access GitHub MCP on Task 1 (Cursor's execution path didn't expose tool descriptors). Got
tool not foundafter 3 retries, but logged and handled cleanly – environment issue, not model quality. Task 2 shipped a working UI in ~15 min, 5ms latency. First compile had TypeScript errors and an infinite React loop (useEffectcallinghydraterepeatedly) that needed a ref guard patch.
Cost
API cost across both tasks: Claude ~$2.50, Codex ~$2.04. Claude was ~23% more expensive but delivered more granular architecture and a first-run clean UI.
Key Takeaways
The author notes the two tools aren't really competing for the same use case. Claude Code feels like pairing with someone who reads the docs first; Codex feels like a senior dev who wants to ship fast. Neither leaked any, neither hallucinated a tool name, and both got WebSocket broadcast under 10ms – a clear improvement over six months ago.
📖 Read the full source: r/LocalLLaMA
👀 See Also

GrapeRoot tool reduces Claude Code costs by 45% with pre-scanned repository context
A free tool called GrapeRoot that pre-scans repositories and builds dependency graphs reduced Claude Code costs by 45% on average across 10 engineering tasks while improving response quality by 13%. The tool eliminates exploration loops that normally consume tokens.

cc+ Desktop App for Claude Code: Multi-Session Management and Fleet Orchestration
cc+ is an open-source desktop application for Claude Code built on the Claude Agent SDK, available for macOS and Linux. It provides multi-session tabs, live activity tree visualization, security scoring, workflow enforcement, and fleet orchestration capabilities.

DeepSeek Reasonix: Native Coding Agent with High Caching and Low Cost
Reasonix is a DeepSeek-native AI coding agent for the terminal, focusing on high caching efficiency and low inference cost.

Android CLI and Skills for AI Agent Development Workflows
Google released Android CLI with commands like android create and android sdk install, plus Android Skills GitHub repository with modular instruction sets. Internal benchmarks show 70% reduction in LLM token usage and 3x faster task completion.