Claude Code vs Codex: 36 Files vs 28, Infinite Loop, $0.46 Diff

A developer ran a head-to-head comparison of Claude Code and Codex (via Cursor) using identical prompts and the same MCP setup (GitHub + Slack). No hints, no extra help. Two tasks:

Task 1: PR triage bot – read open PRs, score complexity, write report, ping Slack for high priority. Required retry logic, error logging, strict TypeScript (no any).
Task 2: Real-time code review UI – React, WebSocket, inline comments, optimistic updates with rollback, virtualized diff viewer, reconnect with backoff. No UI libraries, everything from scratch.

Results

Claude Code: Verified MCP tools were live before writing code. Built 36 files in 12 minutes. Included a two-client WebSocket smoke test not asked for. Broadcast latency: 3ms. Zero any. Passed typecheck first try.
Codex (Cursor): Couldn't access GitHub MCP on Task 1 (Cursor's execution path didn't expose tool descriptors). Got tool not found after 3 retries, but logged and handled cleanly – environment issue, not model quality. Task 2 shipped a working UI in ~15 min, 5ms latency. First compile had TypeScript errors and an infinite React loop (useEffect calling hydrate repeatedly) that needed a ref guard patch.

Cost

API cost across both tasks: Claude ~$2.50, Codex ~$2.04. Claude was ~23% more expensive but delivered more granular architecture and a first-run clean UI.

Key Takeaways

The author notes the two tools aren't really competing for the same use case. Claude Code feels like pairing with someone who reads the docs first; Codex feels like a senior dev who wants to ship fast. Neither leaked any, neither hallucinated a tool name, and both got WebSocket broadcast under 10ms – a clear improvement over six months ago.

📖 Read the full source: r/LocalLLaMA

Claude Code vs. Codex: Real-World Build Test – 36 Files vs. 28, Infinite Loop, and $0.46 Cost Difference

Results

Cost

Key Takeaways

👀 See Also

GrapeRoot tool reduces Claude Code costs by 45% with pre-scanned repository context

cc+ Desktop App for Claude Code: Multi-Session Management and Fleet Orchestration

DeepSeek Reasonix: Native Coding Agent with High Caching and Low Cost

Android CLI and Skills for AI Agent Development Workflows