Claude Code vs Codex: 36 vs 28 files, $2.50 vs $2.04, infinite loop caught — real-world comparison

✍️ OpenClawRadar📅 Published: May 13, 2026🔗 Source
Ad

Someone on r/ClaudeAI ran a head-to-head comparison of Claude Code and Codex (via Cursor) on two practical tasks—same prompts, same MCP setup (GitHub + Slack), same machine. No benchmarks, real builds.

Tasks

  • Task 1: PR triage bot — Read open PRs, score by complexity (files ×2, lines/10, +3 for no labels, +5 for no reviewers), write a markdown report, post Slack alerts for high scores. Required retries, error logging, strict TypeScript, no any.
  • Task 2: Real-time code review UI — React + TypeScript, WebSockets, inline comment threads, optimistic updates with rollback, virtualized diff viewer, WS reconnect with exponential backoff. No UI libraries.

Claude Code results

  • Ran /mcp to verify tools before writing code
  • Built 36 files in ~12 minutes
  • Wrote an unprompted two-client WebSocket smoke test (broadcast: 3ms)
  • Zero any, passed typecheck first try
  • UI worked immediately
Ad

Codex (via Cursor) results

  • Failed Task 1: GitHub MCP wasn't reachable through Cursor's execution path. Handled it cleanly (retried 3x, logged errors, didn't crash), but no delivery.
  • Task 2: Shipped a working UI in ~15 minutes, smoke test passed at 5ms
  • Hit TypeScript errors on first compile and an infinite React loop (useEffect calling hydrate repeatedly). Needed a ref guard patch.
  • 28 files, more compact architecture

Cost (estimated, both tasks)

  • Claude: ~$2.50
  • Codex: ~$2.04
  • Difference: ~18-23%

Takeaways

Neither agent “won”. Claude feels like pairing with someone who verifies everything before touching the keyboard. Codex feels like a senior dev who wants to ship and move on. Both got WebSocket broadcast under 10ms—six months ago that wasn't a given. No any leaks, no hallucinated tool names.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also