Why Codex Still Beats Claude Code for Complex Python Monoliths

Over the last year, a developer working on a complex Python monolith has primarily used Codex. After a month testing Claude Code with Opus 4.6 and 4.7, they still prefer Codex for this codebase. The application is not a simple CRUD server — it has a newer DDD-ish layer, older well-structured code, and fragile legacy spaghetti code. The team avoids rewriting old parts unless necessary.
Key Advantages of Codex
- Harness-engineering principles: Codex reliably follows the harness-engineering workflow without explicit instructions. Claude only does so if
AGENTS.mdcontains a directive like “Read exec_plan.md and follow it.” - Reuses existing tools and patterns: Claude more often creates new tools instead of searching the codebase for existing ones. In a codebase with many project-specific helpers, reuse is critical.
- Better planning and context awareness: Claude frequently reads too little before placing new functionality. The developer had to repeatedly correct:
“Put this functionality in module A instead, not in the controller.”
“Do not construct the response object using the statuses you sent in the request. The API already returns the updated object — use that response.”
“Validate it in the same module that owns this boundary.”
Codex more often notices missing context and asks clarifying questions before making architectural changes.
Where Claude Excels
For frontend work, Opus 4.6 was much better than Codex 5.3 and GPT-5.4. The developer currently prefers Claude for UI tasks. They have not tested GPT-5.5 on UI-heavy work yet.
Tool Configuration
Both LLMs use a single shared skill: commands to start and stop Docker Compose and run tests inside the container.
This is not a benchmark, just daily-use experience from one production codebase.
📖 Read the full source: HN AI Agents
👀 See Also

Chapper: Native iOS Client for LM Studio, Ollama, and OpenAI-Compatible Local Models
Chapper is a native SwiftUI iOS app that connects to LM Studio, Ollama, and OpenAI-compatible local models without cloud services or accounts. It offers real-time token streaming, full sampling controls, reasoning model support with <think> tags, and export in 7 formats.

Detrix MCP Server Adds Runtime Debugging to AI Coding Agents
Detrix is a free, open-source MCP server that enables MCP-compatible agents to observe live variables in running code without restarts or code changes. It supports Python, Go, and Rust applications running locally or in Docker.

SlackClaw: Managed OpenClaw Instance for Slack Integration
SlackClaw is a commercial product built on OpenClaw that provides a managed instance specifically for Slack. It offers one-click installation, OAuth tool connections, dedicated servers per workspace, and persistent memory.

Building and Testing an MCP Server in Claude Desktop: Architecture and Lessons
A developer shares their experience building and testing an MCP server within Claude Desktop, detailing their architecture setup and practical lessons learned about tool schemas, debugging, and limitations.