Multi-Agent Haiku System Matches Claude Opus on Complex Number Theory Problem at 15x Lower Cost

Experimental Setup and Results
A Reddit user conducted a comparative test between two Claude model configurations on a challenging number theory problem. The problem required proving that for an odd prime p, the sum 1^(p-1) + 2^(p-1) + ... + (p-1)^(p-1) is congruent to -1 (mod p), using Fermat's Little Theorem and properties of primitive roots.
Two configurations were tested:
- Config X (Opus solo): Claude Opus 4.5 with max_tokens: 2048, no auditor
- Config Y (Haiku multi-agent): Haiku generator produces full proof, second Haiku auditor checks every step, with two passes if auditor flags anything, max_tokens: 1024 each call
Scoring and Performance
Both configurations scored 4/4 using this rubric:
- Correctly invokes Fermat's Little Theorem
- Correctly handles primitive root argument
- Summation over complete residue system valid
- Congruence conclusion follows correctly
The Haiku auditor returned VERIFIED with no disagreement. Performance metrics:
- Opus solo: ~8.7 seconds, score 4/4
- Haiku + auditor: ~10.9 seconds, score 4/4
Cost Analysis
The economic implications are significant:
- Opus solo: $0.075/1000 tokens × ~800 tokens = ~$0.06 per query
- Haiku + Haiku: $0.0025/1000 tokens × ~1600 tokens = ~$0.004 per query
This represents approximately 15x lower cost for identical results on this problem. The problem was described as "genuinely hard" and not training-data-obvious like simpler proofs.
The source notes that on clean problems where Fermat's Little Theorem does the heavy lifting (each a^(p-1) ≡ 1, sum (p-1) ones, get p-1 ≡ -1), the auditor pattern adds about a 17% time tax to confirm correctness. The pattern is particularly valuable for problems where the generator might stumble with quantization stutter or hallucinated algebra.
📖 Read the full source: r/ClaudeAI
👀 See Also

Claudebin: Export and Share Your Claude Code Sessions
Claudebin allows you to export entire Claude Code sessions, making them shareable and resumable through a single URL.

Caveman: A Claude Code Skill That Cuts 75% of Tokens by Using Caveman-Style Speech
Caveman is a Claude Code skill that reduces token usage by approximately 75% by making Claude respond in a concise, caveman-like style while maintaining full technical accuracy. It's installed via npx or the Claude plugin marketplace.

BetterClaw vs OpenClaw: Comparing Tool Calling, Structured Outputs, and Workflow Control
A developer-focused comparison of BetterClaw and OpenClaw covering tool calling, structured outputs, workflow control, and day-to-day agent development.

Tredict MCP Server Enables Claude to Create and Push Training Plans to Sports Watches
A developer built a Tredict MCP Server for Claude.ai and Claude Code that creates complex endurance training plans via prompts and automatically uploads structured workouts to Garmin, Coros, Suunto, and Wahoo watches. The server includes an MCP App for visual feedback within Claude chat.