Two-Agent Haiku System Beats Claude Opus at 15x Lower Cost

Experimental Setup and Results

A Reddit user conducted a comparative test between two Claude model configurations on a challenging number theory problem. The problem required proving that for an odd prime p, the sum 1^(p-1) + 2^(p-1) + ... + (p-1)^(p-1) is congruent to -1 (mod p), using Fermat's Little Theorem and properties of primitive roots.

Two configurations were tested:

Config X (Opus solo): Claude Opus 4.5 with max_tokens: 2048, no auditor
Config Y (Haiku multi-agent): Haiku generator produces full proof, second Haiku auditor checks every step, with two passes if auditor flags anything, max_tokens: 1024 each call

Scoring and Performance

Both configurations scored 4/4 using this rubric:

Correctly invokes Fermat's Little Theorem
Correctly handles primitive root argument
Summation over complete residue system valid
Congruence conclusion follows correctly

The Haiku auditor returned VERIFIED with no disagreement. Performance metrics:

Opus solo: ~8.7 seconds, score 4/4
Haiku + auditor: ~10.9 seconds, score 4/4

Cost Analysis

The economic implications are significant:

Opus solo: $0.075/1000 tokens × ~800 tokens = ~$0.06 per query
Haiku + Haiku: $0.0025/1000 tokens × ~1600 tokens = ~$0.004 per query

This represents approximately 15x lower cost for identical results on this problem. The problem was described as "genuinely hard" and not training-data-obvious like simpler proofs.

The source notes that on clean problems where Fermat's Little Theorem does the heavy lifting (each a^(p-1) ≡ 1, sum (p-1) ones, get p-1 ≡ -1), the auditor pattern adds about a 17% time tax to confirm correctness. The pattern is particularly valuable for problems where the generator might stumble with quantization stutter or hallucinated algebra.

📖 Read the full source: r/ClaudeAI

Multi-Agent Haiku System Matches Claude Opus on Complex Number Theory Problem at 15x Lower Cost

Experimental Setup and Results

Scoring and Performance

Cost Analysis

👀 See Also

Claudebin: Export and Share Your Claude Code Sessions

Caveman: A Claude Code Skill That Cuts 75% of Tokens by Using Caveman-Style Speech

BetterClaw vs OpenClaw: Comparing Tool Calling, Structured Outputs, and Workflow Control

Tredict MCP Server Enables Claude to Create and Push Training Plans to Sports Watches