Cut Agent Costs From $18 to $4 With Model Routing

One developer on r/ClaudeAI describes a practical cost-optimization strategy for agent loops: route routine subtasks to cheap models and reserve expensive models (Opus 4.7) only for complex reasoning. Their refactoring agent — handling CSS variable renames, YAML config updates, and linter runs via MCP — originally sent every step to Opus 4.7 at a total of about $18. After implementing routing logic, 178 of 212 steps went to cheap models, reducing cost to roughly $4 with no observable quality difference on routine changes.

Routing Logic

Hard subtasks → Opus 4.7: Component architecture, debugging 2am code, anything requiring sustained reasoning across long conversations. The author notes Opus is genuinely unmatched at that kind of work — a previous attempt to route an auth middleware bug to a cheaper model silently broke session handling, costing an hour to trace.
Routine subtasks → cheaper models: Lint, rename, config edits, tool orchestration. The author settled on DeepSeek V4 Pro for general coding chores and Tencent Hunyuan Hy3 preview for heavy tool calling. As of late April, Hunyuan Hy3 was ranked #1 on OpenRouter by tool call volume and almost never botches a function call when the schema is clean.

Cost Comparison

Opus 4.7: ~$0.18 per million input tokens (estimated from context of ~28x cheaper alternative).
Tencent Hunyuan Hy3: $0.18 per million input tokens, $0.59 per million output — roughly 28x cheaper than Opus 4.7 on input.
Same 212-step refactor: 178 steps to cheap tier, 34 steps to Opus. Cost dropped from $18 to ~$4.

Failure Modes

The tool-calling model hallucinates parameters when schemas are sloppy (author admits schemas were bad).
DeepSeek V4 Pro occasionally writes syntactically perfect code that does the opposite of what was asked, surviving a quick skim.
Neither cheap model can match Opus for debugging deep issues (e.g., auth flow silently eating a cookie).

Decision Heuristic

The author's routing rule of thumb: "How expensive is a wrong answer to catch?" A bad lint fix costs a 2-second git revert; a bad architecture call costs the whole afternoon.

The savings enabled previously skipped tasks — like writing and running tests for every CSS change, or regenerating all Open Graph images — because at fractions of a cent per tool call there's no reason not to.

📖 Read the full source: r/ClaudeAI