NerfGuard: A Classifier That Routes Coding Requests to Cheaper Models, Cutting Spend 3x

✍️ OpenClawRadar📅 Published: June 6, 2026🔗 Source
NerfGuard: A Classifier That Routes Coding Requests to Cheaper Models, Cutting Spend 3x
Ad

A team that switched from Claude Code to Codex for speed and steerability found themselves hitting per-token pricing hard. Their daily bill was striking, and they noticed they were using top-tier models on max reasoning for every task, even trivial ones. So they built NerfGuard — a fast classifier that routes each request to the least expensive model and reasoning depth required.

The core is a classifier that determines the minimum intelligence needed for a given coding request. On top of that, it applies automated token efficiency techniques. The result: roughly the same quality for multiples lower token spend, and because intelligence and reasoning are properly bin-packed, speed also goes up considerably. The team observed up to 3x savings and hours per day per person saved waiting on tool turns and agent responses.

Ad

Key details from the source:

  • Classifier routes to cheapest model + reasoning depth for each request
  • Additional automatic token efficiency techniques
  • Result: 3x usage for same spend
  • Speed improvements: hours per day per person saved
  • More usage before hitting throttling limits

This is currently in use by engineers at multiple AI companies. The tool is available at nerfguard.com.

Who it's for: Teams using coding agents (Claude Code, Codex, etc.) who want to maximize output per dollar and reduce wait times.

📖 Read the full source: HN AI Agents

Ad

👀 See Also