Infracost cuts Claude token usage 79% by redesigning CLI for AI agents

Infracost, a CLI tool that estimates cloud infrastructure costs from Terraform, CloudFormation, and CDK, has redesigned its output for AI coding agents like Claude Code and Cursor. The result: up to 79% fewer output tokens and 67% lower API costs vs a bare-Claude baseline. The redesign revolves around two techniques: predicate pushdown into the CLI and a token-efficient output format.
Benchmark details
- 16 questions over a 3-project Terraform fixture with 1,171 resources
- Model: Claude Opus, 5 repeats per question
- Baseline: bare Claude with Bash and Read tools, no skill loaded
- Compared against Infracost skill with
--llmoutput flag
Key results
| Metric | Bare Claude | With Infracost skill (--llm) | Change |
|---|---|---|---|
| Correct answers | 5 / 11 (45%) | 11 / 11 (100%) | +6 |
| Total cost (USD) | $16.41 | $9.63 | -41% |
| Output tokens | 207,017 | 81,697 | -61% |
| Wall time | 50 min | 50 min | tied |
One example: the question "count distinct resources failing the tagging policy, deduplicated across projects" cost $3.51 with bare Claude and hit the 25-turn cap, returning no answer. With the redesigned CLI, the same question cost $0.25 and returned the correct answer.
Technical approach
- Predicate pushdown: Instead of having the agent pipe JSON through
jqor write Python parsers, the CLI accepts filtering flags (e.g.,--tag-policy), offloading computation to the tool itself. This reduces the number of turns and token consumption. - Token-efficient output format: The
--llmflag returns a compact, agent-friendly format rather than verbose human-readable tables or full JSON. This alone accounts for a significant share of the reduction.
Benchmark harness gotchas
Infracost open-sourced their harness setup to help others avoid pitfalls:
- Sandbox
HOMEfor baseline runs to avoid accidental skill loading - Set
TMPDIRto a project-local directory to circumvent macOS ACL issues - Prepend the test binary to
PATHrather than relying on system install - Use 5+ repeats per cell due to 20-30% token variance
- Re-run cells that hit turn caps (
--rerun-failed) and re-score if the verifier changes (--rescore)
If you maintain a CLI that AI agents call as a subprocess, the same two moves — predicate pushdown and a dedicated agent output format — likely apply. The redesign also improved the human-facing CLI, though the article focuses on the agent path.
📖 Read the full source: HN AI Agents
👀 See Also

Task-observer: A Meta-Skill That Automates Skill Improvement for AI Coding Agents
Task-observer is a meta-skill that self-improves all your AI agent's skills, including itself. It logged 600 skill improvements across 40 skills in 3 months and automates skill creation from work gaps.

yoyo: Local MCP Server for Grounded Codebase Reads and Guarded Writes with Claude Code
yoyo is an open-source local MCP server that provides coding agents like Claude Code with grounded repository reads and guarded writes across 16 languages, including Rust, Go, Python, and TypeScript. It prevents broken edits from silently landing by returning machine-readable guard_failure output and enabling retry_plan for targeted repairs.

Developer Builds Power Automate MCP Server with 108 Tools, Cross-Platform Support
A developer built a Power Automate MCP server that expanded from 12 to 108 tools, covering Dataverse CRUD via OData, SharePoint management via Graph, Power Apps versioning, environment administration, and cross-platform support for Windows, macOS, and Linux.

Claude Code HUD: Terminal Dashboard for Monitoring AI Coding Sessions
claude-code-hud is a terminal dashboard that provides real-time monitoring for Claude Code sessions, showing context window usage, API rate limits, and file changes without requiring an IDE. Run it with npx claude-code-hud.