Using the Dispatcher Pattern to Reduce Claude API Costs by 95%

A developer building AI agents discovered a cost optimization pattern after burning $40 in one hour on Claude API tokens for routine tasks like debugging code, writing PRs, drafting emails, and research. The solution leverages their existing $200/month Claude Max subscription, which includes unlimited Claude Code CLI usage within rate limits.
The Dispatcher Pattern
The approach involves creating a lightweight AI agent that acts as a dispatcher. This agent reads user messages, decides what action to take, and delegates heavy work to Claude Code CLI, which runs on the Max subscription at no additional cost. Only the thin orchestration layer remains on the API: "What did the user ask? Ok, delegate to Claude Code. Report back the result."
Tasks that can be delegated include:
- Coding
- Marketing copy
- Email drafts
- Sales outreach
- Research
- Content writing
- Data analysis
- Reddit posts
Cost Comparison
- Pure API (Opus, heavy usage): $800-$2,000+/month
- Max subscription + dispatcher pattern: $200/month flat
- API cost for dispatcher overhead only: ~$5-15/month
- Total with dispatcher pattern: ~$215/month vs $1,000+/month
Setup Instructions
# 1. Install Claude Code CLI
npm install -g /claude-code
2. Login to claude code with Max subscription
3. Configure delegation
openclaw config set plugins.entries.acpx.enabled true
openclaw config set plugins.entries.acpx.config.permissionMode approve-all
openclaw config set acp.enabled true
openclaw config set acp.defaultAgent claude
openclaw config set 'acp.allowedAgents' '["claude"]' --json
4. (Optional) Add observability
pip install clawmetry && clawmetry onboard
The developer used ClawMetry, an open-source observability dashboard for OpenClaw agents, to track token usage per session, cost per task, and set alerts for API spend thresholds. The tool showed a dramatic cost reduction after implementing the dispatcher pattern, with most previous spending going to tasks that Claude Code handles on the subscription.
📖 Read the full source: r/openclaw
👀 See Also

‘White Monkey’ Failure Mode: How Persistent Agents Get Stuck on Wrong Facts
A cross-architecture study of 'reconstruction substrate contamination' — where wrong facts in wake-state files replicate across sessions. Includes a 6-question survey for persistent agents.

OpenClaw LLM Timeout Fix for Cold Model Loading
A Reddit user identified and fixed a specific timeout issue in OpenClaw where cold-loaded local LLMs would fail after about 60 seconds, even with higher general timeouts set. The solution involves adjusting the embedded-runner LLM idle timeout configuration.

OpenClaw v2026.3.13 adds per-agent cacheRetention config for OpenAI token cost savings
OpenClaw v2026.3.13 adds per-agent cacheRetention configuration that enables OpenAI's 24-hour prompt cache retention, potentially cutting input token costs by up to 90% for agents with heartbeat cycles longer than 10 minutes.

After 3 months of A/B testing 160 Claude prompt codes: the boring takeaways
Samarth built a controlled test rig, ran 160 prompt codes through it, and found that most are placebo, 7 consistently shift reasoning, and stacking 3+ codes confuses the model. Skills files outperform prompt codes for Claude Code.