Routing Claude API traffic to control costs after Max subscription change

API billing migration and cost implications
As of noon PT, Anthropic's Max subscription no longer covers usage from third-party tools like OpenClaw. All OpenClaw users are now on API billing with these rates:
- Claude Opus 4.6: $5 per million input tokens, $25 per million output tokens
- Claude Sonnet 4.6: $3 per million input tokens, $15 per million output tokens
- Claude Haiku 4.5: $1 per million input tokens, $5 per million output tokens
A heavy OpenClaw session on Opus can cost $1-4, while the same session on Sonnet costs $0.20-0.80 with similar results for most tasks.
The routing solution
Most OpenClaw operations don't require Opus: heartbeat checks, file reads, summaries, routing decisions, and short tool calls can all be handled by Sonnet. Without a routing layer, every request hits your default model, potentially wasting Opus budget on simple tasks.
A local proxy routes Claude requests by complexity: simple tasks go to Sonnet automatically, complex ones escalate to Opus. This approach has significantly reduced costs without quality loss on important tasks.
The proxy is open source and installable via npm: npm install -g @relayplane/proxy
Detailed documentation and discussion is available on r/ClaudeCode, where the solution has received 52K views.
📖 Read the full source: r/openclaw
👀 See Also

Claude Code Skills for Automated Project Scaffolding
A developer has built Claude Code skills that automate full-stack project setup with commands for React, Next.js, Node.js APIs, and Turborepo monorepos. The skills pull latest dependencies, support 50+ integrations, and are MIT licensed.

Four Free Claude Code Skills for Prompt Clarity, Tutorials, and Bug Hunting
Four Apache 2.0, no-paid-tier Claude Code skills: prompter (prompt rewriting), tutorial-creator (annotated code walkthroughs), bug-echo (post-fix anti-pattern sweep), and bug-prospector (pre-release audit with 7 analysis lenses).

PinchBench Results: First OpenClaw-Specific AI Coding Agent Benchmark
The first OpenClaw-specific benchmark, PinchBench, ranks 32 AI models by success rate, cost, and speed, with Google's Gemini-3-Flash-Preview leading at 95.1% success for $0.72.

Analysis of Ollama's Reusable Go Components for Local LLM Development
A developer examined Ollama's source code and found several standalone Go components including a pure Go token sampler, GGUF reader/writer, model conversion tools, chat template rendering, and OpenAI compatibility transforms that aren't available as separate libraries.