RelayPlane Open Source Proxy Shows 73% Cost Reduction with Claude Model Routing

Open Source Proxy for Claude API Routing
RelayPlane is an open source, npm-native proxy that sits in front of the Anthropic API. The tool was built using Claude Code, which accelerated development. It's free to self-host and designed to handle routing between different Claude models based on prompt complexity.
Benchmark Results and Configuration
The benchmark used a mixed workload with 60% simple tasks and 40% complex tasks. Two scenarios were compared:
- Direct (all Sonnet): p50 latency 1.55s, cost per 10 requests $0.0323
- Via RelayPlane with routing: p50 latency 0.78s, cost per 10 requests $0.0086
This represents a 73.4% cost reduction. At 10,000 requests per day, this translates to approximately $712 in monthly savings.
Routing Configuration
The routing configuration is straightforward:
{
"routing": {
"complexity": {
"enabled": true,
"simple": "claude-haiku-4-5",
"moderate": "claude-sonnet-4-6",
"complex": "claude-opus-4-6"
}
}
}The routing logic uses a complexity classifier that examines token count, code indicators, and analytical keywords. Response headers include x-relayplane-routed-model to verify which model actually processed the request.
Model Pricing and Routing Logic
The routing system directs prompts to appropriate models based on complexity:
- Simple prompts → Haiku ($0.80 per million tokens)
- Moderate prompts → Sonnet ($3 per million tokens)
- Complex prompts → Opus ($15 per million tokens)
The author notes the classifier isn't perfect but is "good enough to capture most of the savings." The full benchmark methodology is available in a Gist linked in the source material.
📖 Read the full source: r/ClaudeAI
👀 See Also

OpenClaw Nerve WebUI adds voice control and team management dashboard
Nerve is a WebUI for OpenClaw that provides an all-in-one dashboard for monitoring and managing AI agents, with voice control via double-tap shift for Whisper and sub-agent team building capabilities.

Agent MCP Studio: Build Multi-Agent MCP Systems Entirely in a Browser via WASM
Agent MCP Studio lets you design, orchestrate, and export MCP agent systems from a single static HTML file using WebAssembly – no backend, no Docker, no server.

Running NemoClaw with Local vLLM: Setup Notes and Agent Engineering Observations
A developer documented running NVIDIA's NemoClaw sandboxed AI agent platform with a local Nemotron 9B v2 model via vLLM on WSL2. Key findings include inference routing details, parser compatibility issues, and observations about the agent engineering gap.

Open-source tool for AI-curated Reddit feeds using Cloudflare, Supabase, and Vercel
A developer open-sourced a self-hosted tool that filters Reddit for quality posts about AI-assisted development, using Cloudflare Workers for cron jobs and proxies, Supabase for storage, and Vercel for the frontend. The tool includes engagement scoring, optional LLM summaries, and costs $1-2/month for AI processing.