Orkestra: Cost-Aware LLM Routing Layer for OpenClaw Reduces API Costs by 60-80%

What Orkestra Does
Orkestra is a cost-aware LLM routing layer built for OpenClaw that reduces API costs by 60-80%. It's a modular architecture that sits in front of model calls and decides which tier should handle each request based on semantic similarity.
How It Works
When a prompt comes in, it gets embedded and passed through a lightweight KNN classifier trained on previously labeled workloads. Based on semantic similarity, the router categorizes it as budget, balanced, or premium and forwards the call accordingly.
There's no prompt rewriting and no complex rule tree — just semantic classification at call time. The reduction in API costs comes primarily from preventing simpler prompts from defaulting to the most expensive models.
Integration with OpenClaw
Orkestra plugs in as an OpenClaw skill via a local proxy, so existing pipelines stay completely intact. The agent calls it through bash/curl to an OpenAI-compatible endpoint on 127.0.0.1:8765.
The response includes full cost transparency with the fields _orkestra.cost and _orkestra.savings_percent.
Supported Providers and Configuration
- Supported providers: Google (Gemini), Anthropic (Claude), OpenAI
- Routes across budget/balanced/premium tiers within each provider
- Supports multi-provider mode across all three providers
- Repository and OpenClaw integration available at: github.com/imperativelabs/orkestra
- See
integrations/openclaw/for the skill files, proxy, and config examples
📖 Read the full source: r/openclaw
👀 See Also

Chrome Extension Adds Live Preview to Claude Code Web
A Chrome extension called Claude Code Preview adds live preview functionality to Claude Code Web, similar to Lovable and other 'vibecoding' sites, allowing side-by-side viewing of deployments.

llm-use – An Open-Source Framework for Routing and Orchestrating Multi-LLM Agent Workflows
llm-use is revolutionizing automation with its open-source framework designed to efficiently route and orchestrate multi-LLM agent workflows. Explore its impact on AI operations.

Open Source Grafana Dashboard Tracks Claude Code Costs and Usage via OpenTelemetry
An SRE built a free Grafana dashboard to visualize Claude Code spend, token usage, cache hit ratios, and edit decisions by pulling OpenTelemetry metrics into Prometheus-compatible backends.

Open-source CLI uses Claude Haiku to automate Xero expense auditing
A developer has released an open-source Python CLI tool that uses Claude Haiku 4.5 to automate Xero expense auditing. The tool follows a 'deterministic code first, then AI to fill in the gaps' approach, keeping costs to a few cents per audit run.