RouteLLM Setup for Cost-Effective AI Task Routing

✍️ OpenClawRadar📅 Published: March 9, 2026🔗 Source
RouteLLM Setup for Cost-Effective AI Task Routing
Ad

Docker Compose Configuration for Hybrid AI Setup

A Reddit user posted a detailed Docker Compose setup that implements what they call "Poor Man's Superintelligence" - a hybrid AI system that routes tasks between local and cloud models based on complexity.

Core Components

The system uses four main services:

  • vscode-openwire: Uses image sendmeticket/vscode-openwire:1.0.0 with ports 3000 and 3030 exposed. This provides access to GitHub Copilot through OpenWire, though the source notes this may violate TOS and suggests using an available API key instead.
  • ollama: Runs ollama/ollama:latest with port 11434 exposed. It automatically pulls and serves the qwen3.5:4b model as the local "weak" model.
  • openroutellm: Uses image sendmeticket/openroutellm:1.0.0 on port 6060. This is the routing service that decides which model handles each request.
  • openclaw: Runs ghcr.io/openclaw/openclaw:latest with ports 18789 and 18790 exposed, serving as the main interface.
Ad

RouteLLM Configuration

The openroutellm service is configured with specific parameters:

python -m routellm.openai_server --routers bert --default-router-threshold 0.75 --port 6060 --openwire-base-url http://vscode-openwire:3030/v1 --ollama-base-url http://ollama:11434/v1 --strong-model gpt-4o --weak-model qwen3.5:4b

This setup uses BERT-based routing with a 0.75 threshold to determine when to send tasks to the "strong" model (GPT-4o) versus the local "weak" model (Qwen3.5:4b).

How It Works

The system routes difficult tasks to the paid GPT-4o model through OpenWire/Copilot, while simpler tasks are handled by the local Qwen3.5:4b model running in Ollama. This creates what the author describes as a "fail-safe, local-first AI model with low base intelligence but really high max intelligence."

All services are connected through a custom Docker network (openclaw_net with subnet 172.10.10.0/24) and include health checks to ensure service availability.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also