RouteLLM Setup for Cost-Effective AI Task Routing

✍️ OpenClawRadar📅 Published: March 9, 2026🔗 Source

Docker Compose Configuration for Hybrid AI Setup

A Reddit user posted a detailed Docker Compose setup that implements what they call "Poor Man's Superintelligence" - a hybrid AI system that routes tasks between local and cloud models based on complexity.

Core Components

The system uses four main services:

vscode-openwire: Uses image sendmeticket/vscode-openwire:1.0.0 with ports 3000 and 3030 exposed. This provides access to GitHub Copilot through OpenWire, though the source notes this may violate TOS and suggests using an available API key instead.
ollama: Runs ollama/ollama:latest with port 11434 exposed. It automatically pulls and serves the qwen3.5:4b model as the local "weak" model.
openroutellm: Uses image sendmeticket/openroutellm:1.0.0 on port 6060. This is the routing service that decides which model handles each request.
openclaw: Runs ghcr.io/openclaw/openclaw:latest with ports 18789 and 18790 exposed, serving as the main interface.

RouteLLM Configuration

The openroutellm service is configured with specific parameters:

python -m routellm.openai_server --routers bert --default-router-threshold 0.75 --port 6060 --openwire-base-url http://vscode-openwire:3030/v1 --ollama-base-url http://ollama:11434/v1 --strong-model gpt-4o --weak-model qwen3.5:4b

This setup uses BERT-based routing with a 0.75 threshold to determine when to send tasks to the "strong" model (GPT-4o) versus the local "weak" model (Qwen3.5:4b).

How It Works

The system routes difficult tasks to the paid GPT-4o model through OpenWire/Copilot, while simpler tasks are handled by the local Qwen3.5:4b model running in Ollama. This creates what the author describes as a "fail-safe, local-first AI model with low base intelligence but really high max intelligence."

All services are connected through a custom Docker network (openclaw_net with subnet 172.10.10.0/24) and include health checks to ensure service availability.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Collection of 177 OpenClaw SOUL.md Templates Organized into 24 Categories

A developer has compiled 177 ready-to-use SOUL.md templates for OpenClaw agents across 24 categories including Marketing, Development, Business, DevOps, Finance, Creative, Data, Security, Healthcare, Legal, HR, and Education. All templates are MIT licensed and available on GitHub.

Mar 25, 2026, 05:45 PM UTC

OpenClawRadar

Tools

Vibeyard Adds Kanban Board for Managing Multiple Claude Code Sessions

An open-source IDE called Vibeyard now includes a Kanban board that lets you spin up Claude Code agent sessions directly from cards. Cards auto-move to Done when the agent finishes.

Apr 29, 2026, 12:18 AM UTC

OpenClawRadar

Tools

Academic Research Skills for Claude Code: A Human-in-the-Loop Pipeline for Paper Writing

Academic Research Skills (ARS) v3.7.0+ is a Claude Code plugin that automates reference hunting, citation formatting, data checking, and logical consistency review while keeping the human researcher in control. Install via /plugin marketplace add Imbad0202/academic-research-skills.

May 10, 2026, 04:17 PM UTC

OpenClawRadar

Tools

Jan Adds One-Click OpenClaw Installation with Jan-v3-Base Model Integration

Jan now supports one-click installation of OpenClaw with direct integration to the Jan-v3-base model, keeping all operations local and private on your computer.

Apr 18, 2026, 05:45 AM UTC

OpenClawRadar