Developer shares hybrid AI coding workflow: Claude for planning, local models for execution

Hybrid AI coding workflow reduces cloud costs
A developer on r/LocalLLaMA shared a detailed workflow that combines cloud and local AI models to reduce token costs while maintaining coding quality. The approach addresses the realization that many coding tasks don't require expensive cloud models.
The workflow architecture
The system follows a "Reason in the cloud, Execute locally" logic:
- Planner (Claude 3.5 Sonnet): Receives the task and generates a precise
task_context.mdfile containing instructions, file paths, and logic. This costs approximately 300-500 tokens. - Coder (Local Qwen2.5-Coder 30B via Ollama): Takes the specification and actual file content to write the code. This runs locally with zero cost.
- Validator: A simple Bash script runs
tsc --noEmitormypyfor type checking. - Reviewer (Local Qwen2.5-Coder 7B): Runs in parallel to check for obvious logic flaws.
- Auto-fix: If the build fails, the error log goes back to the local coder for 2-3 iterations.
Implementation details
The entire pipeline is wrapped into a set of Bash scripts using just jq and curl to communicate with the Ollama API. The system auto-detects language standards (TypeScript, Python, C++, etc.) based on the planner's output and doesn't require heavy Python/Node runtimes.
The developer notes that local models (even 30B ones) often fail at complex architectural reasoning but are surprisingly good at execution when given crystal-clear specifications.
Results and savings
On a recent TypeScript project involving 12 files changed:
- Claude usage was limited to the initial planning phase only
- Local models handled everything else: writing 12 files, linting, and reviewing
- Total savings: approximately 85% token reduction compared to doing everything inside the Claude Code CLI
The developer has made the scripts available in a repository called ai-orchestrator on GitHub (username: Mybono) for those interested in implementation details.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Developer shares 10+ MCP servers for AI agent settlement, reputation, and micropayments
A developer built BlindOracle on Claude Code with 100+ agents and created 10+ MCP servers for settlement, reputation, and micropayments. The architecture includes private commit-reveal forecasts, on-chain scoring, per-request micropayments, and verifiable agent attestation.

From Replit to Local: How One Developer Used Claude to Build StillHere, an API-Powered AI Companion Chat App
A developer built StillHere.ink, an AI chat app for companion-style conversations using personal API keys, after migrating from Replit to local development with Claude. The app features memory, diary summaries, RAG, model switching, and cost-control tools.

Claude Code Remote Control: Continue Local Sessions from Any Device
Claude Code Remote Control lets you continue local Claude Code sessions from other devices like phones or browsers while keeping everything running on your machine. It's available as a research preview on Pro and Max plans, requiring authentication and workspace trust setup.

Skill Scaffolder: Build OpenClaw Skills Without Writing Code
Skill Scaffolder is an open-source tool that lets users create OpenClaw skills by describing what they want in plain English. It handles the entire process—interviewing users, writing skill files, testing, and installation—without requiring YAML, Python, or config files.