Voice-Controlled Claude Code: Multi-Agent System on Mac

A developer on r/ClaudeAI built a weekend project that adds voice control to Claude Code on macOS, complete with a wake word, WebRTC voice loop, and a multi-agent orchestration system. What started as a convenience hack turned into a system where a lead agent decomposes tasks, recruits sub-agents, and runs them in parallel with auto-triggered QA passes.

How it works

Wake word: "Yabby" triggers the voice loop. The developer chose a custom wake word to avoid conflicts with Siri or other assistants.
Voice loop: WebRTC handles real-time audio streaming. The system uses Anthropic's Realtime API for speech-to-text and text-to-speech; target latency is under 300ms, but the API sometimes causes delays.
Lead agent: Receives the voice request, performs a discovery phase, creates a project plan, and recruits a small team (manager + 2-3 sub-agents) to execute steps.
Parallel execution: Sub-agents run in parallel where possible, sequentially otherwise. Each agent gets its own Claude Code CLI session with a separate thread — conversations don't bleed.
Auto-QA: When a sub-agent finishes, a review pass is triggered with a 5-second debounce to prevent pile-ups. During testing, one agent caught a bug written by another agent — an emergent behavior the developer didn't expect.
Plan approval modal: Before any agent executes, a modal pops up for the user to vet the plan. This prevents the system from running unverified actions.

Pain points

Speaker verification: Uses cosine-similarity on speaker embeddings. The threshold is hard to tune — too tight rejects the user when they have a cold; too loose allows anyone in the room to trigger commands.
Locale issues: French was the default locale because the code was written that way. The developer is slowly fixing it.
Background task lifecycle: When the parent Claude Code CLI process exits, background tasks die silently. The developer wrote an OS-level PID watcher with a bookkeeper shell script to track which long-lived servers have crashed.
Over-planning: The lead agent sometimes produces a four-phase project plan for trivial requests like renaming a file.

Open questions

The developer is still figuring out how to reduce verbosity in the QA phase, whether to let sub-agents recruit their own sub-agents (recursive delegation), and how to keep voice latency under 300ms when the Realtime API gets cranky. They're also curious how Anthropic's official voice mode (rolled out to 5% of users) will handle multi-agent coordination.

📖 Read the full source: r/ClaudeAI

Building a voice-controlled multi-agent system on top of Claude Code

How it works

Pain points

Open questions

👀 See Also

LM Studio plugins add web image analysis for vision-capable LLMs

Reverse-engineering UniFi inform protocol for multi-tenant routing

LumaBrowser: Electron Browser Offloads DOM Parsing to Local LLMs for AI Agents

sourcecode: Open-Source CLI to Compress Large Java/Spring Monorepos for Claude