Building a voice-controlled multi-agent system on top of Claude Code

A developer on r/ClaudeAI built a weekend project that adds voice control to Claude Code on macOS, complete with a wake word, WebRTC voice loop, and a multi-agent orchestration system. What started as a convenience hack turned into a system where a lead agent decomposes tasks, recruits sub-agents, and runs them in parallel with auto-triggered QA passes.
How it works
- Wake word: "Yabby" triggers the voice loop. The developer chose a custom wake word to avoid conflicts with Siri or other assistants.
- Voice loop: WebRTC handles real-time audio streaming. The system uses Anthropic's Realtime API for speech-to-text and text-to-speech; target latency is under 300ms, but the API sometimes causes delays.
- Lead agent: Receives the voice request, performs a discovery phase, creates a project plan, and recruits a small team (manager + 2-3 sub-agents) to execute steps.
- Parallel execution: Sub-agents run in parallel where possible, sequentially otherwise. Each agent gets its own Claude Code CLI session with a separate thread — conversations don't bleed.
- Auto-QA: When a sub-agent finishes, a review pass is triggered with a 5-second debounce to prevent pile-ups. During testing, one agent caught a bug written by another agent — an emergent behavior the developer didn't expect.
- Plan approval modal: Before any agent executes, a modal pops up for the user to vet the plan. This prevents the system from running unverified actions.
Pain points
- Speaker verification: Uses cosine-similarity on speaker embeddings. The threshold is hard to tune — too tight rejects the user when they have a cold; too loose allows anyone in the room to trigger commands.
- Locale issues: French was the default locale because the code was written that way. The developer is slowly fixing it.
- Background task lifecycle: When the parent Claude Code CLI process exits, background tasks die silently. The developer wrote an OS-level PID watcher with a bookkeeper shell script to track which long-lived servers have crashed.
- Over-planning: The lead agent sometimes produces a four-phase project plan for trivial requests like renaming a file.
Open questions
The developer is still figuring out how to reduce verbosity in the QA phase, whether to let sub-agents recruit their own sub-agents (recursive delegation), and how to keep voice latency under 300ms when the Realtime API gets cranky. They're also curious how Anthropic's official voice mode (rolled out to 5% of users) will handle multi-agent coordination.
📖 Read the full source: r/ClaudeAI
👀 See Also

LM Studio plugins add web image analysis for vision-capable LLMs
A developer created plugins for LM Studio that enable vision-capable LLMs to fetch and analyze images from the web, with automatic image processing and tool chaining. The plugins work with models like Qwen 3.5 9b/27b and include updated Duck-Duck-Go and Visit Website functionality.

Reverse-engineering UniFi inform protocol for multi-tenant routing
The UniFi inform protocol sends device data to controllers via HTTP POST on port 8080 every 10 seconds. The first 40 bytes of each packet contain unencrypted device MAC addresses, enabling routing without decryption.

LumaBrowser: Electron Browser Offloads DOM Parsing to Local LLMs for AI Agents
LumaBrowser is an Electron browser that offloads DOM parsing to local LLMs via OpenAI-compatible endpoints, helping autonomous agents avoid processing raw HTML. It uses models like Qwen 2.5 variants to identify UI elements and returns CSS selectors.

sourcecode: Open-Source CLI to Compress Large Java/Spring Monorepos for Claude
sourcecode CLI reduces a ~4k-file Java/Spring monorepo from ~3M tokens to 1.7k tokens (compact mode). Currently focuses on context compression, git hotspot detection, and symbol lookup.