RunAnywhere RCLI: On-Device Voice AI Pipeline for Apple Silicon

What RCLI Does
RCLI is a complete voice AI pipeline that runs speech-to-text, large language model inference, and text-to-speech entirely on-device on Apple Silicon Macs. It requires macOS 13+ on M1 or later chips and operates without cloud services or API keys.
Installation and Setup
Install via Homebrew:
brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup # downloads ~1 GB of models
Or using curl:
curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
Performance Claims
The developers benchmarked on an M4 Max with 64GB RAM and report:
- LLM decode: 1.67x faster than llama.cpp, 1.19x faster than Apple MLX
- Qwen3-0.6B: 658 tokens/sec (vs mlx-lm 552, llama.cpp 295)
- Qwen3-4B: 186 tokens/sec (vs mlx-lm 170, llama.cpp 87)
- Time-to-first-token: 6.6 ms
- STT: 70 seconds of audio transcribed in 101 ms (714x real-time, 4.6x faster than mlx-whisper)
- TTS: 178 ms synthesis (2.8x faster than mlx-audio and sherpa-onnx)
Key Features
- Three concurrent threads with lock-free ring buffers
- Double-buffered TTS (next sentence renders while current plays)
- 38 macOS actions controllable by voice
- Local RAG with ~4 ms retrieval over 5K+ document chunks
- 20 hot-swappable models
- Full-screen TUI with per-operation latency readouts
- Falls back to llama.cpp when MetalRT isn't installed
Voice Pipeline Components
- VAD: Silero voice activity detection
- STT: Zipformer streaming + Whisper/Parakeet offline
- LLM: Qwen3/LFM2/Qwen3.5 with KV cache continuation and Flash Attention
- TTS: Double-buffered sentence-level synthesis
- Tool Calling: LLM-native tool call formats
- Multi-turn Memory: Sliding window conversation history with token-budget trimming
Usage Commands
rcli # interactive TUI with push-to-talk
rcli listen # continuous voice mode
rcli ask "open Safari" # one-shot command
rcli rag ingest ~/Documents/notes # index documents for RAG
rcli ask --rag ~/Library/RCLI/index "summarize the project plan"
TUI Controls
- SPACE: Push-to-talk
- M: Models browser for downloading and hot-swapping LLM/STT/TTS
- A: Actions browser to enable/disable macOS actions
- B: Run STT, LLM, TTS, and end-to-end benchmarks
- R: RAG document ingestion
- X: Clear conversation and reset context
- T: Toggle tool call trace
- ESC: Stop/close/quit
MetalRT Engine Details
MetalRT is RunAnywhere's proprietary GPU inference engine that uses Metal 3.1 features available on M3, M3 Pro, M3 Max, M4, and later chips. M1/M2 support is planned. The engine uses custom Metal compute shaders for quantized matmul, attention, and activation operations, compiled ahead of time and dispatched directly to the GPU with zero allocations during inference.
macOS Actions
RCLI includes 43 macOS actions across categories:
- Productivity: create_note, create_reminder, run_shortcut
- Communication: send_message, facetime_call
- Media: play_on_spotify, play_apple_music, play_pause, next_track, set_music_volume
- System: open_app, quit_app, set_volume, toggle_dark_mode, screenshot, lock_screen
- Web: search_web, search_youtube, open_url, open_maps
📖 Read the full source: HN AI Agents
👀 See Also

Sovr MCP Proxy adds safety layer to prevent LLM destructive commands
A developer built sovr-mcp-proxy after a local LLM nearly executed rm -rf on their home folder. The tool intercepts commands before execution and blocks destructive patterns including rm -rf, DROP TABLE, curl | sh, and chmod 777.

lazyclaude: A TUI for Managing Claude Code Configuration
lazyclaude is a terminal user interface tool inspired by lazygit that provides a single view for managing all Claude Code configuration stored on disk, including memory files, skills, agents, MCP servers, settings, permissions, hooks, keybindings, sessions, stats, plugins, and todos.

SlackClaw: Managed OpenClaw Instance for Slack Integration
SlackClaw is a commercial product built on OpenClaw that provides a managed instance specifically for Slack. It offers one-click installation, OAuth tool connections, dedicated servers per workspace, and persistent memory.

Canopy: Terminal Dashboard for Managing Multiple Claude Code Agents
Canopy is an open source terminal UI that provides a single dashboard view for tracking multiple AI coding agents running across git worktrees. It shows agent states (running, idle, waiting for input, done, errored) and lets you jump into sessions or send input without fully switching.