RunAnywhere RCLI: On-Device Voice AI Pipeline for Apple Silicon

✍️ OpenClawRadar📅 Published: March 10, 2026🔗 Source
RunAnywhere RCLI: On-Device Voice AI Pipeline for Apple Silicon
Ad

What RCLI Does

RCLI is a complete voice AI pipeline that runs speech-to-text, large language model inference, and text-to-speech entirely on-device on Apple Silicon Macs. It requires macOS 13+ on M1 or later chips and operates without cloud services or API keys.

Installation and Setup

Install via Homebrew:

brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup   # downloads ~1 GB of models

Or using curl:

curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash

Performance Claims

The developers benchmarked on an M4 Max with 64GB RAM and report:

  • LLM decode: 1.67x faster than llama.cpp, 1.19x faster than Apple MLX
  • Qwen3-0.6B: 658 tokens/sec (vs mlx-lm 552, llama.cpp 295)
  • Qwen3-4B: 186 tokens/sec (vs mlx-lm 170, llama.cpp 87)
  • Time-to-first-token: 6.6 ms
  • STT: 70 seconds of audio transcribed in 101 ms (714x real-time, 4.6x faster than mlx-whisper)
  • TTS: 178 ms synthesis (2.8x faster than mlx-audio and sherpa-onnx)

Key Features

  • Three concurrent threads with lock-free ring buffers
  • Double-buffered TTS (next sentence renders while current plays)
  • 38 macOS actions controllable by voice
  • Local RAG with ~4 ms retrieval over 5K+ document chunks
  • 20 hot-swappable models
  • Full-screen TUI with per-operation latency readouts
  • Falls back to llama.cpp when MetalRT isn't installed

Voice Pipeline Components

  • VAD: Silero voice activity detection
  • STT: Zipformer streaming + Whisper/Parakeet offline
  • LLM: Qwen3/LFM2/Qwen3.5 with KV cache continuation and Flash Attention
  • TTS: Double-buffered sentence-level synthesis
  • Tool Calling: LLM-native tool call formats
  • Multi-turn Memory: Sliding window conversation history with token-budget trimming
Ad

Usage Commands

rcli              # interactive TUI with push-to-talk
rcli listen       # continuous voice mode
rcli ask "open Safari"  # one-shot command
rcli rag ingest ~/Documents/notes  # index documents for RAG
rcli ask --rag ~/Library/RCLI/index "summarize the project plan"

TUI Controls

  • SPACE: Push-to-talk
  • M: Models browser for downloading and hot-swapping LLM/STT/TTS
  • A: Actions browser to enable/disable macOS actions
  • B: Run STT, LLM, TTS, and end-to-end benchmarks
  • R: RAG document ingestion
  • X: Clear conversation and reset context
  • T: Toggle tool call trace
  • ESC: Stop/close/quit

MetalRT Engine Details

MetalRT is RunAnywhere's proprietary GPU inference engine that uses Metal 3.1 features available on M3, M3 Pro, M3 Max, M4, and later chips. M1/M2 support is planned. The engine uses custom Metal compute shaders for quantized matmul, attention, and activation operations, compiled ahead of time and dispatched directly to the GPU with zero allocations during inference.

macOS Actions

RCLI includes 43 macOS actions across categories:

  • Productivity: create_note, create_reminder, run_shortcut
  • Communication: send_message, facetime_call
  • Media: play_on_spotify, play_apple_music, play_pause, next_track, set_music_volume
  • System: open_app, quit_app, set_volume, toggle_dark_mode, screenshot, lock_screen
  • Web: search_web, search_youtube, open_url, open_maps

📖 Read the full source: HN AI Agents

Ad

👀 See Also