Building a Voice Interface for OpenClaw Agents Using iPhone Shortcuts

A developer on r/openclaw shared their setup for creating a voice interface similar to Siri for OpenClaw agents. The system combines a local Python server with iPhone Shortcuts to enable voice interaction with OpenClaw agents.
System Architecture
The setup requires enabling OpenAI HTTP mode on the OpenClaw gateway and LAN. The core components are:
- Python Server: Originally a script that listened for keywords via microphone, performed speech-to-text, sent text to OpenClaw API, received responses, and performed text-to-speech using the user's voice. This was adapted into a basic server with an endpoint that can receive text from anywhere, send it to OpenClaw, and return the response.
- iPhone Shortcut: Handles speech-to-text and text-to-speech locally on the iPhone. The shortcut workflow includes:
- Dictate text (records voice to text)
- Get contents of URL: url/ask with dictated text in body (sends text to be routed to OpenClaw agent for response)
- Dictionary: Get value for reply in contents of URL (store response text)
- Speak: dictionary value (text-to-speech output)
Implementation Details
The developer runs this through WireGuard and operates entirely on LAN or through VPN when outside the local network. They emphasize a critical security consideration: "Be careful opening an endpoint for your OpenClaw agent to respond through. It can allow anyone to access your agent (computer). Use auth token."
The approach offloads speech processing to the iPhone while keeping the OpenClaw agent interaction centralized through the Python server endpoint. This allows for voice interaction with OpenClaw agents from anywhere while maintaining security through VPN and authentication tokens.
📖 Read the full source: r/openclaw
👀 See Also

Analysis of Anthropomorphism in Claude Pokemon Chat Using Bayesian Models
A researcher analyzed Twitch chat messages from Claude's Pokemon benchmark to study how users anthropomorphize the AI, using Bayesian mixed-effects models on 107k messages annotated by Gemini 2.0 Flash. False belief tags were strong predictors of anthropomorphism, increasing probability from ~11% to ~45%.

Evaluating Multilingual Guardrails with any-guardrail in Humanitarian AI
Mozilla's any-guardrail tool evaluates multilingual guardrails in humanitarian LLMs, focusing on task and domain specificity.

From Zero Code to 25M Game Plays: A Non-Engineer's Journey Building with Claude + Cursor
A developer with no coding experience built three browser games (25M total plays, 200K daily) using Claude via Cursor. Two games are single 8,000-line HTML files. Total tool cost: ~$2K/month.

Building a Technical Book with Claude Code: Process and Pitfalls
A developer created an EPUB book about intermediate Claude Code features by using Claude to collect Anthropic documentation, researching real-world examples in finance, and structuring chapters with technical features followed by practical applications. The process revealed specific workflow constraints when using agents.