OpenClaw WhatsApp Auto-Reply May Skip Media Understanding in 2026.4.2

Issue Overview
A user encountered a problem where OpenClaw's WhatsApp integration failed to transcribe voice notes despite correct configuration. The issue occurs specifically in the WhatsApp auto-reply flow in OpenClaw version 2026.4.2.
Problem Details
The user's setup included:
- WhatsApp inbound messages with valid MediaPath and MediaType
- Audio files being stored correctly as .ogg files
tools.media.audioenabled in configuration- An external transcription backend (Groq STT) for speech-to-text
Despite everything appearing correct, the agent received <media:audio> placeholders instead of transcripts. The transcription process never triggered.
Root Cause
After tracing the flow, the user discovered that the WhatsApp auto-reply path doesn't always invoke the standard media understanding pipeline before dispatching messages to the agent. This means:
tools.media.audiois never executed- CLI or external backends (like Groq STT) never run
- The agent only sees the
<media:audio>placeholder
This issue is particularly noticeable when using non-native audio models, as those won't auto-handle audio implicitly.
Solution
The fix involves forcing a call to the media understanding step before the reply is dispatched to the agent. The user patched the WhatsApp inbound auto-reply flow to:
- Build the WhatsApp inbound context
- Explicitly run the same media understanding logic used in the standard reply pipeline
- Continue with normal agent dispatch
After implementing this fix:
- Audio gets picked up correctly
- The CLI (Groq STT in this case) executes
- The transcript is injected into the message
- The agent receives text instead of
<media:audio>
Who This Affects
This issue impacts users who rely on CLI-based transcription, external APIs, or any non-native audio model. These setups depend entirely on media understanding being triggered, and if that step is skipped, nothing downstream will work even with correct configuration.
Key Takeaway
If you're experiencing issues where audio is received and stored correctly, tools.media.audio is enabled, but transcription never happens, check whether your WhatsApp auto-reply path is actually calling the media understanding pipeline before agent dispatch.
📖 Read the full source: r/openclaw
👀 See Also

Practical Claude Code Workflow Tips for Complex Development Projects
A Claude Pro user shares specific workflow strategies for developing complex audio plugins, including using planning mode for major features, creating context files, managing token usage, and implementing validation steps.

Agent Framework Token Bloat: A 500:1 Input-to-Output Ratio Is Normal
A self-hosted agent framework user reports ~21k input tokens per message and 500:1 input-to-output ratio from tool definitions, system prompt, and memory. Community confirms 15-25k baseline context is common for tool-using agents.

Reddit user shares common Claude Code prompting mistakes with fixes
A developer using Claude for Node.js backend work identified 10 common prompting mistakes after months of use, including missing validation requirements and treating Claude as one-shot tool. They created a visual guide with fixes for each issue.
