Fix OpenClaw 2026.4.2 WhatsApp Auto-Reply Media Skip

Issue Overview

A user encountered a problem where OpenClaw's WhatsApp integration failed to transcribe voice notes despite correct configuration. The issue occurs specifically in the WhatsApp auto-reply flow in OpenClaw version 2026.4.2.

Problem Details

The user's setup included:

WhatsApp inbound messages with valid MediaPath and MediaType
Audio files being stored correctly as .ogg files
tools.media.audio enabled in configuration
An external transcription backend (Groq STT) for speech-to-text

Despite everything appearing correct, the agent received <media:audio> placeholders instead of transcripts. The transcription process never triggered.

Root Cause

After tracing the flow, the user discovered that the WhatsApp auto-reply path doesn't always invoke the standard media understanding pipeline before dispatching messages to the agent. This means:

tools.media.audio is never executed
CLI or external backends (like Groq STT) never run
The agent only sees the <media:audio> placeholder

This issue is particularly noticeable when using non-native audio models, as those won't auto-handle audio implicitly.

Solution

The fix involves forcing a call to the media understanding step before the reply is dispatched to the agent. The user patched the WhatsApp inbound auto-reply flow to:

Build the WhatsApp inbound context
Explicitly run the same media understanding logic used in the standard reply pipeline
Continue with normal agent dispatch

After implementing this fix:

Audio gets picked up correctly
The CLI (Groq STT in this case) executes
The transcript is injected into the message
The agent receives text instead of <media:audio>

Who This Affects

This issue impacts users who rely on CLI-based transcription, external APIs, or any non-native audio model. These setups depend entirely on media understanding being triggered, and if that step is skipped, nothing downstream will work even with correct configuration.

Key Takeaway

If you're experiencing issues where audio is received and stored correctly, tools.media.audio is enabled, but transcription never happens, check whether your WhatsApp auto-reply path is actually calling the media understanding pipeline before agent dispatch.

📖 Read the full source: r/openclaw