Mistral Voxtral Realtime 4B in Pure C for Speech-to-Text

The Mistral Voxtral Realtime 4B is a speech-to-text model implemented in pure C, providing a dependency-free alternative to those relying exclusively on the C standard library. The repository, voxtral.c by antirez, facilitates the inference pipeline without requiring Python runtime, CUDA toolkit, or any other external library at inference time.

Key Features

Pure C Implementation: No external dependencies beyond the C standard library are required, making it suitable for environments where minimal dependency is critical.
Platform Specific Backends: Offers two make targets: make mps for Apple Silicon which provides faster processing, and make blas for Intel Mac or Linux systems equipped with OpenBLAS, albeit with slower performance due to conversion needs from bf16 to fp32.
Audio Processing: Utilizes a chunked encoder with overlapping windows to bound memory usage, irrespective of input length. It also allows audio input through stdin or microphone on macOS, enhancing its versatility for live or file-based transcription tasks.
Streaming C API: The API, vox_stream_t, permits incremental audio feeding and outputs token strings as they are generated.

Usage

Download the model (~8.9GB) using ./download_model.sh.
For audio transcription from a file: ./voxtral -d voxtral-model -i audio.wav.
Live transcription from a mic on macOS: ./voxtral -d voxtral-model --from-mic.
Transcoding and transcription with ffmpeg: ffmpeg -i audio.mp3 -f s16le -ar 16000 -ac 1 - 2> /dev/null | ./voxtral -d voxtral-model --stdin.

The project is open to further testing, as it currently relies on limited samples. Full production readiness might require more work, particularly in handling long transcriptions to test the KV cache's circular buffer.

📖 Read the full source: HN AI Agents

Exploring Mistral Voxtral Realtime 4B in Pure C for Speech-to-Text

Key Features

Usage

👀 See Also

Qure: Desktop App for Generating E2E Tests from Recorded Browser Flows

Rift CLI: Manage Git Worktrees for Parallel AI Agent Workflows

Single-page chatbot interface for locally running Gemma 4 26B A4B

Solo Dev Uses Claude + Blender MCP to Create App Store Video in 90 Minutes