Echo-TTS Ported to Apple Silicon with MLX for Native TTS with Voice Cloning

✍️ OpenClawRadar📅 Published: March 7, 2026🔗 Source
Echo-TTS Ported to Apple Silicon with MLX for Native TTS with Voice Cloning
Ad

Echo-TTS, a 2.4B parameter diffusion transformer (DiT) model for text-to-speech with voice cloning, has been ported from CUDA to run natively on Apple M-series silicon using MLX. The port allows the model to generate speech in a target voice when given text and a short audio clip of someone talking.

Performance and Benchmarks

On a base 16GB M4 Mac mini, the model generates a short 5-second voice clone in about 10 seconds. Clones up to 30 seconds take approximately 60 seconds to generate.

Key Features

  • 8-bit quantization: Reduces memory usage from approximately 6 GB to about 4 GB, runs faster with negligible quality loss.
  • Blockwise generation: Enables streaming and audio continuations.

Development Details

This was an AI-assisted port. Claude Opus 4.6 handled specification and validation, GPT-5.3-Codex performed the implementation, and the developer steered the project through OpenClaw.

The repository is available at github.com/mznoj/echo-tts-mlx.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also