Echo-TTS Ported to Apple Silicon with MLX for Native TTS with Voice Cloning

Echo-TTS, a 2.4B parameter diffusion transformer (DiT) model for text-to-speech with voice cloning, has been ported from CUDA to run natively on Apple M-series silicon using MLX. The port allows the model to generate speech in a target voice when given text and a short audio clip of someone talking.
Performance and Benchmarks
On a base 16GB M4 Mac mini, the model generates a short 5-second voice clone in about 10 seconds. Clones up to 30 seconds take approximately 60 seconds to generate.
Key Features
- 8-bit quantization: Reduces memory usage from approximately 6 GB to about 4 GB, runs faster with negligible quality loss.
- Blockwise generation: Enables streaming and audio continuations.
Development Details
This was an AI-assisted port. Claude Opus 4.6 handled specification and validation, GPT-5.3-Codex performed the implementation, and the developer steered the project through OpenClaw.
The repository is available at github.com/mznoj/echo-tts-mlx.
📖 Read the full source: r/LocalLLaMA
👀 See Also

LocalSynapse MCP Server Adds macOS Support and Search Improvements
LocalSynapse, an offline MCP server for searching local documents, now supports macOS and includes fixes for multi-word search queries. The developer has implemented feedback-driven improvements including position-adjusted click boosting and time decay as promotion.

LLMock: HTTP-based mocking server for deterministic LLM testing across processes
LLMock is a real HTTP server that mocks OpenAI, Claude, and Gemini APIs, allowing developers to run deterministic tests across multiple processes without hitting real APIs. It supports SSE streaming, tool calls, predicate routing, and request journaling with zero dependencies.

Claude adds memory import feature to migrate from other AI providers
Claude now allows users to import context and preferences from other AI providers through a copy-paste process. The memory feature is available on all paid plans and helps maintain conversation history when switching platforms.

Qwen 3.6 27B F16 Passes Pacman Coding Test, But 8-Bit Quants Fail — Key Lessons on Templates and MTP Speculative Decoding
A user one-shots a Pacman clone with Qwen 3.6 27B F16 — two of three attempts produce nearly perfect games. 8-bit quants fail entirely. Detailed notes on chat template tuning and MTP speculative decoding speed gains.