Microsoft VibeVoice: 60-Min ASR and 90-Min TTS Models Open-Sourced

Microsoft open-sourced VibeVoice, a family of frontier voice AI models covering both ASR and TTS. The ASR model (VibeVoice-ASR-7B) handles up to 60 minutes of long-form audio in a single pass (64K token window), outputting structured transcriptions with speaker ID, timestamps, and text — supporting over 50 languages. It also supports user-customized hotwords for domain-specific terms. The TTS model (VibeVoice-TTS-1.5B) can synthesize up to 90 minutes of multi-speaker speech (up to 4 speakers). A real-time variant (VibeVoice-Realtime-0.5B) supports streaming text input and long-form generation with multilingual voices (9 languages) and 11 English style voices.
Key Technical Details
- Core innovation: Continuous speech tokenizers (Acoustic and Semantic) at an ultra-low frame rate of 7.5 Hz, preserving audio fidelity while boosting computational efficiency for long sequences.
- Architecture: Next-token diffusion framework — an LLM handles textual context and dialogue flow, a diffusion head generates high-fidelity acoustic details.
- ASR capabilities: Single-pass 60-minute audio, joint ASR + diarization + timestamping (Who, When, What), customizable hotwords.
- TTS capabilities: 90-minute long-form synthesis with up to 4 distinct speakers; real-time streaming via VibeVoice-Realtime-0.5B.
- Inference speedup: vLLM inference supported (see
vllm-asr). - Finetuning: ASR finetuning code is available.
- Hugging Face integration: VibeVoice-ASR is now part of the Transformers release (2026-03-06).
Quick links:
- ASR model: HF Link | Playground
- TTS model: HF Link (code disabled)
- Realtime TTS: HF Link | Colab
Note: The VibeVoice-TTS code was removed from the repo (2025-09-05) due to misuse concerns, but ASR and realtime TTS code remain active.
📖 Read the full source: HN AI Agents
👀 See Also

User-built PTC for Claude Code shows 40-65% token savings on analysis tasks, not code writing
A developer built a local PTC implementation called Thalamus for Claude Code and analyzed 79 real sessions, finding 40-65% token savings on analysis tasks but near-zero savings on code-writing tasks. The agent used execute() primarily for general Python computation rather than batching tool calls.

Leanstral: Open-Source Code Agent for Lean 4 and Formal Proof Engineering
Mistral AI released Leanstral, the first open-source code agent designed for Lean 4, with 6B active parameters and Apache 2.0 licensing. Benchmarks show it outperforms larger open-source models and offers competitive performance to Claude at significantly lower cost.

MCP Server Directory Lists 1000+ Servers Across 20 Categories
A curated directory provides install commands and config snippets for over 1000 MCP servers across categories including databases, developer tools, browser automation, AI/ML, and cloud/devops. The directory is free to browse and submit to.

DIY OpenClaw Alternative Using Claude Code in Headless Mode
A developer built a Python server that sends prompts to Claude Code in headless mode, with Telegram bot access, Hammerspoon automation, and local markdown file storage for tasks, schedules, and notes.