Gemini 3.1 Flash Live: Google's latest audio model with improved benchmarks and watermarking

What's new in Gemini 3.1 Flash Live
Google has released Gemini 3.1 Flash Live, their highest-quality audio and voice model designed for real-time dialogue. The model delivers improved speed and natural rhythm for voice-first AI applications.
Key technical details
- Benchmark scores: 90.8% on ComplexFuncBench Audio (multi-step function calling with constraints) and 36.1% on Scale AI's Audio MultiChallenge (complex instruction following with "thinking" on)
- Improved capabilities: Better tonal understanding, recognition of acoustic nuances like pitch and pace, and dynamic adjustment to user frustration or confusion
- Watermarking: All audio generated includes SynthID watermark for AI content detection
- Multilingual support: Available in over 200 countries and territories
Availability and access
- For developers: Available in preview via Gemini Live API in Google AI Studio
- For enterprises: Included in Gemini Enterprise for Customer Experience
- For general users: Accessible via Search Live and Gemini Live
The model enables building voice-ready agents that handle complex tasks in noisy environments and supports longer conversation threads during extended interactions.
📖 Read the full source: HN AI Agents
👀 See Also

IDP Leaderboard benchmark shows Claude Sonnet 4.6 matches Opus 4.6 for document AI tasks
The IDP Leaderboard tested 16 AI models on 9,000+ documents across OCR, table extraction, key extraction, visual QA, handwriting, and long documents. Claude Sonnet 4.6 scored 80.8 overall, essentially matching Opus 4.6 at 80.3, while Haiku 4.5 scored 69.6.

Anthropic adds memory import feature for switching from ChatGPT/Gemini to Claude
Anthropic's new memory import feature lets users transfer preferences, projects, context, and working style from ChatGPT, Gemini, or other AIs to Claude in about two copy-paste steps, eliminating the need to retrain from scratch.

Proving Model Identity with Tinfoil's Modelwrap Technology
Tinfoil's Modelwrap ensures that inference providers serve the exact model weights they claim to, using cryptographic commitments verified by secure enclaves.

TranslateGemma-12b: Human Review Catches 71% Errors Missed by Automated Metrics
Human MQM review flagged 71% of translation segments that automated metrics rated clean, with all 25 accuracy errors in the metric-blind quadrant.