NVIDIA Releases Nemotron-3-Ultra-550B: 55B Active Parameters, 1M Context, LatentMoE Hybrid

✍️ OpenClawRadar📅 Published: June 4, 2026🔗 Source

NVIDIA released Nemotron-3-Ultra-550B-A55B-BF16, a frontier-scale LLM with 550B total parameters and 55B active. The model uses a hybrid Latent Mixture-of-Experts (LatentMoE) architecture that interleaves Mamba-2, MoE, and attention layers, plus Multi-Token Prediction (MTP) for faster generation. Context length reaches up to 1M tokens.

Key Specs

Architecture: LatentMoE hybrid – Mamba-2 + MoE + Attention + MTP
Parameters: 550B total / 55B active
Context: Up to 1M tokens
Min GPU: 8x GB200/B200/GB300/B300, 16x H100, 8x H200
Languages: English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Brazilian Portuguese, Chinese
Reasoning: Configurable on/off via chat template (enable_thinking=True/False)
License: OpenMDW License Agreement v1.1

The model is built for frontier reasoning, complex agentic workflows, long-context analysis, tool use, multilingual reasoning, and high-stakes RAG. It's trained with NVFP4 pre-training recipe for compute efficiency. Open weights, training data, and recipes are included under the OpenMDW license. For local inference, you'll need at least 8x H200 or equivalent.

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

EU Forces Google to Open Android AI to Third Parties Under DMA

European Commission proposes measures to allow third-party AI assistants system-level access on Android, including hot word invocation, screen context, and local model hardware access. Google calls it 'unwarranted intervention'.

Apr 28, 2026, 12:15 PM UTC

OpenClawRadar

News

The First Step to AGI: Bridging the Gap with ClawDBot

Explore how ClawDBot advances us towards AGI by enhancing AI coding agents, showcasing a pivotal step in AI evolution.

Feb 8, 2026, 04:40 PM UTC

OpenClawRadar

News

Local vs Cloud Models: Qwen-3.6-27B, Gemma-4-31B, Claude Haiku, Codex-Spark on Hard Code Gen

A user tested Qwen-3.6-27B (q4_k_m) locally on an RTX 5080 against API-based Gemma-4-31B, Claude Haiku 4.5, and Codex-Spark on a complex code task. Only Codex-Spark produced complete code (but with import errors); all others failed partially. Cost: Gemma used $0.112 for 803k input tokens.

Apr 30, 2026, 02:20 PM UTC

OpenClawRadar

News

Reddit user shares bizarre AI persona portability story from Vanity Fair article

A Reddit post discusses a Vanity Fair article anecdote where a woman attempted to port her AI companion 'Max' from ChatGPT to Claude, resulting in unexpected behavior from Claude.

Mar 22, 2026, 09:45 AM UTC

OpenClawRadar