Sarvam AI releases 30B and 105B open-source LLMs with Indian training infrastructure

✍️ OpenClawRadar📅 Published: March 7, 2026🔗 Source
Sarvam AI releases 30B and 105B open-source LLMs with Indian training infrastructure
Ad

Model specifications and architecture

Sarvam 30B and Sarvam 105B are reasoning models trained from scratch on large-scale, high-quality datasets curated in-house across pre-training, supervised fine-tuning, and reinforcement learning stages. Training was conducted entirely in India on compute provided under the IndiaAI mission.

Both models use a Mixture-of-Experts (MoE) Transformer backbone with sparse expert routing to scale parameter count without increasing compute per token. The architecture supports long-context inputs through rotary positional embeddings, RMSNorm-based stabilization, and attention designs optimized for efficient KV-cache usage during inference.

Sarvam 30B uses Grouped Query Attention (GQA) to reduce KV-cache memory while maintaining performance. Sarvam 105B extends the architecture with greater depth and Multi-head Latent Attention (MLA), a compressed attention formulation that reduces memory requirements for long-context inference. Both models use sparse expert feedforward layers with 128 experts but differ in expert capacity and routing configuration.

Ad

Training and data details

The 30B model was trained on 16T tokens, while the 105B model was trained on 12T tokens. Pre-training data spans code, general web data, specialized knowledge corpora, mathematics, and multilingual content with substantial allocation to the 10 most-spoken Indian languages.

Training used sigmoid-based routing scores rather than traditional softmax gating, which improves expert load balancing and reduces routing collapse. An expert-bias term stabilizes routing dynamics and encourages more uniform expert utilization across training steps.

Pre-training was conducted in three phases: long-horizon pre-training, mid-training, and a long-context extension phase. The 105B model achieved benchmark superiority over the 30B model early in training, suggesting efficient scaling behavior.

Performance and deployment

Sarvam 105B performs well on reasoning, programming, and agentic tasks across benchmarks. Sarvam 30B is optimized for real-time deployment with strong performance on real-world conversational use cases. Both models achieve state-of-the-art results on Indian language benchmarks, outperforming significantly larger models.

Sarvam 30B powers Samvaad, Sarvam's conversational agent platform. Sarvam 105B powers Indus, their AI assistant built for complex reasoning and agentic workflows.

Access and implementation

Weights can be downloaded from AI Kosh (30B, 105B) and Hugging Face (30B, 105B). For local inference with Transformers, vLLM, and SGLang, refer to the Hugging Face models page for sample implementations. Both models are accessible via Sarvam's API at their API dashboard.

📖 Read the full source: HN LLM Tools

Ad

👀 See Also

Go Players Disempower Themselves to AI: How Cheating Became Undetectable
News

Go Players Disempower Themselves to AI: How Cheating Became Undetectable

The LessWrong post details how AI cheating in Go tournaments became rampant and nearly impossible to punish, using the case of Carlo Metta who used Leela 0.11 and Leela Zero to win 25 of 26 games over several seasons, with only one loss under camera surveillance.

OpenClawRadar
Reddit User Argues Developers Should Shift from Clean Coding to Model Architecture with AI Agents
News

Reddit User Argues Developers Should Shift from Clean Coding to Model Architecture with AI Agents

A Reddit post argues that developers using AI coding agents like Claude should stop focusing on writing clean code and instead become 'model architects' who orchestrate AI systems. The author shares specific techniques including creating 'logic maps' before coding and treating prompts as design reviews.

OpenClawRadar
Snowflake lays off documentation staff after training AI replacement
News

Snowflake lays off documentation staff after training AI replacement

Snowflake confirmed 'targeted workforce reductions' in technical writing and documentation teams, with sources reporting approximately 400 people affected. The company had been screen recording documentation sessions for 8 months to build training datasets from senior writers' workflows.

OpenClawRadar
Project Health Check: Bus Factor and Commit Activity Across Claw/Assistant Repos
News

Project Health Check: Bus Factor and Commit Activity Across Claw/Assistant Repos

A Reddit user scraped commit data from major claw/assistant projects and found many with a bus factor of 1—meaning a single author accounts for over 50% of commits. Some projects show drastic drops in April activity.

OpenClawRadar