Savant Commander 48B: A Custom Qwen 3 Mixture-of-Experts Model with 12 Distilled Models

Savant Commander 48B is a custom Mixture-of-Experts (MOE) model built on Qwen 3 architecture that combines 12 distilled models from various providers including Claude, Gemini, OpenAI, and Deepseek. The model uses hand-coded routing to isolate each distill while allowing connections between them simultaneously.
Key Features and Architecture
- Based on Qwen 3 with 256K context length
- 4x12B MOE structure (48B total parameters)
- Custom routing isolates each distilled model while maintaining inter-model connections
- Prompt-controlled activation - users can select which distilled model(s) to use
- Enables direct comparison between different distilled models using identical prompts
Model Variants and Availability
The project includes both regular and uncensored ("Heretic") versions. The uncensored version was created by applying the Heretic process to each individual model before adding them to the MOE structure, rather than applying it to the entire MOE.
Available GGUF formats:
- Regular version:
https://huggingface.co/DavidAU/Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill-GGUF - Uncensored version:
https://huggingface.co/DavidAU/Qwen3-48B-A4B-Savant-Commander-Distill-12X-Closed-Open-Heretic-Uncensored-GGUF
Source repositories:
- Regular:
https://huggingface.co/DavidAU/Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill - Uncensored:
https://huggingface.co/DavidAU/Qwen3-48B-A4B-Savant-Commander-Distill-12X-Closed-Open-Heretic-Uncensored
Practical Applications
The model's prompt-controlled routing allows developers to test and compare outputs from different distilled models using the same prompts. Command and control functions are documented in the repository card with detailed instructions.
This approach to MOE architecture provides a practical way to leverage multiple specialized models within a single inference framework, particularly useful for comparing model behaviors or selecting specific model characteristics for different tasks.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Routing Claude API traffic to control costs after Max subscription change
Anthropic's Max subscription no longer covers third-party tool usage, forcing OpenClaw users to API billing. A routing proxy directs simple tasks to Claude Sonnet ($3/M input, $15/M output) and complex ones to Opus ($5/M input, $25/M output), cutting costs without quality loss.

Claude Code v2.1.143: Plugin Dependency Enforcement, PowerShell Defaults, and Background Session Fixes
Anthropic released Claude Code v2.1.143 with plugin dependency enforcement, PowerShell -ExecutionPolicy Bypass, new worktree isolation option, and numerous fixes for background sessions, Windows Terminal, and macOS file access.

Project Headroom: Netflix Engineer's Open Source Tool Slashes AI Token Costs by 90%
Netflix senior engineer Tejas Chopra created Project Headroom, an open source proxy that compresses AI context input by up to 90%, saving an estimated $700,000 across users since January 2026. It runs locally on port 8787 and wraps any LLM CLI.

Scaling Karpathy's Autoresearch with 16 GPUs: Results and Methods
The SkyPilot team gave Claude Code access to 16 GPUs on a Kubernetes cluster to run Karpathy's Autoresearch project. Over 8 hours, the agent submitted ~910 experiments, reduced validation bits per byte from 1.003 to 0.974 (2.87% improvement), and reached the best validation loss 9x faster than sequential execution.