Qwen 3.5 122B MoE at 35 t/s on a Single 3090 with ik_llama.cpp MTP

A developer running a fully local inference stack on a single desktop reports hitting 35 tokens/s on Qwen 3.5 122B MoE using only one 3090, with the key enabler being a fork of llama.cpp that fixes MTP (Multi-Token Prediction) for offloaded experts.
Hardware Config
- AMD 9900X CPU
- 192GB DDR5-5200 RAM (called “the secret weapon”)
- Two 3090s (Ti + standard), no NVLink
Card 1 runs the worker: Qwen3.5-122B-A10B using Unsloth IQ3_S MTP GGUF with 204K context. 75% of expert layers are offloaded to CPU via surgical -ot flags. Card 2 runs the reasoner: Qwen3.6-35B-A3B Q4_K_XL with MTP at 135 t/s, 262K context.
Additional CPU-only instances handle background processing: Dialectic (35B heretic Q8), Scribe-Logos (Gemma4 19B), Moonshot (Gemma4 2B) — totalling ~19GB RAM.
The ik_llama.cpp Finding
Stock llama.cpp’s MTP evaluates each speculated token’s experts sequentially through DDR5, which on reasoning content actually regresses performance — the draft overhead outweighs the acceptance speedup. The ik fork implements fused MoE ops that batch expert reads for speculated tokens, turning MTP from a +4% gain into a +20% gain. The developer reports 35 t/s decode on a 122B model from a single 3090 using this fork.
If you’re offloading experts to RAM on any MoE model, try ik_llama.cpp before giving up on MTP.
Total Build Cost
- ~$1600 for RAM
- ~$1600 for two 3090s
- ~$400 for everything else
- Running cost: electricity only
📖 Read the full source: r/openclaw
👀 See Also

OpenClaw 101: A Beginner's Quick Start Summary

Four aarch64-specific failure modes when running vLLM on Blackwell GB10 with CUDA 13.0
A developer encountered four specific failure modes when setting up vLLM v0.7.1 with DeepSeek-R1-32B on a Blackwell GB10 system running aarch64 architecture with CUDA 13.0, including ABI mismatches and missing dependencies.

72-Step Claude Setup Checklist: From Default to Power User
A detailed medium article outlines a 72-step checklist for configuring Claude, moving from default settings to advanced power-user features. Shared on HN with 10 points and 1 comment.

Master OpenClaw on Your Android Smartphone: A Comprehensive Tutorial
Curious about harnessing the potential of OpenClaw on your Android smartphone? This tutorial provides step-by-step guidance on getting started, covering essential tips and tricks from the vibrant OpenClaw community.