Qwen 3.5 122B MoE at 35 t/s on a Single 3090 with ik_llama.cpp MTP

✍️ OpenClawRadar📅 Published: June 6, 2026🔗 Source

A developer running a fully local inference stack on a single desktop reports hitting 35 tokens/s on Qwen 3.5 122B MoE using only one 3090, with the key enabler being a fork of llama.cpp that fixes MTP (Multi-Token Prediction) for offloaded experts.

Hardware Config

AMD 9900X CPU
192GB DDR5-5200 RAM (called “the secret weapon”)
Two 3090s (Ti + standard), no NVLink

Card 1 runs the worker: Qwen3.5-122B-A10B using Unsloth IQ3_S MTP GGUF with 204K context. 75% of expert layers are offloaded to CPU via surgical -ot flags. Card 2 runs the reasoner: Qwen3.6-35B-A3B Q4_K_XL with MTP at 135 t/s, 262K context.

Additional CPU-only instances handle background processing: Dialectic (35B heretic Q8), Scribe-Logos (Gemma4 19B), Moonshot (Gemma4 2B) — totalling ~19GB RAM.

The ik_llama.cpp Finding

Stock llama.cpp’s MTP evaluates each speculated token’s experts sequentially through DDR5, which on reasoning content actually regresses performance — the draft overhead outweighs the acceptance speedup. The ik fork implements fused MoE ops that batch expert reads for speculated tokens, turning MTP from a +4% gain into a +20% gain. The developer reports 35 t/s decode on a 122B model from a single 3090 using this fork.

If you’re offloading experts to RAM on any MoE model, try ik_llama.cpp before giving up on MTP.

Total Build Cost

~$1600 for RAM
~$1600 for two 3090s
~$400 for everything else
Running cost: electricity only

📖 Read the full source: r/openclaw

👀 See Also

Guides

OpenClaw 101: A Beginner's Quick Start Summary

Feb 7, 2026, 03:58 PM UTC

u/mehdiweb

Guides

Four aarch64-specific failure modes when running vLLM on Blackwell GB10 with CUDA 13.0

A developer encountered four specific failure modes when setting up vLLM v0.7.1 with DeepSeek-R1-32B on a Blackwell GB10 system running aarch64 architecture with CUDA 13.0, including ABI mismatches and missing dependencies.

Mar 22, 2026, 07:45 AM UTC

OpenClawRadar

Guides

72-Step Claude Setup Checklist: From Default to Power User

A detailed medium article outlines a 72-step checklist for configuring Claude, moving from default settings to advanced power-user features. Shared on HN with 10 points and 1 comment.

Apr 30, 2026, 08:16 PM UTC

OpenClawRadar

Guides

Master OpenClaw on Your Android Smartphone: A Comprehensive Tutorial

Curious about harnessing the potential of OpenClaw on your Android smartphone? This tutorial provides step-by-step guidance on getting started, covering essential tips and tricks from the vibrant OpenClaw community.

Apr 20, 2026, 05:38 PM UTC

OpenClawRadar