Qwen3.6-27B Fits on Single 24GB GPU, Beats Former 397B MoE on SWE-bench

✍️ OpenClawRadar📅 Published: April 29, 2026🔗 Source
Qwen3.6-27B Fits on Single 24GB GPU, Beats Former 397B MoE on SWE-bench
Ad

Qwen3.6-27B dropped on April 22, bringing a 27B dense model that fits a single 24GB GPU at Q4_K_M (~16.8GB) and scores 77.2 on SWE-bench Verified — beating the previous 397B MoE model (76.2). For developers running local coding agents on consumer hardware, this changes the threshold for capable agentic models.

Key specs and architecture

  • 262K context length
  • Apache 2.0 license
  • Gated DeltaNet linear attention (3 of 4 sublayers) with Gated Attention for the remainder
  • "Thinking Preservation" carries reasoning traces across turns, reducing redundant token generation and improving KV cache efficiency in long agent sessions
Ad

Hardware requirements

At Q4_K_M, the model uses ~16.8GB VRAM, fitting comfortably on a single 24GB card (e.g., RTX 3090/4090, A10G). In contrast, Qwen3-Coder-Next (80B MoE, 3B active) requires 45–80GB at the same quantization, limiting it to dual-GPU setups or Apple Silicon with 48GB+ unified memory.

Caveats and gotchas

  • Do NOT use CUDA 13.2 — it produces garbage output. Stick to CUDA 13.1 or 12.x.
  • For users already running Coder-Next on 48GB+ hardware for agentic tasks, the switch isn't obviously beneficial.
  • For single-GPU users stuck on older or weaker local coding models, Qwen3.6-27B is currently the most capable option at the 24GB tier.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also