Qwen3.6-27B Fits on Single 24GB GPU, Beats Former 397B MoE on SWE-bench

✍️ OpenClawRadar📅 Published: April 29, 2026🔗 Source

Qwen3.6-27B dropped on April 22, bringing a 27B dense model that fits a single 24GB GPU at Q4_K_M (~16.8GB) and scores 77.2 on SWE-bench Verified — beating the previous 397B MoE model (76.2). For developers running local coding agents on consumer hardware, this changes the threshold for capable agentic models.

Key specs and architecture

262K context length
Apache 2.0 license
Gated DeltaNet linear attention (3 of 4 sublayers) with Gated Attention for the remainder
"Thinking Preservation" carries reasoning traces across turns, reducing redundant token generation and improving KV cache efficiency in long agent sessions

Hardware requirements

At Q4_K_M, the model uses ~16.8GB VRAM, fitting comfortably on a single 24GB card (e.g., RTX 3090/4090, A10G). In contrast, Qwen3-Coder-Next (80B MoE, 3B active) requires 45–80GB at the same quantization, limiting it to dual-GPU setups or Apple Silicon with 48GB+ unified memory.

Caveats and gotchas

Do NOT use CUDA 13.2 — it produces garbage output. Stick to CUDA 13.1 or 12.x.
For users already running Coder-Next on 48GB+ hardware for agentic tasks, the switch isn't obviously beneficial.
For single-GPU users stuck on older or weaker local coding models, Qwen3.6-27B is currently the most capable option at the 24GB tier.

📖 Read the full source: r/LocalLLaMA

👀 See Also

News

Testing OpenClaw on UmbrelOS: What to Know

OpenClaw's integration with UmbrelOS is being explored, potentially offering a new environment for AI-enhanced coding tools.

Apr 20, 2026, 05:38 PM UTC

OpenClawRadar

News

Anthropic's Natural Language Autoencoders Turn Claude's Activations into Readable English — Here's How

Anthropic releases Natural Language Autoencoders (NLAs) that convert Claude's internal activations into plain-text explanations, revealing model reasoning about rhymes, safety test awareness, and cheating detection.

May 7, 2026, 10:15 PM UTC

OpenClawRadar

News

AlphaEvolve: DeepMind's Gemini-powered agent optimizes algorithms across genomics, power grids, and TPC circuits

AlphaEvolve, a Gemini-powered coding agent by Google DeepMind, improved DeepConsensus variant detection errors by 30%, boosted AC Optimal Power Flow GNN feasibility from 14% to 88%, and reduced quantum circuit error by 10x.

May 7, 2026, 06:15 PM UTC

OpenClawRadar

News

Claude Code allegedly refuses requests or charges extra when commits mention 'OpenClaw'

A tweet by Theo claims Claude Code either refuses requests or charges extra if your git commits mention 'OpenClaw', sparking discussion on HN.

Apr 30, 2026, 04:16 PM UTC

OpenClawRadar