Qwen3.6-27B Fits on Single 24GB GPU, Beats Former 397B MoE on SWE-bench

Qwen3.6-27B dropped on April 22, bringing a 27B dense model that fits a single 24GB GPU at Q4_K_M (~16.8GB) and scores 77.2 on SWE-bench Verified — beating the previous 397B MoE model (76.2). For developers running local coding agents on consumer hardware, this changes the threshold for capable agentic models.
Key specs and architecture
- 262K context length
- Apache 2.0 license
- Gated DeltaNet linear attention (3 of 4 sublayers) with Gated Attention for the remainder
- "Thinking Preservation" carries reasoning traces across turns, reducing redundant token generation and improving KV cache efficiency in long agent sessions
Hardware requirements
At Q4_K_M, the model uses ~16.8GB VRAM, fitting comfortably on a single 24GB card (e.g., RTX 3090/4090, A10G). In contrast, Qwen3-Coder-Next (80B MoE, 3B active) requires 45–80GB at the same quantization, limiting it to dual-GPU setups or Apple Silicon with 48GB+ unified memory.
Caveats and gotchas
- Do NOT use CUDA 13.2 — it produces garbage output. Stick to CUDA 13.1 or 12.x.
- For users already running Coder-Next on 48GB+ hardware for agentic tasks, the switch isn't obviously beneficial.
- For single-GPU users stuck on older or weaker local coding models, Qwen3.6-27B is currently the most capable option at the 24GB tier.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Testing OpenClaw on UmbrelOS: What to Know
OpenClaw's integration with UmbrelOS is being explored, potentially offering a new environment for AI-enhanced coding tools.

Anthropic's Natural Language Autoencoders Turn Claude's Activations into Readable English — Here's How
Anthropic releases Natural Language Autoencoders (NLAs) that convert Claude's internal activations into plain-text explanations, revealing model reasoning about rhymes, safety test awareness, and cheating detection.

AlphaEvolve: DeepMind's Gemini-powered agent optimizes algorithms across genomics, power grids, and TPC circuits
AlphaEvolve, a Gemini-powered coding agent by Google DeepMind, improved DeepConsensus variant detection errors by 30%, boosted AC Optimal Power Flow GNN feasibility from 14% to 88%, and reduced quantum circuit error by 10x.

Claude Code allegedly refuses requests or charges extra when commits mention 'OpenClaw'
A tweet by Theo claims Claude Code either refuses requests or charges extra if your git commits mention 'OpenClaw', sparking discussion on HN.