Qwen 3.6 27B F16 Passes Pacman Test, 8-Bit Quants Fail

A developer on r/LocalLLaMA shared a practical coding benchmark: one-shot a single-page Pacman clone from a good prompt, three attempts, keep the best. Qwen 3.6 27B F16 produced two nearly perfect games — the first local model to succeed. However, dropping to 8-bit quantization made good results unreproducible even after five attempts, reinforcing the claim that 8-bit quant is not lossless for complex generative tasks.

Key technical findings from the post:

Chat template is critical: The official Qwen chat template is tuned for vLLM and contains errors in llama.cpp and other runners. The author fixed bugs iteratively, and after fine-tuning, the model felt "a new level of intelligence."
MTP speculative decoding speeds vary by task: For deterministic tasks like coding, generative tok/s ranged from 8 to 18 tok/s (baseline without MTP: 6.6 tok/s). Creative tasks see less acceleration.
Harness choice affects speed more than code quality: Qwen CLI performed surprisingly well — comparable to Claude Code in output quality, but much faster because Claude Code's extra prompts slow down local models. With a slow model like Qwen 3.6 27B at ~6 tok/s, every extra prompt adds painful latency.
Don't interfere with context management: The model's native context caching and compaction work well. Plugins or tools that manipulate cache or context confuse the model and degrade performance.
Tool calls and subagents work flawlessly after proper chat template fixes. Context compaction, shell usage, and parallel subagents all function as expected.

The author warns that your mileage depends heavily on runner configuration: use F16 weights, a corrected chat template, and avoid heavy harnesses unless you have fast inference. The full playable Pacman result is available at guigand.com/pacman.

📖 Read the full source: r/LocalLLaMA

Qwen 3.6 27B F16 Passes Pacman Coding Test, But 8-Bit Quants Fail — Key Lessons on Templates and MTP Speculative Decoding

👀 See Also

Applying Claude Code's Architecture to Local 9B Models: Key Findings and Optimizations

Spec-Driven Development Workflow for Claude Code: Decomposition, Context Clearing, and Cost Control

Claude Code Ultracode Mode Spawns 70-Agent Pipeline for Deep Search

Introducing Lean Collab: A Multi-Agent Orchestrator for Long-Running LLM Tasks