Qwen 3.6 27B Quantization Benchmark: Q4_K_M Beats Q8_0 on Practical Tradeoffs

✍️ OpenClawRadar📅 Published: April 28, 2026🔗 Source

A Reddit user benchmarked Qwen 3.6 27B in three GGUF quantization variants (BF16, Q4_K_M, Q8_0) using llama-cpp-python via the Neo AI Engineer framework. The evaluation covered 664 total samples across three tasks: HumanEval (code generation, 164 samples), HellaSwag (commonsense reasoning, 100 samples), and BFCL (function calling, 400 samples).

Benchmark Results

BF16 (model size 53.8 GB, peak RAM 54 GB, throughput 15.5 tok/s): HumanEval 56.10% (92/164), HellaSwag 90.00% (90/100), BFCL 63.25% (253/400). Average accuracy: 69.78%.
Q4_K_M (16.8 GB, 28 GB RAM, 22.5 tok/s): HumanEval 50.61% (83/164), HellaSwag 86.00% (86/100), BFCL 63.00% (252/400). Average: 66.54%.
Q8_0 (28.6 GB, 42 GB RAM, 18.0 tok/s): HumanEval 52.44% (86/164), HellaSwag 83.00% (83/100), BFCL 63.00% (252/400). Average: 66.15%.

Key Takeaways

Q4_K_M is the standout practical variant. It preserves BFCL accuracy (63.00% vs 63.25%), drops only ~5.5 points on HumanEval, and is ~4 points behind BF16 on HellaSwag. The tradeoffs: 1.45x faster than BF16, 48% less peak RAM, 68.8% smaller file, and nearly identical function calling performance. Q8_0 was underwhelming: it improved HumanEval by only ~1.8 points over Q4_K_M but used 42 GB RAM vs 28 GB, was slower, and scored lower on HellaSwag.

For local/CPU deployment, Q4_K_M is recommended unless the workload is heavily code-generation focused. For maximum quality, BF16 still wins.

Evaluation Setup

GGUF variants via llama-cpp-python with n_ctx: 32768, checkpointed evaluation. The Neo AI Engineer framework built the GGUF eval pipeline, handled checkpointed runs, and consolidated results. Complete case study with code snippets is linked in the original Reddit comments.

📖 Read the full source: r/LocalLLaMA

👀 See Also

Tools

Bridge Claude Code to Chat Apps for Remote Interaction

A GitHub project called cc-connect bridges Claude Code to messaging platforms like Slack and Telegram, allowing remote interaction without exposing your local machine. The agent runs locally while a small bridge relays messages between the agent and chat apps.

Apr 17, 2026, 01:45 AM UTC

OpenClawRadar

Tools

OpenObscure: Open-Source On-Device Privacy Firewall for AI Agents

OpenObscure is an open-source, on-device privacy firewall that sits between AI agents and LLM providers, using FF1 Format-Preserving Encryption to encrypt PII values before requests leave your device. It includes PII detection with 99.7% recall, cognitive firewall scanning, and runs on macOS/Linux/Windows with iOS/Android bindings.

Mar 28, 2026, 11:45 AM UTC

OpenClawRadar

Tools

Why Deterministic Workflows Outperform AI-Driven Orchestration for Agent Systems

A developer with a year of experience building agent systems shares that AI-driven orchestration failed reliably due to non-deterministic routing, compounding errors, cost explosion, and impossible debugging. Switching to deterministic workflows with code-based orchestration eliminated orchestration failures.

Apr 14, 2026, 09:45 AM UTC

OpenClawRadar

Tools

Blitz: Claude Code Tool for App Store Submissions

Blitz is a free tool that gives Claude Code the ability to automate App Store submissions via MCP tool calls. Users can ask Claude to 'submit my app to the app store' to handle certificates, screenshots, and App Store Connect forms.

Mar 16, 2026, 03:45 PM UTC

OpenClawRadar