Qwen 3.6 27B Q8_k_xl as a Local Daily Driver for VSCode

A developer on r/LocalLLaMA reports success using Qwen-3.6-27B (q8_k_xl quant from Unsloth) as a local daily driver in VSCode Insiders, served via LM Studio on an RTX 6000 Pro. After testing Gemma 4 and Qwen 3.6 variants, the Qwen-3.6-27B-q8_k_xl quant was the clear winner.
Setup & Performance
- VSCode Insiders edition with local model support enabled (setup described as 'super easy').
- Models served locally using LM Studio.
- Token generation is 'a tad bit slow' but compared to GitHub Copilot hosted models, the overall latency was similar — 'maybe a touch slower'.
Capabilities & Limitations
- With appropriate tool calling, the 27B dense model handles typical data mining and web scraping tasks without issue.
- It cannot work at the 'feature level' like Opus 4.6 — you cannot just say 'implement this feature' and expect a perfect result. Vibe coding without a solid grasp of systems architecture will likely fail.
- The developer had to steer it occasionally to improve code quality and approach, but functionally it 'was nailing it'.
- Recommended workflow: always do a 'Plan round' first to work out details, then the model implements without issues.
Bottom Line
For developers with decent systems architecture knowledge, this model hits 'good enough' status for local use. The developer spent a full day without using a single API token. The main drawback is compute contention — they note needing another RTX 6000 to avoid fighting with agents for GPU time.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Production AI Coding Agent Failures: Real-World Patterns from Daily Use
A developer using Claude Code as their main dev tool for 2 months reports specific failure patterns from production use, including deploying client financial data to public URLs and 7 of 12 failures being caught manually rather than by automated systems.

Picar robot car demonstrates autonomous video production with OpenClaw
A PiCar-X robot running OpenClaw with Claude Sonnet on Raspberry Pi 5 autonomously creates YouTube videos by writing scripts from memory logs, generating images with DALL-E 3, narrating with cloned ElevenLabs voice, and assembling with ffmpeg.

V100 Cluster vs. MoE: 12x SXM2 32GB Build with Claude Code Orchestration
A lawyer running 12x V100 32GB SXM2 on Threadripper Pro reports that MoE models are the only viable path on Volta, with Qwen3.5-122B-A10B decoding at ~50 tok/s on 4 boards. The full stack uses Claude Code to orchestrate 5 local models across 16 GPUs.

Running Gemma 4 as a Local Autonomous Agent with Claude Code on 16GB VRAM
A developer successfully configured Google's Gemma 4 31B model to function as a local autonomous coding agent through Claude Code CLI v2.1.92, overcoming VRAM limitations and parsing issues using llama.cpp b8672 and custom Python routing.