Running Qwen3.6-35B-A3B-UD-Q5_K_XL Locally with VS Code Copilot on AMD R9700

A Reddit user reports great results running the Qwen3.6-35B-A3B-UD-Q5_K_XL GGUF model locally using llama.cpp with Vulkan on a single AMD R9700 GPU. The setup served as a drop-in replacement for GitHub Copilot in VS Code, generating a complete test website and Playwright test suite with minimal intervention.
llama.cpp Startup Command
/app/llama-server -m /models/Qwen3.6-35B-A3B-UD-Q5_K_XL/Qwen3.6-35B-A3B-UD-Q5_K_XL.gguf \
--ctx-size 262144 --threads 8 --threads-batch 8 \
--gpu-layers 99 --parallel 1 --flash-attn on \
--batch-size 2048 --ubatch-size 1024 \
--cache-type-k q8_0 --cache-type-v q8_0 \
--cache-ram 12000 --ctx-checkpoints 50 \
--mmap --no-mmproj --kv-unified \
--reasoning off --reasoning-budget 0 --jinja \
--temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.0 \
--repeat-penalty 1.0 --presence-penalty 0.0
Key parameters: 256K context window, 99 GPU layers for full offload, flash attention enabled, and sampling config taken from the Qwen3.6-35B-A3B Hugging Face page under "precise coding".
VS Code Integration
The user configured a custom chat model in chatLanguageModels.json pointing to the local llama.cpp server:
{
"name": "Sean Llama.cpp",
"vendor": "customoai",
"apiKey": "${input:chat.lm.secret.3c0c0f21}",
"models": [
{
"id": "Qwen3.6-35B-A3B-UD-Q5_K_XL.gguf",
"name": "Qwen3.6-35B",
"url": "https://llm.home.arpa/v1/chat/completions",
"toolCalling": true,
"vision": false,
"maxInputTokens": 180000,
"maxOutputTokens": 10000,
"family": "Qwen3",
"inputTokenCost": 0.0001,
"outputTokenCost": 0.0001,
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"repeat_penalty": 1,
"presence_penalty": 0,
"frequency_penalty": 0,
"systemMessage": "You are a precise coding assistant. Avoid repeating plans. Execute tasks directly. Do not restate intentions multiple times.",
"timeout": 600000,
"retry": { "enabled": true, "max_attempts": 2, "interval_ms": 1500 }
}
]
}
The model correctly responded to tool calling requests, allowing it to act as a Copilot replacement.
Real-World Test: Full Stack Generation
The user fed a detailed prompt (originally from ChatGPT) asking the model to build a "Bike Shop Service Tracker" — a local-first React + TypeScript app using localStorage. Requirements included a data model, seed data, filtering, sorting, and form validation. The model generated the entire website fully functional on the first run.
Next, they prompted it to generate a complete Playwright test suite. Only one test required a manual fix — otherwise the suite ran without errors. The user's conclusion: "I think I am done tweaking and testing models (until the next big release) and can get back to coding now."
Who It's For
Developers running local LLMs for coding assistance, especially those with AMD GPUs (Vulkan) who want a Copilot alternative with comparable quality.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claudetop: Real-Time Cost Monitoring for Claude Code Sessions
Claudetop is an htop-like tool that shows real-time spending, cache efficiency, and model comparisons for Claude Code sessions. It provides slash commands like /claudetop:stats and smart alerts for cost milestones and efficiency issues.

MCP Server for TypeScript Projects Replaces Claude Code's Grep Pattern with Indexed Symbol Lookups
A developer built an MCP server that replaces Claude Code's grep-and-guess pattern with indexed symbol lookups for TypeScript projects. The tool maintains a live SQLite index of symbols, call sites, imports, and class hierarchy, reducing token usage by 63-79% in tests.

Open-source CLI uses Claude Haiku to automate Xero expense auditing
A developer has released an open-source Python CLI tool that uses Claude Haiku 4.5 to automate Xero expense auditing. The tool follows a 'deterministic code first, then AI to fill in the gaps' approach, keeping costs to a few cents per audit run.

Benchmark Results: Claude Agent Swarm with Memory System Shows 30-43% Token Cost Savings
A developer tested a 6-agent Claude swarm on a 40-point coding task with and without a custom memory system called Stompy. Results show Sonnet 4.6 with memory achieved perfect scores at $3.98 vs $7.04 without, while Haiku 4.5 failed completely without memory but scored 39/40 with it.