Qwen3.5-35B-A3B-UD-Q6_K_XL Tested in Production Development Workflows

A developer on r/LocalLLaMA shared detailed testing results of the Qwen3.5-35B-A3B-UD-Q6_K_XL model in production development scenarios. The user conducted both benchmark testing and practical application across real client projects.
Performance Benchmarks
The model achieved benchmark scores of 1504pp2048 and 47.71 tg256. Token generation speed was solid when spread across two GPUs, and increased to 80 tokens per second (tps) when running on a single GPU.
Production Testing Methodology
The developer tested the model across five different projects using Git Worktrees to roll back to known specifications and features. Specifications for these tests were generated by Claude, with the developer using a Max Pro plan for the past year.
- Tested across JavaScript, Go, and Rust projects
- Used Git Worktrees for version control during testing
- Most "bugs" required only 5-minute tweaks or could be fixed with a second prompt
- Compared the experience to using Sonnet 4
Practical Results and Business Implications
The developer reported that Qwen3.5 "nailed them out of the park" for the work they do, particularly noting strong performance on Go and Rust projects. This has prompted serious consideration of switching from API-based models to a hybrid approach: using SOTA models via API for specification generation and reviews, while using local models for development work.
The testing has raised questions about hardware investment versus subscription costs. The developer has already spent $2,000 on Claude Pro Max since June 2025, with potential costs reaching $6,800 by 2027 if subscriptions continue. This has led to consideration of purchasing an RTX 6000 Pro as a business investment.
The developer has been using Qwen Coder for tab completion previously, but found Qwen3.5 takes local model capabilities to a new level for production use.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Toothcomb: Open-Source Real-Time Speech Fact-Checker Built with Claude Opus and Sonnet APIs
Toothcomb is an open-source tool that takes a speech transcript, fact-checks claims, detects logical fallacies and manipulative language using Claude Opus API, and supports real-time microphone streaming.

AI-Setup CLI Tool Automatically Generates AI Configuration Files for Local LLM Stacks
AI-Setup is a CLI tool that scans codebases and automatically generates AI configuration files like .cursorrules and claude.md. It detects your stack to eliminate manual rule writing for each new project.

OpenClaw Codex-GPT5.4 Task Validation Loop Issue
A developer reports Codex-GPT5.4 through OpenClaw gets stuck in a task validation loop during autonomous project work, repeatedly identifying and confirming tasks without executing them. They've implemented workspace controls including TASKS.md, heartbeat rules, and persona files to address the issue.

Hipocampus: A Persistent Memory System for AI Agents Using Compaction Trees
Hipocampus addresses the problem of AI agents forgetting context between sessions by implementing a compaction tree that compresses conversation history through five levels: raw → daily → weekly → monthly → root, with a topic index called ROOT.md.