Developer Tests Qwen3.5 27B vs Larger Models for Local Coding Tasks

A developer tested several large language models for local coding tasks, comparing performance and hardware requirements. The testing focused on Qwen3.5 variants and Nemotron models, with comparisons to GPT-5.4 High.
Test Results and Findings
The developer tested these specific models:
- unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL
- unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL
- unsloth/Qwen3.5-122B-A10B-GGUF
- unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL
- unsloth/Qwen3.5-27B-GGUF:UD-Q8_K_XL
- unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-IQ4_XS
- unsloth/gpt-oss-120b-GGUF:F16
Key findings from the testing:
- Nemotron-3-Super-120B performed "very, very good," on par with GPT-5.4 High
- Qwen3.5-27B performed well for development tasks
- GPT-OSS-120B and Qwen3.5-122B performed worse than the other two models
- Nemotron-3-Super-120B consistently responded in Spanish (the tester's native language) while others responded in English
Performance Metrics
The developer provided specific performance numbers:
- Nemotron-3-Super-120B: 80 tokens per second (tg/s), ~2000 prompt processing (pp), 100k context on vast.ai with 4x RTX 3090
- Qwen3.5-27B Q6: 803 pp, 25 tg/s, 256k context on vast.ai
Hardware Requirements
The developer noted hardware constraints:
- Qwen3.5-122B would require a new motherboard and 1-2 more RTX 3090 cards, making it too expensive
- Qwen3.5-27B runs on existing 2x RTX 3090 hardware without additional investment
- If they had the hardware for Nemotron-3-Super-120B, they would use it instead
Implementation Details
The developer plans to use Qwen3.5-27B-GGUF:UD-Q6_K_XL for real development tasks locally and provided the llama.cpp command used for testing:
./llama.cpp/llama-server -hf unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 -ngl 999
The developer mentioned they'll continue using CODEX for complex tasks but can replace API subscriptions for daily tasks with the local setup.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Distilled Qwen 3.5 27B Model Shows Strong Performance with Cursor AI Coding Agent
A user reports that the opus 4.6 distilled version of Qwen 27B works effectively as the model driving Cursor, with performance comparable to Gemini 3 Flash. Setup took about 10 minutes using Cursor to configure ngrok tunnel and localllama.

Project Headroom: Netflix Engineer's Open Source Tool Slashes AI Token Costs by 90%
Netflix senior engineer Tejas Chopra created Project Headroom, an open source proxy that compresses AI context input by up to 90%, saving an estimated $700,000 across users since January 2026. It runs locally on port 8787 and wraps any LLM CLI.

Open-source structural hallucination checker for AI agent pipelines
A new open-source tool provides four suppressors to catch structural failures in AI agent pipelines, including grounding enforcement, prompt injection detection, JSON validation, and tool response verification. Available as both a REST API and MCP server with a free tier of 500 requests/month.

ClawMetry adds remote monitoring with E2E encryption for OpenClaw agents
ClawMetry v0.1.0 now includes cloud sync for remote monitoring of OpenClaw agents from any browser or Mac menu bar app, with end-to-end encryption that keeps data encrypted until it reaches your client.