Developer Tests Qwen3.5 27B vs Larger Models for Local Coding Tasks

✍️ OpenClawRadar📅 Published: March 28, 2026🔗 Source
Developer Tests Qwen3.5 27B vs Larger Models for Local Coding Tasks
Ad

A developer tested several large language models for local coding tasks, comparing performance and hardware requirements. The testing focused on Qwen3.5 variants and Nemotron models, with comparisons to GPT-5.4 High.

Test Results and Findings

The developer tested these specific models:

  • unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL
  • unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL
  • unsloth/Qwen3.5-122B-A10B-GGUF
  • unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL
  • unsloth/Qwen3.5-27B-GGUF:UD-Q8_K_XL
  • unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-IQ4_XS
  • unsloth/gpt-oss-120b-GGUF:F16

Key findings from the testing:

  • Nemotron-3-Super-120B performed "very, very good," on par with GPT-5.4 High
  • Qwen3.5-27B performed well for development tasks
  • GPT-OSS-120B and Qwen3.5-122B performed worse than the other two models
  • Nemotron-3-Super-120B consistently responded in Spanish (the tester's native language) while others responded in English

Performance Metrics

The developer provided specific performance numbers:

  • Nemotron-3-Super-120B: 80 tokens per second (tg/s), ~2000 prompt processing (pp), 100k context on vast.ai with 4x RTX 3090
  • Qwen3.5-27B Q6: 803 pp, 25 tg/s, 256k context on vast.ai
Ad

Hardware Requirements

The developer noted hardware constraints:

  • Qwen3.5-122B would require a new motherboard and 1-2 more RTX 3090 cards, making it too expensive
  • Qwen3.5-27B runs on existing 2x RTX 3090 hardware without additional investment
  • If they had the hardware for Nemotron-3-Super-120B, they would use it instead

Implementation Details

The developer plans to use Qwen3.5-27B-GGUF:UD-Q6_K_XL for real development tasks locally and provided the llama.cpp command used for testing:

./llama.cpp/llama-server -hf unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 -ngl 999

The developer mentioned they'll continue using CODEX for complex tasks but can replace API subscriptions for daily tasks with the local setup.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also