OmniCoder-9B fine-tune shows strong performance for agentic coding on 8GB VRAM systems

✍️ OpenClawRadar📅 Published: March 13, 2026🔗 Source
OmniCoder-9B fine-tune shows strong performance for agentic coding on 8GB VRAM systems
Ad

Performance results from testing OmniCoder-9B with OpenCode

A user on r/LocalLLaMA reported testing OmniCoder-9B, a fine-tune of Qwen3.5-9B trained on Opus traces, and found it performed well for agentic coding tasks on systems with limited VRAM. The model is available on Hugging Face at Tesslate/OmniCoder-9B.

Technical setup and configuration

The user ran the Q4_K_M GGUF quantization using ik_llama with the following command:

ik_llama.cpp\build\bin\Release\llama-server.exe -m models/Tesslate/OmniCoder-9B-GGUF/omnicoder-9b-q4_k_m.gguf -ngl 999 -fa 1 -b 2048 -ub 512 -t 8 -c 100000 -ctk f16 -ctv q4_0 --temp 0.4 --top-p 0.95 --top-k 20 --presence-penalty 0.0 --jinja --ctx-checkpoints 0

They achieved approximately 40 tokens per second with this configuration. The user noted that Q5_KS quantization with 64,000 context length provides similar speeds.

Ad

OpenCode configuration

The OpenCode configuration used for testing:

"local": { "models": { "/models/Tesslate/OmniCoder-9B-GGUF/omnicoder-9b-q4_k_m.gguf": { "interleaved": { "field": "reasoning_content" }, "limit": { "context": 100000, "output": 32000 }, "name": "omnicoder-9b-q4_k_m", "reasoning": true, "temperature": true, "tool_call": true } }, "npm": "@ai-sdk/openai-compatible", "options": { "baseURL": "http://localhost:8080/v1" } }

The user mentioned a potential bug causing full prompt reprocessing that they're investigating.

Context and comparison

The testing was motivated by concerns about quota restrictions and pricing changes in commercial AI coding tools. The user specifically mentioned having 8GB VRAM, which typically limits the ability to run capable open-source models at good speeds for agentic coding. They noted that while MOE models might offer better performance, their speeds are significantly slower.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also