OmniCoder-9B fine-tune shows strong performance for agentic coding on 8GB VRAM systems

Performance results from testing OmniCoder-9B with OpenCode
A user on r/LocalLLaMA reported testing OmniCoder-9B, a fine-tune of Qwen3.5-9B trained on Opus traces, and found it performed well for agentic coding tasks on systems with limited VRAM. The model is available on Hugging Face at Tesslate/OmniCoder-9B.
Technical setup and configuration
The user ran the Q4_K_M GGUF quantization using ik_llama with the following command:
ik_llama.cpp\build\bin\Release\llama-server.exe -m models/Tesslate/OmniCoder-9B-GGUF/omnicoder-9b-q4_k_m.gguf -ngl 999 -fa 1 -b 2048 -ub 512 -t 8 -c 100000 -ctk f16 -ctv q4_0 --temp 0.4 --top-p 0.95 --top-k 20 --presence-penalty 0.0 --jinja --ctx-checkpoints 0
They achieved approximately 40 tokens per second with this configuration. The user noted that Q5_KS quantization with 64,000 context length provides similar speeds.
OpenCode configuration
The OpenCode configuration used for testing:
"local": { "models": { "/models/Tesslate/OmniCoder-9B-GGUF/omnicoder-9b-q4_k_m.gguf": { "interleaved": { "field": "reasoning_content" }, "limit": { "context": 100000, "output": 32000 }, "name": "omnicoder-9b-q4_k_m", "reasoning": true, "temperature": true, "tool_call": true } }, "npm": "@ai-sdk/openai-compatible", "options": { "baseURL": "http://localhost:8080/v1" } }The user mentioned a potential bug causing full prompt reprocessing that they're investigating.
Context and comparison
The testing was motivated by concerns about quota restrictions and pricing changes in commercial AI coding tools. The user specifically mentioned having 8GB VRAM, which typically limits the ability to run capable open-source models at good speeds for agentic coding. They noted that while MOE models might offer better performance, their speeds are significantly slower.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Agint: A Rust CLI tool that detects contradictions in AI agent instruction files
Agint is a free, open-source Rust CLI tool that scans instruction files like CLAUDE.md and AGENTS.md for contradictions, missing file references, and sync issues. It uses static analysis for structural problems and optionally calls Claude API for semantic contradiction detection.

Claude-File-Recovery: CLI tool extracts files from Claude Code session history
claude-file-recovery is a Python CLI tool and TUI that parses JSONL session transcripts from ~/.claude/projects/ to recover files created, modified, or read by Claude Code, including point-in-time recovery of earlier file versions.

RAG Learning Academy Built Inside Claude Code with 20 Specialist Agents
A developer created an interactive RAG learning academy inside Claude Code featuring 20 specialist agents, 17 slash commands, and a 9-module curriculum that assesses knowledge level and uses open-source tools by default.

PeaDB: Redis-Compatible Database Coded with AI Assistants in C++20
A developer created PeaDB, a Redis 7.2.5 drop-in replacement written in C++20 using Codex, Copilot, and Claude, implementing ~147 commands with persistence, replication, and cluster support. Benchmarks show performance close to Redis.