Omnicoder-9B Performance Review: Speed vs. Tool Calling Issues

Technical Overview
Omnicoder-9B is a coding-specific model developed by Tesslate, based on the Qwen 3.5 architecture. It's fine-tuned on top of Qwen3.5 9B using outputs from multiple models including Opus 4.6, GPT 5.4, GPT 5.3 Codex, and Gemini 3.1 Pro.
Performance Characteristics
The model demonstrates strong performance on mid-tier hardware. With 12GB of VRAM, users report consistent token generation at 15 tokens/second even with context size set to 100k. Prompt processing is notably fast at approximately 265 tokens/second. The model runs without crashing systems or causing performance degradation.
Limitations and Issues
Despite the speed advantages, Omnicoder-9B shows several limitations in practical coding scenarios:
- Failed to generate a complete Super Mario clone in a standalone HTML file with a one-shot prompt
- Experienced tool calling failures with MCP servers, generating MCP errors during data fetching
- Issues executing write tool calls from Claude Code, though this may involve compatibility factors
IDE Integration Testing
Testing in development environments revealed mixed results:
- In LM Studio with Roo Code: Disconnections occurred as token size increased to 4k, though this appears to be an integration issue rather than model-specific
- The model successfully updated or wrote small scripts with token sizes between 2-3k
- API requests failed for tokens above 4k without error messages
- In Claude Code: Token generation felt slower compared to Roo Code, and the model failed to execute write tool calls after generating output
The user notes that Roo Code has been the most effective extension for local LLMs among Continue and other tested options.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code Session Data Loss: Backup Script for Windows & Mac
Users report silent session data loss in Claude Code. Here's a free, automated backup script for Windows and Mac using PowerShell and launchd.

HostedShell: A Web-Based Deployment Solution for OpenClaw Agents
HostedShell is a hosted version of OpenClaw that eliminates local CLI setup, dependency management, and manual pairing by providing a web console with direct terminal access and filesystem updates.
PullMD v2.4.1 Adds Native MCP Connector for claude.ai Web and Multi-User Auth
PullMD v2.4.1 now supports the claude.ai web custom connector dialog via OAuth 2.1 + PKCE-S256 and adds multi-user auth modes. Turn any URL into clean Markdown via self-hosted MCP.

Mobile Harness: Bringing Browser-Use Skills to Mobile Apps for Claude Agents
Mobile Harness gives Claude/agents reusable mobile app skills (Reddit, Instagram, TikTok) using MobAI as execution layer. Works with real devices, emulators, simulators, free daily quota.