LM Studio parser bugs break Qwen3.5 tool calling and reasoning

LM Studio parser issues affecting reasoning models
LM Studio's server parser contains multiple bugs that interfere with tool calling and reasoning in models like Qwen3.5 and DeepSeek-R1. These issues can cause models to appear broken when the problem is actually in the parser.
The bugs
1. Parser scans inside <think> blocks for tool call patterns
When reasoning models think about tool calling syntax inside their <think> blocks, LM Studio's parser treats those prose mentions as actual tool call attempts. This creates a recursive trap where the model reasons about tool calls, the parser finds tool-call-shaped tokens in the thinking, the parse fails, the error is fed back to the model, and the cycle repeats.
The model literally cannot debug a tool calling issue because describing the problem reproduces it. One model explicitly said "I'm getting caught in a loop where my thoughts about tool calling syntax are being interpreted as actual tool call markers" — and that sentence itself triggered the parser.
This was first reported as issue #453 in February 2025 and remains open over a year later.
Workaround: Disable reasoning with {%- set enable_thinking = false %}. This instantly fixes the issue, allowing 20+ consecutive tool calls to succeed.
2. Registering a second MCP server breaks tool call parsing for the first
This bug is clean and deterministic. Testing with lfm2-24b-a2b at temperature=0.0 shows:
- Only KG server active: Model correctly calls
search_nodes, parser recognizes<|tool_call_start|>tokens, tool executes, results returned. Works perfectly. - Add webfetch server (don't even call it): Model emits
<|tool_call_start|>[web_search(...)]<|tool_call_end|>as raw text in the chat. The special tokens are no longer recognized. The tool is never executed.
The mere registration of a second MCP server — without calling it — changes how the parser handles the first server's tool calls. Same model, same prompt, same target server. Single variable changed.
Workaround: Only register the MCP server you need for each task. This is impractical for agentic workflows.
3. Server-side reasoning_content/content split produces empty responses that report success
This affects everyone using reasoning models via the API, whether using tool calling or not. When sending a simple prompt to Qwen3.5-35b-a3b via /v1/chat/completions asking it to list XML tags used for reasoning, the server returned:
{
"content": "",
"reasoning_content": "[3099 tokens of detailed deliberation]",
"finish_reason": "stop"
}
The model did extensive work — 3099 tokens of reasoning — but got caught in a deliberation loop inside <think> and never produced output in the content field. The server returned finish_reason: "stop" with empty content, reporting success.
This means:
- Every eval harness checking
finish_reason == "stop"silently accepts empty responses - Every agentic framework propagates empty strings downstream
- Every user sees a blank response and concludes the model is broken
- The actual reasoning is trapped in
reasoning_content— the model did real work that nobody sees unless they explicitly check that field
This is server-side, not a UI bug, confirmed by inspecting the raw API response and LM Studio server log. The reasoning_content/content split happens before the response reaches any client.
Bug interaction
These aren't independent issues. They interact to create systemic problems with tool calling and reasoning in LM Studio.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Jan Adds One-Click OpenClaw Installation with Jan-v3-Base Model Integration
Jan now supports one-click installation of OpenClaw with direct integration to the Jan-v3-base model, keeping all operations local and private on your computer.

MemRosetta adds persistent memory to Claude Code with one command setup
MemRosetta v0.2.4 provides Claude Code with cross-session memory via a single npm install command. The tool includes a MCP server with 6 memory tools, automatic session capture, and local SQLite storage that can be shared with Cursor.

Lightfeed Extractor: TypeScript Library for Robust Web Data Extraction with LLMs
Lightfeed Extractor is a TypeScript library that handles the full pipeline from raw HTML to validated structured data using LLMs, with features like HTML-to-markdown conversion, Zod schema validation, JSON recovery, and built-in Playwright browser automation.

SpecLock: MCP Server for Enforcing AI Coding Constraints
SpecLock is an open-source MCP server that remembers project constraints across sessions and blocks AI coding agents from violating them. Claude independently tested it with 100 adversarial tests, scoring 100/100 with zero false positives and 15.7ms per check.