Qwen 3.5 Tool Calling Fix: 4 Bugs and Client Workarounds

Tool Calling Bugs in Qwen 3.5 Agentic Setups

When running Qwen 3.5 models in agentic environments like coding agents or function calling loops, four specific bugs can cause tool calling to fail completely.

The Four Core Bugs

XML tool calls leak as plain text: Qwen 3.5 emits tool calls as XML format (e.g., <function=bash><parameter=command>ls</parameter></function>). When servers fail to parse this—especially when text precedes the XML or thinking is enabled—the tool call arrives as raw text with finish_reason: stop, so your agent never executes it.
<think> tags leak into text and poison context: llama.cpp forces thinking=1 internally regardless of enable_thinking: false, causing tags to accumulate across turns and destroy multi-turn sessions.
Wrong finish_reason: Servers send "stop" when tool calls are present, causing agents to treat it as a final answer.
Non-standard finish_reason: Some servers return "eos_token", "", or null, causing most frameworks to crash on the unknown value before checking if tool calls exist.

Server Status (April 2026)

The source provides a detailed status table for major inference servers:

LM Studio 0.4.9: Best local option for XML parsing (fixed in v0.4.7), improved think leak handling, usually correct finish_reason.
vLLM 0.19.0: Works with --tool-call-parser qwen3_coder flag, streaming bugs exist, think leak fixed, usually correct finish_reason.
Ollama 0.20.2: Improved since fix for unclosed </think> bug, still flaky on XML parsing, sometimes wrong finish_reason.
llama.cpp b8664: Parser exists but fails with thinking enabled, think leak broken, wrong finish_reason when parser fails.

Recommended Solutions

Use Unsloth GGUFs instead of stock Qwen 3.5 Jinja templates, which have known issues with |items filter failing on tool arguments. Unsloth ships with 21 template fixes.

Add a client-side safety net with three small functions that catch what servers miss. The source provides the first function:

import re, json, uuid

1. Parse Qwen XML tool calls from text content
def parse_qwen_xml_tools(text):
    results = []
    for m in re.finditer(r'<function=([\w.-]+)>([\s\S]?)</function>', text):
        args = {}
        for p in re.finditer(r'<parameter=([\w.-]+)>([\s\S]?)</parameter>', m.group(2)):
            k, v = p.group(1).strip(), p.group(2).strip()
            try:
                v = json.loads(v)
            except:
                pass
            args[k] = v

This function extracts tool calls from text content when servers fail to parse the XML properly, providing a fallback mechanism for agentic workflows.

📖 Read the full source: r/LocalLLaMA