Qwen 3.5 Tool Calling Fixes for Agentic Use: Server Status and Client-Side Workarounds

Tool Calling Bugs in Qwen 3.5 Agentic Setups
When running Qwen 3.5 models in agentic environments like coding agents or function calling loops, four specific bugs can cause tool calling to fail completely.
The Four Core Bugs
- XML tool calls leak as plain text: Qwen 3.5 emits tool calls as XML format (e.g., <function=bash><parameter=command>ls</parameter></function>). When servers fail to parse this—especially when text precedes the XML or thinking is enabled—the tool call arrives as raw text with finish_reason: stop, so your agent never executes it.
- <think> tags leak into text and poison context: llama.cpp forces thinking=1 internally regardless of enable_thinking: false, causing tags to accumulate across turns and destroy multi-turn sessions.
- Wrong finish_reason: Servers send "stop" when tool calls are present, causing agents to treat it as a final answer.
- Non-standard finish_reason: Some servers return "eos_token", "", or null, causing most frameworks to crash on the unknown value before checking if tool calls exist.
Server Status (April 2026)
The source provides a detailed status table for major inference servers:
- LM Studio 0.4.9: Best local option for XML parsing (fixed in v0.4.7), improved think leak handling, usually correct finish_reason.
- vLLM 0.19.0: Works with --tool-call-parser qwen3_coder flag, streaming bugs exist, think leak fixed, usually correct finish_reason.
- Ollama 0.20.2: Improved since fix for unclosed </think> bug, still flaky on XML parsing, sometimes wrong finish_reason.
- llama.cpp b8664: Parser exists but fails with thinking enabled, think leak broken, wrong finish_reason when parser fails.
Recommended Solutions
Use Unsloth GGUFs instead of stock Qwen 3.5 Jinja templates, which have known issues with |items filter failing on tool arguments. Unsloth ships with 21 template fixes.
Add a client-side safety net with three small functions that catch what servers miss. The source provides the first function:
import re, json, uuid
1. Parse Qwen XML tool calls from text content
def parse_qwen_xml_tools(text):
results = []
for m in re.finditer(r'<function=([\w.-]+)>([\s\S]?)</function>', text):
args = {}
for p in re.finditer(r'<parameter=([\w.-]+)>([\s\S]?)</parameter>', m.group(2)):
k, v = p.group(1).strip(), p.group(2).strip()
try:
v = json.loads(v)
except:
pass
args[k] = v
This function extracts tool calls from text content when servers fail to parse the XML properly, providing a fallback mechanism for agentic workflows.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Five Common OpenClaw Configuration Issues That Inflate API Costs
A Reddit post identifies five configuration problems in OpenClaw setups that lead to excessive API credit consumption, including using expensive models for routine tasks, missing budget limits, open gateways, unmanaged memory, and unaudited skills.

GitHub Repo Owners: Use Git's --author Flag to Block AI Bot Spam
Archestra fought AI comment/PR spam by exploiting GitHub's 'prior contributors' setting and Git's --author flag to whitelist real humans via a captcha-based onboarding flow.

Camoufox Cookie Injection: Browse Reddit as Yourself While Your Agent Does the Work
A detailed walkthrough on bypassing Reddit bot detection by extracting Firefox cookies and injecting them into Camoufox via Playwright.

OpenClaw Memory Plugin Testing Results and Recommended Stack
A Reddit user tested every OpenClaw memory plugin and found the default markdown setup causes token bloat and instruction compression. The recommended setup combines Obsidian for human-readable notes, QMD for token-free searching, and SQLite for structured data.