GLM-5-Turbo: 0.57% Tool Call Error Rate vs 3%

The z-ai/glm-5-turbo model is showing promising performance for tool calling applications according to user testing shared on r/LocalLLaMA.

Benchmark Results

Testing indicates the model achieves a very low tool call error rate of 0.57% on average. This represents a significant improvement over the standard GLM-5 model, which shows approximately 3% error rate - making GLM-5-turbo about 6 times more accurate for tool calling tasks.

When compared to other providers' models:

Anthropic models range from 0.38% to 0.93% with 0.67% average
Amazon Bedrock models range from 1.48% to 1.76% with 1.63% average
Google Vertex models range from 0.99% to 2.62% with 1.93% average

Practical Application

A user tested GLM-5-turbo with a novel CLI tool for writing fantasy novels and reported substantial improvements over previous models. With the standard GLM-5, the tool was "a bit flaky when it came to something none english, and randomly dont now what command to use correctly compare to the user request."

Using GLM-5-turbo (Max plan), the user successfully wrote 97,000 words with "no flaky, no em-dash, connected chapters and tool calls has been almost done right." The model specifically supports OpenClaw well according to the source.

Usage Considerations

The source suggests GLM-5-turbo may be suitable for side projects requiring coding assistance, but cautions that for production projects requiring more stable factors, "it feel like not a right choices." The user also mentioned considering using NemoClaw with GLM-5-turbo on a homelab setup rather than OpenClaw.

Initial usage data on Openrouter shows good numbers for the first 100B tokens, though specific metrics weren't provided in the source.

📖 Read the full source: r/LocalLLaMA

GLM-5-Turbo Shows Low Tool Call Error Rate in User Testing

Benchmark Results

Practical Application

Usage Considerations

👀 See Also

Collaborate: A Claude Code Skill for Structured, Asynchronous Document Writing with Multi-Agent Handoffs

Destiny: Claude Code Plugin for Deterministic Fortune Telling Using Classical East Asian Astrology

Google PM Open-Sources Always On Memory Agent with SQLite Storage, No Vector DB

Codesight: AI Context Engine Cuts 30K-60K Tokens from Claude Code Sessions