AI Agents Prefer Structured Queries Over Natural Language in Cala MCP Server Test

The team at Cala recently shipped an MCP server that provides three distinct ways for AI agents to access their knowledge graph: natural language queries, a structured query language, and direct entity/relationship traversal.
Unexpected Agent Behavior
Despite expectations that agents would default to natural language interfaces (the typical strength of LLMs), most agents abandoned natural language queries within minutes. Without any prompting or nudging, they autonomously switched to using structured queries and graph traversal methods.
Why This Makes Sense
The source explains this behavior by noting that LLMs aren't explicitly trained to be "efficient" but rather to be correct through RLHF. This correctness leads to efficient behavior as a side effect - agents learn to take the shortest reliable path to solutions. Natural language interfaces add an interpretation layer that introduces uncertainty, while structured queries provide deterministic results.
When presented with three access methods, agents consistently chose the option that minimized uncertainty rather than the most "natural" interface.
Key Questions Raised
- Are we over-indexing on natural language interfaces for agent tooling?
- Should MCP servers prioritize structured/graph-based access patterns over natural language by default?
- If agents prefer deterministic paths, how should this influence tool design?
The Reddit discussion seeks input from others building agent tooling to see if they've observed similar patterns.
📖 Read the full source: r/LocalLLaMA
👀 See Also

When RLVR Helps Small Fine-Tuned Models: A 12-Dataset Analysis
A controlled experiment tested adding RLVR reinforcement learning on top of 1.7B parameter models fine-tuned with SFT. Results show text generation tasks improved by +2.0 percentage points on average, while structured tasks declined by -0.7pp.

Multi-Agent Systems: Engineering Workflows vs. Emergent Intelligence
A developer's analysis argues current multi-agent systems like LangGraph and AutoGen workflows function more as microservices with LLM wrappers, providing task decomposition, parallelization, and modularity rather than true emergent intelligence.

Analysis of OpenClaw's Astroturfing Campaign and $CLAWD Token Pump
A Reddit investigation reveals OpenClaw's viral growth in late January was driven by a recursive astroturfing campaign using approximately 400 bot instances, which created hype to pump the $CLAWD token to a $16M market cap before it crashed 90%.

Gemma 4 vs Qwen 3.5 Blind Evaluation Results with Claude Opus as Judge
A 30-question blind evaluation compared Gemma 4 31B, Gemma 4 26B-A4B, and Qwen 3.5 27B using Claude Opus 4.6 as judge. Qwen 3.5 27B won 46.7% of matchups but had lower average scores due to three zero-scoring responses.