Gemini 3.1 Pro in Multi-Agent Systems: High Design Quality, 20% Tool-Call Failure Rate

Architecture and Testing Context
The team behind Bobr, an AI presentation generator, tested Gemini 3.1 Pro within a two-level agent system. The architecture consists of:
- Orchestrator Agent: Handles conversation, understands user intent, plans structure, and dispatches work via tool calls.
- Creative Agent (Gemini 3.1 Pro in this test): Receives slide descriptions, generates images, builds templates (1920x1080), and returns results via a
submit_slidetool call.
The creative agent has tools including generate_image, search_images, and submit_slide. The submit_slide call is critical—it returns a 'submit' signal, terminates the agent loop, and extracts slide data. Both agents run through the same loop with streaming, parallel tool execution, and iteration limits.
Strengths: Design and Aesthetic Output
When Gemini 3.1 Pro works correctly, it produces superior design output compared to other models tested (Claude Sonnet 4.6 and GPT-5.2). Specific strengths include:
- Aesthetic intuition: Better color theory and visual hierarchy.
- Layout creativity: Experiments with asymmetric compositions, overlapping elements, and modern UI styles like dark-mode/glassmorphism.
- Vibe interpretation: Effectively handles vague prompts like "make it feel premium" or "tech startup vibes."
- Code quality: Generates modern, structural HTML/CSS.
Critical Problems in Production
The team encountered two major reliability issues with Gemini 3.1 Pro in their agentic pipeline:
1. ~20% Tool-Call Failure Rate
In approximately 20% of requests, Gemini 3.1 Pro fails to call the required submit_slide tool. Instead, it exhibits several failure patterns:
- Outputs raw HTML template as plain text, describing what it "would" create rather than triggering the tool.
- Generates images correctly but stops without submitting, hitting iteration limits.
- Calls image generation tools but writes natural language summaries ("Here is your beautiful slide...") instead of the final tool call.
- Enters loops refining design descriptions in text without committing to action.
Since submit_slide is the hard exit path, failures result in no data returned to the orchestrator and failed user generations.
2. Garbled/Corrupted Output
The model frequently returns corrupted text in responses—random character sequences, broken Unicode, half-encoded strings. This corruption sometimes bleeds into slide content (variable values, template markup), meaning even successful submissions might display gibberish text in presentations.
Comparison with Other Models
- Claude Sonnet 4.6: Near-zero failure rate on
submit_slidecalls in the same creative agent role, described as "boringly reliable" with no garbled output. - GPT-5.2: Moderate tool reliability between Gemini and Claude, but doesn't suffer from encoding/gibberish issues.
Attempted Mitigations
The team tried several approaches without significant improvement:
- Adding aggressive explicit instructions in system prompts: "You MUST call submit_slide. Do not output the template as text."
- Injecting few-shot examples showing exact expected tool-call patterns.
- Reducing iteration limits to force faster convergence.
- Stripping down and simplifying tool schemas.
Despite these issues, Gemini 3.1 Pro remains live in their system due to its superior design capabilities when it functions correctly.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Eden AI: European API Hub for AI Models – Pivots as OpenRouter Alternative
Eden AI offers a single unified API to access 500+ AI models (LLMs, vision, OCR, speech) with smart routing, fallback mechanisms, and region control. Positioned as a European alternative to OpenRouter.

EctoLedger: Open-source microVM sandbox for local AI agents with terminal access
EctoLedger is an open-source runtime firewall and ledger that provides microVM isolation for local AI agents with terminal access, running four prevention layers before executing commands in Apple Hypervisor.framework (macOS) or Firecracker microVM (Linux) environments.

companion-capture: Tool saves Claude Code's ephemeral speech bubbles
companion-capture is an open-source tool that captures Claude Code's companion character speech bubbles before they vanish from the terminal. It saves messages to markdown files and SQLite for search, using VT100 screen buffer parsing to track cursor positions.

Open-source tool automates Meta ad competitor analysis with Claude Code
Ads Machine is an open-source system built with Claude Code that scrapes competitor ads from Meta's Ad Library, transcribes videos, extracts hooks and angles, and grades ads based on how long they've been running. It can generate variations from successful ads and push campaigns to Meta.