How Mendral Cut LLM Costs by Upgrading to Opus: Triager Pattern, SQL Access, and Sub-Agent Architecture

Mendral recently published details on how they upgraded to Opus 4.6 for CI failure analysis while reducing overall LLM costs compared to their previous setup with Sonnet 4.0. The key is an architecture that separates triage from investigation and uses cheap sub-agents for heavy lifting.
Architecture: Cheap triager, expensive planner
Out of ~4,000 CI failures analyzed, 3,187 were duplicates — a known flaky test, infrastructure hiccup, or network blip. Waking up an expensive model for those is wasteful. But deduplication isn't deterministic: the same job can fail for different reasons. Their solution is a triager pattern:
- A Haiku agent handles the narrow job: decide if a failure is already tracked. It uses exact matching and semantic search (pgvector) against known error messages. Two different strings like
operator does not exist bigint character varyingandmigration type mismatch on installation_idare the same root cause — semantic search catches that. - When in doubt, Haiku escalates to Opus 4.6. A false positive costs a little; a false negative misses a real bug.
- 4 out of 5 failures never reach Opus. A triager match costs ~25x less than a full investigation.
Let agents pull context, don't push it
Instead of stuffing 200K+ line logs into prompts, agents get a SQL interface to ClickHouse. There's a raw table (github_logs, one row per log line) and materialized views with pre-aggregated data: failure rates by workflow, job timings, outcome counts. Most investigations start with the views to narrow down, then drill into raw logs. If a query returns too many rows, the system truncates and suggests a more specific view. If logs aren't ingested yet, agents fall back to the GitHub CLI.
Expensive models plan, cheap models execute
Opus forms a hypothesis and spawns Haiku sub-agents capped at one level deep — no unbounded fan-out. Each sub-agent gets a prompt from Opus: exactly what to search and how. Example from a real case:
Three Storybook CI jobs failed on the same commit, crashing at pnpm install. Opus dispatched a sub-agent to fetch error messages from that step. ClickHouse didn't have the logs yet, so the sub-agent used GitHub CLI and returned: gyp ERR! not found: make — [email protected] couldn't compile because make wasn't on the runner. Opus then queried ClickHouse for the failure trend over 14 days, found the inflection point, and escalated. Sub-agent prompts are explicit: "Fetch the CI logs for this run. Return the exact error messages from the pnpm install step, the full error output, especially the last 50-100 lines."
Who this is for
Teams building LLM-powered agents for CI debugging or any task where context size and cost are concerns.
📖 Read the full source: HN LLM Tools
👀 See Also

Trepan: Local VS Code Security Auditor for AI-Generated Code
Trepan is an open-source VS Code extension that acts as a security gatekeeper for AI-generated code suggestions. It uses Ollama to run local security audits against project-specific rules in a .trepan/system_rules.md file.

Holaboss AI Runtime Moves to TypeScript, Implements Persistent MCP Ports
The Holaboss AI local agent runtime has been refactored to use TypeScript exclusively, eliminating Python dependencies and reducing bundle size. It now persists MCP server ports in SQLite with UNIQUE(port) constraints to prevent collisions across restarts.

Persistent Memory for Claude: Local Stack with MCP, 39ms Retrieval, 82% Token Reduction
A developer built a persistent memory layer for Claude using local vector search (Qdrant + Qwen3) and MCP integration, achieving 82% token reduction, 39ms hot-path retrieval, and session crystallization via L4 nodes.

MCP Memory Gateway: An MCP Server for Persistent Memory in Claude Code
A developer built an MCP server called MCP Memory Gateway using Claude Code as the primary development tool. It provides Claude Code with persistent memory across sessions through feedback capture, prevention rules, and context injection.