LLMs generate SQL queries to analyze terabytes of CI logs in seconds

Mendral's AI agent traced a flaky test to a dependency bump three weeks prior by writing its own SQL queries, scanning hundreds of millions of log lines across a dozen queries, and following a trail from job metadata to raw log output. The whole investigation took seconds.
Every week, about 1.5 billion CI log lines and 700K jobs flow through their system. All of it lands in ClickHouse, compressed at 35:1. All of it is queryable in milliseconds.
SQL interface for the agent
They expose a SQL interface to the agent, scoped to the organization it's investigating. The agent constructs its own queries based on the question. No predefined query library, no rigid tool API. A constrained tool API like get_failure_rate(workflow, days) would limit the agent to anticipated questions. A SQL interface lets it ask questions never thought of, which matters when debugging novel failures.
The agent queries two main targets:
- Job metadata: a materialized view with one row per CI job execution. The agent uses this 63% of the time for questions like "how often does this fail?", "what's the success rate?", "which jobs are slowest?", "when did this start failing?"
- Raw log lines: one row per log line. The agent uses this 37% of the time for questions like "show me the error output for this job", "when did this log pattern first appear?", "how often does this error message occur across runs?"
Query patterns and scale
They analyzed 8,534 agent sessions and 52,312 queries from their observability pipeline. The agent doesn't stop at one query. It investigates - starts broad, then drills in.
Total rows scanned across all queries to answer one question:
- Typical question: 335K rows across about 3 queries
- P75: 5.2 million rows
- P95: 940 million rows
- Heaviest raw-log sessions scan 4.3 billion rows
The agent averages 4.4 queries per session. A typical investigation begins with job metadata (cheap queries, median 47K rows) against a compact, pre-aggregated materialized view. When it finds something interesting, it drills into raw logs (expensive queries, median 1.1M rows).
Data architecture
For the agent to query this fast, the data needs to be structured for it. Up to 300 million log lines flow through on a busy day. They use ClickHouse.
Every log line carries 48 columns of metadata: the full context of the CI run it belongs to. Commit SHA, author, branch, PR title, workflow name, job name, step name, runner info, timestamps, and more.
In ClickHouse's columnar format, denormalizing 48 columns onto every single log line is essentially free. A column like commit_message has the same value for every log line in a CI run, and a single run can produce thousands of log lines. ClickHouse stores those thousands of identical values in sequence. The compression algorithm sees the repetition and compresses it to almost nothing.
Compression ratios:
commit_message: 301:1display_title: 160:1workflow_path: 79:1step_name: 52:1job_name: 48:1
Without denormalization, every query would require a join. With it, they're all simple filters.
📖 Read the full source: HN LLM Tools
👀 See Also

Testing Claude Sonnet with a Strategy Board Game: Rule Adherence Challenges
A developer tested Claude Sonnet by playing OFMOS® Essential, a patented strategy board game about product portfolio management, using a structured prompt system with rules, board representation, and turn management. The model understood rules and tracked scores but frequently made illegal moves due to lack of constrained move generation.

Exploring Success Stories for the Android Companion App via Tailscale
Discover how Android companion apps leverage Tailscale to enhance security and connectivity, based on discussions from the OpenClaw community.

Using Claude Code to Build a Drupal Site with Custom Twig Templates
A developer used Claude Code to create a Drupal website with custom Twig templates and raw HTML, bypassing traditional Drupal theming. They employed ddev for local development and specific commands to configure content types, views, and taxonomies.

OpenClaw user automates cross-platform content formatting with custom skill
A developer built an OpenClaw skill that automatically formats raw drafts for multiple platforms, eliminating manual markdown adjustments for each site's specific requirements.