LogClaw: Open-Source AI SRE for Auto-Ticketing from Logs

LogClaw is an open-source AI SRE platform that deploys in your VPC and automatically creates incident tickets from log anomalies. Built by Robel after frustration with vague alerts from tools like Datadog, it focuses on turning log noise into actionable tickets without manual intervention.

How It Works

The system ingests logs via OpenTelemetry and detects anomalies using signal-based composite scoring rather than simple threshold alerting. It extracts 8 failure-type signals: OOM, crashes, resource exhaustion, dependency failures, DB deadlocks, timeouts, connection errors, and auth failures. These are combined with statistical z-score analysis, blast radius, error velocity, and recurrence signals into a composite score.

Critical failures (OOM, panics) trigger immediate detection. Once an anomaly is confirmed, a 5-layer trace correlation engine groups logs by traceId, maps service dependencies, tracks error propagation cascades, and computes blast radius across affected services.

The Ticketing Agent then pulls the correlated timeline, sends it to an LLM for root cause analysis, and creates a deduplicated ticket on Jira, ServiceNow, PagerDuty, OpsGenie, Slack, or Zammad. The entire loop from log noise to filed ticket takes about 90 seconds.

Architecture

LogClaw uses this architecture: OTel Collector → Kafka (Strimzi, KRaft mode) → Bridge (Python, 4 concurrent threads: ETL, anomaly detection, OpenSearch indexing, trace correlation) → OpenSearch + Ticketing Agent.

The AI layer supports OpenAI, Claude, or Ollama for fully air-gapped deployments. Everything deploys with a single Helm chart per tenant, namespace-isolated with no shared data plane.

Current Limitations

Metrics and traces are not supported yet — this is logs-only. Metrics support is on the roadmap.
The anomaly detection is signal-based + statistical (composite scoring with z-score), not deep learning. It catches 99.8% of critical failures but won't detect subtle performance drift patterns yet.
The dashboard is functional but basic. OpenSearch Dashboards are used for the heavy lifting.

Deployment and Pricing

The platform is licensed under Apache 2.0. A managed cloud version is available for $0.30/GB ingested if you don't want to self-host. According to the source, LogClaw can provide 80-90% cost savings versus Splunk/Datadog, with an annual observability cost of $38K versus $1.2M for Splunk at 500GB/day.

For local development, documentation is available at https://docs.logclaw.ai/local-development.

📖 Read the full source: HN AI Agents