LogClaw: Open-Source AI SRE for Auto-Ticketing from Logs

LogClaw is an open-source AI SRE platform that deploys in your VPC and automatically creates incident tickets from log anomalies. Built by Robel after frustration with vague alerts from tools like Datadog, it focuses on turning log noise into actionable tickets without manual intervention.
How It Works
The system ingests logs via OpenTelemetry and detects anomalies using signal-based composite scoring rather than simple threshold alerting. It extracts 8 failure-type signals: OOM, crashes, resource exhaustion, dependency failures, DB deadlocks, timeouts, connection errors, and auth failures. These are combined with statistical z-score analysis, blast radius, error velocity, and recurrence signals into a composite score.
Critical failures (OOM, panics) trigger immediate detection. Once an anomaly is confirmed, a 5-layer trace correlation engine groups logs by traceId, maps service dependencies, tracks error propagation cascades, and computes blast radius across affected services.
The Ticketing Agent then pulls the correlated timeline, sends it to an LLM for root cause analysis, and creates a deduplicated ticket on Jira, ServiceNow, PagerDuty, OpsGenie, Slack, or Zammad. The entire loop from log noise to filed ticket takes about 90 seconds.
Architecture
LogClaw uses this architecture: OTel Collector → Kafka (Strimzi, KRaft mode) → Bridge (Python, 4 concurrent threads: ETL, anomaly detection, OpenSearch indexing, trace correlation) → OpenSearch + Ticketing Agent.
The AI layer supports OpenAI, Claude, or Ollama for fully air-gapped deployments. Everything deploys with a single Helm chart per tenant, namespace-isolated with no shared data plane.
Current Limitations
- Metrics and traces are not supported yet — this is logs-only. Metrics support is on the roadmap.
- The anomaly detection is signal-based + statistical (composite scoring with z-score), not deep learning. It catches 99.8% of critical failures but won't detect subtle performance drift patterns yet.
- The dashboard is functional but basic. OpenSearch Dashboards are used for the heavy lifting.
Deployment and Pricing
The platform is licensed under Apache 2.0. A managed cloud version is available for $0.30/GB ingested if you don't want to self-host. According to the source, LogClaw can provide 80-90% cost savings versus Splunk/Datadog, with an annual observability cost of $38K versus $1.2M for Splunk at 500GB/day.
For local development, documentation is available at https://docs.logclaw.ai/local-development.
📖 Read the full source: HN AI Agents
👀 See Also

Building a $6.4k Local LLM Server: TCO Breakdown vs API Costs
A developer shares a detailed total cost of ownership for a 4x MI100 local server running llama.cpp, compared to API equivalents including OpenAI and Z.AI coding plans.

Persistent Side Panel for Claude Code with Autonomous Content Management
A developer built a TUI panel that sits in an iTerm2 split pane next to the terminal, featuring three fixed panels that Claude autonomously manages to show relevant content like code, diagrams, and status updates.

Agent Swarm: Multi-Agent Orchestration Framework for AI Coding Assistants
Agent Swarm is an open-source framework that enables teams of AI coding agents to coordinate autonomously. A lead agent receives tasks from Slack, GitHub, or email, breaks them down, and delegates to Docker-isolated worker agents.

Claude Desktop Feature Request: Session Start Hook for Automatic Initialization
A developer building persistent context systems for Claude Desktop identifies a gap: the User Preferences field only injects instructions when the user sends the first message, requiring manual triggers for initialization. They propose adding an "On Session Start" execution field that runs automatically when a new conversation opens.