Agent-Xray: Open-source tool for debugging AI agent failures from trace logs

Agent-Xray is an open-source tool for debugging AI agents by analyzing their trace logs. It was created to solve the problem of agents failing tasks without clear errors—situations where code runs fine but the agent makes wrong decisions, like repeatedly calling the wrong tool despite error messages suggesting the correct one.
Key Features
The tool reads trace logs and provides structural grading and root-cause classification for agent failures. It reconstructs what the agent was seeing at each step to help understand why bad decisions were made.
Failure Categories
- spin
- tool_bug
- early_abort
Enforcement Mode
The most significant feature according to the creator is enforcement mode. After fixing an agent bug, this mode runs adversarial challenges against your fixes to verify they're legitimate. It checks for:
- Hardcoded returns
- Weakened assertions
This addresses the problem where fixes might work on specific test tasks but are actually fragile, or where agents learn to game the test.
Workflow Integration
The tool runs as MCP tools, allowing Claude Code to use it directly. A typical workflow described in the source:
- Tell Claude Code to triage agent traces
- It finds the worst failure
- Replays what the agent saw
- Suggests a fix
- Enforcement mode verifies the fix is legitimate
The creator describes this as "agents debugging agents."
Technical Details
- Installation:
pip install agent-xray - Quickstart:
agent-xray quickstart(includes sample traces to test without your own data) - License: MIT
- Zero dependencies
- Runs offline
- Works with OpenAI, Anthropic, LangChain, CrewAI, OpenTelemetry traces
- Project age: About 9 days old at time of posting
Use Case
This tool is for developers working with AI agents who need to debug failures that don't produce traditional errors or stack traces—situations where agents make incorrect decisions despite having access to correct tools and information.
📖 Read the full source: r/ClaudeAI
👀 See Also

Curated List of 260+ AI Agent Tools with Claude Ecosystem Highlights
A GitHub repository contains a curated list of 260+ AI agent tools, including specific Claude-related entries like Claude Code (80.9% SWE-bench), Claude Computer Use, and Claude in Chrome, plus tools that work well with Claude such as Cline and Cursor.

Open Source Grafana Dashboard Tracks Claude Code Costs and Usage via OpenTelemetry
An SRE built a free Grafana dashboard to visualize Claude Code spend, token usage, cache hit ratios, and edit decisions by pulling OpenTelemetry metrics into Prometheus-compatible backends.

Comparison of 14 Claw AI Agent Variants Across 10 Categories
A detailed comparison of 14 popular Claw AI agent variants including OpenClaw, NanoClaw, NemoClaw, ZeroClaw, PicoClaw, Moltis, IronClaw, and NullClaw, scored across 53 sub-parameters with composite rankings and ideal use cases for each.

Claude Code Plan Mode Reduces Redo Rate from 40% to Near Zero
A developer tracked 30+ coding sessions with Claude Code and found that skipping Plan Mode resulted in redoing tasks from scratch 40% of the time. With Plan Mode, the redo rate dropped to basically zero, with one feature taking 17 minutes total versus 35+ minutes without planning.