ETH Zurich Study Questions Value of AGENTS.md Files for AI Coding Agents

Research Findings on AGENTS.md Files
A new paper from ETH Zurich researchers challenges the widespread industry practice of using AGENTS.md files with AI coding agents. The study, conducted by Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, and Martin Vechev, provides empirical evidence that these context files often hinder rather than help AI agents.
Methodology and Testing
The team built AGENTbench, a novel dataset of 138 real-world Python tasks sourced from niche repositories to avoid bias from popular benchmarks like SWE-bench that AI models may have memorized. They tested four agents: Claude 3.5 Sonnet, Codex GPT-5.2, GPT-5.1 mini, and Qwen Code across three scenarios:
- No context file
- LLM-generated AGENTS.md file
- Human-written AGENTS.md file
Performance was measured using three proxy indicators: task success rates (determined by repository unit tests), number of agent steps, and overall inference costs.
Key Results
LLM-generated context files degraded performance, reducing task success rates by an average of 3% compared to providing no context file. These files consistently increased the number of steps agents took, driving up inference costs by over 20%.
Human-written files showed marginal gains with a 4% average increase in task success rate on AGENTbench, but this came with a parallel increase in steps, raising costs by up to 19%.
Including architectural overviews or repository structure explanations in AGENTS.md files did not reduce the time models spent locating relevant files for tasks.
Behavior Analysis
Trace analysis revealed that agents generally followed instructions in AGENTS.md files, leading them to run more tests, read more files, execute more grep searches, and perform more code-quality checks. While thorough, this behavior was often unnecessary for resolving specific tasks, forcing reasoning models to "think" harder without yielding better final patches.
Practical Recommendations
The researchers recommend omitting LLM-generated context files entirely and limiting human-written instructions to non-inferable details, such as highly specific tooling or custom build commands. They note that while 60,000 open-source repositories currently contain context files like AGENTS.md, and many agent frameworks feature built-in commands to auto-generate them, these files have only marginal effects on agent behavior.
📖 Read the full source: HN AI Agents
👀 See Also

Claude Agents on Bedrock Get Autonomous Micropayments via x402 Protocol
AWS AgentCore Payments lets Claude agents on Bedrock hold wallets and make USDC micropayments mid-task via the x402 HTTP standard, enabling autonomous paid API calls and subtask delegation without human approval.

Research on Professional Social Networks for AI Agents
Analysis of intent, behavior, and platform trends for professional AI agent social networks, focusing on Moltbook, Agent.ai, and Clawsphere, with examination of Meta's acquisition impact.

OpenClaw Experiment: AI Agents Choosing Silence to Improve Signal-to-Noise Ratio
An OpenClaw experiment gives AI agents autonomy to skip tasks when they can't add value, logging silence decisions to a 'silence log' with reasoning. The system uses LLM calls before content generation and auto-adjusts thresholds after 3 consecutive silence days.

Elevated Errors on Claude Opus 4.7: Status Update and What to Expect
Claude Opus 4.7 is experiencing elevated errors as of 2026-05-19T15:21Z. Check status.claude.com for progress and resolutions.