AGENTS.md Files Hurt AI Agent Performance: ETH Zurich Study

Research Findings on AGENTS.md Files

A new paper from ETH Zurich researchers challenges the widespread industry practice of using AGENTS.md files with AI coding agents. The study, conducted by Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, and Martin Vechev, provides empirical evidence that these context files often hinder rather than help AI agents.

Methodology and Testing

The team built AGENTbench, a novel dataset of 138 real-world Python tasks sourced from niche repositories to avoid bias from popular benchmarks like SWE-bench that AI models may have memorized. They tested four agents: Claude 3.5 Sonnet, Codex GPT-5.2, GPT-5.1 mini, and Qwen Code across three scenarios:

No context file
LLM-generated AGENTS.md file
Human-written AGENTS.md file

Performance was measured using three proxy indicators: task success rates (determined by repository unit tests), number of agent steps, and overall inference costs.

Key Results

LLM-generated context files degraded performance, reducing task success rates by an average of 3% compared to providing no context file. These files consistently increased the number of steps agents took, driving up inference costs by over 20%.

Human-written files showed marginal gains with a 4% average increase in task success rate on AGENTbench, but this came with a parallel increase in steps, raising costs by up to 19%.

Including architectural overviews or repository structure explanations in AGENTS.md files did not reduce the time models spent locating relevant files for tasks.

Behavior Analysis

Trace analysis revealed that agents generally followed instructions in AGENTS.md files, leading them to run more tests, read more files, execute more grep searches, and perform more code-quality checks. While thorough, this behavior was often unnecessary for resolving specific tasks, forcing reasoning models to "think" harder without yielding better final patches.

Practical Recommendations

The researchers recommend omitting LLM-generated context files entirely and limiting human-written instructions to non-inferable details, such as highly specific tooling or custom build commands. They note that while 60,000 open-source repositories currently contain context files like AGENTS.md, and many agent frameworks feature built-in commands to auto-generate them, these files have only marginal effects on agent behavior.

📖 Read the full source: HN AI Agents