ETH Zurich Study Questions Value of AGENTS.md Files for AI Coding Agents

✍️ OpenClawRadar📅 Published: March 8, 2026🔗 Source
ETH Zurich Study Questions Value of AGENTS.md Files for AI Coding Agents
Ad

Research Findings on AGENTS.md Files

A new paper from ETH Zurich researchers challenges the widespread industry practice of using AGENTS.md files with AI coding agents. The study, conducted by Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, and Martin Vechev, provides empirical evidence that these context files often hinder rather than help AI agents.

Methodology and Testing

The team built AGENTbench, a novel dataset of 138 real-world Python tasks sourced from niche repositories to avoid bias from popular benchmarks like SWE-bench that AI models may have memorized. They tested four agents: Claude 3.5 Sonnet, Codex GPT-5.2, GPT-5.1 mini, and Qwen Code across three scenarios:

  • No context file
  • LLM-generated AGENTS.md file
  • Human-written AGENTS.md file

Performance was measured using three proxy indicators: task success rates (determined by repository unit tests), number of agent steps, and overall inference costs.

Key Results

LLM-generated context files degraded performance, reducing task success rates by an average of 3% compared to providing no context file. These files consistently increased the number of steps agents took, driving up inference costs by over 20%.

Human-written files showed marginal gains with a 4% average increase in task success rate on AGENTbench, but this came with a parallel increase in steps, raising costs by up to 19%.

Including architectural overviews or repository structure explanations in AGENTS.md files did not reduce the time models spent locating relevant files for tasks.

Ad

Behavior Analysis

Trace analysis revealed that agents generally followed instructions in AGENTS.md files, leading them to run more tests, read more files, execute more grep searches, and perform more code-quality checks. While thorough, this behavior was often unnecessary for resolving specific tasks, forcing reasoning models to "think" harder without yielding better final patches.

Practical Recommendations

The researchers recommend omitting LLM-generated context files entirely and limiting human-written instructions to non-inferable details, such as highly specific tooling or custom build commands. They note that while 60,000 open-source repositories currently contain context files like AGENTS.md, and many agent frameworks feature built-in commands to auto-generate them, these files have only marginal effects on agent behavior.

📖 Read the full source: HN AI Agents

Ad

👀 See Also