Practical Lessons from Using AI Agents on a 100k LOC Codebase

✍️ OpenClawRadar📅 Published: March 2, 2026🔗 Source
Practical Lessons from Using AI Agents on a 100k LOC Codebase
Ad

Six Concrete Techniques for Large-Scale AI-Assisted Development

A developer recently documented their experience using AI agents (Claude Code + Cursor) to build a pandas-compatible API layer on top of the chDB analytical engine. The project involved aligning 600+ methods across two systems and cost approximately $20k in tokens. Here are the specific, actionable lessons they shared.

Ad

Key Implementation Details

  • Maintain a CLAUDE.md rules file: Since AI has no cross-session memory, they committed a rules file to the repository containing every pattern the AI kept getting wrong, every shortcut they banned, and every architectural decision that was settled. This also served as the team collaboration interface. They caution against letting this file become a "500-line manifesto" that the AI will start ignoring.
  • Watch the reasoning, not just the output: In early stages, reading how the AI thinks is more valuable than what it ships. When its logic drifts from yours, ask: was my thinking wrong, or did I just not communicate it properly?
  • Periodically use a zero-context agent as a critic: They started using a fresh agent (claude.ai/code, not Claude Code CLI) with zero project memory to evaluate their work from a critical, rational outsider's perspective. Two keywords matter: critical (override AI's default accommodating mode) and rational (demand structured reasoning, not vibes).
  • Use the target system as the test oracle: Since their goal was to match an existing API, they found real code in the wild (GitHub/Kaggle notebooks), swapped one import line, and compared outputs instead of inventing test cases.
  • Rules over prompts: They observed how AI takes shortcuts and wrote explicit bans. For example: when tests failed due to row order mismatch, the AI's favorite move was adding .sort_values() to make the test pass. They banned this explicitly. Cases that genuinely can't be matched get marked XFAIL, never silently skipped.
  • Filesystem over conversation history for multi-agent pipelines: They orchestrate multi-agent workflows with Python scripts where the filesystem is the shared context layer. Each agent writes its work to a tracking directory, and the next reads what it needs. Key patterns that worked: role separation, structured decisions (APPROVE/REJECT/ESCALATE as JSON for deterministic control flow), and automatic git rollback on failure.

The developer notes that AI excels at scale work—aligning hundreds of functions, generating thousands of tests, catching regressions—but judgment ("is this a bug or a feature? Is the architecture right?") remains the human's responsibility.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also