Building a 350K-Line Codebase Solo with AI Agents: Lessons

Engineering Environment as Context

The developer's 52-day project (600 commits, ~965K lines of code throughput, 356K lines of production code) revealed that agent output quality depends critically on the engineering environment, not just the model. The codebase itself serves as the agent's context system, eliminating the need for separate RAG or memory files.

Clear architectural boundaries proved essential. The codebase follows strict DDD layering: domain layer for data structures, service layer for business logic, handler layer for HTTP format conversion, with 22 domain modules having clear boundaries. This tells agents where to make changes.

Directory structure functions as documentation with cross-stack naming alignment. For a feature like "Loop": backend/internal/domain/loop/ for data structures, backend/internal/service/loop/ for logic, web/src/components/loops/ for frontend. This direct mapping from product concept to code path eliminates the need for agents to explore the entire codebase.

Tech Debt Amplification

Tech debt spreads exponentially with AI agents. When developers make temporary compromises—bypassing the service layer to query the DB directly or using hardcoded magic numbers—agents systematically reuse these patterns as legitimate approaches. Unlike human engineers who recognize bad code as landmines, agents treat existing patterns as valid precedents.

The practical takeaway: regular refactoring becomes essential not for aesthetics but to maintain engineering signal purity. When good practices dominate, agents amplify good practices; when shortcuts dominate, agents amplify shortcuts. This represents a unique maintenance cost in agent-collaborative development.

Strong Typing as Quality Gate

Using Go + TypeScript + Proto provides compile-time error catching that shifts agent errors from runtime to development time. Agent-generated functions with mismatched signatures cause build failures. TypeScript catches API format mismatches immediately. Proto-generated code won't compile if message formats change without backend synchronization. These errors would slip into runtime in weakly typed languages.

Four-Layer Feedback System

Agents need four layers of feedback for efficient iteration:

Compilation — hot-reload, Go restarts within 1 second, TypeScript type errors flagged in real time. Eliminates syntax and type errors.
Unit tests — 700+ tests covering domain and service layers. Agents know within 5 minutes if they introduced regressions, especially for boundary conditions like multi-tenant isolation.
E2E tests — end-to-end validation of real functional paths. Catches integration issues unit tests can't reach.
CI pipeline — every PR runs full test suite, linting, type-checking, multi-platform build. The final safety net before merge.

The four layers provide increasing latency and expanding coverage: layer one confirms single-line changes, while layer four validates cross-module refactoring.

📖 Read the full source: r/ClaudeAI