Components of a Coding Agent: How Tools, Memory, and Context Extend LLMs

Sebastian Raschka outlines the architecture of coding agents, which are systems that wrap LLMs in application layers to improve performance on coding tasks. He distinguishes between LLMs, reasoning models, and agents, explaining that much of the practical progress in LLM systems comes from the surrounding system components rather than just better models.
Key Components of Coding Agents
The article identifies six main building blocks that make coding agents effective:
- Repo context: Navigation and management of code repository information
- Tool design: Integration of external tools and functions
- Prompt-cache stability: Consistent prompt management across sessions
- Memory: State retention and session continuity
- Long-session continuity: Maintaining context over extended interactions
- Model choice: Selection of appropriate LLM or reasoning model
Architecture Layers
Raschka defines several key concepts in the agent ecosystem:
- LLM: The core next-token model
- Reasoning model: An LLM trained or prompted to spend more inference-time compute on intermediate reasoning, verification, or search over candidate answers
- Agent: A control loop around the model that decides what to inspect next, which tools to call, how to update its state, and when to stop
- Agent harness: The software scaffold around an agent that manages context, tool use, prompts, state, and control flow
- Coding harness: A special case of agent harness specifically for software engineering that manages code context, tools, execution, and iterative feedback
He notes that Claude Code and Codex CLI can be considered coding harnesses. The relationship is described as: the LLM is the engine, a reasoning model is a beefed-up engine, and an agent harness helps us use the model effectively.
Coding work involves more than just next-token generation—it requires repo navigation, search, function lookup, diff application, test execution, error inspection, and context management. Coding harnesses combine three layers: the model family, an agent loop, and runtime supports.
📖 Read the full source: HN AI Agents
👀 See Also

How 40 Prompt Revisions Turned Claude AI Summaries Into a Product: A Tutoring Platform Case Study ($19K MRR)
A tutoring platform with $19K MRR iterated their Claude-generated session summary prompt 40+ times over 12 months. The journey from vague v1 to personalized v40 shows how prompt engineering transforms a feature into a product.

Method for Transferring User Context from ChatGPT to Claude
A Reddit user shares a two-prompt method for extracting a detailed cognitive profile from ChatGPT and creating a portable AI constitution to transfer to Claude, addressing the difficulty of porting between AI systems.

Using the Dispatcher Pattern to Reduce Claude API Costs by 95%
A developer reduced their Claude API costs from $800-$2,000/month to about $215/month by implementing a dispatcher pattern that delegates heavy work to Claude Code CLI on a Claude Max subscription, while using minimal API tokens for orchestration.

OpenClaw 4.1 with Gemma 4 Stack: Hybrid Architecture and Setup Fixes
A Reddit post details an optimized local agent stack combining OpenClaw 4.1 with Google's Gemma 4 model, featuring a hybrid architecture, specific configuration fixes for Ollama tool calling, and context window adjustments.