Building a Coding Agent for 8k Context: Planner/Executor Split, Token Budgeting, and Parallel Execution

✍️ OpenClawRadar📅 Published: April 28, 2026🔗 Source
Building a Coding Agent for 8k Context: Planner/Executor Split, Token Budgeting, and Parallel Execution
Ad

Most AI coding tools assume 200k-token models, but if you're running local LLMs via Ollama, LM Studio, or free-tier APIs like Groq or OpenRouter, you're stuck with ~8k tokens. That doesn't fit a whole project — barely fits a single large file. One developer spent weeks building a CLI agent designed around this constraint, and shared the practical lessons learned.

Core architecture: planner/executor split

The agent never shows the LLM the entire project. Instead, it splits work into three roles:

  • Planner: sees only a lightweight project map (Markdown summaries of each folder, ~300-500 tokens total) plus the user request, and outputs a task list.
  • Executor: sees exactly one file plus one task per call — never two files together.
  • Orchestrator: pure code (no LLM) that builds a dependency graph from the task list and decides which tasks can run in parallel vs sequentially.

This turns multi-file refactors from a context-window problem into a scheduling problem. The planner doesn't need to see code, and the executor only sees a bounded amount of code at once.

Token budgeting enforced in code

Every LLM call goes through a canFit() check that measures system prompt + reserved output tokens + memory + actual code. If code doesn't fit, the agent falls back to a per-file line index (generated once for files over ~150 lines) and pulls only the relevant section.

Budget math for 8192 tokens:

System prompt + instructions: ~1000
Reserved for response: ~2000
Short-term memory (4 entries): ~360
Available for actual code: ~4800 (about 140-190 lines)

When budget is tight, folder context is dropped first, then memory, before cutting actual code.

Ad

Parallel execution as speed multiplier

Because each executor sees only one file, independent edits across files run simultaneously. A 5-file refactor completes in roughly the time of the longest single edit. The dependency graph (built in code from the planner's task list) decides ordering.

Pain points and fixes

  • Question-style requests overwriting files: asking "how many lines does X have?" caused the executor to write the answer into the file. Fixed by adding an action_type: "query" field to the planner's output, routed through a code path that never touches disk.
  • Stale project maps causing silent misroutes: if the user mentioned a renamed file not in the map, the planner would silently route to the closest match. Now the orchestrator validates that mentioned file paths exist on disk and throws a clear error if they don't.
  • Markdown fences in executor output: smaller models wrap code in triple backticks even when told not to. Fix: strip them in post-processing instead of fighting the prompt.
  • Memory token cost: persistent memory adds ~80-90 tokens per entry. Folder context is dropped first when budget is tight, then memory, before actual code gets cut.

Open questions

Whether the planner/executor split scales to codebases over 50 files — the dependency graph stays manageable, but the project map starts costing real tokens. Currently dropping folder context first, but deeper edits lose context. The implementation is open-sourced if you want to dig in.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Video Editor Builds Free Transcription Tool Treelo Using Claude Code
Tools

Video Editor Builds Free Transcription Tool Treelo Using Claude Code

A video editor created Treelo, a free web tool that transcribes audio/video files into editable timestamp blocks with caption presets and exports to SRT, VTT, ASS, and WAV formats. The tool was built through iterative conversations with Claude Code.

OpenClawRadar
Yozora-fm: Interactive Anime Music Galaxy Visualization
Tools

Yozora-fm: Interactive Anime Music Galaxy Visualization

Yozora-fm is an interactive visualization where each star represents an anime opening or ending song, with over 9,000 tracks mapped by genre and era. Users can click stars to play videos or explore the galaxy interface.

OpenClawRadar
Open Source System Captures Claude Code Patterns into Evolving Documentation
Tools

Open Source System Captures Claude Code Patterns into Evolving Documentation

Developer Lee Fuhr has released three open source repositories that systematically capture and codify learnings from working with Claude Code. The system includes a methodology document with 14 principles and 19 patterns, an architecture classification framework, and a memory system with 149 features.

OpenClawRadar
Interact MCP: Faster Web Browsing for Claude Code with Persistent Chromium
Tools

Interact MCP: Faster Web Browsing for Claude Code with Persistent Chromium

Interact MCP is a Model Context Protocol tool that keeps a persistent Chromium browser in-process, reducing browser action times from 2-5 seconds to 5-50ms after the initial call. It features a ref system for element interaction without CSS selectors and includes 46 tools for web automation.

OpenClawRadar