Treating Agent Runs as Review Packets: A Practical Pattern for Claude Code & Codex

A Reddit user experimenting with Codex/Claude-style agent workflows shares a pattern that improved their results: instead of treating agent runs as chat transcripts, they now produce a durable folder with multiple artifacts that another human or agent can inspect.
Key artifacts per run
research.md— sources and assumptions used by the agentdrafts.md— candidate outputs, including rejected onesevals.md— scoring rubric and reasoning for the chosen optionapproval-packet.md— checkpoint before the irreversible stepmetrics.json— numeric outcomes of the runmemory.md— reusable workflow lessons only
Two big lessons
Memory should be about how to work, not an unreviewed fact database. If a claim matters, it belongs in a reviewed artifact with a source.
“Fully autonomous” is less useful than “autonomous until the irreversible step.” For code that means commit/deploy. For content that means publish. For local workflows it means anything touching credentials or third-party accounts.
Why this helps
Failures become visible at specific stages: Was the research wrong? Was the draft bad? Was the eval rubric too vague? Did the approval packet miss a risk? Did memory store a lesson that actually helped next time? This makes iteration faster and more targeted than relying on chat transcripts.
The post is a discussion starter — the author is curious if others are using durable artifacts or trusting chat transcripts for Claude Code/Codex workflows.
📖 Read the full source: r/ClaudeAI
👀 See Also

How to Fix Claude Code's CSS Guesswork with a Design System
A developer found Claude Code repeatedly regenerated misaligned HTML/CSS because it designs blind without visual feedback. The solution: provide a complete design system with spacing, colors, and type variables, then separate HTML and CSS prompts.

Using AI to Generate Project Tickets Before Coding Reduces Scope Drift
A developer found that asking AI to generate detailed project tickets with tasks, sub-tasks, scope, and acceptance criteria before writing any code significantly reduced scope creep and large diffs. Each AI agent only receives its specific sub-task, not the entire plan.

Model Routing Cut API Costs by 85% vs Claude Max Subscription – A Developer's Analysis
A Claude Max subscriber tracked token usage and found only 15% of tasks needed Opus. Switching to API routing (Sonnet for routine tasks, Opus for hard reasoning) dropped monthly cost from $200 to ~$30 with identical output quality.
Claude + MCP Browser: User Reports Supercharged Web Access
A Claude user explains how hooking Claude to an external browser via MCP allowed it to navigate previously inaccessible sites, and wonders if Claude can use the browser's model tokens.