Claude Agent Swarm With Memory: 43% Token Cost Savings

Memory System Benchmark for Claude Agent Swarms

A developer has been building a memory system called Stompy for nine months, evolving from file-based to SQLite to PostgreSQL. The goal was to minimize token usage when running Claude agent swarms. They conducted a benchmark comparing performance with and without the memory system.

Test Setup

The benchmark used a 40-point coding task requiring a full booking feature with backend, frontend, and tests. A 6-agent swarm was tested with three different Claude models as lead: Sonnet 4.6, Opus 4.6, and Haiku 4.5. All tests used the same codebase, same teammates, and same scoring system. Teammate agents always ran Opus regardless of the lead model.

Benchmark Results

Sonnet 4.6 + memory: 40/40, $3.98, 6.5min, 2 turns
Sonnet 4.6 no memory: 40/40, $7.04, 9.6min, 4 turns
Opus 4.6 + memory: 40/40, $4.34, 9.6min, 29 turns
Opus 4.6 no memory: 40/40, $7.65, 10.0min, 70 turns
Haiku 4.5 + memory: 39/40, $4.95, 7.5min, 2 turns
Haiku 4.5 no memory: 0/40, $3.97, 5.8min, 3 turns

Key Findings

Opus and Sonnet with memory saved about 43% on cost compared to running without memory. The developer notes that these models are smart enough to complete the task without memory, but they burn tokens on codebase exploration that the memory system eliminates.

The Haiku result was unexpected: it scored 0/40 without memory but 39/40 with memory. The developer observed that Haiku couldn't coordinate the Opus teammate agents without understanding the project structure, but became a competent lead with memory access.

Sonnet with memory was the best overall configuration, beating memoryless Opus on every metric at roughly half the cost. The takeaway is that making project knowledge available to the model matters more than using expensive models.

Technical Details

The memory system is called Stompy and is MCP/API/CLI-based, working with Claude Code. The benchmark setup is available on GitHub for others to use or improve. The developer notes this is n=1 per condition so far, with more runs planned.

📖 Read the full source: r/ClaudeAI

Benchmark Results: Claude Agent Swarm with Memory System Shows 30-43% Token Cost Savings

Memory System Benchmark for Claude Agent Swarms

Test Setup

Benchmark Results

Key Findings

Technical Details

👀 See Also

Voker Launches Agent Analytics Platform with Intent/Correction/Resolution Primitives

cc+ Desktop App for Claude Code: Multi-Session Management and Fleet Orchestration

Detecting Silent Tool Failures in AI Coding Agents with Vibeyard

RCFlow: Open-source orchestrator for Claude Code, Codex, and OpenCode with multi-session management