Benchmark Results: Claude Agent Swarm with Memory System Shows 30-43% Token Cost Savings

✍️ OpenClawRadar📅 Published: March 8, 2026🔗 Source
Benchmark Results: Claude Agent Swarm with Memory System Shows 30-43% Token Cost Savings
Ad

Memory System Benchmark for Claude Agent Swarms

A developer has been building a memory system called Stompy for nine months, evolving from file-based to SQLite to PostgreSQL. The goal was to minimize token usage when running Claude agent swarms. They conducted a benchmark comparing performance with and without the memory system.

Test Setup

The benchmark used a 40-point coding task requiring a full booking feature with backend, frontend, and tests. A 6-agent swarm was tested with three different Claude models as lead: Sonnet 4.6, Opus 4.6, and Haiku 4.5. All tests used the same codebase, same teammates, and same scoring system. Teammate agents always ran Opus regardless of the lead model.

Benchmark Results

  • Sonnet 4.6 + memory: 40/40, $3.98, 6.5min, 2 turns
  • Sonnet 4.6 no memory: 40/40, $7.04, 9.6min, 4 turns
  • Opus 4.6 + memory: 40/40, $4.34, 9.6min, 29 turns
  • Opus 4.6 no memory: 40/40, $7.65, 10.0min, 70 turns
  • Haiku 4.5 + memory: 39/40, $4.95, 7.5min, 2 turns
  • Haiku 4.5 no memory: 0/40, $3.97, 5.8min, 3 turns
Ad

Key Findings

Opus and Sonnet with memory saved about 43% on cost compared to running without memory. The developer notes that these models are smart enough to complete the task without memory, but they burn tokens on codebase exploration that the memory system eliminates.

The Haiku result was unexpected: it scored 0/40 without memory but 39/40 with memory. The developer observed that Haiku couldn't coordinate the Opus teammate agents without understanding the project structure, but became a competent lead with memory access.

Sonnet with memory was the best overall configuration, beating memoryless Opus on every metric at roughly half the cost. The takeaway is that making project knowledge available to the model matters more than using expensive models.

Technical Details

The memory system is called Stompy and is MCP/API/CLI-based, working with Claude Code. The benchmark setup is available on GitHub for others to use or improve. The developer notes this is n=1 per condition so far, with more runs planned.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also

🦀
Tools

Voker Launches Agent Analytics Platform with Intent/Correction/Resolution Primitives

YC S24 startup Voker launches an agent analytics platform with a lightweight SDK that automatically annotates user intents, corrections, and resolutions — providing self-service dashboards without relying on LLMs for data engineering.

OpenClawRadar
cc+ Desktop App for Claude Code: Multi-Session Management and Fleet Orchestration
Tools

cc+ Desktop App for Claude Code: Multi-Session Management and Fleet Orchestration

cc+ is an open-source desktop application for Claude Code built on the Claude Agent SDK, available for macOS and Linux. It provides multi-session tabs, live activity tree visualization, security scoring, workflow enforcement, and fleet orchestration capabilities.

OpenClawRadar
Detecting Silent Tool Failures in AI Coding Agents with Vibeyard
Tools

Detecting Silent Tool Failures in AI Coding Agents with Vibeyard

Vibeyard is a tool that detects when AI coding agents experience silent tool failures—where agents fall back to alternative strategies without alerting developers—and surfaces these inefficiencies during sessions. It can suggest fixes to prevent repeated inefficient workflows.

OpenClawRadar
RCFlow: Open-source orchestrator for Claude Code, Codex, and OpenCode with multi-session management
Tools

RCFlow: Open-source orchestrator for Claude Code, Codex, and OpenCode with multi-session management

RCFlow is an AGPL v3 orchestrator for AI coding agents (Claude Code, Codex, OpenCode) providing a unified UI to manage parallel sessions across machines, with worktree support, task planning, artifact tracking, and live telemetry.

OpenClawRadar