agentcache: Python Library for Multi-Agent LLM Prefix Caching

agentcache is a Python library designed to optimize multi-agent LLM systems by implementing prefix caching as a core feature. The library addresses the common problem where frameworks like CrewAI, AutoGen, and open-multi-agent create fresh sessions for each worker, resulting in zero cache hits and duplicated prompt costs.
How It Works
The library operates on a fork-based approach instead of creating separate sessions:
- Start one session with a shared system prompt
- Make the first call - provider computes and caches the prefix
- When you need N workers, fork instead of creating N new sessions
- Parent session: [system, msg1, msg2, ...]
- Forked session: [system, msg1, msg2, ..., WORKER_TASK]
- Exact same prefix = cache hit
Key Features
- Cache-safe forks: Maintains identical prefixes across worker sessions
- Cache-break detection: Diffs snapshots and reports exactly what changed when cache hits drop
- Cache-safe compaction: For long-running sessions, scans old tool outputs before each call and replaces large results with deterministic placeholders to maintain smaller context while preserving cacheable prefixes
- Parameter freezing: Freezes cache-relevant parameters before forking (system prompt, model, tools, messages, reasoning config)
- Task DAG scheduling: Enables parallel workers from one cached session
Performance Results
In a head-to-head test with GPT-4o-mini (coordinator + 3 workers, same task):
- Text injection / separate sessions: 0% cache hits, 85.7 seconds
- Prefix forks: 75.8% cache hits, 37.4 seconds
- Per worker cache hit rates typically range from 80-99%
Installation and Usage
Install via pip:
pip install "git+https://github.com/masteragentcoder/agentcache.git@main"
The library is available on GitHub at github.com/masteragentcoder/agentcache.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw vs Hermes: Choose the Right Self-Hosted AI Agent After 100+ Deployments
After deploying 100+ AI agents for clients, a Reddit user shares hard-won lessons: OpenClaw (149K stars) is the reliable workhorse for single/small fleets; Hermes excels at multi-agent orchestration but has a smaller community.

Cloudflare's vinext: A Next.js-compatible framework built with AI on Vite
Cloudflare engineers rebuilt Next.js API surface on Vite using AI in one week, creating vinext - a drop-in replacement that builds 4x faster and produces 57% smaller bundles. It deploys to Cloudflare Workers with a single command.

OpenClaw Plugin Connects AI Agents to Meshtastic Radio Mesh for Off-Grid Operation
A new open-source plugin bridges the OpenClaw framework with Meshtastic's LoRa radio mesh network, enabling AI conversations, API queries, and device control without internet or cellular connectivity.

Leadership App with 90+ Lessons from 20+ Books Runs in Claude
A developer created a leadership app that runs inside Claude, featuring 90+ lessons extracted from 20+ books on leadership, habits, discipline, influence, team culture, and wealth mindset. The app provides daily lessons with specific actions, streak tracking, journaling, and search capabilities.