How Clawdbot Coordinates 6 AI Agents with a Production-Stable Work Queue

Clawdbot's team shared their work queue architecture that coordinates 6 AI agents running an AI-operated store. They found the coordination problem harder than individual agent logic, with the system going through several iterations before reaching production stability.
Core System Features
The work queue implements several key mechanisms:
- Atomic task claiming: Prevents two agents from grabbing the same task
- State machine: Tasks move through states: pending → ready → in_progress → review → complete
- Retry logic: 3 failures with backoff, then permanent failure to prevent runaway retry loops
- Task chains: Parent completion auto-spawns children via a next_tasks field
- Heartbeat tracking: Stale claims (from agent crashes) auto-reset after timeout
- Daemon orchestrator: Polls every 60 seconds and spawns agents for ready tasks
Production Lessons
The team notes that failure mode handling wasn't obvious until they had real production incidents to learn from. They've published a full architecture writeup with lessons from running this in production.
The system coordinates multiple agents working concurrently: design, code, marketing, and operations agents. The team is open to discussing tradeoffs, particularly around the failure mode handling that emerged from production experience.
📖 Read the full source: r/clawdbot
👀 See Also

WinRemote MCP: Open Source MCP Server for Full Control of Windows Desktops
WinRemote MCP provides AI agents with full control over Windows desktops, allowing for UI detection, file operations, registry access, and more, utilizing over 40 tools.

SOPHIA Meta-Agent for AI Agent Maintenance
SOPHIA is a meta-agent designed as a Chief Learning Officer that observes, diagnoses, researches, and proposes improvements to other AI agents in production ecosystems. The system was designed through 7 iterations using 4 frontier models with human approval required for all deployments.

OpenJet v0.4: Zero-Config Local Coding Agent with llama.cpp Backend
OpenJet v0.4 is an open-source terminal coding agent for local LLMs that auto-detects hardware, configures llama.cpp, and provides a Claude Code-style workflow with no API keys.
MartinLoop: Open-Source Control Plane for AI Coding Agents with Budget Stops and Audit Trails
MartinLoop is an open-source control plane that adds hard budget stops, JSONL audit trails, failure classification, and test-verified completion checks to AI coding agents.