Be My Butler: Multi-Agent Pipeline for AI Code Verification

✍️ OpenClawRadar📅 Published: March 14, 2026🔗 Source

What Be My Butler Does

Be My Butler (BMB) is a multi-agent pipeline designed to solve a specific problem in AI-assisted coding: when AI coding agents incorrectly report their own code as working. The creator, a materials/mechanical engineer with no programming background, built this after experiencing Claude Code agents writing code that passed tests but didn't actually work in practice.

Core Concept

The system implements a peer review model for AI-generated code:

One model writes the code
A different model reviews it without knowing who wrote it (blind verification)
A cross-model council (Claude + GPT + Gemini) votes on whether it actually works
An analyst agent tracks patterns in what goes wrong

Performance Metrics

From testing:

Single-agent self-review catches ~40% of real issues
Cross-model blind review catches ~85%
Cost overhead: 15-20% more tokens

v0.2 Features

Analytics dashboard to track token usage and costs
Analyst agent for automated code review patterns
Consultant agent for architecture decisions
Improved tmux-based orchestration

Installation and Usage

Fully open source under MIT license. Installation:

git clone https://github.com/project820/be-my-butler.git
cd be-my-butler && ./install.sh
bmb "build a REST API with auth"

The tool is particularly useful for "vibe coders" — people without traditional coding experience who depend on AI for code quality assessment. When you can't read code to spot issues yourself, having multiple models cross-check each other provides verification that single-agent systems lack.

📖 Read the full source: r/ClaudeAI

👀 See Also

Tools

Garry Tan's gstack: An Open Source AI Agent Framework for Claude Code

Garry Tan's gstack is an open source software factory that turns Claude Code into a virtual engineering team with 13 specialist slash commands for planning, design, engineering, review, QA, and release management.

Mar 17, 2026, 11:45 PM UTC

OpenClawRadar

Tools

Slack Message Formatter: Fix Claude's Broken Markdown in Slack

A developer built a skill that converts Claude-generated Markdown to proper Slack formatting, solving issues where bold text shows as asterisks, links appear raw, and tables break. The tool offers both browser preview with rich HTML copy-paste and API webhook support.

Apr 16, 2026, 07:37 PM UTC

OpenClawRadar

Tools

Claude for Design Work: How to Stop Repeating the Same Taste Arguments Every Session

A developer running client work through Claude describes the core problem: Claude has no memory of rejected design decisions, leading to generic outputs and inconsistent brand identity.

May 3, 2026, 12:15 AM UTC

OpenClawRadar

Tools

Video Editor Builds Free Transcription Tool Treelo Using Claude Code

A video editor created Treelo, a free web tool that transcribes audio/video files into editable timestamp blocks with caption presets and exports to SRT, VTT, ASS, and WAV formats. The tool was built through iterative conversations with Claude Code.

Apr 13, 2026, 09:19 PM UTC

OpenClawRadar