Be My Butler: Multi-Agent Pipeline for AI Code Verification

What Be My Butler Does
Be My Butler (BMB) is a multi-agent pipeline designed to solve a specific problem in AI-assisted coding: when AI coding agents incorrectly report their own code as working. The creator, a materials/mechanical engineer with no programming background, built this after experiencing Claude Code agents writing code that passed tests but didn't actually work in practice.
Core Concept
The system implements a peer review model for AI-generated code:
- One model writes the code
- A different model reviews it without knowing who wrote it (blind verification)
- A cross-model council (Claude + GPT + Gemini) votes on whether it actually works
- An analyst agent tracks patterns in what goes wrong
Performance Metrics
From testing:
- Single-agent self-review catches ~40% of real issues
- Cross-model blind review catches ~85%
- Cost overhead: 15-20% more tokens
v0.2 Features
- Analytics dashboard to track token usage and costs
- Analyst agent for automated code review patterns
- Consultant agent for architecture decisions
- Improved tmux-based orchestration
Installation and Usage
Fully open source under MIT license. Installation:
git clone https://github.com/project820/be-my-butler.git
cd be-my-butler && ./install.sh
bmb "build a REST API with auth"The tool is particularly useful for "vibe coders" — people without traditional coding experience who depend on AI for code quality assessment. When you can't read code to spot issues yourself, having multiple models cross-check each other provides verification that single-agent systems lack.
📖 Read the full source: r/ClaudeAI
👀 See Also

Garry Tan's gstack: An Open Source AI Agent Framework for Claude Code
Garry Tan's gstack is an open source software factory that turns Claude Code into a virtual engineering team with 13 specialist slash commands for planning, design, engineering, review, QA, and release management.

Slack Message Formatter: Fix Claude's Broken Markdown in Slack
A developer built a skill that converts Claude-generated Markdown to proper Slack formatting, solving issues where bold text shows as asterisks, links appear raw, and tables break. The tool offers both browser preview with rich HTML copy-paste and API webhook support.

Claude for Design Work: How to Stop Repeating the Same Taste Arguments Every Session
A developer running client work through Claude describes the core problem: Claude has no memory of rejected design decisions, leading to generic outputs and inconsistent brand identity.

Video Editor Builds Free Transcription Tool Treelo Using Claude Code
A video editor created Treelo, a free web tool that transcribes audio/video files into editable timestamp blocks with caption presets and exports to SRT, VTT, ASS, and WAV formats. The tool was built through iterative conversations with Claude Code.