Bernstein: A Kubernetes-like orchestrator for AI coding agents with verification and model policies

Bernstein is an orchestrator for AI coding agents that the creator describes as "Kubernetes for coding agents." Unlike simpler tools that spawn agents in parallel worktrees, Bernstein addresses what the developer calls "the other 95%" of the problem.
Key Features
The system includes several critical components:
- Verification: A "janitor" component independently verifies agent outputs after every task. It runs tests, checks diffs, and lints output because "agents lie" - they may claim tests pass when they don't or say they committed files when they didn't.
- Model Policy Engine: Provides allow/deny lists per provider, data residency constraints, preferred routing, and cost ceilings. The creator compares this to "K8s network policies but for LLM providers."
- Deterministic Scheduling: Uses pure Python for scheduling instead of LLMs, creating deterministic control flow with zero LLM tokens spent on coordination. An epsilon-greedy bandit learns routing over time.
- Agent-Agnostic Design: Includes 13 adapters for Claude Code, Codex, Gemini CLI, Cursor, Qwen, Aider, Amp, Roo Code, Goose, Kilo, Kiro, OpenCode, and generic agents. Claude Code has the deepest integration.
- Scale Features: At 500K+ lines and ~5000 tests, Bernstein includes circuit breakers, cost anomaly detection, loop detection, deadlock detection, PII scanning, HMAC-chained audit logs, progressive permissions, and quarantine for suspicious output.
- Self-Development: Can develop itself using
bernstein --evolve.
Technical Details
The creator notes that spawning agents in worktrees is "the hello world of this space" and that most multi-agent frameworks use an LLM to schedule other LLMs, which is "slow, expensive, and non-deterministic." Bernstein's approach uses pure Python for deterministic control flow.
The project has been tested at scale with 500K+ lines of code and approximately 5000 tests. The developer built features like circuit breakers and anomaly detection because "things broke and these were the fixes."
The creator is a solo developer from Israel who mentions "building under rockets (literally)" and that the project has outgrown them, seeking contributors.
📖 Read the full source: r/ClaudeAI
👀 See Also

InsForge: Self-Hosted Postgres Backend with MCP Integration for AI Coding Agents
InsForge is an open-source, self-hosted backend alternative to Supabase that connects to Claude Code via MCP, allowing AI agents to see schema, policies, and service state. It includes PostgreSQL 16.4, PostgREST, Deno Runtime, auth, storage, and edge functions.

Deterministic Compiler Architecture for Multi-Step LLM Workflows Shows Strong Benchmark Results
A deterministic compilation architecture for structured LLM workflows uses typed node registries, parameter contracts, and static validation to compile workflow graphs ahead of time. Benchmarks show it outperforms GPT-4.1 and Claude Sonnet 4.6 across workflow depths from 3-12+ nodes.

OpenCawt: Open Source Judiciary System for AI Agent Disputes
OpenCawt is an open source judiciary system for autonomous agents that lets them lodge disputes, present evidence, receive structured decisions, and seal outcomes as verifiable public records. It includes a lightweight protocol layer called OCP for formalizing agreements and decisions within other applications.

Cross-Model Review Loop for AI Coding Agents Catches Critical Planning Flaws
A developer built a cross-model review system where a second AI model reviews plans from coding agents before execution, catching critical flaws like rollback failures and security holes. The tool is MIT licensed and includes a TUI dashboard.