Build a Multi-Agent Video Pipeline with Claude: Script Contract Architecture

A developer built a multi-agent AI pipeline that takes a topic (e.g., "Ada Lovelace") and a persona (channel identity, tone, visual style) and produces a complete chapter-structured educational YouTube video (15–20 min). The pipeline uses Claude as the core LLM for scripting and orchestrates specialized agents across script writing, asset generation, rendering (CUDA on Windows host), and YouTube upload.

Script Writing via Contract Architecture

To keep a 20-minute AI-written script narratively coherent across chapters written in separate LLM calls, the system uses a narrative contract — a validated JSON blueprint generated before any script text is written. The contract encodes four constraint types:

Threads — story arcs that must open in one chapter and close in another, with a declared payoff type (resolved, tragedy, etc.)
Entities — named people/places with a forced first-introduction chapter, preventing retroactive mentions
Facts Required — citations chained with dependencies (fact B can't appear until fact A is established)
Timeline Anchors — temporal reference points allowing non-linear structure (flashback, in-medias-res) while staying internally consistent

The contract is generated via an Opus → structural validate → Sonnet review loop (up to 3 rounds). Sonnet checks semantic coherence (no orphan entities, threads actually close); the structural validator runs a Pydantic parse + temporal constraint check. Downstream chapter writers are bound to the contract.

Research via Fanout

The research pipeline spins up N parallel OutlineAgent instances, each working from the same research package but on different thesis candidates. Each produces a three-level hierarchy: thesis → chapter arguments → scene beats. A grounding/revision loop runs independently on each branch:

Grounding reviewer (Sonnet) flags blocking issues vs. polish issues
Revision agent applies fixes without restructuring
Quality reviewer checks for structural failures (topical chapter lists, collapsed middles, summary endings)

Up to 3 revision rounds per branch, in parallel. Then a single judge agent scores each refined outline on four axes:

Axis	Weight	What it measures
Concept Hook	0.40	CTR potential; title falsifiability
Trap Closure	0.30	Narrative payoff completeness

Pipeline Architecture

The pipeline is split across two environments: script and asset work runs in a Linux dev container (WSL), while rendering runs on the Windows host to access CUDA and video tooling. Agents communicate over HTTP with a lightweight orchestrator. The system is phase-based — every step (W2.1, W4.3, R3.1, etc.) is independently re-runnable. Each phase reads and writes typed artifact files (JSON manifests, audio files, image directories) so agents are loosely coupled.

Integrated tools: Live2D, Fish Audio, Sadtalker, and others for asset generation and rendering.

📖 Read the full source: r/ClaudeAI