AutoBe: How Weak Local LLMs Fixed an AI Backend Generator's Architecture

What Happened
AutoBe is an open-source AI agent that generates complete backend applications using TypeScript, NestJS, and Prisma. Initially, it achieved 100% compilation success, but the code was unmaintainable—there was no code reuse, so every small change required regenerating everything. The team rebuilt the system around modular code generation, which immediately crashed the success rate to 40%.
The Debugging Breakthrough
When the new architecture introduced dependencies between modules, the team used intentionally weak local LLMs to find bugs they didn't know existed. The qwen3-30b-a3b-thinking model had about a 10% success rate and exposed AST schema ambiguities and malformed structures. The qwen3-next-80b-a3b-instruct model had about a 20% success rate and revealed type mismatches and edge cases in nested relationships.
That low success rate was valuable: each fix tightened the entire system. When a schema is precise enough that a 30B model can't misinterpret it, stronger models won't get it wrong either. This approach also highlights the cost advantage of local LLMs—discovering edge cases requires hundreds of generation-compile-diagnose cycles, which would be prohibitively expensive at cloud API prices.
Architectural Shift
The team moved from prompt engineering to schema design with validation feedback. They stripped system prompts to almost nothing and moved all constraints into function calling schemas, letting validation feedback do the teaching. AutoBe uses three AST types that are particularly challenging for LLMs to generate: AutoBeDatabase (Prisma models, relations, indexes), AutoBeOpenApi (OpenAPI schemas, endpoints, DTOs), and AutoBeTest (30+ expression types).
These structures are difficult because they involve unlimited union types, unlimited depth, and recursive references. For example, the compiler AST includes types like IArrayLiteralExpression and IObjectLiteralExpression that contain recursive references to IExpression[].
Results
Through validation feedback alone, the team improved from 6.75% raw function calling success to 100%. They're now back to 100% success with GLM v5, and other local models are climbing in performance.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Be My Butler: Multi-Agent Pipeline for AI Code Verification
Be My Butler is an open-source multi-agent pipeline where different AI models review each other's code through blind verification. The system addresses the problem of AI agents incorrectly reporting their own code as functional.

JavaClaw Beta: Java-Based AI Assistant Built on Spring AI and JobRunr
JobRunr team released JavaClaw beta, a Java version of OpenClaw that runs locally with multi-channel support, LLM choice, and background job processing via JobRunr. Built with Spring Boot 4, Spring AI, and Spring Modulith.

Comparative Overview of Fast LLM Inference by Anthropic and OpenAI
Anthropic and OpenAI have released distinct 'fast mode' features for faster LLM inference, with OpenAI leveraging Cerebras chips for greater speed

Shieldbot: Open-Source Security Scanner Plugin for Claude Code
Shieldbot is an open-source security scanner that runs as a plugin inside Claude Code, integrating six scanners including Semgrep with 5,000+ rules, Bandit, Ruff, detect-secrets, pip-audit, and npm audit. It deduplicates findings and generates prioritized reports with risk scores and code fixes.