Local Multi-Agent Setup with vLLM, Claude Code, and gpt-oss-120b on Linux

A developer shared their experience creating a fully local, parallel multi-agent coding setup on Linux after switching from Windows. The configuration uses vLLM for parallel inference, Claude Code for agent orchestration, and a large language model for coding tasks.
Setup Components
- vLLM Docker container: Used for easy deployment and parallel inference
- Claude Code: Handles vibecoding and Agent Teams orchestration, configured to point at vLLM localhost endpoint instead of cloud providers
- gpt-oss:120b: Serves as the coding agent
- RTX Pro 6000 Blackwell MaxQ: Primary GPU for the workload
- Dual-boot Ubuntu: Operating system setup
Performance and Workflow Improvements
The developer previously used Ollama and LM Studio but found they processed requests sequentially and experienced slowdowns after multiple message turns and tool calls. With vLLM, they achieved parallel processing that "turbocharged" their experience.
In testing, the setup handled 4 agents collaborating simultaneously as shown in a video demonstration, with the GPU capable of supporting 8 agents in parallel continuously. The only noted issue was throughput reduction, which varies depending on the agent.
Agent Team-scale tasks that previously took hours to complete sequentially can now be done in approximately 30 minutes, depending on project scope. The developer estimates that adding a second MaxQ GPU could potentially scale the system to handle tens of agents concurrently.
This parallel approach enables vibecoding multiple projects locally and concurrently, though it may introduce some increased latency in certain scenarios. The developer found this trade-off preferable to completing projects one agent at a time.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Using Claude as a Creative Director in a Sticker Generation Pipeline
A developer built a sticker app where Claude analyzes user-uploaded photos, generates nine sticker concepts, and writes detailed prompts for image models, resulting in personalized stickers rather than generic ones.

Claude Mobile Workflow: Brainstorm Features on Phone, Get Autonomous Implementation
A developer shares a workflow where they brainstorm features and bug fixes with Claude on their phone while mobile, then have a daemon script automatically implement well-defined tasks by creating Linear issues and spinning up Claude Code agents to handle implementation, testing, and deployment to staging.

OpenClaw Use Case: Building a Daily Personal News Digest with AI
A developer shares their OpenClaw setup for a daily news digest using a cronjob with a detailed prompt that specifies news sources, interest priorities, and output format. The system fetches RSS feeds from trusted Dutch publications and delivers 5 curated stories each morning.

VP of Engineering Builds Four Applications in One Week Using Claude AI
A VP of Engineering used Claude AI to build a VPN application, iOS native app with Go backend, Next.js landing website, and React admin dashboard in one week without writing code directly. The user previously attempted a Jira alternative with Claude a year ago but encountered limitations with complex applications.