Running NemoClaw with Local vLLM: Setup Notes and Agent Engineering Observations

Local NemoClaw Setup with vLLM
A developer shared their experience running NVIDIA's NemoClaw, a sandboxed AI agent platform, with a local Nemotron 9B v2 model using vLLM on WSL2. The setup is based on jieunl24's fork of NemoClaw.
Key Technical Details
Inference Routing: NemoClaw's inference routing follows a clean path: inference.local → gateway → vLLM. However, initial onboarding bugs required a 3-layer network hack that has since been fixed via PR #412.
Parser Compatibility: The built-in vLLM parsers (qwen3_coder, nemotron_v3) are incompatible with Nemotron v2 models. You need NVIDIA's official plugin parsers from the NeMo repository instead.
Agent Engineering Gap: OpenClaw as an agent platform provides solid infrastructure but ships with minimal prompt engineering. The gap between "model serves text" and "agent does useful work" is primarily about scaffolding rather than model capability limitations.
Resources
- Blog post covering architecture, vLLM parser setup, and agent engineering observations: https://github.com/soy-tuber/nemoclaw-local-inference-guide/blob/master/BLOG-openclaw-agent-engineering.md
- Setup guide (V2) with inference.local routing and no network hacks: https://github.com/soy-tuber/nemoclaw-local-inference-guide
- Original NemoClaw issue #315: https://github.com/NVIDIA/NemoClaw/issues/315
This setup demonstrates practical local deployment of AI agent platforms, highlighting both the technical implementation details and the ongoing challenges in agent engineering.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Structured Claude Skill for B2B SaaS Growth Workflows
A developer has open-sourced a Claude Skill that structures B2B SaaS growth knowledge into playbooks and case studies to improve Claude's output quality. The repository includes 5 SaaS case studies, a 4-stage growth flywheel, and 6 structured playbooks.

Persistent Side Panel for Claude Code with Autonomous Content Management
A developer built a TUI panel that sits in an iTerm2 split pane next to the terminal, featuring three fixed panels that Claude autonomously manages to show relevant content like code, diagrams, and status updates.

KANBAII: A Visual Kanban Board Built with Claude Code for AI-Assisted Development
A developer built KANBAII, a local kanban board tool entirely with Claude Code over two months. It provides visual task management, AI planning, and parallel execution modes for Claude Code workflows.

Multi-Agent Debate Approach Improves LLM Reasoning Quality
A developer experimented with a multi-agent debate approach using CyrcloAI, where different AI agents take on roles like analyst, critic, and synthesizer to critique each other's responses before producing a final answer, resulting in more structured and deliberate outputs.