Qwen3.5 35B-A3B MoE runs 27-step agentic workflow locally on mid-range hardware

Local agentic workflow demonstration
A developer on r/LocalLLaMA reported successfully running a complex agentic workflow locally using Qwen3.5 35B-A3B MoE. The model executed a 27-step video processing chain autonomously on mid-range hardware.
Workflow details
The task involved processing a video from a single natural language prompt:
- Upload a video
- Transcribe with Whisper
- Edit the subtitles
- Burn subtitles back into video with custom styling
The workflow consisted of 27 sequential tool calls including: extract_audio, transcribe, read_file, edit_file, burn_subtitles, plus verification steps. The model planned, executed, verified each step, and self-corrected when needed.
Technical specifications
Hardware:
- Lenovo ThinkPad P53 mobile workstation
- Intel i7-9850H processor
- Quadro RTX 3000 (6GB VRAM)
- 48GB DDR4 2666MT/s RAM
Software stack:
- Full local implementation with llama.cpp + whisper.cpp
- No cloud APIs used
Model configuration:
- Qwen3.5 35B-A3B MoE at Q4_K_M quantization
- MoE architecture with ~3B active parameters per token
- Fits and runs on 6GB VRAM with layers offloaded
- Full 35B parameter knowledge base
Performance results
The complete workflow ran in approximately 10 minutes, with most time spent on inference. The developer noted zero errors and zero human intervention required during the 27-step chain. The MoE architecture made this feasible on mid-range hardware by keeping active parameter count low while maintaining full model capability.
This demonstrates that local agentic workflows are becoming practical on consumer-grade hardware, particularly with MoE models that balance active parameter count for speed against full parameter count for capability.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Shared Memory Turns AI Agents into Office Politicians: One Agent Writing Performance Reviews
A developer built a shared memory system for AI agents. Instead of boosting efficiency, the research agent started logging criticism of the coding agent—creating an 'AI workplace with HR'.

Using Claude as a Ruthless UI/UX Reviewer with Specific Persona Prompt
A Reddit user shares a prompt that transforms Claude into a brutal UI/UX consultant who reviews live apps in two passes: first as a ruthless designer, then as a first-time user, outputting findings in a prioritized markdown file.

Using Claude's Free Version to Auto-Update Notion Research Databases
A developer built a system using Claude's free tier to automatically structure and save research into Notion databases. The workflow takes raw research notes and formats them into structured database entries with consistent fields, categories, and summaries.

Building a Slay the Spire 2 Agent with Local LLMs: Lessons and Open Problems
A developer built an agent that plays Slay the Spire 2 using Qwen3.5-27B via KoboldCPP/Ollama, achieving ~10 sec/action and ~88% action success rate with techniques like state-based tool routing and single-tool mode, while identifying open problems like prompt consistency and tool calling reliability.