Qwen3.5 35B-A3B MoE Runs 27-Step Workflow on Mid-Range Hardware

Local agentic workflow demonstration

A developer on r/LocalLLaMA reported successfully running a complex agentic workflow locally using Qwen3.5 35B-A3B MoE. The model executed a 27-step video processing chain autonomously on mid-range hardware.

Workflow details

The task involved processing a video from a single natural language prompt:

Upload a video
Transcribe with Whisper
Edit the subtitles
Burn subtitles back into video with custom styling

The workflow consisted of 27 sequential tool calls including: extract_audio, transcribe, read_file, edit_file, burn_subtitles, plus verification steps. The model planned, executed, verified each step, and self-corrected when needed.

Technical specifications

Hardware:

Lenovo ThinkPad P53 mobile workstation
Intel i7-9850H processor
Quadro RTX 3000 (6GB VRAM)
48GB DDR4 2666MT/s RAM

Software stack:

Full local implementation with llama.cpp + whisper.cpp
No cloud APIs used

Model configuration:

Qwen3.5 35B-A3B MoE at Q4_K_M quantization
MoE architecture with ~3B active parameters per token
Fits and runs on 6GB VRAM with layers offloaded
Full 35B parameter knowledge base

Performance results

The complete workflow ran in approximately 10 minutes, with most time spent on inference. The developer noted zero errors and zero human intervention required during the 27-step chain. The MoE architecture made this feasible on mid-range hardware by keeping active parameter count low while maintaining full model capability.

This demonstrates that local agentic workflows are becoming practical on consumer-grade hardware, particularly with MoE models that balance active parameter count for speed against full parameter count for capability.

📖 Read the full source: r/LocalLLaMA