Local Qwen 3.6 vs Frontier Models on a Coding Primitive: Single-File HTML Canvas Driving Animation

A Reddit user ran a head-to-head comparison of local quantized models versus frontier web-based models on a specific coding primitive: generating a single HTML file with a full-page canvas animation of a side-view car driving with parallax scrolling, spinning wheels, and cinematic lighting.
The Prompt
The exact prompt asked for a single HTML file with no libraries, a full-page canvas, realistic side-view car animation, layered parallax scenery, spinning wheels, subtle body motion, smooth looping, and cohesive sky/lighting.
Models Tested
Frontier (web-based via Perplexity, tok/s not measured):
- Claude Sonnet 4.6 Thinking (used internet for reasoning)
- Gemini 3.1 Pro Thinking
- GPT 5.4 Thinking
- Kimi k2.6 Thinking
Local (Ryzen 5 5600, 24 GB DDR4-3200, RX 5700 XT 8GB):
- Qwen3.5 9B Q4_K_M — ~50 tok/s
- Qwen3.6-27B (Claude-opus-reasoning-distilled) Q4_K_M — 2.65 tok/s
- Qwen3.6-27B Q4_K_M — 2.70 tok/s
- Qwen3.6-31B A3B Q4_K_M — 12.13 tok/s
- Gemma-4-31b-it — 1.91 tok/s
- Qwen3.5 4B Q8 — 60 tok/s (used internet for reasoning)
- Qwen3.5 4B Q4_K_M — 80 tok/s (used internet for reasoning)
Results & Subjective Ranking
The ranking for this specific task:
- Kimi k2.6 Thinking — cleanest overall visual result
- Qwen3.6-27B Q4_K_M (local) — stronger than expected; good parallax and road feel
- Qwen3.6-27B Claude-opus-reasoning-distilled — close third
The local 27B quant delivered more natural motion and layering than some frontier outputs for this specific visual primitive. The poster noted they expected frontier models to outperform local quants more clearly.
The user only changed HTML <title> tags to track which model generated which file. Outputs are shared in the thread along with screenshots/GIFs of the running animations.
📖 Read the full source: r/LocalLLaMA
👀 See Also

OpenClaw v3.22 Update Causes Dashboard and WhatsApp Issues
OpenClaw v3.22 has broken dashboard functionality and WhatsApp integration, with two GitHub issues (#52808 and #52813) documenting the problems. Users are advised not to update to this version.

DeepSeek Paid API Uses Prompts for Training — What OpenClaw Users Need to Know
DeepSeek's official API logs prompts for training, even on paid tiers. Gemini only logs on free AI Studio. OpenClaw now defaults to DeepSeek V4 Flash — beware when processing personal data.

Anthropic Allows Subscription Usage for Claude via OpenClaw Starting June
Anthropic will allow subscription-based usage of Claude through OpenClaw starting in June, as announced by the OpenClaw Dev Twitter account.

Benchmark Comparison of Qwen 3.5 Models Against Major AI Models
A benchmark comparison website includes verified scores and head-to-head infographics for Qwen 3.5 models (122B, 35B, 27B, 397B) against models like GPT-5.2, Claude 4.5 Opus, Gemini-3 Pro, and others.