Qwen 3.5 35B Running on 8GB VRAM with llama.cpp Configuration

Local Qwen 3.5 35B Setup on Limited VRAM
A developer on r/LocalLLaMA detailed their configuration for running the Qwen 3.5 35B model locally on hardware with 8GB of VRAM. They moved from using Antigravity (with a Google AI Pro plan) to local LLMs after hitting limits with the cloud service.
Hardware and Model Specifications
The setup uses a Lenovo Legion laptop with an i9-14900HX CPU (with E-cores disabled in BIOS, 32GB DDR5 RAM) and an RTX 4060m GPU with 8GB VRAM. The specific model is Qwen 3.5 35B A3B Heretic Opus (Q4_K_M GGUF).
Performance and llama.cpp Configuration
The developer reports getting approximately 700 tokens per second for prompt processing and 42 tokens per second for token generation with this setup. They provided their llama.cpp command-line arguments after testing:
-ngl 99 ^ --n-cpu-moe 40 ^ -c 192000 ^ -t 12 ^ -tb 16 ^ -b 4096 ^ --ubatch-size 2048 ^ --flash-attn on ^ --cache-type-k q8_0 ^ --cache-type-v q8_0 ^ --mlock
Workflow Integration
For their agentic workflow, they found Cline in VSCode to be the closest alternative to Antigravity. They use kat-coder-pro for Plan mode and qwen3.5 for Act mode within this setup. The developer is seeking feedback on whether this local configuration is better than sticking with Google Gemini 3 Flash in Antigravity, noting they prioritize smooth workflow over privacy concerns.
📖 Read the full source: r/LocalLLaMA
👀 See Also

CipherClaw: Using a Security Persona to Audit Code with Claude
A developer used CipherClaw, a CLAUDE.md persona called TALON, to make Claude Code think like a security architect. Running it on a Next.js app revealed 17 security findings including critical vulnerabilities like unauthenticated endpoints returning admin data and hardcoded auth tokens.

OpenClaw Video Translator Skill Available on ClawHub
A new Video Translator skill for OpenClaw agents allows users to upload a video or provide a URL to get a translated preview instantly. The skill is hosted on ClawHub.

MCP Slim: Local Embedding Search for MCP Tools Reduces Context Bloat
MCP Slim is a proxy that replaces full MCP tool catalogs with three meta-tools (search, describe, call), using local MiniLM embeddings for semantic search. It achieves 96% context window reduction and works offline without API keys.

120 Prompt Patterns Tested: 8 That Actually Work for Claude Code
A 3-month empirical test of 120 prompt patterns for Claude Code yields 8 actionable commands and 5 validation prompts. Key patterns: L99 (cuts hedging), /ghost (removes AI voice), OODA (structured reasoning), ULTRATHINK (deep reasoning), HARDMODE (constraint debugging).