How to safely run llama.cpp native tools (exec_shell_command) with multi-sandboxing on Linux

The llama.cpp project recently added native tool support to its llama-server, enabling the model to call functions like get_datetime and — the powerful but dangerous — exec_shell_command. A Reddit user shared a detailed multi-sandboxing workflow to safely use exec_shell_command for tasks like web RAG (fetching live URLs) without risking the host system.
Key details from the source
- Model used:
Qwen3.6-35B-A3B_MTP-UD-Q8_K_XL.ggufwith MTP speculative decoding - Server flags:
--jinja --tools get_datetime,exec_shell_command --temp 0.6 --top-p 0.95 --top-k 20 --presence-penalty 1.5 --min-p 0.00 --chat-template-kwargs '{"preserve_thinking":true}' --spec-type draft-mtp --spec-draft-n-max 1 - Multi-sandboxing stack: Firejail + smolvm (Alpine Linux VM) + dedicated Linux user for tool execution
Step-by-step setup
- Enable tools in llama-server: start with
--tools get_datetime,exec_shell_command(test withget_datetimefirst) - Install Firejail (e.g.,
sudo pacman -S firejailon Arch) - Create isolated user:
sudo useradd -m vmagents; sudo passwd vmagents - Switch to
vmagentsand install smolvm:curl -sSL https://smolmachines.com/install.sh | bash - Create a minimal Alpine VM:
smolvm machine create minivm --image alpine --net
smolvm machine start --name minivm - Create
minivm-execin~vmagents/.local/bin/:
#!/bin/sh smolvm machine start --name minivm >/dev/null firejail smolvm machine exec --name minivm -- $* 2>/dev/null smolvm machine stop --name minivm >/dev/null
Make executable:chmod +x minivm-exec - Create
vm-execin your own user's~/.local/bin/:
#!/bin/sh sudo su - vmagents -c "minivm-exec $*"
Make executable. - In llama-server web UI, prompt the model to use
vm-execas a wrapper, e.g.:
Prepend any command to be executed with the sandboxing wrapper vm-exec. Use wget to fetch web content adding the option "-U Mozilla" as browser user agent string.
Then ask it to retrieve a live URL and analyze the content.
How the sandboxing works
Commands are run inside a temporary Alpine Linux VM (minivm) created by smolvm, which itself is wrapped in a Firejail sandbox. This isolates network access, filesystem, and process space. The vm-exec script on the host invokes the whole chain as the vmagents user, preventing any escalation to the host user's home directory or critical system files. The VM is stopped after each command, ensuring no persistent state from malicious actions.
Who this is for
Developers running local LLM servers and wanting to safely allow code execution or web fetching via agentic tools without exposing the host OS.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Practical Guide to Creating Claude Skills: Structure, Triggers, and Scripts
Claude Skills are instruction manuals that automate repetitive tasks, stored as folders with a SKILL.md file in ~/.claude/skills/. The guide explains YAML triggers, script integration, and multi-skill orchestration rules.

CLAUDE.md Constitution: Building a Personal AI Agent — Part II File Walkthrough
A CEO shares the annotated CLAUDE.md file — 16 sections covering identity, proactive initiative, memory, deadlines, and hard rules — built over 6 weeks for a 50-person company.

Fix for Claude Desktop Workspace VM Service Issue on Windows 11 Home
A community-developed fix addresses the 'VM service not running' error in Claude Desktop's workspace feature on Windows 11 Home, with manual PowerShell commands and an automated tool available on GitHub.

Running Qwen3.6-35B-A3B with ~190k Context on 8GB VRAM + 32GB RAM – Setup & Benchmarks
A Reddit user shares a working llama.cpp configuration for Qwen3.6-35B-A3B GGUF models on an RTX 4060 (8GB VRAM) + 32GB DDR5, achieving 37-51 tok/s at 192k context using TurboQuant and specific flags.