Local LLM Setup Recommendations for OpenClaw

Setup Overview
A user on r/openclaw has shared their current configuration for integrating a local Large Language Model (LLM) with OpenClaw. They are using separate hardware: a GB10 device specifically for running the AI model and a Mac mini for the main OpenClaw installation.
Configuration Details
The setup process is described as mostly standard, with one key deviation: when prompted to choose an LLM, you must select the 'custom LLM' option. The user instructs to "put in ur ip" at this stage. They note that most setups will be using OpenAI-compatible endpoints via tools like vLLM, SGLang, or llama.cpp.
For the model selection, the user provides a specific warning and recommendation:
- Model Selection Advice: "don’t choose the biggest model that fit into your vram u need to find the balance between context token and model size."
- Current Model: They are using
unsloth/MiniMax-M2.5-GGUF:UD_Q2_K_XL + 24000. - Inference Server: They are using llama.cpp to run the model.
Server Endpoint
The local inference server is configured to run at localhost:8080/v1. This provides an OpenAI-compatible API endpoint that OpenClaw can connect to.
The user notes this is a work in progress, stating: "I am still testing openclaw though so I might change to another model if token isn’t enough." This highlights the practical, iterative nature of finding the right model for a specific workflow's context window requirements.
📖 Read the full source: r/openclaw
👀 See Also

OpenClaw 101: The Ultimate Setup Guide for New Users

Claude Code Skills vs. Custom Agents: A Mental Model Based on Task Consistency
A Reddit user clarifies the distinction between Claude Code skills and custom agents: skills execute the same steps every time, while custom agents require reasoning and adaptation. The post also covers parallel subagents, delegation, hooks, and building blocks.

How to safely run llama.cpp native tools (exec_shell_command) with multi-sandboxing on Linux
A practical guide to enabling llama.cpp native tools, especially exec_shell_command, and running them inside multiple sandboxes (Firejail + tiny Alpine VM) for safe web fetching and command execution via the llama-server web UI.

How to avoid unexpected OpenRouter costs in OpenClaw automation
A developer team accidentally spent $750 in 3 days on OpenRouter by defaulting to Claude Sonnet 4.6 ($3/M tokens) across all automation tasks. They reduced costs by 97% by changing default models, locking cron jobs and subagents to cheaper options, and reserving expensive models only for sensitive work.