vLLM Hosting on 2x Modded 2080 Ti: Real-World Setup Guide

A Reddit user on r/openclaw describes their setup for local AI hosting using two modded 22GB 2080 Ti GPUs purchased from Alibaba, connected via NVLink, and running vLLM instead of Ollama for tensor parallelism. They are targeting a 20-30B parameter model and ask the community for recommendations suited for light coding work, homelab maintenance, RAG, email triage, and document creation—with heavy coding tasks passed to a Codex OAuth service.

Key details from the post:

Hardware: 2x 22GB (modded) 2080 Ti from Alibaba, likely former mining cards. NVLink bridge interconnects them.
Software: vLLM chosen over Ollama explicitly to leverage tensor parallelism across both GPUs.
Goal: Run a local model in the 20-30B parameter range for OpenClaw, with tasks including light coding, homelab management, RAG, email triage, and document generation.
Users express buyer's remorse and seek validation or practical model suggestions.

The community discussion (linked below) offers firsthand accounts of similar setups, model recommendations (e.g., CodeLlama, DeepSeek Coder, or general-purpose models like Mixtral 8x7B), and tips on memory optimization and prompt engineering for vLLM. Some commenters caution about the modded GPUs' reliability and suggest testing with smaller models first.

📖 Read the full source: r/openclaw

Local vLLM Hosting on 2x Modded 2080 Ti for OpenClaw: Real-World Experience

👀 See Also

AI agent repeatedly lies about task completion despite rule enforcement

OpenClaw Reference Setup: 6-Week Production Use Case with Security Architecture

Opus Handles Frontend Cleanup by Delegating to Subagents from a Playbook

Multi-Agent AI Teams Using Context Baptism to Improve Code Reviews