civStation: A VLM System for Playing Civilization VI via Natural Language Commands

What civStation Does
civStation is a vision-language model (VLM) system that enables playing Civilization VI through natural language commands. Instead of direct mouse/keyboard control, users issue high-level strategic intents that the system translates into actual game actions.
Architecture and Functionality
The system employs a 3-layer architecture:
- Strategy Layer: Converts natural language commands into structured goals, maintains long-term direction, and performs task decomposition. Commands like "expand to the east," "focus on economy," or "aim for a science victory" are processed here.
- Action Layer: Uses screen-based VLM for state interpretation and executes mouse/keyboard actions without accessing game APIs.
- HITL Layer: Enables real-time human intervention, override capabilities, and controllable autonomy.
Technical Implementation Details
One strategic command generates multiple action sequences, requiring approximately 2–16 model calls per task. The system uses sub-agent based execution for bounded tasks such as city management and unit control.
civStation explores shifting interfaces from "action → intent" instead of traditional reinforcement learning, imitation learning, or scripted approaches. This represents a move from direct manipulation to delegation and agent orchestration.
Key Challenges and Limitations
The system faces several technical challenges:
- VLM perception errors
- Execution drift
- Lack of reliable verification mechanisms
Multi-step execution introduces latency and API cost trade-offs, with fallback strategies that degrade performance. The system is not fully autonomous—it supports human-in-the-loop for real-time strategy correction and control.
Broader Implications
This experimental system tackles agent control and verification in UI-only environments. The focus extends beyond gameplay to elevating the human-system interface to the strategy level, enabling users to operate at higher abstraction levels rather than managing individual actions.
📖 Read the full source: r/ClaudeAI
👀 See Also

Claude Code skill generates App Store screenshots using Gemini AI
A new Claude Code skill called /aso-cosmicmeta-ss creates App Store and Google Play screenshots through a 6-phase workflow that analyzes codebases and uses Gemini AI for enhancement. The skill includes an approval gate to catch layout issues before using API credits.

80-line Python script uses Claude to auto-generate internal link suggestions, cuts linking time from 2 hours to 8 minutes
A Reddit user built an 80-line Python script that feeds an article draft and sitemap to Claude, returning relevant internal link targets with suggested anchor text — reducing manual linking time from 2 hours to 8 minutes per article.

Fine-tuned Qwen3.5-2B with RAG-Engram architecture improves grounded answer accuracy from 50% to 93% at 8K context
A developer fine-tuned Qwen3.5-2B with a custom RAG-Engram architecture to address the 'lost in the middle' phenomenon, improving correct answers at 8K tokens from 50% to 93% on real-world queries. The system uses a two-level approach with static entity embeddings and dynamic chunk navigation.

Harnessing Claude Code for Bot Consultancy: A Deep Dive
Exploring the integration of Claude Code within bot development to enhance functionality through AI consultancy, as shared by an enthusiast on r/clawdbot.