CLI Design Patterns for AI Agents: Misconceptions and Practical Approaches

CLI Interface Protocol Clarification
The biggest misconception from Part 1 was that "CLI" meant giving an LLM a Linux terminal. CLI is actually an interface protocol: text command in → text result out. Implementation can happen in two ways:
- As a binary or script in the shell's PATH — becomes a CLI tool that runs in a real shell
- As a command parser inside your code — when the LLM outputs
run(command="weather --city Tokyo"), you parse the string and execute it directly in your application code with no shell involved
The key is making the LLM feel like it's using a CLI. In the author's system, most commands never touch the OS — they're Go functions dispatched by a command router. Only commands that genuinely need a real OS (running scripts, installing packages) go to an isolated micro-VM. The agent doesn't know and doesn't care which layer handles its command.
Agent-Friendly CLI Design Principles
Two Core Philosophies
Philosophy 1: Unix-Style Help Design
tool --help→ list of top-level commandstool <command> --help→ specific parameters and usage for that subcommand
This allows the agent to discover capabilities on demand without stuffing all documentation into context upfront.
Philosophy 2: Tips Thinking
Every response — especially errors — should include guidance that reduces unnecessary exploration.
Bad example:
> cat photo.png [error] binary file
Good example:
> cat photo.png [error] cat: binary file detected (image/png, 182KB). Use: see photo.png (view image) Or: cat -b photo.png (base64 encode)
Why this matters: invalid exploration wastes tokens. In multi-turn conversations, this waste accumulates — every failed attempt stays in context, consuming attention and inference resources for every subsequent turn. A single helpful hint can save significant tokens across the rest of the conversation.
Safe CLI Design
When CLI commands involve dangerous or irreversible operations, the tool itself should provide safety mechanisms.
Dry-Run / Change Preview — Preventing Mistakes
For operations within the agent's authority but with hard-to-reverse consequences. The goal is to let the agent (or human) see what will happen before committing.
> dns update --zone example.com --record A --value 1.2.3.4 ⚠ DRY RUN: A record for example.com: 5.6.7.8 → 1.2.3.4 Propagation: ~300s. Not instantly reversible. To execute: add --confirm
The preview should clearly show what the current state is and what it will change to. The agent confirms with --confirm.
Human Authorization — Operations Beyond the Agent's Autonomy
For operations requiring human judgment or approval — no matter how confident the agent is, it cannot complete these on its own.
Approach 1: Blocking Push Approval
> pay --amount 500 --to vendor --reason "office supplies for Q2" ⏳ Approval required. Notification sent to your device. Waiting for response... ✓ Approved. Payment of $500 completed. [exit:0 | 7.2s]
Like Apple's device login verification — the CLI sends a push notification directly to the human's device with full context (amount, recipient, reason). The CLI blocks until the human approves or rejects, then returns the result to the agent.
Approach 2: Verification Code / 2FA
> transfer --from savings --to checking --amount 10000 ⚠ This operation requires 2FA verification. Reason: transferring $10,000 between accounts. A code has been sent to your authenticator. Re-run with: --otp <code>
📖 Read the full source: r/LocalLLaMA
👀 See Also

Windows Cowork VM Service Error: Path Issue and Fix
A Windows Cowork installation issue causes the 'VM service not running' error every 10-20 minutes due to incorrect vm_bundles folder path in MSIX installs. The fix involves locating the correct folder and using a repair script.

Evaluating Agent Skill Safety: Key Considerations Before Installation
Installing new agent skills can enhance functionality but also comes with risks. Learn how to evaluate the safety of these skills to protect your system.

Optimizing AutoResearch on RTX 5090: What Failed and What Worked
A developer shares specific configuration details for running AutoResearch on an RTX 5090/Blackwell setup, including failed approaches that appeared functional but performed poorly, and the working configuration that achieved stable results with TOTAL_BATCH_SIZE=2**17 and TIME_BUDGET=1200.

What Breaks When Running Coding Agents on Small Local Models
Real-world failure points from testing multi-file tasks on sub-7B models: markdown fences, structured output reliability, file editing errors, and classification of read vs. write actions.