NanoClaw's Security Model for AI Agents: Container Isolation and Minimal Code

NanoClaw's Security Architecture for Untrusted AI Agents
The NanoClaw blog argues that AI agents should be treated as untrusted and potentially malicious, advocating for architectural containment rather than application-level permission checks. The system is built on the principle that agents will misbehave and focuses on limiting damage when they do.
Container Isolation as Core Security
NanoClaw runs each agent in its own container using Docker or Apple Container on macOS. These containers are ephemeral - created fresh per invocation and destroyed afterward. Agents run as unprivileged users and can only access directories explicitly mounted in. This contrasts with OpenClaw's default approach where agents run directly on the host machine with an opt-in Docker sandbox mode that most users never enable.
The container boundary provides hermetic security enforced by the OS, preventing agents from escaping regardless of configuration. Each agent gets its own container, filesystem, and Claude session history, preventing information leakage between agents that are supposed to access different data.
Mount Allowlist and Default Protections
A mount allowlist at ~/.config/nanoclaw/mount-allowlist.json acts as defense-in-depth, preventing users from accidentally mounting sensitive paths. Sensitive directories like .ssh, .gnupg, .aws, .env, private_key, and credentials are blocked by default. The allowlist lives outside the project directory so compromised agents can't modify their own permissions.
Host application code is mounted read-only, ensuring nothing an agent does can persist after container destruction. Non-main groups are untrusted by default, preventing cross-group messaging, task scheduling, or data viewing to protect against prompt injection from group members.
Minimal, Reviewable Codebase
NanoClaw maintains a deliberately minimal codebase of one process and a handful of files, contrasting with OpenClaw's approximately 400,000 lines of code, 53 config files, and over 70 dependencies. The system relies heavily on Anthropic's Agent SDK for session management, memory compaction, and other functionality instead of reinventing components.
This design allows a competent developer to review the entire codebase in an afternoon. Contribution guidelines accept only bug fixes, security fixes, and simplifications. New functionality comes through skills - instructions with full working reference implementations that coding agents merge into codebases after review.
Each installation ends up as a few thousand lines of code tailored to the owner's specific needs, avoiding the complexity where vulnerabilities typically hide.
📖 Read the full source: HN LLM Tools
👀 See Also

Five Essential Security Steps for OpenClaw Instances
A Reddit post warns that running OpenClaw with default settings creates significant security risks and outlines five immediate actions: change the default port, use Tailscale for private access, configure a firewall, create separate accounts for the agent, and scan skills before installation.

OpenClaw's External Content Wrapper for Prompt Injection Defense
OpenClaw uses an external content wrapper that automatically tags web search results, API responses, and similar content with warnings that it's untrusted, priming the LLM to be skeptical and more likely to refuse malicious instructions.

Delimiter defense boosts Gemma 4 from 21% to 100% prompt injection defense in 6100+ test benchmark
A benchmark tested 15 models across 7 attack types (6100+ tests) using random delimiters around untrusted content. Gemma 4 E4B went from 21.6% to 100% defense rate with delimiter + strict prompt.

Security Warning: ClawProxy Script Stole API Keys, Resulting in Significant OpenRouter Bill
A developer installed a closed-source ClawProxy script from a Reddit user on a sandboxed WSL Ubuntu 24.04 system, which stole their OpenRouter API key and used it via Google Vertex API to run up a large bill on Opus 4.6 overnight.