OpenClaw Setup: Docker, Chromium, noVNC for Human-in-the-Loop

A developer on r/openclaw documented their setup for enabling OpenClaw to handle human-in-the-loop tasks like CAPTCHA solving and approvals during automated browser sessions. The solution uses a Docker container with Chromium, noVNC, and related tools to allow remote intervention when needed.

How it works

The agent drives a headless browser via Chrome DevTools Protocol (CDP). When it encounters a CAPTCHA or needs human approval, it sends a Telegram notification. The user opens a noVNC URL on their phone or laptop to view and interact with the browser, then replies "done" to let the agent continue. The setup requires approximately 300MB RAM with a 3-second cold start time.

Practical application

The developer tested this setup by having OpenClaw book a courier pickup. After providing photos of consignment notes and emails, the agent filled the online form, selected dates, and submitted it while the developer monitored via noVNC. They noted that Claude Opus 4.6's Chromium widget struggled with the same task, getting stuck in navigation loops while OpenClaw completed the booking.

Technical implementation

The Docker container runs:

Xvfb for virtual display
Chromium with Playwright
x11vnc and noVNC for remote viewing
supervisord for process management

The bot controls Chromium via CDP from inside the container, while users view the browser through noVNC from any device with a simple URL (no app required).

Security measures

noVNC accessible only via Tailscale (client device must be part of tailnet)
CDP port bound to localhost only
Container has no host filesystem access
Chromium runs unprivileged
Passwords/2FA handled via noVNC clipboard panel directly

Additional hardening

Docker healthcheck: polls CDP every 30s, 3 retries before unhealthy
Resource limits: 1GB RAM + 2 CPUs
Tab pruner: keeps max 5 tabs, closes blank tabs, runs every 5 minutes
Container remains isolated with no host mounts

Docker configuration

The Dockerfile uses Ubuntu 24.04 and installs:

FROM ubuntu:24.04
ENV DEBIAN_FRONTEND=noninteractive
ENV DISPLAY=:99
ENV RESOLUTION=1920x1080x24
RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates xvfb x11vnc fonts-liberation \
    dbus-x11 supervisor curl gnupg websockify novnc \
    && rm -rf /var/lib/apt/lists/*
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
    && apt-get install -y nodejs \
    && npx playwright install --with-deps chromium \
    && rm -rf /var/lib/apt/lists/*
RUN useradd -m -s /bin/bash browser \
    && mkdir -p /home/browser/.cache \
    && cp -r /root/.cache/ms-playwright /home/browser/.cache/ \
    && chown -R browser:browser /home/browser
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
COPY start-chromium.sh /usr/local/bin/start-chromium.sh
RUN chmod +x /usr/local/bin/start-chromium.sh
RUN ln -sf /usr/share/novnc/vnc.html /usr/share/novnc/index.html
EXPOSE 6080 9222
CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

The supervisord.conf manages four processes: Xvfb, Chromium, x11vnc, and noVNC/websockify.

The start-chromium.sh script launches Chrome with specific flags including --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 for CDP access.