OpenClaw setup for human-in-the-loop browser automation with Docker, Chromium, and noVNC

A developer on r/openclaw documented their setup for enabling OpenClaw to handle human-in-the-loop tasks like CAPTCHA solving and approvals during automated browser sessions. The solution uses a Docker container with Chromium, noVNC, and related tools to allow remote intervention when needed.
How it works
The agent drives a headless browser via Chrome DevTools Protocol (CDP). When it encounters a CAPTCHA or needs human approval, it sends a Telegram notification. The user opens a noVNC URL on their phone or laptop to view and interact with the browser, then replies "done" to let the agent continue. The setup requires approximately 300MB RAM with a 3-second cold start time.
Practical application
The developer tested this setup by having OpenClaw book a courier pickup. After providing photos of consignment notes and emails, the agent filled the online form, selected dates, and submitted it while the developer monitored via noVNC. They noted that Claude Opus 4.6's Chromium widget struggled with the same task, getting stuck in navigation loops while OpenClaw completed the booking.
Technical implementation
The Docker container runs:
- Xvfb for virtual display
- Chromium with Playwright
- x11vnc and noVNC for remote viewing
- supervisord for process management
The bot controls Chromium via CDP from inside the container, while users view the browser through noVNC from any device with a simple URL (no app required).
Security measures
- noVNC accessible only via Tailscale (client device must be part of tailnet)
- CDP port bound to localhost only
- Container has no host filesystem access
- Chromium runs unprivileged
- Passwords/2FA handled via noVNC clipboard panel directly
Additional hardening
- Docker healthcheck: polls CDP every 30s, 3 retries before unhealthy
- Resource limits: 1GB RAM + 2 CPUs
- Tab pruner: keeps max 5 tabs, closes blank tabs, runs every 5 minutes
- Container remains isolated with no host mounts
Docker configuration
The Dockerfile uses Ubuntu 24.04 and installs:
FROM ubuntu:24.04
ENV DEBIAN_FRONTEND=noninteractive
ENV DISPLAY=:99
ENV RESOLUTION=1920x1080x24
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates xvfb x11vnc fonts-liberation \
dbus-x11 supervisor curl gnupg websockify novnc \
&& rm -rf /var/lib/apt/lists/*
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
&& apt-get install -y nodejs \
&& npx playwright install --with-deps chromium \
&& rm -rf /var/lib/apt/lists/*
RUN useradd -m -s /bin/bash browser \
&& mkdir -p /home/browser/.cache \
&& cp -r /root/.cache/ms-playwright /home/browser/.cache/ \
&& chown -R browser:browser /home/browser
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
COPY start-chromium.sh /usr/local/bin/start-chromium.sh
RUN chmod +x /usr/local/bin/start-chromium.sh
RUN ln -sf /usr/share/novnc/vnc.html /usr/share/novnc/index.html
EXPOSE 6080 9222
CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
The supervisord.conf manages four processes: Xvfb, Chromium, x11vnc, and noVNC/websockify.
The start-chromium.sh script launches Chrome with specific flags including --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 for CDP access.
TODO items
The developer plans to add token authentication on noVNC and implement an auto-stop feature after idle timeout.
📖 Read the full source: r/openclaw
👀 See Also

DeepSeek-V4-Flash W4A16+FP8 with MTP Self-Speculation: 85 tok/s on 2x RTX PRO 6000 Max-Q
DeepSeek-V4-Flash quantized to W4A16+FP8 achieves 85.52 tok/s at 524k context on 2× RTX PRO 6000 Max-Q using a patched vLLM with retrofitted MTP head, up from 52.85 tok/s baseline.

Running OpenClaw, ClawdBot, and MoltBot on a Budget
Discover how to run OpenClaw, ClawdBot, and MoltBot without breaking the bank. Explore budgeting tips and free alternatives as discussed by enthusiasts on r/clawdbot.

What Breaks When Running Coding Agents on Small Local Models
Real-world failure points from testing multi-file tasks on sub-7B models: markdown fences, structured output reliability, file editing errors, and classification of read vs. write actions.

Using AI to Write Better Code More Slowly: A Bug-Finding Workflow
Nolan Lawson describes a workflow using multiple AI agents (Claude, Codex, Cursor Bugbot) to find and prioritize bugs in PRs, improving code quality over raw velocity.