Understudy: A Teachable Desktop Agent That Learns Tasks by Demonstration

What Understudy Does
Understudy is a teachable desktop agent that operates your computer like a human colleague — handling GUI, browser, shell, file system, and messaging tools in one local runtime. The core innovation is teach-by-demonstration: you perform a task once, the agent records screen video plus semantic events, extracts the intent (not just coordinates), and turns it into a reusable skill.
Current Implementation Status
The system is designed as five layers, with current implementation status:
- Layer 1 (Operate Software Natively): Implemented today on macOS. Operates any macOS desktop app using 13 tools + screenshot grounding + native input.
- Layer 2 (Learn from Demonstrations): Implemented and usable today. User shows a task once — agent extracts intent, validates, learns.
- Layer 3 (Crystallized Memory): Partially implemented. Agent accumulates experience from daily use, hardens successful paths.
- Layer 4 (Route Optimization): Partially implemented. Automatically discover and upgrade to faster execution routes.
- Layer 5 (Proactive Autonomy): Still the long-term direction. Notice and act in its own workspace without disrupting the user.
Technical Capabilities
Understudy is a unified desktop runtime that mixes every execution route in one agent loop, one session, one policy pipeline:
- GUI: 13 tools + screenshot grounding + native input for any macOS desktop app
- Browser: Playwright managed + Chrome extension relay for any website with login sessions
- Shell: bash tool with full local access for CLI tools, scripts, file system
- Web: web_search + web_fetch for real-time information retrieval
- Memory: Semantic memory across sessions for persistent context and preferences
- Messaging: 8 channel support
How It Works in Practice
In the demo video, the creator teaches Understudy to: Google Image search → download a photo → remove background in Pixelmator Pro → export → send via Telegram. Then asks it to do the same for Elon Musk. The replay isn't a brittle macro — the published skill stores intent steps, route options, and GUI hints only as a fallback. It can prefer faster routes when available instead of repeating every GUI step.
Installation and Setup
Current platform: macOS only. Installation is via npm:
npm install -g @understudy-ai/understudy
understudy wizard
The published skill artifact from the showcase demo is available at examples/published-skills/taught-person-photo-cutout-bc88ec/SKILL.md for inspection.
Who It's For
Developers who work across multiple desktop applications and want to automate repetitive tasks without building custom integrations or workflow builders.
📖 Read the full source: HN AI Agents
👀 See Also

Orc: Multi-Agent Coding Orchestration Tool Adds Planning and Notification Features
Orc is an open-source tool that orchestrates AI coding agents across projects with a local TUI interface. The latest release adds planning as a first-class phase, notification systems for human intervention, and natural language lifecycle hooks.

Skill Scaffolder: Build OpenClaw Skills Without Writing Code
Skill Scaffolder is an open-source tool that lets users create OpenClaw skills by describing what they want in plain English. It handles the entire process—interviewing users, writing skill files, testing, and installation—without requiring YAML, Python, or config files.

AIDA: Open-Source Platform for AI-Powered Penetration Testing
AIDA is an open-source platform that provides AI agents with a full penetration testing environment via MCP connection to a Docker container. The latest version replaces the 40GB Exegol requirement with a purpose-built 1GB container containing essential security tools.

Weejur: A Simple UI Front-End for GitHub Pages Publishing
Weejur is a free tool that provides a simplified UI for publishing websites via GitHub Pages, allowing users to paste HTML or upload files after OAuth login.