Karpathy's autoresearch project: AI agents run overnight LLM training experiments

What Karpathy's autoresearch project does
Andrej Karpathy released a tiny repository called "autoresearch" that demonstrates an "AI researcher in a loop" concept. The system uses an AI agent to autonomously run LLM training experiments overnight on a single GPU.
How it works
The agent follows this workflow:
- Continuously edits the
train.pyfile - Runs 5-minute nanochat training experiments
- Checks whether the validation bits-per-byte (
val_bpb) metric improved - Repeats this cycle while you sleep
Setup and configuration
The project has a super minimal setup:
- Hardware: One GPU
- Files: One main file
- Metrics: One primary metric (
val_bpb)
The human writes the research organization prompt in program.md, and the agent handles the code iteration.
Experiment throughput
With a fixed 5-minute budget per experiment, the system can run approximately 12 experiments per hour.
This approach demonstrates a practical implementation of automated research where AI agents can explore parameter spaces and training configurations autonomously, potentially accelerating experimentation cycles for developers working with language models.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Open-source Specialist Dispatch adapter delegates complex tasks to Claude Code
expert-dispatch is a ~500-line bash script that lets a cheap AI assistant delegate complex coding tasks to Claude Code CLI. It uses commands like dispatch-cc run to send tasks and maintains per-project directories with CLAUDE.md for persistent context.

Lisp Development with AI Agents: High Costs and Technical Challenges
A DevOps engineer found AI agents struggle with Lisp development, costing $10-$20 in minutes for subpar code, while Python and Go work efficiently. He created tmux-repl-mcp to improve REPL interaction but still faced high token costs and tooling issues.

Google Workspace CLI includes OpenClaw setup guide in documentation
Google's new gws (Google Workspace CLI) documentation explicitly mentions OpenClaw setup by name in a dedicated section for AI agent skills. This follows recent discussions about Google reviewing account suspensions for AI agents.

120 Prompt Patterns Tested: 8 That Actually Work for Claude Code
A 3-month empirical test of 120 prompt patterns for Claude Code yields 8 actionable commands and 5 validation prompts. Key patterns: L99 (cuts hedging), /ghost (removes AI voice), OODA (structured reasoning), ULTRATHINK (deep reasoning), HARDMODE (constraint debugging).