Claude Code Automates AI Research: 12-Hour Experiments, 100% Compliance

Automated AI Research with Claude Code

A developer documented using Claude Code to automate AI research experiments for 12 hours straight. The project focused on CLaaS, a real-time continual learning framework that moves context into weights using self-distillation.

Experimental Setup

The goal was to tune self-distillation training runs to maximize a model's compliance to different preference verifiers, such as concise responses and no emojis. Experiments ran locally on an RTX 5090 overnight.

System Architecture

The repository was built to be highly configurable:

Every tunable parameter exposed via CLI using Hydra config management
HTML dashboards for every training step and evaluation run
Metrics, inputs, and outputs made observable through dashboards
Claude Code could query dashboards via curl requests to check progress

Experiment Management

The workflow was controlled by a local EXPERIMENTS.md file with specific rules:

Each experiment could change at most one variable or make one code change
Between experiments, the model had to either accept or revert the previous change based on results
Any new code changes had to be exposed via config for later tuning
The model recorded all progress, hypotheses, and outcomes in the file as a running journal
Used a "Ralph Wiggum loop" with the goal of maximizing preference compliance

Results

Over 12 hours, the system ran 9 experiments:

Found and fixed a model collapse bug on the first run
Tuned gradient steps per batch to 4
Tuned learning rate to 3e-5
Compliance improved from 0.000 to 1.000
Token usage was surprisingly low because most time was spent waiting for training runs between experiments

The same task was also run with Codex for 2 hours using a plain prompt, and it independently converged on the same hyperparameters.

Project repository: https://github.com/kfallah/CLaaS

📖 Read the full source: r/ClaudeAI