Two Research Projects Challenge Imitation Learning for Web Agents

Two Approaches to Web Agent Training
Two research projects challenge the standard approach of training AI agents solely through imitation of expert demonstrations, focusing specifically on web form filling tasks where models must navigate real websites, fill fields, click buttons, and submit forms.
Browser in the Loop: RL for Task Completion
The first project, "Browser in the Loop" (doi.org/10.13140/RG.2.2.24922.71360), uses an 8-billion-parameter model in a feedback loop with a real browser. Instead of only imitating expert demonstrations, the model generates action plans, executes them against live web forms, and learns from the outcome.
Reinforcement learning converts near-perfect attempts (where all fields are correct but submission fails) into actual successes. The gains come not from filling fields better, but from learning to cross the finish line—something imitation alone never optimized for.
Concentrate or Collapse: RL Challenges with Diffusion Models
The second project, "Concentrate or Collapse" (doi.org/10.13140/RG.2.2.11500.94088), explores what happens when models don't generate actions left to right at all. Diffusion language models refine entire action sequences in parallel, but applying the same RL that works for autoregressive models causes these diffusion models to collapse, with outputs degrading to incoherence.
Across 16 controlled comparisons, token-level RL improved only twice. The fix required rethinking optimization at the sequence level, where one method (ESPO) finally broke through for pure diffusion architectures.
Key Implications
The research highlights that most web agent benchmarks still evaluate on text similarity to reference trajectories rather than actual task completion. These projects suggest that what looks correct on paper and what actually works in a browser are different problems, and optimizing for the wrong one leaves performance on the table.
All 12 trained models and their pipeline have been open-sourced: Code at github.com/billy-enrizky/openbrowser-ai and models at huggingface.co/billyenrizky.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude Code Generates Python Script That Finds 10,069-Digit Emirp Record
Anthropic's Claude Opus 4.6 generated a Python script that discovered a 10,069-digit emirp (reversible prime) in about one day of CPU time, breaking the previous world record. The script uses four tiers of prime sieves including a CUDA kernel for fast random number generation.

Schiff-Rounds LIFT AI Act: What Developers Need to Know About the K-12 AI Literacy Bill
OpenAI, Google, and Microsoft back the LIFT AI Act, which funds NSF grants for K-12 AI literacy curricula, teacher training, and evaluation tools.

Claude Opus 4.7 Regresses in Reasoning and Conversation, Users Report
Opus 4.7 introduces a new tokenizer costing 30-50% more, exhibits meta-narration, position instability, and planning without execution—making it worse for technical collaboration than 4.6.

Claude Code evolving into an engineering OS rather than just AI code chat
A Reddit discussion argues Claude Code is becoming less like AI chat for coding and more like an engineering operating system with planning, code review, cloud agents, and autonomous workflows.