Coding Agent Session Logs Are Stored Locally, Could Enable Open Federated Training

When you use coding agents like Claude Code or Codex CLI in agent mode, they log comprehensive session data locally on your machine. These logs capture the full interaction loop: your initial task, the model's reasoning process, every tool call made, every environment response, every error encountered and retry attempted. This creates complete (state → action → reward → next state) tuples—the exact data format reinforcement learning researchers need.
What's in the logs
The source author checked their own machines and found:
- Mac Mini: ~/.claude/projects/ containing 3.1GB across 1103 files from 574 agentic sessions
- MacBook: ~/.codex/sessions/ containing 2.4GB across 3530 files from 79 agentic sessions
- MacBook: ~/.claude/projects/ containing 652MB across 316 files from 99 agentic sessions
In total, they identified 775 sessions with real tool calls containing approximately 41 million tokens. Extrapolated across thousands of developers, this could represent hundreds of billions of tokens of real agentic trajectory data—data that currently has no open equivalent like The Pile dataset.
Why this data matters
The environment provides clear feedback signals: exit code 0 or not, tests pass or not. This offers the missing training signal for causal reasoning, error recovery, and long-horizon planning—areas where current models struggle. Big AI labs already collect this data internally to train their proprietary models, but there's no open equivalent because the data is fragmented across individual developer machines.
The proposal: Federated learning
The post proposes using federated learning where your data never leaves your machine. You would train a small LoRA adapter locally, share only the weights with differential privacy noise added, and receive an improved global model in return. Everyone contributes compute and signal without exposing their raw data. Alternatively, the community could anonymize the data to create a dataset for fine-tuning models.
Practical steps
To preserve your logs (Claude Code deletes them after 30 days by default):
echo '{"cleanupPeriodDays": 36500}' > ~/.claude/settings.json
To check what's on your own machines:
du -sh ~/.codex/sessions/ 2>/dev/null
du -sh ~/.claude/projects/ 2>/dev/null
find ~/.codex/sessions/ -name "*.jsonl" | wc -l
find ~/.claude/projects/ -name "*.jsonl" | wc -l
The Reddit post encourages developers to share their numbers in the comments to gauge the actual scale of unused data across the community, with the goal of building an open equivalent if there's enough interest.
📖 Read the full source: r/LocalLLaMA
👀 See Also

PostmarketOS February 2026 Update: Generic Kernels and AI Policy
PostmarketOS now offers generic kernel packages (linux-postmarketos-mainline, -stable, -lts) and has updated its AI policy to explicitly forbid generative AI. The project also saw contributor changes and hardware CI improvements.

Microsoft Copilot injects ads into GitHub and GitLab pull requests
Microsoft Copilot has reportedly injected ads into 1.5 million GitHub pull requests and also affects GitLab. The ads appear within pull request descriptions generated by the AI coding assistant.

Senior Government AI Lead Lacks Local LLM Awareness: A Developer's Account
A local LLM developer reports that a senior government AI leader was unaware of why businesses would choose local LLMs over cloud APIs, despite understanding technical basics.

Claude.ai, API, and Claude Code Experiencing Elevated Errors
Claude.ai, the Claude API, and Claude Code are experiencing elevated errors with the web interface and developer console down. Claude Code login via Claude.ai is broken, though logged-in users can still use it.