Talkie: A 13B LLM Trained Exclusively on Pre-1931 Text, Using Claude as a Judge in RL Training

A team of researchers including Alec Radford (GPT, CLIP, Whisper), Nick Levine, and David Duvenaud just released Talkie, a 13 billion parameter language model trained exclusively on text published before 1931. The model's knowledge cutoff is December 31, 1930 — no Internet, no Wikipedia, no World War II content.
Why It Matters
Current LLMs (GPT, Claude, Gemini, Llama) all share training data from the modern Web, making it hard to separate memorization from genuine reasoning. Talkie breaks that lineage: its training distribution is fundamentally different, allowing researchers to test whether capabilities arise from memorization or generalization. As the team notes: “It's an important question how much LM capabilities arise from memorization vs generalization. Vintage LMs enable unique generalization tests.”
Claude's Role in Training
Claude Sonnet 4.6 served as the judge in Talkie's reinforcement learning pipeline (online DPO). Additionally, Claude Opus 4.4 generated synthetic multi-turn conversations that were used in the final fine-tuning stage. The team acknowledges the irony and contamination risk, flagging it as something they're working to eliminate in future versions.
Key Capabilities
- Talkie can learn to write Python code from just a few in-context examples — despite having zero modern code in its training data. It's reasoning from 19th-century mathematics texts, not retrieval.
- Designed for long-range forecasting: how well can a model "predict" the future from its frozen 1930 perspective?
- Can be used to study “invention” — whether it can develop ideas that postdate its knowledge cutoff.
- Helps isolate which capabilities are architecture-driven vs. absorbed from Web data.
Access & Licensing
Both Talkie and its variant are Apache 2.0 licensed and open-weight on Hugging Face. You can chat with it live at the provided link. The team plans a GPT-3-scale vintage model later this year.
What It's Being Used to Study
- Long-range forecasting: predict future developments from a historical vantage point.
- Invention: generate ideas that postdate its training cutoff.
- LLM identity: what makes a model itself — isolating architecture vs. data distribution effects.
📖 Read the full source: r/ClaudeAI
👀 See Also

Anthropic Doubles Claude Code Rate Limits, Removes Peak Throttling for Paid Plans
Anthropic has doubled 5-hour rate limits for Claude Code across Pro, Max, Team, and Enterprise plans, removed peak-hour throttling, and boosted API rate limits for Opus models.

Macs for Local LLM and OpenClaw: Prompt Processing Bottleneck Makes Cloud Cheaper
A developer shares that Macs are slow for prompt processing compared to Nvidia GPUs, making cloud models like Deepseek more cost-effective for AI agents unless privacy requires local inference.
Claude Code System Prompts v2.1.139: Claude Platform on AWS Docs, Summarization Security, PowerShell Tooling
CC 2.1.139 (+2,248 tokens) adds Claude Platform on AWS reference docs with SigV4 auth, security-preserving conversation summarization, PowerShell Unix command equivalence table, and several skill/prompt refinements.

User reports switching from Gemini Pro to Claude Max for academic project assistance
A user switched from Gemini Pro to Claude Max after experiencing frustration with Gemini's performance on practical tasks. They report Claude successfully reviewed their academic project, asked clarifying questions, and suggested logging learned information to a memory.md file.