Talkie 13B LLM: No Modern Data, RL Training with Claude as Judge

A team of researchers including Alec Radford (GPT, CLIP, Whisper), Nick Levine, and David Duvenaud just released Talkie, a 13 billion parameter language model trained exclusively on text published before 1931. The model's knowledge cutoff is December 31, 1930 — no Internet, no Wikipedia, no World War II content.

Why It Matters

Current LLMs (GPT, Claude, Gemini, Llama) all share training data from the modern Web, making it hard to separate memorization from genuine reasoning. Talkie breaks that lineage: its training distribution is fundamentally different, allowing researchers to test whether capabilities arise from memorization or generalization. As the team notes: “It's an important question how much LM capabilities arise from memorization vs generalization. Vintage LMs enable unique generalization tests.”

Claude's Role in Training

Claude Sonnet 4.6 served as the judge in Talkie's reinforcement learning pipeline (online DPO). Additionally, Claude Opus 4.4 generated synthetic multi-turn conversations that were used in the final fine-tuning stage. The team acknowledges the irony and contamination risk, flagging it as something they're working to eliminate in future versions.

Key Capabilities

Talkie can learn to write Python code from just a few in-context examples — despite having zero modern code in its training data. It's reasoning from 19th-century mathematics texts, not retrieval.
Designed for long-range forecasting: how well can a model "predict" the future from its frozen 1930 perspective?
Can be used to study “invention” — whether it can develop ideas that postdate its knowledge cutoff.
Helps isolate which capabilities are architecture-driven vs. absorbed from Web data.

Access & Licensing

Both Talkie and its variant are Apache 2.0 licensed and open-weight on Hugging Face. You can chat with it live at the provided link. The team plans a GPT-3-scale vintage model later this year.

What It's Being Used to Study

Long-range forecasting: predict future developments from a historical vantage point.
Invention: generate ideas that postdate its training cutoff.
LLM identity: what makes a model itself — isolating architecture vs. data distribution effects.

📖 Read the full source: r/ClaudeAI

Talkie: A 13B LLM Trained Exclusively on Pre-1931 Text, Using Claude as a Judge in RL Training

Why It Matters

Claude's Role in Training

Key Capabilities

Access & Licensing

What It's Being Used to Study

👀 See Also

Anthropic Doubles Claude Code Rate Limits, Removes Peak Throttling for Paid Plans

Macs for Local LLM and OpenClaw: Prompt Processing Bottleneck Makes Cloud Cheaper

Claude Code System Prompts v2.1.139: Claude Platform on AWS Docs, Summarization Security, PowerShell Tooling

User reports switching from Gemini Pro to Claude Max for academic project assistance