Talkie: A 13B LLM Trained Exclusively on Pre-1931 Text, Using Claude as a Judge in RL Training

✍️ OpenClawRadar📅 Published: April 28, 2026🔗 Source
Talkie: A 13B LLM Trained Exclusively on Pre-1931 Text, Using Claude as a Judge in RL Training
Ad

A team of researchers including Alec Radford (GPT, CLIP, Whisper), Nick Levine, and David Duvenaud just released Talkie, a 13 billion parameter language model trained exclusively on text published before 1931. The model's knowledge cutoff is December 31, 1930 — no Internet, no Wikipedia, no World War II content.

Why It Matters

Current LLMs (GPT, Claude, Gemini, Llama) all share training data from the modern Web, making it hard to separate memorization from genuine reasoning. Talkie breaks that lineage: its training distribution is fundamentally different, allowing researchers to test whether capabilities arise from memorization or generalization. As the team notes: “It's an important question how much LM capabilities arise from memorization vs generalization. Vintage LMs enable unique generalization tests.”

Claude's Role in Training

Claude Sonnet 4.6 served as the judge in Talkie's reinforcement learning pipeline (online DPO). Additionally, Claude Opus 4.4 generated synthetic multi-turn conversations that were used in the final fine-tuning stage. The team acknowledges the irony and contamination risk, flagging it as something they're working to eliminate in future versions.

Ad

Key Capabilities

  • Talkie can learn to write Python code from just a few in-context examples — despite having zero modern code in its training data. It's reasoning from 19th-century mathematics texts, not retrieval.
  • Designed for long-range forecasting: how well can a model "predict" the future from its frozen 1930 perspective?
  • Can be used to study “invention” — whether it can develop ideas that postdate its knowledge cutoff.
  • Helps isolate which capabilities are architecture-driven vs. absorbed from Web data.

Access & Licensing

Both Talkie and its variant are Apache 2.0 licensed and open-weight on Hugging Face. You can chat with it live at the provided link. The team plans a GPT-3-scale vintage model later this year.

What It's Being Used to Study

  • Long-range forecasting: predict future developments from a historical vantage point.
  • Invention: generate ideas that postdate its training cutoff.
  • LLM identity: what makes a model itself — isolating architecture vs. data distribution effects.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also