Internet Archive Blocking Threatens Web History Preservation

What's Happening with Internet Archive Access
The New York Times has begun blocking the Internet Archive from crawling its website using technical measures that go beyond traditional robots.txt rules. Other newspapers including The Guardian appear to be following this approach. This blocking risks cutting off access to historical web records that journalists, researchers, and courts have relied on for decades.
Why This Matters for Historical Preservation
The Internet Archive operates the Wayback Machine, which contains more than one trillion archived web pages. For nearly thirty years, it has preserved news sites as they originally appeared online. When articles get edited, changed, or removed, the Archive often becomes the only source for seeing those original versions. Major publishers blocking these crawlers means that historical record starts to disappear.
The AI Connection and Legal Context
Publishers cite concerns about AI companies scraping news content as their motivation for blocking the Archive. The New York Times and others are suing AI companies over whether training models on copyrighted material violates the law. However, the Internet Archive is not building commercial AI systems—it's preserving historical records. The article argues that blocking nonprofit archivists is the wrong response to AI training concerns.
From a legal perspective, making material searchable is established fair use. Courts have recognized that building searchable indexes often requires making copies of underlying material. When Google copied entire books to create a searchable database, courts recognized this as fair use because it served the transformative purpose of enabling discovery and research. The same principles apply to web archiving.
Practical Impact on Research and Journalism
Wikipedia alone links to more than 2.6 million news articles preserved at the Internet Archive, spanning 249 languages. Countless bloggers, researchers, and reporters depend on the Archive as a stable, authoritative record of what was published online. If major publishers continue blocking access, future researchers may find that significant portions of web history have vanished.
📖 Read the full source: HN AI Agents
👀 See Also

Claude Code Opus 4.6 Now Defaults to 1M Token Context Window
Claude Code's Opus 4.6 model now comes with a 1 million token context window by default, maintaining the same pricing as previous versions. This change appears to be live without an official announcement.

Two new models appear on OpenRouter, possibly DeepSeek V4 variants
Two new models named healer-alpha and hunter-alpha have appeared on OpenRouter, with specifications matching leaked details about DeepSeek V4. Initial testing shows both models perform well in roleplay scenarios with no message filtering and faster token generation than GLM 5.0.

Microsoft releases Phi-4-reasoning-vision-15B multimodal model with training insights
Microsoft Research has released Phi-4-reasoning-vision-15B, a 15 billion parameter open-weight multimodal reasoning model available through Microsoft Foundry, HuggingFace, and GitHub. The model balances reasoning power with efficiency and excels at math/science reasoning and UI understanding.

Claude Now Connects to Adobe Creative Cloud, Blender, Ableton, and More
Anthropic releases connectors for Claude to integrate with Adobe Creative Cloud, Affinity, Blender, Ableton, Splice, and Autodesk, enabling app control and data retrieval via natural language.