AgentCrawl Update Adds Critical Crawler Features and Enhancements

The latest update to AgentCrawl enhances its functionality as a TypeScript scraper/crawler, introducing several important features for developers using AI agents. This release focuses on production-readiness by integrating crawler correctness and politeness, caching mechanisms, resumable crawls, and enhanced data extraction capabilities.
Key Details
- Removed Tool Adapters: The update eliminates the tool adapters for the agents SDK and Vercel AI SDK, allowing users to define their tools independently.
- Updated Libraries: The package now includes the latest version of Zod for better data validation.
- Crawler Correctness: Robots.txt compliance is now opt-in and supports Disallow/Allow and Crawl-delay directives. Opt-in sitemap seeding from
/sitemap.xmlis also available. - URL Normalization: Improved URL normalization comprehensively strips tracking parameters and can handle canonical normalization.
- Throttling Options: The crawler supports per-host throttling with configurable
perHostConcurrencyandminDelayMs. - Caching: An opt-in disk HTTP cache for static fetches implements ETag and Last-Modified support. The system caches post-cleaning and markdown conversion of
ScrapedPageand can handle server responses with status 304 by serving cached bodies. - Resumable Crawls: A new opt-in crawlState persistence saves the crawl's frontier, including the queue, visited pages, queued items, errors, and max depth, which allows for resumable crawls without re-visiting pages.
- Data Extraction Improvements: The scraper now supports structured metadata extraction, including Canonical URL, OpenGraph, Twitter cards, and JSON-LD, kept in
metadata.structured. - Chunking for Agents: Opt-in chunking functionality returns
page.chunks[]with an approximate token size, heading path, and citation anchor, which is beneficial for RAG/tool loops.
Who It's For
This update is particularly beneficial for developers utilizing AI agents requiring efficient and structured web scraping capabilities.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Crag: Open-source tool generates unified AI agent rules from project configs
Crag is an open-source compiler that analyzes project configurations and generates a single governance.md file, then compiles it into multiple AI agent rule files to prevent configuration drift across tools like Claude Code, Cursor, and Copilot.

Atuin v18.13 adds AI shell commands, faster search daemon, and PTY proxy
Atuin v18.13 introduces three major features: an AI-powered English-to-Bash helper called atuin ai, a faster search daemon with in-memory indexing, and a PTY proxy called hex that enables popup rendering without clearing terminal output.

Mnemos: Open-Sourced Local-First Memory Layer for Coding Agents
Mnemos is a local-first memory layer for solo coding-agent workflows that addresses common memory system failures like scope bleed, stale facts, and unbounded transcript growth. The public beta includes SQLite starter profiles, MCP support for Claude Code/Desktop, and a biomimetic pipeline with components like SurprisalGate and MutableRAG.

ClaudyBro: Native macOS Terminal for Claude Code Workflows
ClaudyBro is a 3.5 MB native Swift terminal app built specifically for Claude Code users, featuring image paste, process inspection, orphan cleanup, and smart MCP management. It uses 68 MB memory idle and 82 MB with Claude running.