AgentCrawl Update Adds Critical Crawler Features and Enhancements

✍️ OpenClawRadar📅 Published: February 13, 2026🔗 Source
AgentCrawl Update Adds Critical Crawler Features and Enhancements
Ad

The latest update to AgentCrawl enhances its functionality as a TypeScript scraper/crawler, introducing several important features for developers using AI agents. This release focuses on production-readiness by integrating crawler correctness and politeness, caching mechanisms, resumable crawls, and enhanced data extraction capabilities.

Key Details

  • Removed Tool Adapters: The update eliminates the tool adapters for the agents SDK and Vercel AI SDK, allowing users to define their tools independently.
  • Updated Libraries: The package now includes the latest version of Zod for better data validation.
  • Crawler Correctness: Robots.txt compliance is now opt-in and supports Disallow/Allow and Crawl-delay directives. Opt-in sitemap seeding from /sitemap.xml is also available.
  • URL Normalization: Improved URL normalization comprehensively strips tracking parameters and can handle canonical normalization.
  • Throttling Options: The crawler supports per-host throttling with configurable perHostConcurrency and minDelayMs.
  • Caching: An opt-in disk HTTP cache for static fetches implements ETag and Last-Modified support. The system caches post-cleaning and markdown conversion of ScrapedPage and can handle server responses with status 304 by serving cached bodies.
  • Resumable Crawls: A new opt-in crawlState persistence saves the crawl's frontier, including the queue, visited pages, queued items, errors, and max depth, which allows for resumable crawls without re-visiting pages.
  • Data Extraction Improvements: The scraper now supports structured metadata extraction, including Canonical URL, OpenGraph, Twitter cards, and JSON-LD, kept in metadata.structured.
  • Chunking for Agents: Opt-in chunking functionality returns page.chunks[] with an approximate token size, heading path, and citation anchor, which is beneficial for RAG/tool loops.
Ad

Who It's For

This update is particularly beneficial for developers utilizing AI agents requiring efficient and structured web scraping capabilities.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also