WebClaw: Open-Source MCP Server for Web Extraction with Claude

✍️ OpenClawRadar📅 Published: March 23, 2026🔗 Source
WebClaw: Open-Source MCP Server for Web Extraction with Claude
Ad

WebClaw is an MCP server built in Rust that adds web extraction capabilities to Claude Desktop and Claude Code. It addresses the problem where Claude's built-in web_fetch gets blocked on most real websites, returning 403 Forbidden errors, Cloudflare challenges, or empty responses.

Technical Solution

The server uses TLS fingerprinting at the HTTP layer so websites see a real Chrome browser fingerprint instead of a bot. In testing against 10 popular sites, Claude's built-in web_fetch failed on all 10, while WebClaw successfully extracted content from 9 out of 10 sites.

Features

  • scrape: Extract clean content from any URL
  • crawl: Recursive site crawling
  • extract: Structured data extraction using JSON schema or natural language prompts
  • summarize: Page summaries
  • brand: Extract colors, fonts, logos from any site
  • diff: Track content changes
  • map, batch, search, research tools
Ad

Claude Code Development

The extraction pipeline was implemented with Claude Code, including:

  • Scoring algorithm based on text density, semantic tags, and link ratio penalties
  • Noise filter that strips navigation, ads, and cookie banners without false positives on Tailwind classes
  • Multiple rounds of refinement for edge cases

Setup and Usage

Setup requires one command:

npx create-webclaw

The tool detects Claude Desktop and Claude Code automatically and writes the configuration. No API key is needed for 8 of the 10 tools, and everything runs locally.

Performance Benefits

The output is optimized for Claude's context window. A typical news article goes from 4,820 tokens (raw HTML) to 1,590 tokens in WebClaw's LLM format - a 67% reduction while maintaining the same content.

WebClaw is free and open source under the MIT license, available at https://github.com/0xMassi/webclaw.

📖 Read the full source: r/ClaudeAI

Ad

👀 See Also

Claude's 171 Internal Emotion Vectors Influence Output: Toolkit Based on Anthropic Research
Tools

Claude's 171 Internal Emotion Vectors Influence Output: Toolkit Based on Anthropic Research

Anthropic's research paper reveals Claude has 171 internal activation patterns that function like emotion vectors, causally driving its behavior before it writes. A developer created a toolkit with 7 practical prompting principles and system prompts based on these findings.

OpenClawRadar
Buyer Eval: Claude skill for B2B vendor evaluation using AI agent conversations
Tools

Buyer Eval: Claude skill for B2B vendor evaluation using AI agent conversations

A Claude skill that evaluates B2B software vendors by researching your company, asking domain-specific questions, and directly interrogating vendor AI agents through the Salespeak Frontdoor API. It cross-references claims against independent sources and produces evidence-based scorecards with transparent verification levels.

OpenClawRadar
Sitefire Automates AI Search Optimization with Content Agents
Tools

Sitefire Automates AI Search Optimization with Content Agents

Sitefire's platform monitors AI search results, analyzes which pages get cited, and uses content agents to draft improvements or create new pages that get pushed directly to clients' CMS. One client saw AI bot requests increase from ~200/day to ~570/day within ten days.

OpenClawRadar
Self-Hosted Contextual Bandit in Rust: Syntra & Lycan for Adaptive Decision Systems
Tools

Self-Hosted Contextual Bandit in Rust: Syntra & Lycan for Adaptive Decision Systems

Two open-source projects: Lycan (graph execution language with strategy nodes and learned weights) and Syntra (Docker/API appliance serving compiled Lycan capsules). Found data pipeline bugs before runtime bugs when dogfooding on an AI stock-debate product.

OpenClawRadar