Kreuzberg v4.7.0 adds code intelligence for 248 languages and improved markdown extraction

✍️ OpenClawRadar📅 Published: April 14, 2026🔗 Source
Kreuzberg v4.7.0 adds code intelligence for 248 languages and improved markdown extraction
Ad

Kreuzberg v4.7.0 is now available. This is a Rust-core document intelligence library that works with Python, TypeScript/Node.js, Go, Ruby, Java, C#, PHP, Elixir, R, C, and WASM.

Code Intelligence and Extraction

The main highlight is code intelligence and extraction. Kreuzberg now supports 248 formats through the tree-sitter-language-pack library. This enables efficient code parsing for direct integration as a library for agents and via MCP. Agents can work with code repositories, review pull requests, index codebases, and analyze source files.

Kreuzberg extracts at the AST level:

  • Functions
  • Classes
  • Imports
  • Exports
  • Symbols
  • Docstrings

with code chunking that respects scope boundaries.

Markdown Quality Improvements

Poor document extraction can lead to issues down the pipeline. The team created a benchmark harness using Structural F1 and Text F1 scoring across over 350 documents and 23 formats, then optimized based on that.

Specific improvements:

  • LaTeX: improved from 0% to 100% SF1
  • XLSX: increased from 30% to 100% SF1
  • PDF table SF1: went from 15.5% to 53.7%

All 23 formats are now at over 80% SF1. The output pipelines receive is now structurally correct by default.

Ad

Other Key Features

  • New markdown rendering layer and new HTML output support
  • OpenWebUI integration as a document extraction backend
  • Options for docling-serve compatibility or direct connection
  • Unified architecture where every extractor creates a standard typed document representation
  • TOON wire format - a compact document encoding that reduces LLM prompt token usage by 30 to 50%
  • Semantic chunk labeling
  • JSON output
  • Strict configuration validation
  • Improved security

Availability

Kreuzberg is available on GitHub: https://github.com/kreuzberg-dev/kreuzberg

Kreuzberg Cloud will be out soon - a hosted version for teams that want the same extraction quality without managing infrastructure. More information at: https://kreuzberg.dev

Contributions are welcome.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also

Claude-Code v2.1.76 adds MCP elicitation, worktree optimizations, and numerous fixes
Tools

Claude-Code v2.1.76 adds MCP elicitation, worktree optimizations, and numerous fixes

Claude-Code v2.1.76 introduces MCP elicitation support for structured input mid-task, adds worktree.sparsePaths for monorepo efficiency, and fixes 20+ issues including deferred tool schema loss, slash command problems, and Remote Control session stability.

OpenClawRadar
certctl: Self-hosted certificate lifecycle platform with 78 API endpoints for AI agent automation
Tools

certctl: Self-hosted certificate lifecycle platform with 78 API endpoints for AI agent automation

certctl is a self-hosted certificate lifecycle platform built with Go and TypeScript that exposes 78 REST API endpoints for certificate management. The platform is issuer-agnostic and target-agnostic, with an MCP server planned to expose all functionality as native MCP tools.

OpenClawRadar
Portable Mind Format (PMF): Provider-Agnostic Agent Specification with 15 Open-Source Agents
Tools

Portable Mind Format (PMF): Provider-Agnostic Agent Specification with 15 Open-Source Agents

The Portable Mind Format (PMF) is a JSON-based specification for defining AI agent identities that can run across multiple models and providers, including Claude, GPT-4, Gemini, DeepSeek, and local models via Ollama. It includes 15 MIT-licensed production agents and converters for Claude Code, Cursor, GitHub Copilot, and Gemini CLI.

OpenClawRadar
Eden AI: European API Hub for AI Models – Pivots as OpenRouter Alternative
Tools

Eden AI: European API Hub for AI Models – Pivots as OpenRouter Alternative

Eden AI offers a single unified API to access 500+ AI models (LLMs, vision, OCR, speech) with smart routing, fallback mechanisms, and region control. Positioned as a European alternative to OpenRouter.

OpenClawRadar