Kreuzberg v4.7.0 adds code intelligence for 248 languages and improved markdown extraction

Kreuzberg v4.7.0 is now available. This is a Rust-core document intelligence library that works with Python, TypeScript/Node.js, Go, Ruby, Java, C#, PHP, Elixir, R, C, and WASM.
Code Intelligence and Extraction
The main highlight is code intelligence and extraction. Kreuzberg now supports 248 formats through the tree-sitter-language-pack library. This enables efficient code parsing for direct integration as a library for agents and via MCP. Agents can work with code repositories, review pull requests, index codebases, and analyze source files.
Kreuzberg extracts at the AST level:
- Functions
- Classes
- Imports
- Exports
- Symbols
- Docstrings
with code chunking that respects scope boundaries.
Markdown Quality Improvements
Poor document extraction can lead to issues down the pipeline. The team created a benchmark harness using Structural F1 and Text F1 scoring across over 350 documents and 23 formats, then optimized based on that.
Specific improvements:
- LaTeX: improved from 0% to 100% SF1
- XLSX: increased from 30% to 100% SF1
- PDF table SF1: went from 15.5% to 53.7%
All 23 formats are now at over 80% SF1. The output pipelines receive is now structurally correct by default.
Other Key Features
- New markdown rendering layer and new HTML output support
- OpenWebUI integration as a document extraction backend
- Options for docling-serve compatibility or direct connection
- Unified architecture where every extractor creates a standard typed document representation
- TOON wire format - a compact document encoding that reduces LLM prompt token usage by 30 to 50%
- Semantic chunk labeling
- JSON output
- Strict configuration validation
- Improved security
Availability
Kreuzberg is available on GitHub: https://github.com/kreuzberg-dev/kreuzberg
Kreuzberg Cloud will be out soon - a hosted version for teams that want the same extraction quality without managing infrastructure. More information at: https://kreuzberg.dev
Contributions are welcome.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Claude-Code v2.1.76 adds MCP elicitation, worktree optimizations, and numerous fixes
Claude-Code v2.1.76 introduces MCP elicitation support for structured input mid-task, adds worktree.sparsePaths for monorepo efficiency, and fixes 20+ issues including deferred tool schema loss, slash command problems, and Remote Control session stability.

certctl: Self-hosted certificate lifecycle platform with 78 API endpoints for AI agent automation
certctl is a self-hosted certificate lifecycle platform built with Go and TypeScript that exposes 78 REST API endpoints for certificate management. The platform is issuer-agnostic and target-agnostic, with an MCP server planned to expose all functionality as native MCP tools.

Portable Mind Format (PMF): Provider-Agnostic Agent Specification with 15 Open-Source Agents
The Portable Mind Format (PMF) is a JSON-based specification for defining AI agent identities that can run across multiple models and providers, including Claude, GPT-4, Gemini, DeepSeek, and local models via Ollama. It includes 15 MIT-licensed production agents and converters for Claude Code, Cursor, GitHub Copilot, and Gemini CLI.

Eden AI: European API Hub for AI Models – Pivots as OpenRouter Alternative
Eden AI offers a single unified API to access 500+ AI models (LLMs, vision, OCR, speech) with smart routing, fallback mechanisms, and region control. Positioned as a European alternative to OpenRouter.