LiteParse: Fast Open-Source Document Parser for AI Agents

LiteParse is an open-source document parser focused on fast, local parsing with spatial text extraction and bounding boxes. It runs entirely locally without cloud dependencies or GPU requirements, processing hundreds of pages in seconds.
Key Features
- Apache 2.0 licensed open-source tool
- Spatial text parsing with bounding boxes for precise text positioning
- No dependency on local or frontier VLMs (Vision Language Models)
- Runs on any machine without GPU requirements
- Supports multiple file formats: PDFs, Office documents, images
- Higher accuracy than similar tools like PyPDF, PyMuPDF, MarkItDown
- One-line installation as a skill for 40+ AI agents including Claude Code, Cursor, OpenClaw, Windsurf
Installation Options
CLI Tool Installation:
npm i -g @llamaindex/liteparse
Then use:
lit parse document.pdf
lit screenshot document.pdf
For macOS and Linux via Homebrew:
brew tap run-llama/liteparse
brew install llamaindex-liteparse
Agent Skill Installation:
npx skills add run-llama/llamaparse-agent-skills --skill liteparse
Usage Examples
Basic parsing:
lit parse document.pdf
lit parse document.pdf --format json -o output.md
lit parse document.pdf --target-pages "1-5,10,15-20"
lit parse document.pdf --no-ocr
Batch parsing:
lit batch-parse ./input-directory ./output-directory
Screenshot generation (useful for LLM agents):
lit screenshot document.pdf -o ./screenshots
lit screenshot document.pdf --target-pages "1,3,5" -o ./screenshots
lit screenshot document.pdf --dpi 300 -o ./screenshots
lit screenshot document.pdf --target-pages "1-10" -o ./screenshots
Library Usage
Install as a dependency:
npm install @llamaindex/liteparse
# or
pnpm add @llamaindex/liteparse
Basic usage:
import { LiteParse } from '@llamaindex/liteparse';
const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text);
Buffer/Uint8Array input (no disk I/O):
import { LiteParse } from '@llamaindex/liteparse';
import { readFile } from 'fs/promises';
const parser = new LiteParse();
const pdfBytes = await readFile('document.pdf');
const result = await parser.parse(pdfBytes);
Technical Details
- Flexible OCR system with built-in Tesseract.js (zero setup)
- Supports HTTP servers for OCR (EasyOCR, PaddleOCR, custom)
- Standard OCR API specification
- Multiple output formats: JSON and Text
- Standalone binary with no cloud dependencies
- Multi-platform support: Linux, macOS (Intel/ARM), Windows
For complex documents with dense tables, multi-column layouts, charts, handwritten text, or scanned PDFs, the creators recommend LlamaParse, their cloud-based document parser built for production document pipelines.
📖 Read the full source: HN AI Agents
👀 See Also

Claude Code Matrix Channel Plugin Built in Rust with E2EE Support
A developer built a Matrix channel plugin for Claude Code in Rust, adding support for text, files, images with E2EE decryption, reply threading, reactions, and bot commands. The 14MB binary is MIT licensed and works with any Matrix homeserver.

Developer Builds Native tmux Port for Windows Using Claude Code Without Knowing C
A developer created tmux-win, a native Windows multiplexer using Claude Code to handle Win32 API and conpty implementation despite not knowing C. The tool features vertical/horizontal splits, detachable sessions, and native performance without VM overhead.

YourMemory: AI memory with biological decay hits 59% recall on LoCoMo-10
YourMemory gives AI agents persistent memory using Ebbinghaus forgetting curve and graph-enhanced retrieval. Benchmarked at 59% Recall@5 on LoCoMo-10, 2× better than Zep Cloud.

Comparison of RunLobster vs Hosted OpenClaw Solutions
A developer tested RunLobster against KiwiClaw, xCloud, and self-hosted OpenClaw for 2 weeks each. RunLobster differs fundamentally as a product rather than just hosting, with 3,000 one-click integrations and memory that builds over time.