DocMason: Local Agent Knowledge Base for Complex Office Files

✍️ OpenClawRadar📅 Published: April 15, 2026🔗 Source
DocMason: Local Agent Knowledge Base for Complex Office Files
Ad

What DocMason Does

DocMason is a local, file-based knowledge base system designed for deep research over private work documents. The core concept is "The repo is the app. Codex is the runtime." It compiles office files into structured evidence bundles that AI agents can reason over while maintaining strict provenance tracking.

Key Features from Source

  • Handles multiple office document types: PPTX, DOCX, XLSX, PDFs, and even .EML files
  • Extracts multimodal information including IT architecture diagrams and Excel sheet data
  • Maintains document structure and visual semantics (slide layouts, presenter notes, spreadsheet references, formatting signals)
  • Runs locally with no cloud ingestion or hidden backends
  • Provides incremental knowledge base syncing when files are added or revised
  • Enforces strict data contracts and provenance boundaries

How It Works

DocMason operates as a production-grade runtime that forces AI to respect original document structure. Instead of flattening complex files into unstructured text blobs, it creates deterministic file-based evidence and runs offline retrieval algorithms locally on your machine.

Ad

Getting Started

Two setup paths are described in the source:

Path A (Start Small):

  • Drop work files into the DocMason/original_doc/ folder
  • Open the DocMason folder in Codex
  • Ask questions naturally - DocMason guides through environment setup
  • Approves prompts when building the knowledge base

Path B (Stage Entire Folders):

  • Drop department-level folders into DocMason/original_doc/
  • Open in Codex and tell it: "Please prepare the DocMason environment."
  • Then: "Please build the knowledge base."
  • Once complete, ask complex research questions against the entire corpus

The system is designed so you don't need to memorize internal commands - just speak naturally to your AI agent within a valid workspace.

Technical Details

DocMason addresses specific limitations of existing document AI tools:

  • Preserves visual layout, presenter notes, and chart-text relationships in slide decks
  • Maintains multi-sheet references and nested tables in spreadsheets
  • Retains formatting semantics like red text for "Risk" or indentation for hierarchies
  • Enables cross-document reasoning for multi-part proposals

The repository structure includes adapters, knowledge_base, runtime, skills, and sample_corpus directories, with configuration managed through docmason.yaml and pyproject.toml files.

📖 Read the full source: HN AI Agents

Ad

👀 See Also