Caliby: Open-Source Embedded Vector Database for AI Agents with Hybrid Text+Vector Storage

Caliby is now open-source: an embedded, in-process vector database designed for AI Agent and RAG workloads. Developed by a team including a PhD from MIT’s DB Group (Michael Stonebraker’s team) and Sea-Land AI, it's a single C++ library with Python bindings.
Why Another Vector DB?
The team found existing solutions lacking for agent/LLM use cases:
- FAISS: Pure in-memory, no persistence — restart clears the index.
- pgvector: Performance ceiling due to PostgreSQL dependency.
- Chroma / Qdrant / Milvus: Require separate services, too heavy for embedded scenarios.
- LanceDB: Embedded but lacks advanced indexes like DiskANN, performance bottlenecks.
Caliby aims to be a lightweight, embeddable data engine like DuckDB, but for vector + text storage.
Architecture: Hybrid Text + Vector Storage
Caliby unifies text and vector data in a single system. Instead of juggling a vector DB and a relational DB, you store embeddings, raw text, and metadata in one library. The architecture uses a page-organized buffer pool for persistence.
Supported Indexes
- HNSW: General high-performance retrieval, CPU-optimized.
- DiskANN (Vamana Graph): Designed for disk-based scenarios, outperforms FAISS on disk.
- IVF+PQ: Inverted file with product quantization for compact indexes.
Caliby also supports brute-force search with SIMD (AVX-512, AVX2, SSE) distance functions (L2, InnerProduct, Cosine).
Performance Claims
Caliby beats pgvector by 4x and significantly surpasses FAISS in disk-storage scenarios. It handles millions to tens of millions of vectors on disk without requiring a separate service.
Getting Started
Simply install the package:
pip install caliby
The Python API exposes HnswIndex, DiskANN, and IVFPQIndex classes via pybind11. No dependencies, no server setup, no DevOps.
Who It's For
AI Agent developers and RAG pipeline builders who want an embeddable, zero-infrastructure vector database with hybrid text+vector capabilities and production-grade performance.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Code Evolution Method Triples LLM Performance on ARC-AGI-2 Benchmark
Researchers achieved a 2.8x improvement on the ARC-AGI-2 benchmark using code evolution with open-weight models, reaching 34% accuracy at $2.67 per task. The same method pushed Gemini 3.1 Pro to 95% accuracy at $8.71 per task.

Autonomous coding workflow ships 163K lines overnight using Claude Code
A developer built an autonomous workflow that completed 72 tasks overnight, generating 163,643 lines of code and 6,400+ passing tests with an 85% first-attempt success rate.

Building a Local Open-Source AI Workspace with Rust and Tauri
Explore a fully local, open-source AI workspace built using Rust, Tauri, and sqlite-vec, without a Python backend.

Reverse-engineered Claude Code SDK released in four languages
A developer has reverse-engineered Claude Code and created single-file SDKs in Node.js, Python, Go, and Rust with zero dependencies. The tools provide full agent loop with streaming and tool use while using existing Claude Pro/Max subscriptions.