Pali v0.1: Open Source Memory Infrastructure for LLMs

What Pali Is

Pali is open source memory infrastructure for LLMs that's infrastructure-first. It's built in Go as a single binary out of the box with configs for plug and play attachments like qdrant, neo4j, ollama, and openrouter. The project is MIT licensed and fully self-hostable.

Key Features

Multi-tenant memory APIs with tenant-scoped isolation
Hybrid retrieval across lexical, dense, fusion, reranking, and optional multi-hop expansion
MCP server with memory-first tools and tenant-aware resolution
REST API with respective Python and JavaScript packages live
Dashboard for operators inspecting tenants, memories, and system state
Plug-and-play extension points for vector stores, embedders, entity-fact backends, and scoring/routing

Benchmark Approach

The creator addresses common issues with memory stack benchmarks by implementing a reproducible approach:

Every run stores the exact config files used (profile + rendered)
Hardware is fully disclosed (CPU, GPU, RAM, model versions)
Paired comparisons only — same fixture/eval/top_k across all profiles
Speed lanes and retrieval quality lanes are kept separate

Performance Numbers

Benchmarks from testing on a Ryzen 9 7950X + RTX 5070:

sqlite + lexical: 208 store ops/s, Top1=0.32, Recall@5=0.54
qdrant + ollama (all-minilm): 98 store ops/s, Top1=0.34, Recall@5=0.52
parser+graph (structured memory stress lane): 2.4 store ops/s — slow due to structured extraction cost, but gets ~30 avg on LoCoMo with temporal highs around ~40

Important Clarification

Pali is not LLM memory in the SaaS sense. It returns raw retrieval results you optimize for your own workflow — no black box scoring, no locked provider decisions. You can swap vector backends, embedders, and scorers through config without changing your app contract.