Hybrid search with RRF improves AI memory system over pure vector search

An open-source memory system for AI assistants has been developed, using PostgreSQL with pgvector in a local-first, self-hosted setup. The system stores information for AI assistants to remember across sessions and makes it searchable.
Why pure vector search wasn't enough
The developer started with pure vector search: embedding queries, using cosine similarity, and returning top-k results. While this worked for vague questions, it consistently failed on exact matches. For example, searching for "RRF merging" would return chunks about "combining ranked lists" from months ago instead of the document that literally says "RRF merging."
Hybrid search solution
The solution involved adding a second search arm: full-text search using PostgreSQL's tsvector with a GIN index. This keyword matching catches what vector search misses. However, this created two ranked lists that needed merging.
Reciprocal Rank Fusion (RRF)
Reciprocal Rank Fusion proved to be the answer for merging the two ranked lists. The formula is simple: score = 1 / (k + rank), where k=60 (the standard value). Results that appear in both lists get both scores added. This approach requires no weight tuning and no score normalization between cosine similarity and ts_rank—it only uses rank positions.
Query enrichment technique
Before searching, the system runs queries through the embedding model's WordPiece tokenizer to extract key terms (multi-subword tokens that are likely technical or domain terms). This generates up to 3 query variations, embeds all of them, and searches in parallel. This catches results that one phrasing might miss.
Technical stack
- PostgreSQL 16 + pgvector (HNSW index for vectors, GIN index for full-text)
- all-MiniLM-L6-v2 for embeddings (384 dimensions, runs on CPU)
- Python with async psycopg 3
- 3 ingestion adapters: markdown, plaintext, and Claude conversation JSON
The entire system runs locally with no API calls for embeddings and no cloud dependencies. The code was recently shipped, and the developer has written a detailed blog post about the full approach.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Quell Proxy Fixes Claude Code Scroll-Jumping on Windows
Quell is a Rust proxy that sits between your terminal and Claude Code, stripping clear-screen sequences that cause scroll position resets during long responses. It also adds Shift+Enter for newlines, security filtering, and full Unicode support.

Silos Dashboard: Open-source web UI for managing OpenClaw agents
Silos Dashboard is an MIT-licensed web UI for managing OpenClaw agents, replacing config files and CLI with a single interface. It offers agent management, live chat with streaming, skill installation, task boards, channel integrations, and analytics.

mentioned.to vs broader monitoring tools: a Reddit-focused workflow comparison
mentioned.to is a monitoring tool specifically designed for Reddit workflows, focusing on tracking relevant posts, surfacing reply opportunities, analyzing successful content, and drafting responses rather than broad brand monitoring across multiple channels.

Symphony workflow automation tool works with Claude Code
A developer got the Symphony spec working with Claude Code to automate ticket-to-PR workflows, using Node/TypeScript initially but noting Elixir might be better. The tool requires separate API key setup and billing beyond Claude subscriptions.