Code retrieval for AI agents: Why vector embeddings fail and per-file LLM graphs win

A year-long experiment building a code indexing system for AI coding tools yielded clear results: vector embeddings on code chunks and Tree-sitter AST parsing both have critical flaws, while per-file LLM analysis stored in a Neo4j graph with semantic fulltext search works best. The findings echo recent papers like RepoGraph (ICLR 2025) and Code-Craft.
Approaches tested
- Vector embeddings on code chunks – discarded entirely. A function named
process()in a payments service and one in an image pipeline embed to similar vectors, despite having nothing to do with each other. Vectors flatten call graphs, inheritance, imports — all structural relationships. Retrieval precision was unacceptable. - Tree-sitter AST parsing – precise and fast, but structural-only. It can tell you a function exists and what it calls, but cannot answer the question “this function handles webhook retries for failed Stripe payments.” Falls short when developers phrase questions in business language.
- Per-file LLM analysis → graph – works. Every file gets an LLM call generating
purpose,summary, andbusinessContext, stored as nodes in Neo4j with edges to classes, functions, keywords, and imports. Retrieval uses fulltext search across those semantic fields instead of vector similarity. SHA-256 diffing limits reindexing to changed files, making the upfront cost manageable.
Benchmarks from literature
RepoGraph (ICLR 2025) showed +32.8% improvement on SWE-bench with graph approaches. Code-Craft achieved +82% top-1 retrieval precision using bottom-up LLM summaries from code graphs.
Comparison to existing tools
The team published a side-by-side in comparison.md. Key differences:
- Bytebell: per-file LLM → purpose + summary + businessContext + entities; Neo4j + MongoDB storage; SHA-256 diff-aware reindex.
- PageIndex: TOC reasoning tree for long PDFs/docs; no code-specific semantics.
- GitNexus: Tree-sitter AST + community detection; optional per-symbol semantics; uses LadybugDB.
- GraphRAG: per-chunk LLM entities + community clustering for general text, not code.
- Sourcegraph/Cody: LSIF/SCIP search index; no per-node semantics; deployment is self-hosted or SaaS.
- Augment: proprietary semantic index with embeddings; SaaS-only; continuous indexing managed.
Open source
The system is open source at github.com/ByteBell/bytebell-oss.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Hyper iOS App: Voice Recorder with Real-Time Transcription and Action Extraction
Hyper is an iOS voice recorder app that transcribes conversations in real-time, provides summaries and action items, and allows mid-conversation queries via wakeword detection. It's designed for unstructured meetings like 1:1s, coffee chats, and standups.

Flash-MOE Benchmark on M5 Max: 12.99 tok/s with Qwen3.5-397B
A benchmark of the 397-billion-parameter Qwen3.5 model running locally on a MacBook Pro M5 Max with 128GB RAM achieved 12.99 tokens per second using 4-bit quantization and cache-io-split 4, three times faster than the original 48GB benchmark.

Open-Source Ralph Loop Toolkit for Claude Code: Pickle Rick and Mr. Meeseeks Agents
An open-source extension for Claude Code implements the Ralph Loop technique with two autonomous agents: Pickle Rick for PRD-driven development and Mr. Meeseeks for code review. Both use tmux with live dashboards and macOS notifications.

SuperContext: A Persistent Memory Framework for AI Coding Agents
SuperContext is an open-source framework that gives AI coding tools like Claude persistent memory through structured, targeted files instead of large instruction documents. It includes an executable prompt that builds the system in about 10 minutes with no manual setup.