Reduce AI Coding Session Costs by 90% with Graph-Based Code Indexing

A Reddit user reports spending $2-6 per query on Claude Code due to the model re-reading dozens of files every session. Even with caching (70% of tokens from cache at 90% discount), cache resets per session. The fix: a local server that indexes the codebase into a graph database, queried via the Model Context Protocol (MCP) instead of raw file reads.
How It Works
- Instead of AST parsing or vector embeddings, the tool uses an LLM to generate a purpose, summary, and business context for each file, plus links to its functions, classes, and imports.
- The graph is exposed through an MCP server; Claude queries the graph for targeted lookups (2-4 nodes per question) instead of dumping the entire repo into context.
- Session costs dropped from dollars to cents. The approach works equally well with open-source models like DeepSeek-V4 and Kimi-2.6 because retrieval (not model size) does the heavy lifting.
Setup Details
Everything runs locally, single-tenant, no cloud dependency. The project is open-sourced on GitHub: github.com/ByteBell/bytebell-oss. The user notes they aren't using AST parsing or vectors — the graph is LLM-generated file analyses.
Who This Is For
Developers using Claude Code (or any token-cost AI agent) on large codebases who want to slash costs by caching structural context across sessions.
📖 Read the full source: r/ClaudeAI
👀 See Also

Orion: Bypassing CoreML to Run and Train LLMs Directly on Apple Neural Engine
Orion is an open-source Objective-C system that bypasses Apple's CoreML to run and train LLMs directly on the Apple Neural Engine (ANE), achieving 170+ tokens/s for GPT-2 124M decode and stable multi-step training on a 110M parameter transformer.

Marky: A Lightweight Markdown Viewer for Agent-Generated Documentation
Marky is a desktop markdown viewer built with Tauri v2 and React that opens .md files from the terminal with live reload. It features CLI-first usage, syntax highlighting with Shiki, KaTeX math support, Mermaid diagrams, and workspaces for folders.

WebClaw: Open-Source MCP Server for Web Extraction with Claude
WebClaw is an open-source MCP server built with Claude Code that provides web extraction tools for Claude Desktop and Claude Code, solving Claude's built-in web_fetch limitations with TLS fingerprinting and content optimization.

Doc Harness: A Claude Code Skill for Maintaining Project State Across Sessions
Doc Harness is a Claude Code skill that creates a lightweight documentation system with five structured files to help AI agents maintain project context across sessions. It addresses issues like context resets, forgotten rules, and the need to re-explain projects to new agents.