SourceBridge: Open-source Codebase Analysis with Local LLMs

SourceBridge is an open-source project that uses local LLMs to build structured understanding of codebases. You point it at a Git repository and it indexes the codebase into a symbol graph containing files, functions, classes, and dependencies. The tool then uses your LLM to build a hierarchical understanding tree starting from individual code segments and rolling up through files, packages, and the full repository.

What it generates

Cliff notes: multi-level summaries grounded in actual code
Code tours: architecturally-ordered walkthroughs with specific file/function references
Learning paths: pedagogically structured onboarding material
Workflow stories: data flow traces through the system
Semantic search against the repository graph

Local model support

Local model support was a priority from day one. Currently supported backends include:

Ollama — primary local backend, what the developer tests against daily
llama.cpp — direct llama-server support, slightly faster than Ollama in testing
vLLM — for GPU servers
LM Studio — including speculative decoding
SGLang — for multi-GPU setups

All backends work via the OpenAI-compatible API, so anything that speaks that protocol works. Cloud providers (Anthropic, OpenAI, Gemini, OpenRouter) are also supported for when you want higher quality on specific tasks.

Model performance

The developer has been running it primarily on Qwen 3.5 35B-A3B (MoE, only 3B active parameters) via llama.cpp on a Mac Studio. At Q4_K_XL quantization it runs at approximately 50 tokens/second and produces solid cliff notes and code tours. For larger repositories, Qwen 3.5 122B-A10B via Ollama has been tested — it shows better instruction following but needs about 76GB RAM.

For comprehension tasks (summarizing code, building the understanding tree), 32B-class models do a reasonable job. The quality gap between local and cloud models is noticeable but not a dealbreaker for most use cases. Cloud models still clearly win in report-style generation where you need the LLM to follow complex formatting instructions without looping.

Thinking mode in Qwen 3.5 models is disabled by default — it wastes tokens on reasoning chains that don't improve comprehension output. This is configurable via environment variable if you want to experiment.

Architecture

Go API server (indexing, auth, job queue, graph store)
Python gRPC worker (LLM calls, comprehension pipeline, artifact generation)
Next.js web UI (real-time progress, markdown viewer)
SurrealDB (graph data, knowledge artifacts, job state)
All three components are Dockerized, runs with docker compose up

The worker handles queuing, retries, backoff, and cancellation — so if your local model is slow or crashes mid-generation, the system recovers gracefully instead of losing the work.

Getting started

git clone https://github.com/sourcebridge-ai/sourcebridge.git
cd sourcebridge
# Edit config.toml — point llm.provider at your Ollama/llama.cpp instance
docker compose up

Your code never leaves your machine. The LLM inference stays local. There's opt-out anonymous telemetry (install count only, disable with DO_NOT_TRACK=1).

The developer is looking for feedback from people running local models on what works and what doesn't, especially regarding which models produce the best comprehension output, whether MoE models are worth the RAM tradeoff vs dense models, and any issues with specific backends.

📖 Read the full source: r/LocalLLaMA