One Search Surface: Teaching voitta-rag to Speak Architecture

Back in February, I wrote that llm-tldr and voitta-rag were complementary. One builds a map of a codebase through static analysis. The other retrieves the actual code you need. My conclusion then was basically: great, wire both into the agent and let it choose.

That works, but it still leaves the agent doing tool-routing. It has to know that one question wants architecture and another wants source. It has to bounce between surfaces. So we collapsed the distinction.

voitta-rag can now index llm-tldr‘s static-analysis output as companion documents alongside the raw code chunks it already stores for Git sources. Turn on the new gh_llm_tldr flag for a repo, sync it, and the same search surface now returns two different kinds of context:

  • raw code chunks for the implementation itself, and
  • structural analysis chunks describing callers, callees, imports, signatures, and relationships.

One query. One index. No “which tool should I call?” moment.

The old split was clean, but inconvenient

The original split between the two tools made conceptual sense.

llm-tldr is good at questions like:

  • What calls this function?
  • What depends on this module?
  • Where does this piece of data flow?
  • What parts of the codebase are structurally central?

voitta-rag is good at questions like:

  • Show me the implementation of token verification.
  • Find the code that handles OAuth callbacks.
  • Search across this repo, that wiki, and those tickets.
  • Give me the actual file I need to edit.

That’s a nice division of labor for a human. It is less nice for an agent, because agents do not merely need information; they need the right shape of information without extra orchestration. The more routing logic you make them do, the more failure modes you introduce.

The latest voitta-rag implementation removes that choice entirely. Static analysis stops being a separate destination and becomes part of retrieval.

What actually shipped

When a Git source has gh_llm_tldr enabled, sync now runs llm-tldr over each supported source file and stores the results in the same Qdrant collection as the ordinary code chunks.

Those analysis chunks are tagged as source_type="llm-tldr-analysis" and linked back to their origin file with related_file. That sounds like plumbing, and it is, but it matters: the search layer now knows that an analysis chunk about verify_token() belongs to a specific source file rather than floating around as a free-standing summary.

The first proof of concept indexed file-level summaries. The more interesting version goes further: it now stores one overview chunk per file plus one chunk per top-level function and class method. Each function-level chunk can carry structured payload fields such as:

  • function name
  • class name
  • callers
  • callees
  • caller count
  • callee count
  • imports

That means this is not just “RAG, but with bigger summaries.” The call graph is queryable metadata now. You can filter for things like “functions with more than five callers” or “functions importing module X” without standing up a separate graph database just to answer what are, in practice, glorified indexing questions.

GitNexus

GitNexus is interesting, but it is licensed under PolyForm Noncommercial. That’s a non-starter for a lot of consulting and commercial work. By contrast, both llm-tldr and voitta-rag are AGPL v3.

Why function-level chunks beat file-level blobs

The biggest design improvement was moving from file-level rendered analysis to function-level structural chunks.

On voitta-rag indexing itself, that produced 647 stored analysis chunks: 70 file-overview chunks and 577 function chunks. That sounds like more pieces, but it is actually a better unit of retrieval. Agents rarely need a whole philosophical treatise about a file. They need to know that foo() is called from three handlers, imports sqlalchemy.orm, and sits on the hot path for authentication. Function-level chunks make that retrievable directly.

It is also a cheaper way to approximate code intelligence than hauling in a dedicated graph stack. You keep the retrieval surface the agent already understands, but enrich the payload enough to answer the structural questions that retrieval alone cannot.

Related reading: llm-tldr vs voitta-rag: Two Ways to Feed a Codebase to an LLM

llm-tldr vs voitta-rag: Two Ways to Feed a Codebase to an LLM

Every LLM-assisted coding tool faces the same fundamental tension: codebases are too large to fit in a context window. Two recent tools attack this from opposite directions, and understanding the difference clarifies something important about how we’ll work with code-aware AI going forward.

The Shared Problem

llm-tldr is a compression tool. It parses source code through five layers of static analysis — AST, call graph, control flow, data flow, and program dependence — and produces structural summaries that are 90–99% smaller than raw source. The LLM receives a map of the codebase rather than the code itself.

voitta-rag is a retrieval tool. It indexes codebases into searchable chunks and serves actual source code on demand via hybrid semantic + keyword search. The LLM receives real code, but only the relevant fragments.

Compression vs. retrieval. A map vs. the territory.

At a Glance

llm-tldr voitta-rag
Approach Static analysis → structural summaries Hybrid search → actual code chunks
Foundation Tree-sitter parsers (17 languages) Server-side indexing (language-agnostic)
Interface CLI + MCP server MCP server
Compute Local (embeddings, tree-sitter) Server-side

What Each Does Better

llm-tldr wins when you need to understand how code fits together:

  • Call graphs and dependency tracing across files
  • “What affects line 42?” via program slicing and data flow
  • Dead code detection and architectural layer inference
  • Semantic search by behavior — “validate JWT tokens” finds verify_access_token()

voitta-rag wins when you need the actual code:

  • Retrieving exact implementations for review or modification
  • Searching across many repositories indexed server-side
  • Tunable search precision (pure keyword ↔ pure semantic via sparse_weight)
  • Progressive context loading via chunk ranges — start narrow, expand as needed

The Interesting Part

These tools don’t compete — they occupy different layers of the same workflow. Use llm-tldr to figure out where to look and why, then voitta-rag to pull the code you need. Static analysis for navigation, RAG for retrieval.

This mirrors how experienced developers actually work: first you build a mental model of the architecture (“what calls what, where does data flow”), then you dive into specific files. One tool builds the mental model; the other hands you the files.

The fact that both expose themselves as MCP servers makes combining them straightforward — plug both into your editor or agent and let the LLM decide which to call based on the question.

References

Fxforce5: code rewriter for DI with Uber FX

I am working on a project (closed source, sorry) that has suffered from proliferation of parameters to constructors, and for this (and other) reasons looks like it would benefit from a DI approach, such as Uber FX.

The code base is pretty large at this point, and manually adapting it is slow, tedious and frustrating. But laziness, impatience and hubris says: why not use features like static analysis using DST and reflection to automate this process?

So here is an attempt at doing that: https://github.com/debedb/fxforce5.

P.S. For a related approach used to automatically generate Swagger docs, see Swagger as you Go.