aws-glue

A follow-up to our February 13 comparison of llm-tldr and voitta-rag.

Part I: The Search Toggle — Context Management for the Multi-Project Developer

One of the quieter problems with RAG-assisted development is context pollution. You index everything — your client project, your internal tools, that side experiment from last month — and then your AI assistant cheerfully retrieves code snippets from all of them, muddying every answer.

voitta-rag now has a clean answer to this: a per-folder search toggle in the file browser.

voitta-rag search toggle

Each indexed folder has a Search checkbox. Green means its content shows up in search results (and thus in MCP responses to Claude Code or any other connected assistant). Grey means the folder stays indexed — nothing is deleted or re-processed — but it’s invisible to search. Toggle it back on, and it’s instantly available again.

Why this matters

If you consult for multiple clients, or are just working on multiple not very related projects, your voitta-rag instance might hold:

Project A’s monorepo, Jira board, and Confluence space
Project B’s microservices and SharePoint docs
An internal project — say, a lead generation pipeline
A few open-source repos you reference occasionally

Without scoping, a search for “authentication flow” returns results from all of them. Your AI assistant synthesizes an answer that blends Project A’s OAuth implementation with Project B’s API key scheme and a random auth.py from your internal tool. Not wrong, exactly, but not useful either.

With the search toggle, you flip Project B and the internal project off when you’re heads-down on Project A. Searches — including MCP tool calls from Claude Code — only return Project A’s content. When you context-switch, you flip the toggles. It takes one click per folder.

Projects: grouping toggle states

If toggling folders one by one sounds tedious for a large index, voitta-rag also supports projects — named groups of toggle states. Create a “Project A” project and a “Project B” project, each with its own set of active folders. Switching projects flips all the toggles at once.

The active project persists across sessions and is respected by the MCP server, so your AI assistant automatically searches the right scope when you resume work.

Per-user scoping

The toggle is per-user. On a shared instance, each developer can have their own search scope without stepping on each other. Your teammate can be searching across everything while you’ve scoped down to one client — same voitta-rag deployment, different views.

The takeaway

This is a small feature with disproportionate impact. The whole point of a RAG knowledge base is to give your AI assistant relevant context. If you can’t control what “relevant” means, you’re outsourcing that judgment to vector similarity scores — which don’t know that Project A and Project B are different engagements. The search toggle puts that judgment back in your hands.

Part II: What Else Shipped — Glue Data Catalog, UI Polish, and More

Since our last deep-dive, voitta-rag has been on a steady clip of new features. Here’s what landed in the latest batch.

AWS Glue Data Catalog as a Data Source

This is the headline addition. voitta-rag can now sync schema metadata from AWS Glue Data Catalog — databases, tables, columns, partition keys — and index it for RAG search.

The connector (PR #11) renders Glue metadata as markdown: each database becomes a document with a summary table and a per-table breakdown of columns, types, and partition keys. This gets chunked and embedded like any other content.

Why would you want your data catalog in a RAG knowledge base? Because schema questions are exactly the kind of thing developers ask AI assistants all the time:

“Which table has the customer email field?”
“What are the partition keys on the events table?”
“Show me all tables in the analytics database”

Without Glue indexing, the assistant either hallucinates a schema or asks you to go look it up. With it, the answer comes back from your actual catalog metadata — correct, current, and grounded.

The UI offers a region dropdown, an auth method toggle (AWS profile or access keys), and optional catalog ID and database filters. You can index everything or cherry-pick specific databases.

SharePoint Global Sync and Timestamp Visibility

The SharePoint connector got a global sync implementation — configure once, index everything in the site. Additionally, source timestamps are now exposed in MCP search results, so an AI assistant can see when a document was created or last modified, not just its content. This matters for questions like “what changed recently?” or “is this documentation current?”

Multi-Select Dropdowns for Jira and Confluence

Previously, you typed Jira project keys and Confluence space names into a text field — error-prone and tedious if you have dozens. Now there are multi-select dropdown widgets (PR #10) that fetch available projects and spaces from your instance and let you pick. Select “ALL” to dynamically sync everything, including projects or spaces created in the future.

A small but satisfying fix: JQL project keys are now quoted to handle reserved words like IS that would otherwise break queries. The kind of bug you only hit when a real user has a project named something unfortunate.

File Manager UI Overhaul

The file browser got a visual refresh: independent scroll within the file list (headers and sidebar stay fixed), full-width layout, a file count status bar, styled scrollbars, and file extensions preserved when names are truncated. Mostly quality-of-life, but it makes a noticeable difference when you’re browsing a large index.

MCP Improvements

The get_file tool now includes guidance to prefer get_chunk_range for large files — a pragmatic touch. When an AI assistant tries to fetch a 10,000-line file, it’s better to get a targeted range of chunks than to blow up the context window.

SharePoint ACL Sync — Permission-Aware Search

This is the most architecturally significant addition in this batch. voitta-rag now syncs SharePoint Online permissions (ACLs) alongside document content, so search results respect who’s allowed to see what.

SharePoint’s permission model is deceptively complex: permissions flow down from site → library → folder → file through an inheritance chain, but any object in the chain can break inheritance (e.g., when someone shares a file with a colleague who doesn’t have parent-level access). Effective permissions for a given file might come from the file itself, a parent folder three levels up, or the site root.

The new ACL sync walks this hierarchy via the Microsoft Graph API, resolves effective permissions per file, and stores them in the vector index alongside the document chunks. At search time, results are filtered by the requesting user’s identity — you only see content you’d be allowed to see in SharePoint itself.

The implementation includes an acl-probe diagnostic endpoint that lets you inspect permissions on a sample of files without triggering a full sync — useful for debugging “why can’t user X see document Y?” scenarios.

An 800-line research document covers the SharePoint permission model, Graph API capabilities and limitations, and design decisions. Worth reading if you’re building anything that needs to reason about SharePoint access control.

Microsoft OAuth Login

voitta-rag now supports Microsoft OAuth as a login provider, alongside the existing authentication methods. For organizations already on Microsoft 365, this means users can sign in with their work accounts — and those identities can be matched against SharePoint ACLs for permission-aware search. A .env.sample file documents all the configuration options.

Landing Page Rebrand

A small but notable change: the landing page now reads “Voitta RAG” instead of the previous branding. The project has a clear identity now.

Wrapping Up

The search toggle and project system solve a real workflow problem — context management when you’re juggling multiple codebases. The Glue Data Catalog connector extends voitta-rag’s reach beyond code and documents into infrastructure metadata. The SharePoint ACL sync adds enterprise-grade access control to RAG search — which matters a lot once you’re indexing sensitive documents across an organization. And the UI, connector, and auth improvements continue to sand down the rough edges.

All of it still runs on your infrastructure. Nothing phones home. If you’re building with MCP-connected AI assistants and want a self-hosted knowledge layer, voitta-rag is worth a look.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

DEBEDb

Fear, loathing, uncertainty, doubt, laziness, impatience, Oxford commas, and hubris

voitta-rag: Scoping Your AI’s Knowledge, and a few new features