Recursive self-improvement, you said?

Spawning a fleet of coding agents is a solved problem. You write a for loop, you call the Agent tool N times, you go get coffee. The unsolved problem is everything wrapped around the spawn: deciding what can actually run in parallel, stopping the agent that wrote the code from also grading (or eating) its own homework, and — the part nobody ships — recording how the run went so the next one isn’t the same run with the same mistakes.

I wish I didn’t remember this anymore but this used to be called a “retrospective” in that sect I was once a member of.

I’ve been dogfooding a small orchestration skill (agent-team-orchestration, open in voitta-ai/skillz) that treats those as the actual work. Three runs in. This is the first write-up, warts very much included — the warts are the only part with information in them.

The shape, and the one non-negotiable rule

Start with a conversation, not a spawn. Before any developer agent exists, an architect reads the open issues (gh issue list, then actually gh issue view each one) and the repo, and produces the one deliverable that’s genuinely hard: the parallel set. Independent work (different modules, no shared schema, PRs that won’t collide on merge) fans out; everything else serializes (shared files, a migration that has to land first, B’s acceptance depends on A). Get that wrong and you don’t get parallelism, you get merge conflicts with extra steps.

Then each issue in the wave gets a squad, roles deliberately split so no agent both writes and blesses the same diff:

  • developer — its own git worktree, opens the PR;
  • adversarial reviewer — a different agent, briefed to break the diff, not rubber-stamp it;
  • SDET — drives the change like a user;
  • productivity engineer — a meta-role that watches the process: every stall, every human approval, every bit of rework, written down.

The dev/reviewer split is load-bearing. The instant the context that wrote the code also reviews it, the review is theater.

And the telemetry is free, which is the best price. Every Claude Code session is a complete JSONL transcript at ~/.claude/projects/<slug>/<uuid>.jsonl — every tool call, every AskUserQuestion, every answer you gave. (We’ll gate the privacy policy to not log every breath you take).

TFW that retrospective is not a wishful thinking, it’s actionable.

Three runs, in ascending order of interesting

Run 1 — shipped clean, screwed up in a way I didn’t catch until I read the log. Two bug fixes on a production Next.js + Prisma app (two-branch staging/prod). Both merged, deployed, SDET-verified green. Then I read the transcript: the two bugs already had open PRs from a prior run. The architect never looked. We’d built and squash-merged duplicates, closed the issues, and orphaned two perfectly good PRs.

That’s not an agent being dumb. It’s a hole in the recipe. “Choose the parallel set” reasoned about file overlap and ordering and never asked the first question a human lead asks — is anyone already on this? — which is one gh pr list away. Second tell, same run: asked “where’s the evidence the reviewer approved these?”, the answer was nowhere. The verdicts lived in the agents’ context and never touched the PR. An approval that leaves no durable artifact didn’t happen. (Worse, squash-merge later buried even the merge-commit note, but I’m getting ahead of myself.)

Run 2 — the loop closed, and I have receipts. New work — a homepage redesign across seven sub-issues — same skill. At startup the agent did something I didn’t tell it to: it ran gh issue view 122 on the prior run’s recorded retro and read the engagement log. Then it did exactly the things Run 1 botched. It pre-flighted existing PRs. Every merge carried an adversarial verdict with specifics; the reviewer caught a dead query param (?q= where the target route reads ?search=) and sent it back with REQUEST_CHANGES.

Then it got interesting. A staging route started returning 500. The team traced it to schema drift, and went to fix the deploy pipeline by adding prisma db push. The safe version (no --accept-data-loss) did the right thing and aborted:

⚠️ There might be data loss when applying the changes:
• drop column `negotiableTerms` on `Property` (1 non-null value)
Error: Use the --accept-data-loss flag to ignore the data loss warnings

It refused to drop a column with live data, surfaced it for a human call, took a one-time --accept-data-loss against staging only, reconciled, and reverted — production never saw the flag. The redesign isn’t the headline. The headline is that the run improved because it had read how the last run went. Best current read: that’s the flywheel, showing up unprompted.

Run 3 — we pointed it at itself, which is geekily elegant, and scientifically noble I scraped every point across Runs 1–2 where an agent stopped to ask a human to approve something — fifteen gates — dumped them into one issue, and ran the skill on that issue. The architect grouped the fifteen by type, correctly separated the gates worth keeping (destructive DB ops — yes, always ask) from the avoidable friction (re-asking a runtime question it already answered two turns ago), and — the good part — ran two of the fixes on its own execution before they were written into the skill. It pre-flighted with gh pr list and caught two pre-existing issues that overlapped the work, exactly the Run-1 bug, fixed live by the thing being fixed.

What’s actually carrying the weight

  • The parallel-set call is real architecture. Run 3 ran two repos in parallel but serialized five edits that all touched one SKILL.md into a single PR — instead of four agents racing to conflict on the same file.
  • Build/attack/verify pays rent. The reviewer caught a bug the developer was happy with. Once is enough to justify the second agent.
  • Worktree-per-issue keeps the squads from knifing each other.
  • The flight recorder is the product. Every stall is a candidate fix — a default, a permission, a pre-flight, a sharper brief.

Where it falls down (best current read)

  • The headline feature has never once fired. The skill leads with “every agent is a watchable terminal tab you can steer mid-run.” That needs the root session launched through the cmux claude-teams wrapper, which prepends a tmux shim to PATH (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 alone is a red herring — diagnose with which tmux + echo $TMUX). Three runs, three fallbacks to background agents, because the session wasn’t started that one specific way. A feature nobody reaches isn’t a feature, it’s a positioning bug.
  • The same two process bugs recur every run until baked in: a setup question asked at spawn time instead of as a step-0 precondition, and re-asking a decision already made. Prose doesn’t self-correct — the executor re-litigates your opinions until you encode them as defaults.
  • N=3 and confounded. Run 2’s wins rode on memory carried from Run 1, so I can’t yet split skill-value from memory-value. The compounding loop is a strong signal, not a proof. The honest next experiment is one run on a clean, never-seen repo, launched under cmux, with no carried memory, measured by a typed telemetry schema — which doesn’t exist yet, so I’m building that before I build anything else.

The actual thesis

Spawning is commodity; the moat is the operating doctrine plus the telemetry loop — the thing that makes human-interventions-per-issue trend down run over run. Build the instrument first, defer the spaceship. YAGNI applies to strategy, too.

Skill’s open in voitta-ai/skillz. Run it on your backlog and tell me where it stalls. The stalls are the entire point.

cmux setup

My terminal of choice is now cmux. Here are my setup notes – how I run agent teams, and how I get my Claude/Codex sessions to come back after a reboot.

Contents

Agent Teams

My model: a workspace per team, either Claude Teams or OMX.

Here’s my cmux config (~/.config/cmux/cmux.json):

{
  "$schema": "https://raw.githubusercontent.com/manaflow-ai/cmux/main/web/data/cmux.schema.json",
  "schemaVersion": 1,

  "terminal": {
    "autoResumeAgentSessions": true
  },

  "actions": {
    "agents.openOMX": {
      "type": "workspaceCommand",
      "title": "Open OMX",
      "subtitle": "Start Oh My Codex in its own workspace",
      "commandName": "OMX"
    },

    "agents.openClaudeTeams": {
      "type": "workspaceCommand",
      "title": "Open Claude Teams",
      "subtitle": "Start Claude Teams in its own workspace",
      "commandName": "Claude Teams"
    }
  },

  "commands": [
    {
      "name": "OMX",
      "description": "Start interactive Oh My Codex in its own cmux workspace",
      "keywords": ["omx", "codex", "agents"],
      "restart": "ignore",
      "workspace": {
        "name": "OMX",
        "cwd": ".",
        "layout": {
          "pane": {
            "surfaces": [
              {
                "type": "terminal",
                "name": "OMX",
                "command": "bash -lc 'export PATH=\"/opt/homebrew/bin:/usr/local/bin:$PATH\"; exec cmux omx'",
                "focus": true
              }
            ]
          }
        }
      }
    },

    {
      "name": "Claude Teams",
      "description": "Start Claude Code Teams in its own cmux workspace",
      "keywords": ["claude", "teams", "agents"],
      "restart": "ignore",
      "workspace": {
        "name": "Claude Teams",
        "cwd": ".",
        "layout": {
          "pane": {
            "surfaces": [
              {
                "type": "terminal",
                "name": "Claude Teams",
                "command": "bash -lc 'export PATH=\"/opt/homebrew/bin:/usr/local/bin:$PATH\"; exec cmux claude-teams'",
                "focus": true
              }
            ]
          }
        }
      }
    }
  ]
}

After saving, run (right inside cmux):

cmux reload-config

Then open the Command Palette with Cmd+Shift+P and you can launch OMX or Claude Teams.

Surviving a reboot

The goal: after a restart, the agent panes come back on their existing conversations, not as fresh sessions.

This is the setup that currently does that for me (cmux 0.64.15, macOS 15, Apple Silicon):

  1. autoResumeAgentSessions – already true in the config above. It tells cmux to re-run each pane’s saved resume command when cmux reopens. (It is not a boot daemon; it only acts once cmux is running again.)
  2. Run cmux hooks setup once.
  3. Add cmux as a Login Item so it relaunches at login: System Settings > General > Login Items & Extensions > Open at Login > + > /Applications/cmux.app.
  4. Keep launches argument-free. cmux only restores on a no-argument launch (a Login Item / Spotlight / Dock launch all qualify). Don’t set CMUX_DISABLE_SESSION_RESTORE=1 – and watch out if you sync env vars into launchctl.
  5. Be on cmux 0.64.15 or newer – that’s the version where reboot resume started working for me.

You do not need macOS’s “Reopen windows when logging back in” – mine was off (the box was unchecked) during the reboot where everything resumed.

With all of that, after a reboot my agent panes came back resumed – 15 of 15 in my last test, each on its real prior conversation.

An honest caveat. I have not fully isolated which single piece is load-bearing, and there is an oddity: cmux records a per-pane wasAgentRunning flag that was false for every pane in the snapshot, yet the sessions still resumed. Best current read: the resume comes from cmux’s own restore in 0.64.15 plus the Login Item relaunch – not from any macOS window-reopen feature (that was off). I have since confirmed this on a later reboot: with that option off, no ordinary app restored its windows – only my auto-launch items (cmux/Slack/Texty) came back – yet 14 of 14 agent panes resumed. Full trail in the appendix.

Fallback: reopen everything yourself

A forced/hard reboot skips macOS state restoration, and you might leave the box unchecked. For those cases the conversation ids are still saved in cmux’s session snapshot, so you can re-open them yourself. This script reads the snapshot live (no hard-coded ids, so it is safe to share) and opens each saved agent in its own workspace:

#!/usr/bin/env bash
# resume-cmux-agents.sh [list|all|N]
set -u
CMUX="${CMUX_BUNDLED_CLI_PATH:-/Applications/cmux.app/Contents/Resources/bin/cmux}"
mapfile_cmds() {
  python3 - <<'PY'
import json, os
d=json.load(open(os.path.expanduser('~/Library/Application Support/cmux/session-com.cmuxterm.app.json')))
for w in d.get('windows',[]):
  for ws in w.get('tabManager',{}).get('workspaces',[]):
    for pn in ws.get('panels',[]):
      rb=(pn.get('terminal') or {}).get('resumeBinding')
      if rb and rb.get('command'):
        print((pn.get('customTitle') or 'agent') + '\t' + (rb.get('cwd') or '.') + '\t' + rb['command'])
PY
}
case "${1:-list}" in
  list) mapfile_cmds | cut -f1 | nl ;;
  all)
    win=""; [ -z "${CMUX_WORKSPACE_ID:-}" ] && win="--window $("$CMUX" current-window 2>/dev/null | head -1)"
    mapfile_cmds | while IFS=$'\t' read -r name cwd cmd; do
      "$CMUX" new-workspace --name "$name" --cwd "$cwd" --command "$cmd" $win --focus true >/dev/null 2>&1 \
        || echo "failed: $name" >&2
    done ;;
  *) mapfile_cmds | sed -n "${1}p" | cut -f3 | bash ;;
esac

To make it launchable from Spotlight, wrap it in a tiny .app bundle (Spotlight indexes .apps, not bare .sh files):

APP="$HOME/Applications/Resume cmux agents.app"
mkdir -p "$APP/Contents/MacOS"
cat > "$APP/Contents/Info.plist" <<'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"><dict>
  <key>CFBundleName</key><string>Resume cmux agents</string>
  <key>CFBundleIdentifier</key><string>local.resume-cmux-agents</string>
  <key>CFBundleExecutable</key><string>resume</string>
  <key>CFBundlePackageType</key><string>APPL</string>
  <key>CFBundleVersion</key><string>1.0</string>
  <key>LSUIElement</key><true/>
</dict></plist>
PLIST
cat > "$APP/Contents/MacOS/resume" <<'SH'
#!/bin/bash
exec /bin/bash "$HOME/.config/cmux/resume-cmux-agents.sh" all
SH
chmod +x "$APP/Contents/MacOS/resume"
/System/Library/Frameworks/CoreServices.framework/Frameworks/LaunchServices.framework/Support/lsregister -f "$APP"
mdimport "$APP"

Locally-built bundles carry no quarantine, so they launch without Gatekeeper friction. Type “Resume cmux agents” in Spotlight to fire it.

If the in-app updater refuses to run

If you update cmux via its built-in (Sparkle) updater and hit SUSparkleErrorDomain 4005 “remote port connection was invalidated” with an underlying “Failed to create installation cache directory”, the usual cause is a stale com.apple.quarantine attribute on the app bundle (it was downloaded with a browser). The cache dir and code signing are red herrings. Fix:

xattr -dr com.apple.quarantine /Applications/cmux.app
rm -rf "$HOME/Library/Caches/com.cmuxterm.app/org.sparkle-project.Sparkle/PersistentDownloads/"* \
       "$HOME/Library/Caches/com.cmuxterm.app/org.sparkle-project.Sparkle/Installation/"*

Then quit cmux (Cmd-Q) and relaunch it from /Applications – then Check for Updates. The relaunch is the part people miss: if the running cmux was launched while still quarantined, stripping the attribute and clearing caches is not enough on its own – the installer keeps failing 4005 on same-process Retry until the un-quarantined bundle is relaunched. (A graceful Cmd-Q then reopen resumes your panes; only the reboot path needs the update.)

If a pane fails with “No such file or directory”

cmux saves each pane’s resume command with the absolute path to the agent binary as it was when the pane was created (e.g. ~/.nvm/versions/node/v24.2.0/bin/claude). If that binary later moves or is removed, resume execs a dead path and the pane shows No such file or directory – even though claude still works on your PATH (cmux injects a CLI shim).

The common trigger right now: Claude Code migrating from the npm/nvm global install to the native installer. The new claude lives at ~/.local/bin/claude (-> ~/.local/share/claude/versions/<v>) and the old nvm copy is deleted, so every pre-migration binding is stale. A node version upgrade/removal does the same.

Fix – reinstall so the agent is back on PATH, then relaunch each affected pane’s agent once (cmux re-records the binding with the current path; until then a stale pane fails again on the next reboot):

curl -fsSL https://claude.ai/install.sh | bash    # Claude Code (native installer)

Find the stale ones:

python3 - <<'PY'
import json, os, re
d=json.load(open(os.path.expanduser('~/Library/Application Support/cmux/session-com.cmuxterm.app.json')))
for w in d.get('windows',[]):
  for ws in w.get('tabManager',{}).get('workspaces',[]):
    for pn in ws.get('panels',[]):
      rb=(pn.get('terminal') or {}).get('resumeBinding')
      if rb:
        m=re.search(r'(/\S+/(?:claude|codex))', rb.get('command') or '')
        if m and not os.path.exists(m.group(1)):
          print('STALE:', pn.get('customTitle'), '->', m.group(1))
PY

Fixed upstream in manaflow-ai/cmux#6582 (it canonicalizes PATH-managed absolute claude/codex paths back to the bare name at restore, repairing existing stale snapshots) – merged to main, ships in the first cmux release after 0.64.16. On a build with that fix the stale bindings self-repair and the steps below are unnecessary.

P.S.

remote-control is your friend!

Appendix: how I arrived at the reboot setup

This is the investigation trail behind the Surviving a reboot section – kept separate because it is the “why”, not the “do this”.

Symptom. With autoResumeAgentSessions: true, agents came back fine after a normal quit-and-reopen, but after a macOS reboot the panes were fresh – new sessions, lost conversations.

Two distinct problems.

  1. cmux was not relaunching at all. The setting only runs when cmux reopens; it is not a boot daemon. Adding cmux as a Login Item fixed the relaunch. A trap while diagnosing this: don’t compare a process’s start time to kern.boottime – the machine can sit at the login screen for hours, and Login Items fire at login, not at kernel boot. Compare to the login time instead:

    for p in loginwindow Finder Dock; do pid=$(pgrep -x "$p"|head -1); \
      [ -n "$pid" ] && echo "$p: $(ps -o lstart= -p $pid)"; done
    ps -o lstart= -p "$(pgrep -x cmux | head -1)"
  2. Even once it relaunched, the cold-start restore came back fresh. cmux’s own session restore is gated by a per-pane wasAgentRunning flag. An interactive agent sitting idle at its prompt is recorded as wasAgentRunning=false, and the restore then skips auto-resume and starts it fresh. (Upstream: manaflow-ai/cmux#4269 added that gate to avoid resuming agents you had explicitly exited; it also catches idle-but-alive agents. I reported the reboot impact in #5802.) The fingerprint of a “came back fresh” boot – every pane wasAgentRunning=false, no process on --resume:

    ps -axo command | grep -E '/(claude|codex)' | grep -v grep \
      | grep -oE -- '--resume [0-9a-f-]+|--session-id [0-9a-f-]+' | sort | uniq -c
    
    python3 - <<'PY'
    import json, os
    d=json.load(open(os.path.expanduser('~/Library/Application Support/cmux/session-com.cmuxterm.app.json')))
    tot=run=0
    for w in d.get('windows',[]):
      for ws in w.get('tabManager',{}).get('workspaces',[]):
        for pn in ws.get('panels',[]):
          t=pn.get('terminal') or {}
          if t.get('resumeBinding'):
            tot+=1; run+= 1 if t.get('wasAgentRunning') else 0
    print(f"agent panes={tot} wasAgentRunning_true={run}")
    PY

    On cmux 0.64.10 I saw this on both a graceful reboot (0 of 12 resumed) and a forced reboot (0 of 15) – wasAgentRunning was 0 every time.

What changed. I updated cmux 0.64.10 -> 0.64.15 and kept the Login Item (and, as it turns out, “Reopen windows when logging back in” was off). After the next reboot, all 15 panes resumed:

python3 - <<'PY'
import json, os, subprocess, re
d=json.load(open(os.path.expanduser('~/Library/Application Support/cmux/session-com.cmuxterm.app.json')))
saved=set()
for w in d.get('windows',[]):
  for ws in w.get('tabManager',{}).get('workspaces',[]):
    for pn in ws.get('panels',[]):
      rb=(pn.get('terminal') or {}).get('resumeBinding')
      if rb:
        m=re.search(r'(?:--resume|resume)\s+([0-9a-f-]{36})', rb.get('command') or '')
        if m: saved.add(m.group(1))
out=subprocess.run("ps -axo command", shell=True, capture_output=True, text=True).stdout
running=set(re.findall(r'--session-id ([0-9a-f-]{36})', out)) | set(re.findall(r'(?:--resume|resume) ([0-9a-f-]{36})', out))
print(f"resumed {len(saved & running)} / {len(saved)}")
PY

reported resumed 15 / 15.

The open question. In that same snapshot wasAgentRunning was still 0 for every pane – so the resume did not come through the gate I expected; something else brought them back. I first guessed macOS “Reopen windows when logging back in” – but the next time I hit the restart dialog that box was unchecked, and I had never touched it (macOS remembers the last state), so it was almost certainly off during the successful reboot too. That effectively rules it out as the mechanism. Best current explanation: cmux 0.64.15’s own session restore, plus the Login Item relaunch, bring the agents back. I later ran that reboot: with “Reopen windows” off, only auto-launch items came back yet 14/14 panes resumed – so macOS window-reopen is not involved. On 0.64.15 the per-pane wasAgentRunning flag is null rather than false, which the #4269 gate treats as resumable – the likely reason the version bump fixed it. Upstream thread: #5802.

Upstream: https://github.com/manaflow-ai/cmux/issues/5802

voitta-bookmarklet: A Local AI Sidecar for Arbitrary Web Pages

voitta-bookmarklet is a lightweight browser-side entry point for a larger local AI tool runtime. The user enters the bookmark URL, clicks it on any HTTPS page, and gets a right-side chat pane injected into the current document. That pane is backed by a local FastAPI service running on https://127.0.0.1:12358, which serves both the frontend widget and the tool-using chat backend.

The interesting part is not the bookmarklet itself; it’s the architecture behind it. The frontend is built as a single bundled widget and mounted via Shadow DOM, which keeps the UI isolated from page styles. The backend exposes a multi-provider chat runtime supporting Anthropic, OpenAI, and Gemini, plus an in-memory tool bridge for orchestrating tool calls, session state, and provider-specific actions.

The repository is structured around extensibility. There is a clear separation between provider-agnostic tools, provider-specific integrations, browser/page-context tooling, and retrieval components. External data providers live under their own packages, with Google Drive implemented first via OAuth and read-only access. The project also includes RAG indexing for its own documentation, so the agent can use repo-specific reference material as part of its runtime behavior.

From an engineering perspective, this is a practical approach to embedding an assistant into real browser workflows without requiring a full browser extension as the primary product surface. It treats the browser page as the host environment, the local backend as the secure execution boundary, and the model as one component inside a broader tool system.

The implementation also surfaces the real constraints of this design: local TLS setup, CSP restrictions on script injection, OAuth plumbing, and the need to separate user-facing widget code from backend orchestration logic. Those details are exactly what turn a generic “AI chat overlay” into an actual usable system.

In short, voitta-bookmarklet is interesting because it is not just a UI experiment. It is a compact architecture for attaching model-driven, tool-using assistance to arbitrary web pages while keeping execution local and leaving room for more integrations over time.

Repo so you can give a star: https://github.com/voitta-ai/voitta-bookmarklet

Beyond voitta-rag: A Quick Tour of Voitta AI’s Other Public Projects

Most of our coverage of Voitta AI’s GitHub organization has focused on llm-tldr and voitta-rag. Fair enough: those are central projects, and easy to explain. But the org has turned into a broader workshop for agent tooling, developer workflows, and MCP-adjacent experiments.

So here is a quick tour of the other public repos worth a look — including two that we already mentioned elsewhere but are too useful not to repeat here.

Claude Code workflow tools

voitta-yolt

voitta-yolt is a Claude Code safety hook that statically analyzes commands before execution, auto-allows clearly read-only invocations, and flags mutating ones for review. The interesting bit is that it closes practical gaps in Claude Code’s built-in allowlist behavior, especially around compound shell commands and interpreter wrappers.

GitHub: voitta-ai/voitta-yolt. It would be nice to give a star.

omemepo

omemepoomnia mea mecum porto, “all that is mine, I carry with me” — is a portability and sharing layer for Claude Code. It can pack up your ~/.claude/ setup, move it to another machine, and act as a marketplace layer for plugins and shared Claude Code artifacts.

Right now the implemented surface includes pack, unpack, publish, and an mcp command with subcommands like list, export, import, enable, disable, profile, and prompts. That makes it feel less like a vague portability pitch and more like a concrete attempt to make Claude Code environments reproducible and shareable.

GitHub: voitta-ai/omemepo. It would be nice to give a star.

Tools for working with other software through MCP

voitta-freecad-mcp

voitta-freecad-mcp gives an LLM control over FreeCAD: create geometry, manipulate documents, inspect assemblies, and capture screenshots. The architecture is pragmatic: an MCP server talks to a bridge running inside FreeCAD so operations execute on the app’s main thread.

GitHub: voitta-ai/voitta-freecad-mcp. It would be nice to give a star.

fusion-360-mcp

fusion-360-mcp appears to be the same general idea for Autodesk Fusion 360: an MCP server paired with an in-app HTTP add-in, with documentation for geometry inspection, screenshots, measurements, and design-tree operations. If voitta-freecad-mcp is the open-source-CAD path, this looks like the commercial-CAD sibling.

GitHub: voitta-ai/fusion-360-mcp. It would be nice to give a star.

voitta-pptx

voitta-pptx is smaller but very practical: upload a PowerPoint file, render slides as PNGs through OnlyOffice, and hand the results back to the model. In other words, make decks visible to systems that reason better over images than over zipped XML internals.

GitHub: voitta-ai/voitta-pptx. It would be nice to give a star.

Glue for agent workflows

voitta-auth

voitta-auth is a macOS menu bar app that authenticates against Microsoft, Google, and Okta, then exposes a unified FastMCP proxy with credentials injected for downstream tools. That is not a flashy demo; it is infrastructure for making agent tooling actually usable in enterprise environments.

GitHub: voitta-ai/voitta-auth. It would be nice to give a star.

voitta-bookmarklet

voitta-bookmarklet injects a chat pane into arbitrary web pages via bookmarklet, backed by a local FastAPI service and pluggable model providers. It is a nice reminder that “agent interface” does not have to mean “yet another standalone app.” Sometimes the right UI is: put the assistant next to the page you are already looking at.

GitHub: voitta-ai/voitta-bookmarklet. It would be nice to give a star.

voitta-gannt

voitta-gannt is an interactive Gantt editor backed by Mermaid markdown, with both browser UI and MCP access. That is an oddly specific but smart pattern: keep the source of truth plain text, keep the interface visual, and let agents edit the same artifact humans do.

GitHub: voitta-ai/voitta-gannt. It would be nice to give a star.

Earlier platform pieces

voitta

voitta predates a lot of the current MCP craze and reads like the underlying orchestration layer: a Python framework for routing and automating LLM tool calls across APIs and handlers.

GitHub: voitta-ai/voitta. It would be nice to give a star.

voitta-example

voitta-example is, as the name suggests, a working example app using the library.

GitHub: voitta-ai/voitta-example. It would be nice to give a star.

mcp-voitta-gateway

mcp-voitta-gateway exposes the older Voitta framework through MCP. Together with voitta-example it shows a through-line: Voitta was thinking about tool routing before MCP became the default wrapper for the conversation.

GitHub: voitta-ai/mcp-voitta-gateway. It would be nice to give a star.

IDE and developer-environment experiments

mcp-server-plugin

mcp-server-plugin provides JetBrains-side MCP server plumbing.

GitHub: voitta-ai/mcp-server-plugin. It would be nice to give a star.

jetbrains-voitta

jetbrains-voitta extends that world with AST analysis and debugging tools. That is an important theme across the org: not just calling tools, but embedding them where developers already work.

GitHub: voitta-ai/jetbrains-voitta. It would be nice to give a star.

truffaldino

truffaldino is a configuration manager for AI-development setups — effectively dotfiles for MCP servers and prompts across Claude Code, Cursor, Cline, IntelliJ, and friends. Less glamorous than a model demo, but probably more useful over time.

GitHub: voitta-ai/truffaldino. It would be nice to give a star.

Odds and ends, but not random ones

claude-svg

claude-svg turns Claude Code into a diagram generator for architecture visuals, banners, and other polished SVG outputs. It is easy to dismiss as a side project until you remember how often engineering work needs presentable graphics fast.

GitHub: voitta-ai/claude-svg. It would be nice to give a star.

a2amcp

a2amcp is an example dispatcher agent built around Google’s A2A ideas. Small repo, but it points toward multi-agent routing rather than single-assistant tooling.

GitHub: voitta-ai/a2amcp. It would be nice to give a star.

shoelace

shoelace is the oddball in the org right now because it is really OpenClaw under an older or alternate banner. Still, it reflects the same interest in practical assistant infrastructure across devices and channels.

GitHub: voitta-ai/shoelace. It would be nice to give a star.

The pattern

The org looks less like one product with a few helpers and more like a workshop around agent ergonomics.

Some repos are about retrieval. Some are about auth. Some are about getting LLMs into CAD, IDEs, or decks. Some are about making workflows inspectable, configurable, portable, or just less annoying. Not every repo is equally mature, but taken together they show a consistent instinct: build the missing connective tissue between models and real work.

That, more than any one repository, is what seems interesting about Voitta AI.

voitta-yolt: The Missing Safety Layer for Claude Code

Voitta AI just released voitta-yolt, and it’s aimed at a very real problem: how do you let an agent move fast in the shell without giving it a blank check?

YOLO — You Only Live Once — is the vibe-coder’s operating principle: ship now, deal with consequences later.

YOLT — You Only Live Twice — is the correction.

No, it’s not a replacement for the auto mode; it’s a more fine-grained discerment: it gives Claude Code a second look before a Bash command (or the commands it invokes, which include actual code — e.g., Python, SQL) actually runs.

The problem it solves

Claude Code’s built-in permission system has an awkward gap.

Some commands are obviously safe, but still annoying to approve over and over. Others are wrapped in ways that make broad allowlisting dangerous.

Two cases matter most:

  • Arbitrary-execution wrappers. python3, bash, node, gh api, curl, kubectl, and friends are too powerful to wildcard-allow safely.
  • Compound shell commands. Loops, subshells, command substitutions, and bash -c '...' forms hide the actual inner commands from the simple outer matcher.

That means you either:

1. approve too much and weaken the safety model, or 2. approve everything manually and hate your life.

YOLT exists to get out of that false choice.

What YOLT actually does

YOLT installs as a Claude Code PreToolUse hook on the Bash tool.

When Claude is about to run a shell command, YOLT parses the invocation, walks the structure of the command, and classifies what it finds:

  • safe → auto-allow
  • unsafe → ask for review, with a reason
  • unknown → fall back to Claude Code’s default prompt

The interesting part is that it no longer treats the shell as a flat string.

The current release parses Bash with tree-sitter-bash, reconstructs argv from the AST, and then classifies each command node against rules in rules/shell.json. If the shell invocation contains inline Python, it delegates that body to a Python AST analyzer.

And it now covers a genuinely useful extra case: common SQL CLIs. sqlite3, psql, mysql, mariadb, and duckdb get their query text inspected so read-only commands like SELECT, SHOW, and .tables can pass quietly while mutating statements like INSERT, DELETE, DROP, .import, or .load get surfaced for review.

So this is not just “grep for scary words.” It’s structured analysis.

Why that matters

This is the real improvement over naive allowlists.

A normal matcher sees the wrapper:

  • bash -c "..."
  • for ...; do ...; done
  • $(...)
  • <(...)

YOLT walks inside those forms.

That means a loop full of read-only AWS inspection commands can be auto-approved, while a destructive operation buried inside a process substitution still gets surfaced for review.

That’s the right shape of safety tooling for agentic coding: less theater, more actual inspection.

The architectural shift

The sharpest detail in the release is that YOLT has already outgrown its first framing.

What began as a Python-script safety hook is now a more general shell-execution analyzer with language-specific followers.

The current structure is roughly:

  • hooks/grammar_classifier.py — Bash AST walker
  • hooks/rule_classifier.py — argv-level command classification
  • hooks/yolt_analyzer.py — Python AST analysis when Python appears inline

That’s a better architecture than a pile of string heuristics, and the repo history shows exactly why the rewrite happened: quote-state edge cases, heredocs, substitutions, continuations, and shell grammar weirdness are not bugs you “finish.” They are why parsers exist.

Using a real grammar here is the grown-up move.

Practical wins

A few details make this more than a neat demo:

  • It supports both plugin install and manual hook install.
  • It explicitly warns that broad static allow rules like Bash(python3:) or Bash(aws:) can bypass the hook entirely.
  • It can use the user’s existing permissions.allow patterns as a secondary upgrade pass for otherwise-unknown inner commands.
  • The new SQL CLI handling is exactly the sort of practical expansion I like: not theoretical safety, but fewer prompts for read-only database inspection without waving through destructive schema/data changes.
  • It now defaults logging to ~/.claude/yolt.log, which makes dogfooding and debugging much easier.

And most importantly, the dogfood loop appears real. One recent pass through transcript history reportedly cut the classifier’s unknown rate from 60.2% to 11.7% by fixing a handful of recurring gaps. That’s the number I care about most, because it shows the project is being tuned against actual usage rather than imagined usage.

Why I think this matters

The broader point is not “Claude Code needs more hooks.”

It’s that agent safety gets much better when you stop treating the shell as an indivisible permission blob.

What you really want is a front-line gate for command execution: let the obviously safe paths go through quietly, and save human interruption for the suspicious stuff. That won’t replace every approval surface in an agent stack, but it can take a huge bite out of routine approval fatigue.

There is a big difference between:

  • aws ec2 describe-instances
  • aws ec2 terminate-instances ...
  • for svc in $(aws ecs list-services ...); do aws ecs describe-services ...; done
  • bash -c 'curl ... | sh'

A permission system that collapses all of those into “it’s Bash” is too coarse to be pleasant and too coarse to be trustworthy.

YOLT narrows that gap.

And the cleaner operational pattern is to pair that with direct API usage wherever possible. If a service already gives you a token to create a draft, update a post, or mutate a record, that is usually a better path than driving a browser through the same workflow just to satisfy the UI.

The real thesis

What’s new here is not just another safety wrapper.

What’s new is the move from tool-level permissions to structure-aware command understanding.

That is where a lot of agent tooling is headed, because the old model breaks down as soon as agents start composing commands instead of issuing one-liners.

If you want agents to operate with less friction without quietly turning root access into a vibes-based exercise, this is the kind of infrastructure you need.

Try it

YOLT is open source under AGPL v3 and available here:

https://github.com/voitta-ai/voitta-yolt

Plugin install is straightforward:

/plugin marketplace add voitta-ai/voitta-yolt
/plugin install yolt@voitta-yolt

And if you already installed it manually, the repo documents how to migrate cleanly to the plugin model.

That part matters too. Safety tooling people won’t keep updated is safety tooling that quietly dies.


Related: earlier we wrote about llm-tldr vs voitta-rag. YOLT sits in a different layer of the stack, but it comes out of the same practical question: if you are going to work with agents seriously, where do you put the guardrails so they help instead of getting in the way?

New voitta-rag features

A follow-up to our earlier looks at voitta-rag vs llm-tldr, the February updates, and the search-scope release.

voitta-rag has kept moving since then. The recent work is less about flashy new connectors and more about something arguably more important: usability. Because — dogfooding is real.

Login got more practical

voitta-rag now supports Microsoft OAuth and Google token validation. That matters because a self-hosted knowledge layer gets much more useful once people can sign in with the accounts they already use for work, instead of maintaining a parallel identity system just for search.

In the Microsoft-heavy shops (yeah, ok, shut up) this also tightens the loop with SharePoint permissions: the same work identity can be used both for login and for permission-aware retrieval.

GDrive specific: URLs can now resolve back to indexed content

One of the more quietly useful additions is source URL resolution. If content came from Google Docs, Sheets, or Slides, voitta-rag can now store the source URL in chunk metadata and resolve that URL back to the indexed material through MCP.

That sounds small until you think about actual workflow. Someone drops a Google Docs link into chat, ticket comments, or an LLM prompt. Instead of treating the link as an opaque pointer and making the assistant start from scratch, voitta-rag can connect it to content it already knows.

This also works well with GDrive-based pointers that appear on your disk as *.gdoc, e.g.

Docker mode looks much more usable

Docker mode now auto-discovers mapped folders, distinguishes managed mounts from ordinary folders, etc. Local filesystem sources also got a real first-class flow instead of feeling bolted on.

This works real well if you can, for example, use GDrive app because your admin does not allow voitta-rag to read GDrive. It can read local GDrive (but see for resolving *.gdoc) and, well, it’s supported nicely.

Claude Code integration got real

There is now a Claude Code plugin setup flow, plus tooling to import Claude Code session history into voitta-rag memory. That is a meaningful step beyond “here is an MCP server” toward “here is a workflow.”

The interesting part is not just convenience. It hints at voitta-rag becoming a memory layer around actual agent work: not only your repos and docs, but also the history of what the assistant did, why, and in what context.

Bulk repo handling improved

Bulk repository import/export got better documentation and a round-trip workflow, and Git sync learned a practical trick: when token auth is in play, SSH repository URLs can be converted automatically to HTTPS.

That is exactly the kind of fix mature tools accumulate. It does not make for a dramatic screenshot, but it removes friction from the real environments where people actually deploy this stuff.

The direction is getting clearer

At first glance voitta-rag looks like “RAG for code and documents.” That is still true, but increasingly incomplete.

What is emerging is a self-hosted knowledge substrate for AI work: identity-aware, connector-rich, MCP-accessible, and increasingly conscious of workflow instead of just indexing. The recent changes are part polish, part plumbing, but together they make the system feel much closer to something a team could rely on every day.

Well… Almost… There’ll be more.

voitta-rag: Scoping Your AI’s Knowledge, and a few new features

A follow-up to our February 13 comparison of llm-tldr and voitta-rag.


Part I: The Search Toggle — Context Management for the Multi-Project Developer

One of the quieter problems with RAG-assisted development is context pollution. You index everything — your client project, your internal tools, that side experiment from last month — and then your AI assistant cheerfully retrieves code snippets from all of them, muddying every answer.

voitta-rag now has a clean answer to this: a per-folder search toggle in the file browser.

voitta-rag search toggle

Each indexed folder has a Search checkbox. Green means its content shows up in search results (and thus in MCP responses to Claude Code or any other connected assistant). Grey means the folder stays indexed — nothing is deleted or re-processed — but it’s invisible to search. Toggle it back on, and it’s instantly available again.

Why this matters

If you consult for multiple clients, or are just working on multiple not very related projects, your voitta-rag instance might hold:

  • Project A’s monorepo, Jira board, and Confluence space
  • Project B’s microservices and SharePoint docs
  • An internal project — say, a lead generation pipeline
  • A few open-source repos you reference occasionally

Without scoping, a search for “authentication flow” returns results from all of them. Your AI assistant synthesizes an answer that blends Project A’s OAuth implementation with Project B’s API key scheme and a random auth.py from your internal tool. Not wrong, exactly, but not useful either.

With the search toggle, you flip Project B and the internal project off when you’re heads-down on Project A. Searches — including MCP tool calls from Claude Code — only return Project A’s content. When you context-switch, you flip the toggles. It takes one click per folder.

Projects: grouping toggle states

If toggling folders one by one sounds tedious for a large index, voitta-rag also supports projects — named groups of toggle states. Create a “Project A” project and a “Project B” project, each with its own set of active folders. Switching projects flips all the toggles at once.

The active project persists across sessions and is respected by the MCP server, so your AI assistant automatically searches the right scope when you resume work.

Per-user scoping

The toggle is per-user. On a shared instance, each developer can have their own search scope without stepping on each other. Your teammate can be searching across everything while you’ve scoped down to one client — same voitta-rag deployment, different views.

The takeaway

This is a small feature with disproportionate impact. The whole point of a RAG knowledge base is to give your AI assistant relevant context. If you can’t control what “relevant” means, you’re outsourcing that judgment to vector similarity scores — which don’t know that Project A and Project B are different engagements. The search toggle puts that judgment back in your hands.


Part II: What Else Shipped — Glue Data Catalog, UI Polish, and More

Since our last deep-dive, voitta-rag has been on a steady clip of new features. Here’s what landed in the latest batch.

AWS Glue Data Catalog as a Data Source

This is the headline addition. voitta-rag can now sync schema metadata from AWS Glue Data Catalog — databases, tables, columns, partition keys — and index it for RAG search.

The connector (PR #11) renders Glue metadata as markdown: each database becomes a document with a summary table and a per-table breakdown of columns, types, and partition keys. This gets chunked and embedded like any other content.

Why would you want your data catalog in a RAG knowledge base? Because schema questions are exactly the kind of thing developers ask AI assistants all the time:

  • “Which table has the customer email field?”
  • “What are the partition keys on the events table?”
  • “Show me all tables in the analytics database”

Without Glue indexing, the assistant either hallucinates a schema or asks you to go look it up. With it, the answer comes back from your actual catalog metadata — correct, current, and grounded.

The UI offers a region dropdown, an auth method toggle (AWS profile or access keys), and optional catalog ID and database filters. You can index everything or cherry-pick specific databases.

SharePoint Global Sync and Timestamp Visibility

The SharePoint connector got a global sync implementation — configure once, index everything in the site. Additionally, source timestamps are now exposed in MCP search results, so an AI assistant can see when a document was created or last modified, not just its content. This matters for questions like “what changed recently?” or “is this documentation current?”

Multi-Select Dropdowns for Jira and Confluence

Previously, you typed Jira project keys and Confluence space names into a text field — error-prone and tedious if you have dozens. Now there are multi-select dropdown widgets (PR #10) that fetch available projects and spaces from your instance and let you pick. Select “ALL” to dynamically sync everything, including projects or spaces created in the future.

A small but satisfying fix: JQL project keys are now quoted to handle reserved words like IS that would otherwise break queries. The kind of bug you only hit when a real user has a project named something unfortunate.

File Manager UI Overhaul

The file browser got a visual refresh: independent scroll within the file list (headers and sidebar stay fixed), full-width layout, a file count status bar, styled scrollbars, and file extensions preserved when names are truncated. Mostly quality-of-life, but it makes a noticeable difference when you’re browsing a large index.

MCP Improvements

The get_file tool now includes guidance to prefer get_chunk_range for large files — a pragmatic touch. When an AI assistant tries to fetch a 10,000-line file, it’s better to get a targeted range of chunks than to blow up the context window.

SharePoint ACL Sync — Permission-Aware Search

This is the most architecturally significant addition in this batch. voitta-rag now syncs SharePoint Online permissions (ACLs) alongside document content, so search results respect who’s allowed to see what.

SharePoint’s permission model is deceptively complex: permissions flow down from site → library → folder → file through an inheritance chain, but any object in the chain can break inheritance (e.g., when someone shares a file with a colleague who doesn’t have parent-level access). Effective permissions for a given file might come from the file itself, a parent folder three levels up, or the site root.

The new ACL sync walks this hierarchy via the Microsoft Graph API, resolves effective permissions per file, and stores them in the vector index alongside the document chunks. At search time, results are filtered by the requesting user’s identity — you only see content you’d be allowed to see in SharePoint itself.

The implementation includes an acl-probe diagnostic endpoint that lets you inspect permissions on a sample of files without triggering a full sync — useful for debugging “why can’t user X see document Y?” scenarios.

An 800-line research document covers the SharePoint permission model, Graph API capabilities and limitations, and design decisions. Worth reading if you’re building anything that needs to reason about SharePoint access control.

Microsoft OAuth Login

voitta-rag now supports Microsoft OAuth as a login provider, alongside the existing authentication methods. For organizations already on Microsoft 365, this means users can sign in with their work accounts — and those identities can be matched against SharePoint ACLs for permission-aware search. A .env.sample file documents all the configuration options.

Landing Page Rebrand

A small but notable change: the landing page now reads “Voitta RAG” instead of the previous branding. The project has a clear identity now.


Wrapping Up

The search toggle and project system solve a real workflow problem — context management when you’re juggling multiple codebases. The Glue Data Catalog connector extends voitta-rag’s reach beyond code and documents into infrastructure metadata. The SharePoint ACL sync adds enterprise-grade access control to RAG search — which matters a lot once you’re indexing sensitive documents across an organization. And the UI, connector, and auth improvements continue to sand down the rough edges.

All of it still runs on your infrastructure. Nothing phones home. If you’re building with MCP-connected AI assistants and want a self-hosted knowledge layer, voitta-rag is worth a look.

voitta-rag Grows Up, voitta-yolt Is Born: February Updates from Voitta AI

A follow-up to our February 13 comparison of llm-tldr and voitta-rag.

Part I: voitta-rag — From Code Search to Knowledge Platform

When we last looked at voitta-rag, it was a solid hybrid search engine for codebases — index your repos, search via MCP, get actual code chunks back. Twelve days and 11 commits later, it’s become something broader: a self-hosted knowledge platform that indexes not just code but your entire work graph.

Here’s what landed since February 13.

Enterprise Connectors: Jira, Confluence, SharePoint

The biggest expansion is connector coverage. voitta-rag now syncs from Jira, Confluence, and SharePoint alongside the existing Git, Google Drive, Azure DevOps, and Box integrations.

Jira and Confluence support both Cloud (API token with Basic auth) and Server/Data Center (PAT with Bearer auth), selectable via dropdown in the UI — a detail that matters because plenty of enterprises still run on-prem Atlassian. Cloud uses the v3 search endpoint (v2 is deprecated), and Confluence Cloud correctly routes through /wiki/rest/api.

SharePoint got a full global sync implementation. And on the UI side, both Jira projects and Confluence spaces now use multi-select dropdown widgets — you can cherry-pick specific projects or select “ALL” to dynamically sync everything, including future additions. Practical touch: JQL project keys are now quoted to handle reserved words like IS that would otherwise break queries.

Time-Aware Search

Search results are no longer timeless. voitta-rag now tracks source timestampscreated_at and modified_at — propagated from every remote connector through a .voitta_timestamps.json sidecar file into the indexing pipeline and vector store.

This enables time range filtering on the MCP search tool via date_start/date_end parameters. “What changed in the last week?” is now a first-class query. For an AI assistant trying to understand recent activity across repos, Jira boards, and Confluence spaces simultaneously, this is a significant upgrade.

Anamnesis: Persistent Memory for AI Assistants

The most architecturally interesting addition. Anamnesis (Greek for “recollection”) gives AI assistants a persistent memory layer backed by voitta-rag’s vector store.

Six new MCP tools let an assistant create, retrieve, update, delete, like, and dislike memories. The like/dislike mechanism adjusts relevance scoring — memories the assistant finds useful surface more readily over time, while unhelpful ones fade. It’s essentially a learning loop: the AI assistant builds up a knowledge base of its own observations and decisions, searchable alongside the actual indexed content.

This turns voitta-rag from a read-only knowledge base into a read-write one — the assistant doesn’t just consume context, it contributes to it.

Per-User Search Visibility

A multi-tenancy feature: users can now enable or disable folders for their own search scope without affecting other users. If you’ve indexed 50 repos but only care about 5 for your current task, you toggle the rest off. The MCP server respects these per-user visibility settings, so AI assistants scoped to different users see different slices of the same knowledge base.

More File Types

The indexing pipeline now handles AZW3 (Amazon Kindle) files, joining the existing support for DOCX, PPTX, XLSX, ODT, ODP, and ODS. Not the most common format in a work context, but it signals that voitta-rag is thinking beyond code and office docs toward general document ingestion.

The Bigger Picture

Two weeks ago, voitta-rag was a code search tool. Now it indexes your Git repos, Google Drive, SharePoint, Jira, Confluence, Box, and Azure DevOps — with time-aware search, per-user scoping, and persistent AI memory. The trajectory is clear: it wants to be the single search layer across everything your team produces, exposed to AI assistants via MCP.

The self-hosted angle remains the key differentiator. Nothing leaves your network. For teams where that matters (and increasingly, it does), this is starting to look like a serious alternative to cloud-hosted RAG services.


Part II: voitta-yolt — You Only Live Twice

Brand new from Voitta AI today: voitta-yolt (You Only Live Twice) — a safety analyzer for Claude Code that statically analyzes Python scripts before execution.

The Problem

Claude Code can write and run Python scripts. That’s powerful and dangerous in equal measure. By default, you either pre-approve all Python execution (fast but risky) or manually approve each script (safe but maddening). Neither is great.

How YOLT Works

YOLT registers as a Claude Code PreToolUse hook on the Bash tool. When Claude Code runs python3 script.py, YOLT intercepts the command, parses the Python AST, and walks every function call against a configurable rule set:

  • Safe scripts (pure computation, data parsing, read-only operations) get auto-approved — no permission prompt.
  • Destructive scripts (file writes, AWS mutations, subprocess calls, network POSTs, database connections) get flagged for human review with specifics about what was detected, including the source line content.

Zero external dependencies — it’s pure stdlib (ast, json, fnmatch, shlex). AST parsing is near-instant, so there’s no perceptible delay.

The Rule System

The default rules are sensible and well-structured:

  • AWS boto3: describe/list/get/head → safe. delete/put/create/terminate → destructive. Rules scope via trigger_imports, so cache.delete_item() in a non-AWS script won’t false-positive.
  • File I/O: open() in write modes, os.remove, shutil.rmtree → destructive. Read-only access is fine.
  • Subprocess: Always flagged. subprocess.run, os.system, the lot.
  • Network: requests.get → safe. requests.post/put/delete → destructive.
  • Database: Connection creation → flagged for review.

A curated list of safe imports (json, csv, re, datetime, pathlib, hashlib, and ~50 others) means scripts that only use standard library data-processing modules sail through without interruption.

Custom rules go in ~/.claude/yolt/rules.json and merge with defaults — you can add safe methods, define new categories with their own trigger_imports, and use glob patterns (fetch_, drop_).

One Important Gotcha

If you have Bash(python3:*) in your Claude Code settings.local.json allow list, YOLT’s hook never fires — static allow rules take precedence over PreToolUse hooks. YOLT replaces the need for that allow rule entirely: safe scripts get auto-approved by the hook itself.

Why This Matters

The design philosophy — “false positives OK, false negatives not” — is the right one for a safety tool. It’s the security principle of fail-closed applied to AI code execution.

YOLT is small (527 lines across 6 files in the initial commit), focused, and immediately useful. If you’re letting Claude Code run Python, this is the kind of guardrail that should exist by default.


Wrapping Up

voitta-rag is evolving from a code search tool into a self-hosted knowledge platform with enterprise connectors and AI memory. voitta-yolt tackles a different but equally practical problem: making AI code execution safer without making it slower.

Both projects are open source (AGPL v3) and available on Voitta AI’s GitHub.


Gregory Golberg is co-founder of Method & Apparatus, a fractional CTO consultancy. Previously: llm-tldr vs voitta-rag: Two Ways to Feed a Codebase to an LLM.

llm-tldr vs voitta-rag: Two Ways to Feed a Codebase to an LLM

Every LLM-assisted coding tool faces the same fundamental tension: codebases are too large to fit in a context window. Two recent tools attack this from opposite directions, and understanding the difference clarifies something important about how we’ll work with code-aware AI going forward.

The Shared Problem

llm-tldr is a compression tool. It parses source code through five layers of static analysis — AST, call graph, control flow, data flow, and program dependence — and produces structural summaries that are 90–99% smaller than raw source. The LLM receives a map of the codebase rather than the code itself.

voitta-rag is a retrieval tool. It indexes codebases into searchable chunks and serves actual source code on demand via hybrid semantic + keyword search. The LLM receives real code, but only the relevant fragments.

Compression vs. retrieval. A map vs. the territory.

At a Glance

llm-tldr voitta-rag
Approach Static analysis → structural summaries Hybrid search → actual code chunks
Foundation Tree-sitter parsers (17 languages) Server-side indexing (language-agnostic)
Interface CLI + MCP server MCP server
Compute Local (embeddings, tree-sitter) Server-side

What Each Does Better

llm-tldr wins when you need to understand how code fits together:

  • Call graphs and dependency tracing across files
  • “What affects line 42?” via program slicing and data flow
  • Dead code detection and architectural layer inference
  • Semantic search by behavior — “validate JWT tokens” finds verify_access_token()

voitta-rag wins when you need the actual code:

  • Retrieving exact implementations for review or modification
  • Searching across many repositories indexed server-side
  • Tunable search precision (pure keyword ↔ pure semantic via sparse_weight)
  • Progressive context loading via chunk ranges — start narrow, expand as needed

The Interesting Part

These tools don’t compete — they occupy different layers of the same workflow. Use llm-tldr to figure out where to look and why, then voitta-rag to pull the code you need. Static analysis for navigation, RAG for retrieval.

This mirrors how experienced developers actually work: first you build a mental model of the architecture (“what calls what, where does data flow”), then you dive into specific files. One tool builds the mental model; the other hands you the files.

The fact that both expose themselves as MCP servers makes combining them straightforward — plug both into your editor or agent and let the LLM decide which to call based on the question.

References

Reverse-engineering and keratinous biomass reduction in bos grunniens

Not that we needed all that for the trip, but once you get locked into a serious drug collection, the tendency is to push it as far as you can.

Hunter S. Thompson

Reverse-engineering is kinda fun. More fun when we can shave the yak by adding more tools to our LLM/MCP toolbox, amirite?

So I accidentally came across this LinkedIn post, about an SVG diagramming tool for Claude. I was just working on some diagrams as part of reverse engineering and having been making agents create those with Mermaid, but I thought I’d give it a try.

Well, that was a flock of wild geese chasing a red herring down a rabbit hole to borrow a shear…

First, I thought the idea was clever, but I wanted more cowbell (because we don’t have enough animals in this post), so I forked that and vibe-coded an MCP server on top of that.

Then I tried to use it to create a few architecture diagrams but I found it actually somewhat lacking. When the client (Claude Desktop) was using it, I didn’t love the editing capability. When the client was not using it, it created nicer-looking diagrams somehow (in SVG, yes) and with legends and stuff. But of course the graph layout still sucked. So I’d need to manually edit it.

Well, screw that, said I. I’ll use AWS MCP server, said I.

Screw that, said I next.

Then I modified the prompt to ask not for SVG but for DOT format of GraphViz. Much better, I said. And then, uh… It could have gone better, right? But at this point I’m not sure how to improve the prompt.

But I know what to do when I don’t know something, right?

Yes. I put the DOT file to the LLM and ask it to tweak it to have a certain thing. Then I ask why. Then I, of course, ask it, to fix this original prompt. And it’s turtles (yes, we’re in a zoo and you’re reading it on a Safari) all the way down.

And what do we learn, Palmer? Well, never mind, let us draw the curtain of charity over the rest of this scene.

(Well, not quite true — using DOT is the better thing to do here than explicitly doing things like “30px” instructions).


NOTE: multiple individuals of bos grunniens species have undergone keratinous biomass reduction, which also included:

The moral of the story is absent.

Coding assistants musing

I love me my Cline, Claude Code and company. But there’s major thing I found missing from them — I want my assistant to be able to step with me through a debugger, and be able to examine variables and call stack. Somehow this doesn’t exist. This is helpful for figuring out the flow of an unfamiliar program, for example.

Now, JetBrains MCP Server Plugin gets some of the way there, but… It can set breakpoints but because of the way it analyzes code text it often gets confused. For example, when asked to set a breakpoint on the first line of the method it would do it at a method signature or annotation.

And it doesn’t do anything in terms of examining the code state at a breakpoint.

So I decided to build on top of it, see JetBrains-Voitta plugin (based on a Demo Plugin). It:

  • Uses IntelliJ PSI API to provide more meaningful code structure to the LLM (as AST)
    • This helps with properly setting breakpoints from verbal instructions
    • Hopefully also this should prevent some hallucinations about methods that do not exit (educated guess).
  • Adds more debugging capability, such as inspecting the call stack and variables at a given breakpoint.

    Here are a couple of example debug sessions:

Much better.

And completely vibe-coded.

Maybe do something with Cline next?

MCP protocol of choice: stdin/stdout? WTF, man?

Let’s talk about MCP. More specifically, let’s talk about using stdin/stdout as a protocol transport layer in the year of our Lord 2025.

Yes, yes—it’s universal. It’s composable. It works “everywhere.” It’s the spiritual successor to Unix pipes, which were cool at the time. The time when my Dad was hitting on my Mom. As an actual transport layer, stdin/stdout is a disaster.

Debugging Is Basically a Crime

Let’s say I want to create an MCP server in Python. Reasonable. Now let’s say I want to debug it. Set a breakpoint. Inspect variables. Use threads. Maybe spin up the LLM in the same process for context. You know, software engineering.

The moment you try to do this, you’re writing a debug driver. Congratulations. You are now:

  • Building a fake client to simulate a streaming LLM
  • Implementing bidirectional IO while praying the LLM doesn’t send surprise newline characters
  • Wrapping things in threads and/or asyncio or multiprocessing or whatnot other total fucking bullshit.

Been there. Twice:

  • Voitta’s Brokkoly: Thought I could run the LLM and the driver in one process. Spent 3 hours implementing queues, got it half-working, and realized I was debugging my own debug tool.
  • Samtyzukki: Round two. Same problem. Ended up with more abstraction layers than a Kafka conference.

Eventually, I just gave up and decided to use SSE (Server-Sent Events). Because you know what’s great about SSE? You can log things. You can see the messages. You can debug. It’s like rediscovering civilization after weeks of wilderness survival with only printf() and trauma.

 stdout Is Sacred, Until It Isn’t

Here’s the other problem. stdout is a shared space. You can’t count on it. Libraries will write to it. Dependencies will write to it. Your logger will write to it. Some genius upstream will write:

print(“INFO: falling back to CPU because the GPU is feeling shy today.”)

Congratulations. You just corrupted your transport. Your parser reads that as malformed JSON or a broken packet or an existential and spiritual crisis.

It’s not a bug. It’s a design decision—and not a good one.

This is the part where I invoke Rob Pike. Sorry. Not sorry.

In Go, to format a date, one doesn’t simply use YYYY-MM-DD. You do Mon Jan 2 15:04:05 MST 2006.

Because, I get it, we all need to get high once in a while. But srsly.