voitta-rag Grows Up, voitta-yolt Is Born: February Updates from Voitta AI

February 25, 2026March 4, 2026 Gregory Golberg method-and-apparatus, voitta ai-tools, claude code, developer-tools, mcp, open-source, rag, voitta, voitta-rag

A follow-up to our February 13 comparison of llm-tldr and voitta-rag.

Part I: voitta-rag — From Code Search to Knowledge Platform

When we last looked at voitta-rag, it was a solid hybrid search engine for codebases — index your repos, search via MCP, get actual code chunks back. Twelve days and 11 commits later, it’s become something broader: a self-hosted knowledge platform that indexes not just code but your entire work graph.

Here’s what landed since February 13.

Enterprise Connectors: Jira, Confluence, SharePoint

The biggest expansion is connector coverage. voitta-rag now syncs from Jira, Confluence, and SharePoint alongside the existing Git, Google Drive, Azure DevOps, and Box integrations.

Jira and Confluence support both Cloud (API token with Basic auth) and Server/Data Center (PAT with Bearer auth), selectable via dropdown in the UI — a detail that matters because plenty of enterprises still run on-prem Atlassian. Cloud uses the v3 search endpoint (v2 is deprecated), and Confluence Cloud correctly routes through /wiki/rest/api.

SharePoint got a full global sync implementation. And on the UI side, both Jira projects and Confluence spaces now use multi-select dropdown widgets — you can cherry-pick specific projects or select “ALL” to dynamically sync everything, including future additions. Practical touch: JQL project keys are now quoted to handle reserved words like IS that would otherwise break queries.

Time-Aware Search

Search results are no longer timeless. voitta-rag now tracks source timestamps — created_at and modified_at — propagated from every remote connector through a .voitta_timestamps.json sidecar file into the indexing pipeline and vector store.

This enables time range filtering on the MCP search tool via date_start/date_end parameters. “What changed in the last week?” is now a first-class query. For an AI assistant trying to understand recent activity across repos, Jira boards, and Confluence spaces simultaneously, this is a significant upgrade.

Anamnesis: Persistent Memory for AI Assistants

The most architecturally interesting addition. Anamnesis (Greek for “recollection”) gives AI assistants a persistent memory layer backed by voitta-rag’s vector store.

Six new MCP tools let an assistant create, retrieve, update, delete, like, and dislike memories. The like/dislike mechanism adjusts relevance scoring — memories the assistant finds useful surface more readily over time, while unhelpful ones fade. It’s essentially a learning loop: the AI assistant builds up a knowledge base of its own observations and decisions, searchable alongside the actual indexed content.

This turns voitta-rag from a read-only knowledge base into a read-write one — the assistant doesn’t just consume context, it contributes to it.

Per-User Search Visibility

A multi-tenancy feature: users can now enable or disable folders for their own search scope without affecting other users. If you’ve indexed 50 repos but only care about 5 for your current task, you toggle the rest off. The MCP server respects these per-user visibility settings, so AI assistants scoped to different users see different slices of the same knowledge base.

More File Types

The indexing pipeline now handles AZW3 (Amazon Kindle) files, joining the existing support for DOCX, PPTX, XLSX, ODT, ODP, and ODS. Not the most common format in a work context, but it signals that voitta-rag is thinking beyond code and office docs toward general document ingestion.

The Bigger Picture

Two weeks ago, voitta-rag was a code search tool. Now it indexes your Git repos, Google Drive, SharePoint, Jira, Confluence, Box, and Azure DevOps — with time-aware search, per-user scoping, and persistent AI memory. The trajectory is clear: it wants to be the single search layer across everything your team produces, exposed to AI assistants via MCP.

The self-hosted angle remains the key differentiator. Nothing leaves your network. For teams where that matters (and increasingly, it does), this is starting to look like a serious alternative to cloud-hosted RAG services.

Part II: voitta-yolt — You Only Live Twice

Brand new from Voitta AI today: voitta-yolt (You Only Live Twice) — a safety analyzer for Claude Code that statically analyzes Python scripts before execution.

The Problem

Claude Code can write and run Python scripts. That’s powerful and dangerous in equal measure. By default, you either pre-approve all Python execution (fast but risky) or manually approve each script (safe but maddening). Neither is great.

How YOLT Works

YOLT registers as a Claude Code PreToolUse hook on the Bash tool. When Claude Code runs python3 script.py, YOLT intercepts the command, parses the Python AST, and walks every function call against a configurable rule set:

Safe scripts (pure computation, data parsing, read-only operations) get auto-approved — no permission prompt.
Destructive scripts (file writes, AWS mutations, subprocess calls, network POSTs, database connections) get flagged for human review with specifics about what was detected, including the source line content.

Zero external dependencies — it’s pure stdlib (ast, json, fnmatch, shlex). AST parsing is near-instant, so there’s no perceptible delay.

The Rule System

The default rules are sensible and well-structured:

AWS boto3: describe/list/get/head → safe. delete/put/create/terminate → destructive. Rules scope via trigger_imports, so cache.delete_item() in a non-AWS script won’t false-positive.
File I/O: open() in write modes, os.remove, shutil.rmtree → destructive. Read-only access is fine.
Subprocess: Always flagged. subprocess.run, os.system, the lot.
Network: requests.get → safe. requests.post/put/delete → destructive.
Database: Connection creation → flagged for review.

A curated list of safe imports (json, csv, re, datetime, pathlib, hashlib, and ~50 others) means scripts that only use standard library data-processing modules sail through without interruption.

Custom rules go in ~/.claude/yolt/rules.json and merge with defaults — you can add safe methods, define new categories with their own trigger_imports, and use glob patterns (fetch_*, drop_*).

One Important Gotcha

If you have Bash(python3:*) in your Claude Code settings.local.json allow list, YOLT’s hook never fires — static allow rules take precedence over PreToolUse hooks. YOLT replaces the need for that allow rule entirely: safe scripts get auto-approved by the hook itself.

Why This Matters

The design philosophy — “false positives OK, false negatives not” — is the right one for a safety tool. It’s the security principle of fail-closed applied to AI code execution.

YOLT is small (527 lines across 6 files in the initial commit), focused, and immediately useful. If you’re letting Claude Code run Python, this is the kind of guardrail that should exist by default.

Wrapping Up

voitta-rag is evolving from a code search tool into a self-hosted knowledge platform with enterprise connectors and AI memory. voitta-yolt tackles a different but equally practical problem: making AI code execution safer without making it slower.

Anatomy of a Fork Explosion, Part II: The Full Dissection

February 18, 2026 Gregory Golberg holding forth ai-agents, data, github, open-source, openai, openclaw

Two days ago we published a quick look at OpenClaw’s fork explosion — 34,600 forks, sampled from the bookends of GitHub’s API, with a 33,000-fork black hole in the middle. We were upfront about it: “This was a 30-minute investigation, not a thesis.”

This is the thesis.

We went back and scraped all 36,915 forks (the number grew while we were counting). Every single one. Plus 9,423 pull requests. Three graphs, no black holes, no excuses.

Graph 1: The hockey stick that wasn’t quite a hockey stick

Forks per day

36,915 total forks. Peak: 3,402 on January 27. Average: 499/day.

The first fork appeared November 26, 2025. For nearly two months: nothing. A handful of early adopters per day, the kind of people who read Hacker News at 2am and clone things “to look at later.”

Then something happened around January 20.

Daily forks went from ~50 to over 1,000 in three days. By January 27, it hit 3,402 in a single day. That’s one fork every 25 seconds, sustained for 24 hours.

But here’s what the full data shows that the sample didn’t: it’s already declining. The peak was January 27. By mid-February, we’re down to about 1,000/day — still enormous, but the exponential phase lasted exactly one week. What we’re in now is the long tail. The viral moment came, the viral moment is going.

The cumulative curve tells the same story: a flat line, a vertical cliff, and then an inflection into deceleration. Classic viral adoption. The question isn’t whether it will keep growing — it will. The question is whether it levels off at 40,000 or 400,000.

Graph 2: Who actually builds anything?

Forks with commits

7,591 of 36,915 forks (20.6%) have new commits. Threshold: code pushed more than 1 hour after forking.

This is the graph that matters.

In the early days — November, December — the commit rate was absurd. 60-90% of forks showed real work. These were people who forked because they intended to build. Small community, high signal.

Then came January’s tidal wave, and the ratio cratered. At peak volume, only about 10-20% of forks have any commits at all. The rest are what they’ve always been: GitHub bookmarks. One click, zero intention.

But zoom out from percentages and look at absolute numbers: even at 10%, that’s 300-500 people per day writing actual code on top of OpenClaw. The most recent week shows roughly 1,200 committed forks out of about 5,500 new ones. That’s a healthy project by any measure. It’s just a healthy project buried under 80% noise.

The trend line tells you something about open-source psychology: the harder a project is to use, the higher its commit rate. When OpenClaw was obscure, only competent developers found it. Now that it’s famous, everybody forks it and almost nobody builds anything. Same pattern as every framework that hits the front page of Reddit.

Graph 3: Who gives back?

PRs from forks

9,009 fork PRs from 3,674 unique authors. 9.95% of forks ever sent a PR upstream.

One in ten. That’s actually remarkable for open source.

For context: most popular GitHub projects see PR rates of 1-2% of their fork base. React, with its 10:1 star-to-fork ratio, gets far fewer contributors relative to its fork count. OpenClaw’s 10% is unusually high — partly because the project is young and actively soliciting contributions, partly because the architecture (plugins, extensions, MCPs) makes it easy to contribute without touching core code.

The daily PR count has been climbing steadily: from single digits in December, to 50/day in mid-January, to a sustained 300-500/day now. Cumulative unique contributors crossed 3,500 and show no signs of flattening. Whatever is happening to the fork rate, the contribution rate is still accelerating.

That divergence — declining forks, accelerating PRs — is the best signal in this entire dataset. It means the project is transitioning from “thing people try” to “thing people commit to.”

What we got wrong in Part 1

Our original sample of the 100 newest forks found 19% activity. The full dataset says 20.6%. We were within a rounding error, which is either a testament to sampling theory or dumb luck. Probably both.

What the sample couldn’t show was the shape of the curve — the early period of 60-90% engagement that collapsed as volume exploded. The 20% number is real, but it’s an average across two very different populations: serious developers who forked early, and a much larger wave of tourists who forked because it was trending.

We also estimated “~2,400 forks/day” based on a snapshot. The real peak was 3,402. And by now it’s fallen to about 1,000. The snapshot caught a number that was already past its peak but hadn’t decayed enough to notice.

The numbers that matter

Forget 36,915 forks. Here’s what actually counts:

7,591 forks with real commits — people building things
3,674 unique PR authors — people giving back
~500 PRs/day at current pace — and growing

That’s not a fork explosion. That’s a contributor ecosystem forming in real time. The other 29,324 forks are scenery.

We’ll explain shoelace eventually. Promise.

Full dataset: 36,915 forks and 9,423 PRs scraped from the GitHub REST API v3 on February 17, 2026. All forks paginated (no sampling). Commit activity measured by comparing pushed_at to created_at with a 1-hour threshold to filter initial fork sync. PR data from GitHub’s search API.

Part 1: Anatomy of a Fork Explosion

Anatomy of a Fork Explosion

February 16, 2026February 16, 2026 Gregory Golberg holding forth ai-agents, data, github, open-source, openai, openclaw

OpenClaw has 34,600 forks.

Yesterday, its creator joined OpenAI.

These two facts are related in ways that are worth pulling apart.

What 34,600 forks actually looks like

A GitHub fork costs nothing — one click, two seconds. It’s a bookmark with delusions of contribution. So I pulled the data from GitHub’s API to see what’s actually going on underneath the vanity number.

GitHub’s API for listing forks returns a maximum of 400 results per request. You can sort by oldest or newest, so you get the first 400 forks ever created and the 400 most recent ones. The ~33,000 forks in between? Invisible. GitHub literally won’t show them to you. You’d need to scrape each fork individually or use their BigQuery dataset to see the full picture. I didn’t — so this analysis covers the bookends with a black hole in the middle. I’m not going to dress it up.

The growth curve

The first fork appeared November 26, 2025 — two days after the repo went public. For the next month: a trickle. One, two, three forks per day. Early adopters kicking the tires.

Then Christmas happened.

December 25: 10 forks. A 10x jump. People unwrapped laptops and had free time. The holiday week held steady at 5-10 per day.

January 1: 23 forks. Another 3x. By January 6, it peaked at 51 forks/day in the sample. New Year’s resolution energy: “this is the year I set up my own AI agent.”

And right now? ~100 forks per hour. 345 forks appeared in a 4.3-hour window. That’s a ~2,400/day pace.

The trajectory: 1/day → 10/day → 50/day → 100/hour.

Bar chart showing OpenClaw fork growth from 1-3/day in November 2025 to ~2,400/day in February 2026

Somewhere between people opening Christmas presents and Valentine’s Day, OpenClaw went from “interesting open-source tool” to “phenomenon.” Which is a convenient time for the phenomenon’s creator to get hired by the company that didn’t make it.

The 81% question

Here’s the part nobody talks about.

Of the 100 most recent forks — all created within the last hour of my sample — how many show any commit activity after forking?

19%.

The other 81% are untouched clones. Fork and forget. GitHub stars with extra steps.

Donut chart showing 19% of forks have commits after forking, 81% are untouched clones

But before you dismiss it: 19% of 100 forks per hour is still ~20 people per hour actually building something. That’s ~480 developers per day doing real work on top of OpenClaw. Not nothing. Especially for a project that, until yesterday, was one developer’s playground.

The ones who renamed their fork (and are apparently walking away from Omelas)

The most interesting signal isn’t volume — it’s intent. When someone renames their fork, they’re not cloning; they’re starting something new.

Highlights:

cl-core-mit-snapshot — someone freezing the codebase under MIT. Defensive forking. Just in case.
openclaw-x402-router — x402 payment protocol integration. Somebody’s building monetized agent infrastructure before the foundation even has bylaws.
reallyopenopenclaw — a philosophical statement in repo form. Already preemptively arguing with the future.
ladysclaw — rebranding energy.
clawguard — presumably security hardening.
shoelace — no explanation. Just vibes.

These are the 2% who forked with purpose. Watch them.

People aren’t just watching

OpenClaw’s stars-to-forks ratio is 5.7:1 (197K stars to 34.6K forks). For context:

React: ~10:1
Next.js: ~16:1

A low ratio means people are grabbing the code, not just bookmarking it. OpenClaw’s is unusually low. Whether that’s because the tool rewards customization, because the ecosystem hasn’t consolidated around plugins yet, or because people want to run it privately and not tell anyone — probably all three.

And now that the creator is inside OpenAI and the project is headed for a foundation? That cl-core-mit-snapshot fork starts looking less paranoid and more prescient.

The timing

Peter Steinberger announced yesterday that he’s joining OpenAI. Sam Altman said on X that OpenClaw will “live in a foundation as an open source project that OpenAI will continue to support.”

So let me get this straight: The tool was originally called ClawdBot — you can guess which model it was built for. The tool’s creator just joined OpenAI. The tool will live in a foundation that OpenAI sponsors. And 34,600 people have already forked the code, 81% of whom will never touch it again.

If you’re keeping score at home, a developer built a personal agent, originally called it ClawdBot (no points for guessing the model), made it go viral, got hired by OpenAI, and the project is now an “independent foundation” that OpenAI “supports.” This is like a Ford engineer building the best car on the market using Toyota engines, then getting hired by GM to “drive the next generation of personal vehicles.”

The claw is the law, apparently. Just not any particular company’s law.

What I couldn’t measure

Two of my three original questions remain unanswered:

✅ Fork creation over time — covered, with the API gap caveat
❌ Forks with independent commits — sampled 100, can’t do all 34,600 without days of API scraping
❌ Forks that sent PRs back to main — same problem, worse

A more rigorous analysis would use GitHub’s BigQuery dataset. This was a 30-minute investigation, not a thesis. But the 30 minutes told a story.

The real question

34,600 forks sounds massive. It is massive. But the real number is somewhere between 6,500 (19% active) and 700 (2% with intent). Still impressive, and still accelerating.

The open-source AI agent space is in its “everybody forks, nobody contributes back” phase. That’s fine — it’s how platforms grow. The interesting question isn’t how many forks exist today. It’s how many of them will still have commits six months from now, when the foundation has governance, when OpenAI’s priorities inevitably diverge from the community’s, and when the next shiny thing comes along.

History suggests: about 2%. But those 2% will be the ones that matter.

Data pulled from the GitHub REST API v3 on February 15–16, 2026. Fork listing capped at 400 per sort direction; findings are based on sampled bookends, not the full dataset.

IDRing

June 16, 2017December 7, 2017 Gregory Golberg romana data-structures, github, golang, ipam, open-source, romana

A major feature of the Romana Project is topology-aware IPAM, and an integral part of it is the ability to assign consecutive IP addresses in a block (and reuse freed up addresses, starting with the minimal).

Since IPv4 addresses are essentially 32-bit uint, the problem is basically that of maintaining a sequence of uints, while allowing reuse.

To that end, a data structure called IDRing was developed. I’ll describe it here. It is not yet factored out into a separate project, but as Romana is under Apache 2.0 license, it can still be reused.

The IDRing structure is constructed with a NewIDRing() method, that provides the lower and upper bound (inclusive) of the interval from which to give out IDs. For example, specifying 5 and 10 will allow one to generate IDs 5,6,7,8,9,10 and then return errors because the interval is exhausted.
Optionally, a locker can be provided to ensure that all operations on the structure are synchronized. If nil is specified, then synchronization is the responsibility of the user.
To get a new ID, call GetID() method. The new ID is guaranteed to be the smallest available.
When an ID is no longer needed (in our use case — when an IP is deallocated), call ReclaimID().
A useful method is Invert(): it returns an IDRing whose available IDs are the ones that are allocated in the current one, and whose allocated ones are the available ones in the current one. In other words, a reverse of an IDRing with min 1, max 10 and taken IDs from 4 to 8 inclusively is an IDRing with the following
available ranges: [1,3], [9,10].
You can see examples of the usage in the test code and in actual IP allocation logic.
Persisting it is as easy as using the locker correctly and just encoding the structure to/decoding it from JSON.

DEBEDb

Fear, loathing, uncertainty, doubt, laziness, impatience, Oxford commas, and hubris

open-source