Reverse-engineering and keratinous biomass reduction in bos grunniens

Not that we needed all that for the trip, but once you get locked into a serious drug collection, the tendency is to push it as far as you can.

Hunter S. Thompson

Reverse-engineering is kinda fun. More fun when we can shave the yak by adding more tools to our LLM/MCP toolbox, amirite?

So I accidentally came across this LinkedIn post, about an SVG diagramming tool for Claude. I was just working on some diagrams as part of reverse engineering and having been making agents create those with Mermaid, but I thought I’d give it a try.

Well, that was a flock of wild geese chasing a red herring down a rabbit hole to borrow a shear…

First, I thought the idea was clever, but I wanted more cowbell (because we don’t have enough animals in this post), so I forked that and vibe-coded an MCP server on top of that.

Then I tried to use it to create a few architecture diagrams but I found it actually somewhat lacking. When the client (Claude Desktop) was using it, I didn’t love the editing capability. When the client was not using it, it created nicer-looking diagrams somehow (in SVG, yes) and with legends and stuff. But of course the graph layout still sucked. So I’d need to manually edit it.

Well, screw that, said I. I’ll use AWS MCP server, said I.

Screw that, said I next.

Then I modified the prompt to ask not for SVG but for DOT format of GraphViz. Much better, I said. And then, uh… It could have gone better, right? But at this point I’m not sure how to improve the prompt.

But I know what to do when I don’t know something, right?

Yes. I put the DOT file to the LLM and ask it to tweak it to have a certain thing. Then I ask why. Then I, of course, ask it, to fix this original prompt. And it’s turtles (yes, we’re in a zoo and you’re reading it on a Safari) all the way down.

And what do we learn, Palmer? Well, never mind, let us draw the curtain of charity over the rest of this scene.

(Well, not quite true — using DOT is the better thing to do here than explicitly doing things like “30px” instructions).


NOTE: multiple individuals of bos grunniens species have undergone keratinous biomass reduction, which also included:

The moral of the story is absent.

Coding assistants musing

I love me my Cline, Claude Code and company. But there’s major thing I found missing from them — I want my assistant to be able to step with me through a debugger, and be able to examine variables and call stack. Somehow this doesn’t exist. This is helpful for figuring out the flow of an unfamiliar program, for example.

Now, JetBrains MCP Server Plugin gets some of the way there, but… It can set breakpoints but because of the way it analyzes code text it often gets confused. For example, when asked to set a breakpoint on the first line of the method it would do it at a method signature or annotation.

And it doesn’t do anything in terms of examining the code state at a breakpoint.

So I decided to build on top of it, see JetBrains-Voitta plugin (based on a Demo Plugin). It:

  • Uses IntelliJ PSI API to provide more meaningful code structure to the LLM (as AST)
    • This helps with properly setting breakpoints from verbal instructions
    • Hopefully also this should prevent some hallucinations about methods that do not exit (educated guess).
  • Adds more debugging capability, such as inspecting the call stack and variables at a given breakpoint.

    Here are a couple of example debug sessions:

Much better.

And completely vibe-coded.

Maybe do something with Cline next?

MCP protocol of choice: stdin/stdout? WTF, man?

Let’s talk about MCP. More specifically, let’s talk about using stdin/stdout as a protocol transport layer in the year of our Lord 2025.

Yes, yes—it’s universal. It’s composable. It works “everywhere.” It’s the spiritual successor to Unix pipes, which were cool at the time. The time when my Dad was hitting on my Mom. As an actual transport layer, stdin/stdout is a disaster.

Debugging Is Basically a Crime

Let’s say I want to create an MCP server in Python. Reasonable. Now let’s say I want to debug it. Set a breakpoint. Inspect variables. Use threads. Maybe spin up the LLM in the same process for context. You know, software engineering.

The moment you try to do this, you’re writing a debug driver. Congratulations. You are now:

  • Building a fake client to simulate a streaming LLM
  • Implementing bidirectional IO while praying the LLM doesn’t send surprise newline characters
  • Wrapping things in threads and/or asyncio or multiprocessing or whatnot other total fucking bullshit.

Been there. Twice:

  • Voitta’s Brokkoly: Thought I could run the LLM and the driver in one process. Spent 3 hours implementing queues, got it half-working, and realized I was debugging my own debug tool.
  • Samtyzukki: Round two. Same problem. Ended up with more abstraction layers than a Kafka conference.

Eventually, I just gave up and decided to use SSE (Server-Sent Events). Because you know what’s great about SSE? You can log things. You can see the messages. You can debug. It’s like rediscovering civilization after weeks of wilderness survival with only printf() and trauma.

 stdout Is Sacred, Until It Isn’t

Here’s the other problem. stdout is a shared space. You can’t count on it. Libraries will write to it. Dependencies will write to it. Your logger will write to it. Some genius upstream will write:

print(“INFO: falling back to CPU because the GPU is feeling shy today.”)

Congratulations. You just corrupted your transport. Your parser reads that as malformed JSON or a broken packet or an existential and spiritual crisis.

It’s not a bug. It’s a design decision—and not a good one.

This is the part where I invoke Rob Pike. Sorry. Not sorry.

In Go, to format a date, one doesn’t simply use YYYY-MM-DD. You do Mon Jan 2 15:04:05 MST 2006.

Because, I get it, we all need to get high once in a while. But srsly.

Athena Federated Queries: Azure Data Lake Storage, part II

In our previous installment, we learned that Athena does not support ADLS directly (without Synapse). I decided to try to rectify the situation. Initial draft here: https://github.com/debedb/athena-azure-adls

It totally sucks because it’s not useful performance-wise, too slow. But at least it’s got a connection…

But then again Dremio seems to be real good about it. It appears to work well with blob storage (ADLS on Azure, GCS on GCP, S3 on AWS). Even, in some cases, better than Athena with all the blobs in S3.

I may add benchmarks if I can.

To be continued…

Credit where it’s due

Microsoft has a fix for an issue quite quickly (mentioned in a previous post).

Figuring out the reason for the magic number to backtrack from, though, I had posited another reason, and I was wrong… And overall it now reminded me of:

The appearance of our visitor was a surprise to me, since I had expected a typical country practitioner. He was a very tall, thin man, with a long nose like a beak, which jutted out between two keen, grey eyes, set closely together and sparkling brightly from behind a pair of gold-rimmed glasses. He was clad in a professional but rather slovenly fashion, for his frock-coat was dingy and his trousers frayed. Though young, his long back was already bowed, and he walked with a forward thrust of his head and a general air of peering benevolence. As he entered his eyes fell upon the stick in Holmes’s hand, and he ran towards it with an exclamation of joy. “I am so very glad,” said he. “I was not sure whether I had left it here or in the Shipping Office. I would not lose that stick for the world.”

“A presentation, I see,” said Holmes.

“Yes, sir.”

“From Charing Cross Hospital?”

“From one or two friends there on the occasion of my marriage.”

“Dear, dear, that’s bad!” said Holmes, shaking his head.

Dr. Mortimer blinked through his glasses in mild astonishment. “Why was it bad?”

“Only that you have disarranged our little deductions. Your marriage, you say?”

Which in turn reminded me of

И вот какого хрена “Shipping Office” переводится как “пароходство”?

Athena Federated Queries: Azure Data Lake Storage

Well, this one is super broken, which one finds out after shaving a number of yaks.

We want to query Parquet files that sit in Azure Data Lake Storage with Athena. AWS has what seems to be a nice documentation on how to do it… Except:

  1. Searching for it in Serverless Application Repository with “azure” or “adsl” terms is not yielding anything.
    • Additionally there seems to be a bug there, per AWS support:

      Issue:
      – The search functionality appears to be unresponsive when using the traditional “Enter” key method
      – This seems to be a technical bug in the console
      Workaround:
      – Enter your search term in the search bar – Instead of pressing Enter, click anywhere on the screen
      – This should trigger the search functionality and display the results

    • Search for something like “gen2” actually yields something… It’s a AthenaDataLakeGen2Connector — which is the same thing as below, so read on.
  2. Trying to add the Data Source from Athena, selecting “Microsoft Azure Data Lake Storage (ADLS) Gen2” connector… It is based on athena-datalakegen2 code which is borken because the underlying mssql JDBC driver is borken.
  3. After patching the mssql driver and the connector, we realize that it is trying to connect via JDBC to ADLS, but that is not supported. And yet AWS claims “the documentation is correct“.

Srsly now, AWS and Microsoft, you even tested anything?

It’s already 2025, and still