Reverse-engineering and keratinous biomass reduction in bos grunniens

Not that we needed all that for the trip, but once you get locked into a serious drug collection, the tendency is to push it as far as you can.

Hunter S. Thompson

Reverse-engineering is kinda fun. More fun when we can shave the yak by adding more tools to our LLM/MCP toolbox, amirite?

So I accidentally came across this LinkedIn post, about an SVG diagramming tool for Claude. I was just working on some diagrams as part of reverse engineering and having been making agents create those with Mermaid, but I thought I’d give it a try.

Well, that was a flock of wild geese chasing a red herring down a rabbit hole to borrow a shear…

First, I thought the idea was clever, but I wanted more cowbell (because we don’t have enough animals in this post), so I forked that and vibe-coded an MCP server on top of that.

Then I tried to use it to create a few architecture diagrams but I found it actually somewhat lacking. When the client (Claude Desktop) was using it, I didn’t love the editing capability. When the client was not using it, it created nicer-looking diagrams somehow (in SVG, yes) and with legends and stuff. But of course the graph layout still sucked. So I’d need to manually edit it.

Well, screw that, said I. I’ll use AWS MCP server, said I.

Screw that, said I next.

Then I modified the prompt to ask not for SVG but for DOT format of GraphViz. Much better, I said. And then, uh… It could have gone better, right? But at this point I’m not sure how to improve the prompt.

But I know what to do when I don’t know something, right?

Yes. I put the DOT file to the LLM and ask it to tweak it to have a certain thing. Then I ask why. Then I, of course, ask it, to fix this original prompt. And it’s turtles (yes, we’re in a zoo and you’re reading it on a Safari) all the way down.

And what do we learn, Palmer? Well, never mind, let us draw the curtain of charity over the rest of this scene.

(Well, not quite true — using DOT is the better thing to do here than explicitly doing things like “30px” instructions).


NOTE: multiple individuals of bos grunniens species have undergone keratinous biomass reduction, which also included:

The moral of the story is absent.

Coding assistants musing

I love me my Cline, Claude Code and company. But there’s major thing I found missing from them — I want my assistant to be able to step with me through a debugger, and be able to examine variables and call stack. Somehow this doesn’t exist. This is helpful for figuring out the flow of an unfamiliar program, for example.

Now, JetBrains MCP Server Plugin gets some of the way there, but… It can set breakpoints but because of the way it analyzes code text it often gets confused. For example, when asked to set a breakpoint on the first line of the method it would do it at a method signature or annotation.

And it doesn’t do anything in terms of examining the code state at a breakpoint.

So I decided to build on top of it, see JetBrains-Voitta plugin (based on a Demo Plugin). It:

  • Uses IntelliJ PSI API to provide more meaningful code structure to the LLM (as AST)
    • This helps with properly setting breakpoints from verbal instructions
    • Hopefully also this should prevent some hallucinations about methods that do not exit (educated guess).
  • Adds more debugging capability, such as inspecting the call stack and variables at a given breakpoint.

    Here are a couple of example debug sessions:

Much better.

And completely vibe-coded.

Maybe do something with Cline next?

MCP protocol of choice: stdin/stdout? WTF, man?

Let’s talk about MCP. More specifically, let’s talk about using stdin/stdout as a protocol transport layer in the year of our Lord 2025.

Yes, yes—it’s universal. It’s composable. It works “everywhere.” It’s the spiritual successor to Unix pipes, which were cool at the time. The time when my Dad was hitting on my Mom. As an actual transport layer, stdin/stdout is a disaster.

Debugging Is Basically a Crime

Let’s say I want to create an MCP server in Python. Reasonable. Now let’s say I want to debug it. Set a breakpoint. Inspect variables. Use threads. Maybe spin up the LLM in the same process for context. You know, software engineering.

The moment you try to do this, you’re writing a debug driver. Congratulations. You are now:

  • Building a fake client to simulate a streaming LLM
  • Implementing bidirectional IO while praying the LLM doesn’t send surprise newline characters
  • Wrapping things in threads and/or asyncio or multiprocessing or whatnot other total fucking bullshit.

Been there. Twice:

  • Voitta’s Brokkoly: Thought I could run the LLM and the driver in one process. Spent 3 hours implementing queues, got it half-working, and realized I was debugging my own debug tool.
  • Samtyzukki: Round two. Same problem. Ended up with more abstraction layers than a Kafka conference.

Eventually, I just gave up and decided to use SSE (Server-Sent Events). Because you know what’s great about SSE? You can log things. You can see the messages. You can debug. It’s like rediscovering civilization after weeks of wilderness survival with only printf() and trauma.

 stdout Is Sacred, Until It Isn’t

Here’s the other problem. stdout is a shared space. You can’t count on it. Libraries will write to it. Dependencies will write to it. Your logger will write to it. Some genius upstream will write:

print(“INFO: falling back to CPU because the GPU is feeling shy today.”)

Congratulations. You just corrupted your transport. Your parser reads that as malformed JSON or a broken packet or an existential and spiritual crisis.

It’s not a bug. It’s a design decision—and not a good one.

This is the part where I invoke Rob Pike. Sorry. Not sorry.

In Go, to format a date, one doesn’t simply use YYYY-MM-DD. You do Mon Jan 2 15:04:05 MST 2006.

Because, I get it, we all need to get high once in a while. But srsly.

Athena Federated Queries: Azure Data Lake Storage

Well, this one is super broken, which one finds out after shaving a number of yaks.

We want to query Parquet files that sit in Azure Data Lake Storage with Athena. AWS has what seems to be a nice documentation on how to do it… Except:

  1. Searching for it in Serverless Application Repository with “azure” or “adsl” terms is not yielding anything.
    • Additionally there seems to be a bug there, per AWS support:

      Issue:
      – The search functionality appears to be unresponsive when using the traditional “Enter” key method
      – This seems to be a technical bug in the console
      Workaround:
      – Enter your search term in the search bar – Instead of pressing Enter, click anywhere on the screen
      – This should trigger the search functionality and display the results

    • Search for something like “gen2” actually yields something… It’s a AthenaDataLakeGen2Connector — which is the same thing as below, so read on.
  2. Trying to add the Data Source from Athena, selecting “Microsoft Azure Data Lake Storage (ADLS) Gen2” connector… It is based on athena-datalakegen2 code which is borken because the underlying mssql JDBC driver is borken.
  3. After patching the mssql driver and the connector, we realize that it is trying to connect via JDBC to ADLS, but that is not supported. And yet AWS claims “the documentation is correct“.

Srsly now, AWS and Microsoft, you even tested anything?

It’s already 2025, and still

It’s 2024, and…

This is a special kind of rant, so I’m starting a new tag in addition to it. It’ll be updated next year, I’m sure.

The state of yak shaving in today’s computing world is insane.

Here we go.

It’s 2024, and…

  • …and I can’t get CloudWatch agent to work to get memory monitoring (also, why is this extra step needed, why can’t memory monitoring be part of default metrics? Nobody cares about memory?) Screwing around with IAM roles and access keys keeps giving me:

    ****** processing amazon-cloudwatch-agent ******
    2024/04/05 20:51:36 E! Please make sure the credentials and region set correctly on your hosts.

    At which point I give up and just do this:

    #!/bin/bash                                                              

    inst_id=$(ec2metadata --instance-id)                                                    
    while :
    do
    used_megs=$(free --mega | awk 'NR!=1 {print $3}' | head -1)
    aws cloudwatch put-metric-data \
        --namespace gg \
    --metric-name mem4 \
    --dimensions InstanceId=${inst_id} \
    --unit "Megabytes" \
    --value $used_megs
    sleep 60
    done


    Finally it works. Add it to my custom dashboard. Nice.

    Wait, what’s that? Saving metrics to my custom dashboard from the EC2 instance overrides what I just added. I have to manually edit the JSON source for the dashboard.

    It’s 2024.
  • …and we have what even our apparently an HTTP standard for determining a user’s time zone but per our overlords, but yet no… We are reduced to a ridiculous set of workarounds and explanations for this total fucking bullshit like “ask the user” — yet we do have the Accept-Language header and we’ve had it since RFC 1945 (that’s since 1996, that is for more time than the people claiming this is an answer have been sentient.

    It’s 2024.
  • ..and we have a shit-ton of Javascript frameworks, and yet some very popular ones couldn’t give a shit about a basic thing like environment variables (yeah, yeah, I know how that particular sausage is made — screw your sausage, you put wood shavings in it anyway).


  • …and because I’m starting this rant rubric, we still are on the tabs-vs-spaces (perkeleen vittupä!) and CR-vs-LF-vs-CRLF. WTF, people. This is why I am not getting anything smart, be it a car, a refrigerator, or whatever. I know that sausage. It’s reverse Polish sausage.

    It’s 2024.

Adventures with Golang dependency injection

Just some notes as I am learning this… There aren’t good answers here, mostly questions (am I doing it right?). All these examples are a part of (not very well) organized GitHub repo here.

Structure injection

Having once hated the magic of Spring’s DI, I’ve grown cautiously accustomed to the whole @Autowired stuff. When it comes to Go, I’ve come across Uber’s Fx framework which looks great, but I haven’t been able to figure out just how to automagically inject fields whose values are being Provided into other structs.

An attempt to ask our overlords yielded something not very clear.

I finally broke down and asked a stupid question. Then I found the answer — do not use constructors, just use fx.In in combination with fx.Populate(). Finally this works. But doesn’t seem ideal in all cases…

Avoiding boilerplate duplication

This is all well and good, but not always. For example, consider this example in addition to the above:

package dependencies

import "go.uber.org/fx"

type Foo string

type Bar string

type Baz string

type DependenciesType struct {
	fx.In

	Foo Foo
	Bar Bar
	Baz Baz
}

func NewFoo() Foo {
	return "foo"
}

func NewBar() Bar {
	return "bar"
}

func NewBaz() Baz {
	return "baz"
}

var Dependencies DependenciesType

var DependenciesModule = fx.Options(
	fx.Provide(NewFoo),
	fx.Provide(NewBar),
	fx.Provide(NewBaz),
)

If I try to use it as dependencies.Dependencies, it’s ok (as above). But if I rather want to get rid of this var, and rather use constructors. But I don’t like the proliferation of parameters into constructors. I can use Parameter objects but I’d like to avoid the boilerplate of copying fields from the Parameter object into the struct being returned, so I’d like to use reflection like so (generics are nice):

package utils

import "reflect"

func Construct[P any, T any, PT interface{ *T }](params interface{}) PT {
	p := PT(new(T))
	construct0(params, p)
	return p
}

func construct0(params interface{}, retval interface{}) {
	// Check if retval is a pointer
	rv := reflect.ValueOf(retval)
	if rv.Kind() != reflect.Ptr {
		panic("retval is not a pointer")
	}

	// Dereference the pointer to get the underlying value
	rv = rv.Elem()

	// Check if the dereferenced value is a struct
	if rv.Kind() != reflect.Struct {
		panic("retval is not a pointer to a struct")
	}

	// Now, get the value of params
	rp := reflect.ValueOf(params)
	if rp.Kind() != reflect.Struct {
		panic("params is not a struct")
	}

	// Iterate over the fields of params and copy to retval
	for i := 0; i < rp.NumField(); i++ {
		name := rp.Type().Field(i).Name
		field, ok := rv.Type().FieldByName(name)
		if ok && field.Type == rp.Field(i).Type() {
			rv.FieldByName(name).Set(rp.Field(i))
		}
	}

}

So then I can use it as follows:

package dependencies

import (
	"example.com/fxtest/utils"
	"go.uber.org/fx"
)

type Foo *string

type Bar *string

type Baz *string

type DependenciesType struct {
	Foo Foo
	Bar Bar
	Baz Baz
}

type DependenciesParams struct {
	fx.In
	Foo Foo
	Bar Bar
	Baz Baz
}

func NewFoo() Foo {
	s := "foo"
	return &s
}

func NewBar() Bar {
	s := "bar"
	return &s
}

func NewBaz() Baz {
	s := "foo"
	return &s
}

func NewDependencies(params DependenciesParams) *DependenciesType {
	retval := utils.Construct[DependenciesParams, DependenciesType](params)
	return retval
}

var DependenciesModule = fx.Module("dependencies",
	fx.Provide(NewFoo),
	fx.Provide(NewBar),
	fx.Provide(NewBaz),

	fx.Provide(NewDependencies),
)

But while this takes care of proliferating parameters in the constructor as well as the boilerplate step of copying, I still cannot avoid duplicating the fields between DependenciesType and DependenciesParams, running into various problems.

Looks like this is still TBD on the library side; I’ll see if I can get further.

Conditional Provide

When using constructors, I would have a construct such as:

type X struct {
   field *FieldType
}

func NewX() *X {
   x := &X{}
   if os.Getenv("FOO") == "BAR" {
     x.field = NewFieldType(...)
   }
}

In other words, I wanted field to only be initialized if some environment variable is set. In transitioning from using constructors to fx.Provide(), I wanted to keep the same functionality, so I came up with this:

type XType struct {
   fx.In

   field *FieldType `optional:"true"`
}

var X XType

func NewX() *X {
   x := &X{}
   if os.Getenv("FOO") == "BAR" {
     x.field = NewFieldType(...)
   }
}

var XModule = fx.Module("x",
	func() fx.Option {
		if os.Getenv("FOO") == "BAR" {
			return fx.Options(
				fx.Provide(NewFieldType),
			)
		}
		return fx.Options()
	}(),
	fx.Populate(&X),

Works fine. But is it the right way?


Today’s yak shaving

So I wanted to catch up on Go. I like my IDEs, so I got my Go 1.19.2, my GoClipse 0.16.1, and off to remember how to ride this bicycle again. Or rather ride this yak, because…

I like IDEs mostly for two things: interactive, graphical debugging (for that see the P.S. section) and content assist. And we have a problem:
The solution seems to be to install a fork of gocode. Ok, I got content assist now. On to trying the next sine qua non of IDE, interactive debugger. Wait… There’s no GDB on M1? Yikes.

For debugger, sure, I could use LLDB, as suggested, or, even better, Delve, but I want it in the IDE (how many times do I have to say that, in 2022?) A few GUI frontends exist, but I am partial to Eclipse platform (I can be as picky with my tools as any tradesman).

Is it time to dust off Kabuta, an abandoned attempt to adapt Delve API to GDB/MI interface that GoClipse speaks? Ok why not, let’s go get that sucker… Done.

But how do I get it into GoClipse? A StackOverflow post suggests it is done via “New Project wizard”, but not so fast, Mr. Yak Jockey:
Ok, let’s hack around it but first let’s file an issue on GoClipse… Wait, say it ain’t so… GoClipse is no longer maintained? My world collapses. To Bruno’s point that:

I’ve seen the issues Eclipse has, both external (functionality, UI design, bugs), and internal (code debt, shortage of manpower). I was holding on to the hope it would recover, but now I don’t see much of a future for Eclipse other than a legacy existence, and I realize exiting was long overdue.

I just a few months ago got fed up with how much of a torture it is to step through the code in IDEA (and no waving of dead chickens seems to help) and went back to Eclipse for Java.
Do I fork GoClipse?
P.S. The workaround for the import is to use default ~/go as GOPATH. In other words, using /Users/gregory/src/github.com/debedb/kabuta worked fine: