WebMCP: A Solution In Search of the Problem It Created

Or: How Google and Microsoft Walked Into a Bar and Reinvented the Web, Worse


Google and Microsoft just co-authored a web spec together. Let that sink in.

The last time these two agreed on anything technical, IE6 was busy eating Netscape alive and “web standards” was an oxymoron. Now they’re back — holding hands under a W3C community group banner, gazing into each other’s eyes across a conference table, and delivering unto us WebMCP — a “proposed web standard” that lets websites expose “structured tools” to AI agents.

I have some thoughts.

What WebMCP Actually Is

WebMCP adds a new browser API — navigator.modelContext — that lets a web page register “tools” for AI agents to call. Each tool has a name, a description, a JSON Schema for inputs, and a handler function. Instead of AI agents scraping your DOM and squinting at screenshots like a drunk trying to read a menu, your website just… tells them what’s available.

Two flavors:

  • Declarative: You annotate HTML forms so agents can submit them directly.
  • Imperative: You write JavaScript handlers that agents invoke with structured inputs.

The Chrome team is very excited. They’ve published a blog post, opened an early preview program, and shipped it behind a flag in Chrome 146. VentureBeat wrote it up. Everyone is talking about the agentic web. The hype cycle spins.

The Problem WebMCP Solves

AI agents interact with websites by scraping the DOM, interpreting screenshots, and simulating clicks. This is fragile. It breaks when the UI changes. It’s slow and token-expensive (2,000+ tokens per screenshot vs. 20-100 tokens for a structured call). Every CSS class rename is a potential catastrophe.

This is a real problem. I’m not going to pretend it isn’t.

But here’s the thing: it’s a problem the industry created by ignoring the architecture that already solved it.

The Architecture That Already Solved It (You Didn’t Read It Either)

In the year 2000, Roy Fielding published his PhD dissertation describing the architecture of the World Wide Web. He called it REST — Representational State Transfer. You’ve heard of it. You’ve put it on your resume. You almost certainly haven’t read it.

(Don’t feel bad. Nobody has. That’s the whole problem.)

REST has one crucial, defining idea: HATEOAS — Hypermedia As The Engine Of Application State. Terrible acronym. Sounds like a sneeze. But the idea is simple and beautiful: the server’s response tells you everything you need to know about what you can do next. The links are in the response. The forms are in the response. The available actions are self-describing.

An HTML page already IS a “tool contract.” A <form> already IS a structured tool with defined inputs. A <a href> already IS a discoverable action. The entire web was designed from the ground up so that a client — any client, human or machine — could interact with a server without prior knowledge of its API, simply by following the hypermedia controls in the response.

As the htmx folks put it:

“The HTML response is entirely self-describing. A proper hypermedia client that receives this response does not know what a bank account is, what a balance is, etc. It simply knows how to render a hypermedia, HTML.”

The web already had machine-readable, self-describing, discoverable interactions. It’s called… the web. Somewhere, Roy Fielding is thinking murderous thoughts.

So What Happened?

The industry collectively decided that REST meant “JSON over HTTP with nice-looking URLs.” Which is approximately as accurate as saying democracy means “everyone gets a vote on what to have for lunch.”

Fielding himself, in a now-famous 2008 blog post, tried to set the record straight with the restraint of a man watching his house burn down:

“I am getting frustrated by the number of people calling any HTTP-based interface a REST API… That is RPC. It screams RPC. There is so much coupling on display that it should be given an X rating.”

Reader, the industry did not listen. What followed was a twenty-year sprint in the wrong direction. We abandoned hypermedia for JSON blobs. We replaced self-describing responses with Swagger docs and API versioning. We built increasingly elaborate tooling — API gateways, SDK generators, GraphQL, tRPC — to paper over the problems caused by ignoring the one constraint that made the whole thing work.

And now, in 2026, having thoroughly ignored the architecture of the web while building on the web, we’ve arrived at the logical endpoint: a new browser API so that AI agents can interact with websites in the structured way that websites were already designed to support.

Roy Fielding is no longer thinking murderous thoughts. He’s past that. He’s watching the final scene of Chinatown. “Forget it, Roy. It’s the agentic web.”

The Declarative API Is Just Forms

This is the part where I need you to really focus. From the WebMCP spec:

“Declarative API: Perform standard actions that can be defined directly in HTML forms.”

They. Reinvented. Forms.

Google and Microsoft engineers got together — presumably with catering, perhaps even a whiteboard budget — and produced a specification to make HTML forms work for AI agents. HTML forms. The things that have been telling machines “here is an action, here are the inputs, here is where to send it” since 1993.

The <form> element is literally a structured tool declaration with a name (action), a method (GET/POST), and typed inputs (<input type="text" name="destination" required>). It has been machine-readable for thirty-three years. It is older than some of the engineers who wrote this spec.

But sure. Let’s add an attribute. Innovation.

The Imperative API Is Just RPC (Again)

The other half of WebMCP is the “imperative API,” where you register JavaScript handler functions that agents call with JSON inputs.

This is RPC. Specifically, it’s RPC mediated by the browser, authenticated by the user’s session, and invoked by an AI agent instead of a human. Which is a perfectly fine idea! RPC is useful. It has always been useful. SOAP did this in 1999. CORBA did it before that. Every SPA with a JavaScript API layer does it today.

The new part is navigator.modelContext.registerTool() instead of window.myApp.doThing(). The innovation is… a namespace. Alert the press.

The Security Section Reads Like a Horror Novel

WebMCP’s own specification describes something it calls the “lethal trifecta”: an agent reads your email (private data), encounters a phishing message (untrusted content), and calls a tool to forward that data somewhere (external communication). Each step is legitimate individually. Together, they’re an exfiltration chain.

The spec’s own analysis of this scenario? “Mitigations exist. They reduce risk. They don’t eliminate it. Nobody has a complete answer here yet.”

Nobody has a complete answer yet. They shipped it behind a flag in Chrome 146 anyway. This is the “we’ll add seat belts in v2” school of automotive engineering.

The destructiveHint annotation — the mechanism for flagging “this tool can delete your data” — is marked as advisory, not enforced. The spec literally says the browser or agent can ignore it. It’s a polite suggestion. A Post-it note on the nuclear button that says “maybe don’t?”

And there’s no tool discovery without visiting the page. Agents can’t know what tools Gmail offers without opening Gmail first. The spec proposes future work on a .well-known/webmcp manifest. You mean like robots.txt? Or /.well-known/openid-configuration? Or the dozens of other discovery mechanisms the web already has? Groundbreaking.

The Real Game

Now let’s talk about what this actually is, under the hood.

Google and Microsoft don’t control the API layer. They can’t dictate how backends expose services. But they do control the browser. WebMCP puts the browser — Chrome and Edge, i.e., Chromium with two different logos — at the center of every agent-to-website interaction.

Every AI agent that wants to use WebMCP must go through the browser. The browser mediates authentication, permissions, consent. The browser becomes the gatekeeper. If you control the browser, you control the chokepoint.

This is the same play Google made with AMP: take a real problem (slow mobile pages), create a solution that requires routing through Google’s infrastructure, W3C-wash it, and call it open. WebMCP takes a real problem (agents can’t interact with websites reliably) and creates a solution that routes through Chromium.

MCP (Anthropic’s protocol) connects agents to backend services directly — no browser needed. WebMCP says: no no, come through our browser. That’s not interoperability. That’s a tollbooth with a standards document.

What Should Have Happened

If we actually wanted AI agents to interact with websites reliably, we could:

  1. Build better hypermedia clients. Teach AI agents to understand HTML — forms, links, semantic structure. The web is already machine-readable. We just need clients that aren’t illiterate.
  2. Use existing standards. Schema.org, Microdata, RDFa, JSON-LD — mature standards for machine-readable web content. Google built an entire search empire on them. They work today.
  3. Write APIs. If you want structured machine-to-machine interaction, build an API. REST (actual REST), GraphQL, gRPC — pick your poison. No new browser API required.
  4. Use MCP where appropriate. For backend service integration, MCP does the job without inserting a browser into the loop.

None of these require a new browser API. None of them route through Chromium. None of them require Google and Microsoft to co-author anything.

The Cycle

This is the software industry’s most reliable pattern:

  1. A good architecture is proposed (REST, 2000)
  2. The industry ignores the hard parts (HATEOAS, hypermedia)
  3. The easy parts get cargo-culted (“REST means JSON + HTTP verbs”)
  4. Problems emerge from ignoring the architecture
  5. A new spec is proposed to solve those problems
  6. The new spec doesn’t mention the old architecture
  7. Go to 1

WebMCP is step 5. The Chrome blog post doesn’t mention REST. Doesn’t mention HATEOAS. Doesn’t mention hypermedia. It talks about “the agentic web” as if machine-readable web interactions are a bold new idea that needed inventing in 2026.

Roy Fielding wrote the answer to this problem in his dissertation. In 2000. It’s free to read. It’s shorter than the WebMCP spec. And unlike WebMCP, it doesn’t require Chrome 146.


But sure. Let’s add navigator.modelContext. What’s one more API between friends?

Credit where it’s due

Microsoft has a fix for an issue quite quickly (mentioned in a previous post).

Figuring out the reason for the magic number to backtrack from, though, I had posited another reason, and I was wrong… And overall it now reminded me of:

The appearance of our visitor was a surprise to me, since I had expected a typical country practitioner. He was a very tall, thin man, with a long nose like a beak, which jutted out between two keen, grey eyes, set closely together and sparkling brightly from behind a pair of gold-rimmed glasses. He was clad in a professional but rather slovenly fashion, for his frock-coat was dingy and his trousers frayed. Though young, his long back was already bowed, and he walked with a forward thrust of his head and a general air of peering benevolence. As he entered his eyes fell upon the stick in Holmes’s hand, and he ran towards it with an exclamation of joy. “I am so very glad,” said he. “I was not sure whether I had left it here or in the Shipping Office. I would not lose that stick for the world.”

“A presentation, I see,” said Holmes.

“Yes, sir.”

“From Charing Cross Hospital?”

“From one or two friends there on the occasion of my marriage.”

“Dear, dear, that’s bad!” said Holmes, shaking his head.

Dr. Mortimer blinked through his glasses in mild astonishment. “Why was it bad?”

“Only that you have disarranged our little deductions. Your marriage, you say?”

Which in turn reminded me of

И вот какого хрена “Shipping Office” переводится как “пароходство”?

Athena Federated Queries: Azure Data Lake Storage

Well, this one is super broken, which one finds out after shaving a number of yaks.

We want to query Parquet files that sit in Azure Data Lake Storage with Athena. AWS has what seems to be a nice documentation on how to do it… Except:

  1. Searching for it in Serverless Application Repository with “azure” or “adsl” terms is not yielding anything.
    • Additionally there seems to be a bug there, per AWS support:

      Issue:
      – The search functionality appears to be unresponsive when using the traditional “Enter” key method
      – This seems to be a technical bug in the console
      Workaround:
      – Enter your search term in the search bar – Instead of pressing Enter, click anywhere on the screen
      – This should trigger the search functionality and display the results

    • Search for something like “gen2” actually yields something… It’s a AthenaDataLakeGen2Connector — which is the same thing as below, so read on.
  2. Trying to add the Data Source from Athena, selecting “Microsoft Azure Data Lake Storage (ADLS) Gen2” connector… It is based on athena-datalakegen2 code which is borken because the underlying mssql JDBC driver is borken.
  3. After patching the mssql driver and the connector, we realize that it is trying to connect via JDBC to ADLS, but that is not supported. And yet AWS claims “the documentation is correct“.

Srsly now, AWS and Microsoft, you even tested anything?

It’s already 2025, and still

Content assist

It looks like you are researching razors. I think you are about to go off on a yak-shaving endeavor, and I cannot let you do that, Dave.

What I would really like my DWIM
agent to do. That, and to stop calling me Dave.

Being lazy and impatient, I like an idea of an IDE. The ease of things like autocompletion, refactoring, code search, and graphical debugging with evaluation are, for the lack of a better word, are good.

I like Eclipse in particular — force of habit/finger memory; after all, neurons that pray together stay together. Just like all happy families are alike, all emacs users remember the key sequence to GTFO vi (:q!) and all vi users remember the same thing for emacs (C-x C-c n) – so they can get into their favorite editor and not have to “remember”.

So, recently I thought that it would be good for a a particular DSL I am using to have an auto-completion feature (because why should I remember ). So I thought, great, I’ll maybe write an Eclipse plugin for that… Because, hey, I’ve made one before, how bad could it be?

Well, obviously I would only be solving the problem for Eclipse users of the DSL in question. And I have a suspicion I am pretty much the only one in that group. Moreover, even I would like to use some other text editor occasionally, and get the same benefit.

It seems obvious that it should be a separation of concerns, so to speak:

  • Provider-side: A language/platform may expose a service for context-based auto-completion, and
  • Consumer-side: An editor or shell may have a plugin system exposed to take advantage of this.

Then a little gluing is all that is required. (OK, I don’t like the “provider/consumer” terminology, but I cannot come up with anything better — I almost named them “supply-side” and “demand-side” but it evokes too much association with AdTech that it’s even worse).

And indeed, there are already examples of this.

There is a focus on an IDE paradigm of using external programs for building, code completion, and any others sorts of language semantic functionality. Most of MelnormeEclipse infrastructure is UI infrastructure, the core of a concrete IDE’s engine functionality is usually driven by language-specific external programs. (This is not a requirement though — using internal tools is easily supported as well).

  • Atom defines its own API

And so I thought – wouldn’t it be good to standardize on some sort of interaction between the two in a more generic way?

And just as I thought this, I learned that the effort already exists: Language-server protocol by Microsoft.

I actually like it when an idea is validated and someone else is doing the hard work of making an OSS project out of it…