A wrong reconstruction is worse than an honest stack
I was importing a real admin screen into framesmith — a dense Rails + Tailwind table, the kind with avatars in the first column, status chips in the middle, a select control on the right. The colors came back right. The fonts came back right. Every icon matched.
Then I tried to edit a column, and there were no columns. My four-column table had imported as one tall vertical stack of forty divs. It looked correct in the screenshot and fell apart the moment I touched it.
That bug is the whole story of framesmith v1.5 — and the lesson underneath it is bigger than any one importer.
Why “looks right” isn’t “is right”
framesmith is an open-source MCP server that gives an AI agent a visual design canvas: it sketches a UI as a scene graph, framesmith renders it to real HTML/CSS and screenshots, and you agree on the picture before any framework code gets written. One of its tools imports a shipped UI — a live URL or some HTML — back onto the canvas so you can redesign from what really exists.
The earlier release nailed the easy half: tokens. Colors, fonts, icons, form controls — all pulled accurately out of the computed DOM. The hard half is structure, and structure is where it quietly lied:
- A four-column
<table>flattened into a vertical stack. - CSS-grid layouts lost their columns.
margin: auto/max-widthcentering — which never survives as a computed value — just disappeared.
Here’s the part that matters for anyone building tools an agent consumes: a lossy import isn’t a cosmetic problem, it’s a trust problem. A human glances at the screenshot, sees it’s roughly right, and moves on. An agent takes the scene graph at face value and builds on it. If the importer confidently hands back a flattened table with no signal that it flattened anything, every downstream step inherits the lie — and the agent has no way to know.
The fix isn’t “import harder”
The obvious instinct is to make the reconstruction smarter until it’s always right. That’s a trap. You can chase fidelity forever and still hit a layout your heuristics misread — and a confidently wrong reconstruction is the most expensive output you can produce, because it looks trustworthy.
So v1.5 does two things at once: reconstruct what it can prove, and report its confidence on everything.
The reconstruction side covers the common cases honestly:
<table>becomes real columns — rows of cells with proportional percentage widths derived from the computed boxes. That User Management table now imports as 40/20/25/15%, with the avatars, chips, and selects landing inside their actual columns.- CSS grid becomes rows of columns, driven by the computed
grid-template-columnsresolved all the way to pixels — evenfrtracks. - Centered content stays centered — auto-margin and
max-widthare detected and translated instead of dropped. - Everything else clusters from raw geometry — bounding boxes grouped into rows, but only when the grouping is consistent.
And here’s the rule that ties it together, the one I kept coming back to while building it:
A wrong reconstruction is worse than an honest stack.
If a layout looks multi-column but won’t cluster consistently, framesmith doesn’t guess. It imports a plain vertical stack, flags it as a fallback, and tells you. Genuinely vertical content is left untouched and silent. The importer would rather under-claim than over-claim.
The honesty contract
The feature I’m proudest of isn’t a layout — it’s a field. Every import now records report.layout: exactly how each container was reconstructed, as one of five honest labels.
table → reconstructed from <table> semantics
grid → reconstructed from computed grid-template-columns
centered → auto-margin / max-width centering detected and preserved
geometry → inferred from bounding boxes, clustered into rows
stack-fallback → couldn't prove a layout, gave you an honest vertical stack
That stack-fallback line is the important one. It’s the importer admitting “I wasn’t sure here.” The agent reading the report knows precisely which parts of the scene graph are load-bearing and which were approximated — so it can re-import, ask, or flag instead of building on sand. Confidence you can read beats accuracy you have to assume.
It’s the same discipline I wrote about with prompt caching: when a system can fail silently, you build the signal that exposes the failure before you ship the feature. There, it was cache_creation_input_tokens. Here, it’s report.layout. Same move, different surface.
What I’d tell past-me
If you’re building anything an agent consumes — an MCP tool, an importer, a parser, a retrieval layer — your output is only half the contract. The other half is telling the consumer how much to trust it.
Three things I’d bake in from day one:
- Make under-claiming the default. When in doubt, return the boring, obviously-honest answer (a stack, an empty result, a null) rather than a confident guess. The guess is the one that costs you downstream.
- Emit a confidence signal alongside every output, and make it machine-readable — not a log line a human might skim, but a field the agent can branch on.
report.layoutexists so the agent never has to assume. - Treat silent-wrong as the worst failure mode, worse than loud-wrong and far worse than honestly-incomplete. Optimize your error budget around never being confidently wrong.
The table that flattened wasn’t really a bug in my clustering heuristics. It was a bug in the contract: the importer knew less than it let on, and didn’t say so. v1.5 fixed the heuristics — but the durable fix was making the tool admit what it doesn’t know.
npm i -g framesmith@1.5.2
No breaking changes — re-import a screen you tried before and compare. And if it flattens something it shouldn’t, open an issue with the URL or HTML. The honesty contract only works if the warnings stay honest.