Essay
Don't make the agent do the geometry
I asked an agent to turn a handful of stickies into a mind map with connectors. Thirty-eight seconds later it had built one: a hub in the middle, five branches around it, an arrow from the hub to each branch. The five branches sat on a perfect ring, evenly spaced, the first one parked dead at the top. What I want to talk about is the part that did not happen. The agent did not compute a single coordinate to get that ring.
That distinction is the whole job. When you build a tool an agent operates, the temptation is to make the agent smarter: a longer prompt, more examples of good layouts, a few rules about spacing. That is the wrong lever. The lever is a deterministic primitive the agent can call, so the structure comes out exact and reproducible instead of approximated. The agent supplies intent. The tool supplies precision. Your work is connecting the two and then getting out of the way.
What it looks like when you let the model do the math
Give a language model a blank canvas and a request for a ring of five boxes, and it will happily emit five pairs of x and y. They will look plausible. They will also be wrong in the specific way that floating-point eyeballing is always wrong: the spacing drifts, the radius wanders, two of the five end up a little close, and the same prompt next week produces a different almost-ring. A model is good at deciding that the boxes belong on a circle. It is bad at the trigonometry that places them there, because it is not doing trigonometry, it is predicting numbers that read like trigonometry.
You can paper over this with more tokens. Ask it to reason step by step, give it the formula, tell it the center and radius. Now you are paying for the model to run a sine and cosine in prose, slowly, with a non-zero error rate, every single time. The output is still not reproducible, because the next request re-derives the same arithmetic from scratch and rounds differently. You have spent your cleverness budget teaching a probabilistic system to imitate a calculator.
The primitive does the part the model should never touch
The alternative is to hand the agent one tool: arrange these element ids into a circle. The tool, plain code, takes the ids, computes the centers on a ring with real math, and writes the exact positions. The agent never sees an angle. It names the elements and names the shape it wants them in. Grid, row, column, circle. The geometry is settled by a function that returns the same answer every time.
Here is how I know the agent took the tool and not the shortcut. The circle layout places its first element at the top by default, because its starting angle is minus ninety degrees, twelve o'clock. In the mind map, the branch the agent happened to add first landed exactly at top center. If the model had been guessing coordinates, the first branch would have landed wherever a plausible number put it, which is almost never the precise top. The top placement is a fingerprint. It is the deterministic default of the primitive showing through, and it is the proof that the structure was computed by code, not narrated by the model.
The connectors make the same point from the other side. The agent drew an arrow from the hub to each branch by naming the two endpoints, not by drawing a line between two coordinates. The arrow binds to the elements. When the ring later moves, the arrows re-route on their own, because they were never about positions. They were about relationships, and relationships are exactly the thing the agent should be expressing while the tool handles where the pixels go.
The general shape
This is not really about canvases. It is about where to draw the line between the agent and the tool in anything an agent drives. Walk through the operations your agent performs and sort them. Which ones are judgment, and which ones are arithmetic wearing the costume of judgment? Placement on a ring is arithmetic. So is aligning a column of boxes to a shared edge, distributing gaps evenly, snapping to a grid, routing a line between two anchors. Every one of those has a single correct answer that a function can compute and a model can only approximate.
Push each of those down into a deterministic primitive and the agent gets shorter, cheaper, and more reliable in the same move. Its prompt stops carrying spacing rules. Its output stops drifting between runs. Its job shrinks to the part it is genuinely good at: reading the situation and choosing the intent. Cluster these by theme. Make this a two by two matrix. Lay these out as a flow. The agent decides which composition the moment calls for, and the primitives make that composition exact.
So the question I would ask of any tool you are building for an agent to use is the unglamorous one. Which of these operations is my agent currently doing by hand, in tokens, that it should be calling a function for? Each one you find is a place the agent was doing geometry it should never have been asked to do. Take the geometry away from it. Give it a compass instead, and let it point.