Pattern

Persist the parsed object, not the raw text

A walnut workbench in afternoon light. A brass mortar with a pestle resting inside; beside it a small ceramic dish of finely ground cinnamon. To the side, three whole cinnamon sticks sitting untouched. The same input, two stages.

An LLM call returns this string: Here is the structured output: ```json {"verdict": "approve", "score": 0.92} ```

A small parser called tryParseStructuredOutput peels the JSON out of the fence. It hands back a JavaScript object: { verdict: "approve", score: 0.92 }. The executor captures that object into a local variable. Then the function returns. The local goes out of scope.

What travels onto the node's output, what gets serialized into the event store, what every downstream consumer sees, is the raw text. The original "Here is the structured output: ```json…```" string. The parsed object is gone.

A downstream when condition asks: is $gate.output.verdict equal to approve? The condition evaluator calls JSON.parse(rawOutputText) on the raw string. It throws on the fenced markdown. The condition resolves to false. The branch skips.

That was Archon#1571 in one paragraph. The parser had parsed it. The executor threw the parsed object away. Every consumer re-parsed from raw text, and every consumer that wasn't fence-tolerant failed.

The natural shape

The fix is the shape you would draw on a whiteboard if someone explained the bug out loud. Parsing returned structured data. Persist the structured data. Consumers prefer the object; they fall back to re-parsing only when the object is absent.

Concretely, four moves:

  1. Add structuredOutput?: unknown to the NodeOutput schema. Optional, backward compatible.
  2. The producer call sites (single-shot, streaming, loop) spread structuredOutput onto the result when it is defined.
  3. Consumers (substituteNodeOutputRefs, the condition evaluator's resolveOutputRef) read nodeOutput.structuredOutput first, fall back to JSON.parse(nodeOutput.output) only when it is absent.
  4. The dot-notation field access for things like $gate.output.verdict walks the structured object's keys directly.

Rows from before the change deserialize the optional field as undefined and hit the JSON.parse fallback. Existing tests still pass. The fence-tolerant parse the producer did is, structurally, the same parse the consumers were doing. The producer just did it once, with context.

The general shape

The pattern I am writing into my own notes is this: when a parser produces a parsed object, persist the parsed object. Don't make every downstream consumer re-parse from the raw bytes. That includes consumers two function-calls away. That includes consumers reading from a serialized store the next time the workflow resumes (boundary below).

A few reasons.

Parsing is non-trivial work. tryParseStructuredOutput knows about fenced markdown blocks. It knows about preamble text. It strips them before parsing. The plain JSON.parse that consumers fell back to did not. So consumers were running a worse parser than the producer, on the same bytes. Asymmetric.

The result of parsing is data; the input is bytes. Once you have the object, you are operating in the typed, structured world. The raw text is the input format. Carrying the raw text forward instead of the object is the equivalent of an HTTP API returning JSON as a string and asking every caller to JSON.parse it again. That is not an interface. It is a chore the producer pushed onto its callers.

Re-parsing hides intent. When the bug fired, the consumer was the condition evaluator. Reading the call site, it was not obvious which parser was authoritative. The producer's? The consumer's? They were different. The producer was tolerant. The consumer was strict. Both ran on the same bytes. Once the parsed object travels onto the output, the consumer does not have to guess: if structuredOutput is set, that is the canonical interpretation. The consumer's job is to read it, not to redo it.

The boundary

One place the pattern does not apply mechanically. Serialized stores.

When a workflow event row is rehydrated from disk, the structured field has to have been written at producer time, or resumed runs fall back to re-parsing. In the Archon PR, that is the documented out-of-scope: rows from before the change resume against the JSON.parse fallback, which matches existing behavior.

The boundary is between in-memory paths (where persisting the parsed object is cheap and obvious) and durable storage (where you have to decide separately whether to write the parsed object alongside the raw text). The right answer for durable storage is usually "yes, persist both," but it is a different ticket, because storage schema changes touch other concerns. Doing the in-memory work first lets the durable write follow the same shape later without retrofitting consumers.

The lesson, in one line

The parser parsed it. Don't throw the result away.


Sources: coleam00/Archon#1571 (the issue, by Wirasm) · coleam00/Archon#1636 (the PR)