Structured output doesn't just constrain LLMs, it steers them. I've historically treated output purely as an interface concern -- shape it for downstream systems and move on. That was a mistake.
In my work with People Systems you're often working with messy, incomplete data -- fragmented data for one employee, rafts of unstructured documents for another. In this environment, prompting and input presentation matter immensely. But I've found the output schema is also an important lever.
A hiring debrief agent that returns {"recommendation": "advance", "concerns": ["limited backend experience"], "confidence": 0.7} is more auditable than three paragraphs of equivocation. It also guides the model to commit rather than hedge in natural language[1].
We had one process where the output needed to avoid a direct rating -- in this case it was prior to a formal rating and any kind of "jumping the gun" ran against what the process was trying to achieve. We built guardrails in the prompts, but something "rating-like" kept emerging in the output[2].
The open-loop, heterogeneous nature of the data was pulling hard against us[3]. If you don’t give a behavior a place to go, it leaks into the rest of the output.
The fix: I moved the rating to its own structured field, then discarded it. The model kept trying to rate. Fighting that was expensive and fragile. Giving it a place to put the rating, then throwing it away, was cheaper, more reliable, and kept the rest of the output clean[4].
This cuts both ways. Your evals need to do some heavy-lifting. Research has shown that naive format constraints can degrade reasoning -- the model gets forced into answering before it's done thinking[5]. Well-designed schemas scaffold reasoning. Poorly designed ones short-circuit it. The difference is whether the structure gives the model room to think before it commits.