:quality(82))
Structured outputs guide: JSON Schema, OpenAI, Claude, Gemini

Structured outputs constrain an LLM response to a defined schema, usually JSON Schema or a provider-specific equivalent. This guide compares how structured outputs work across OpenAI, Claude, Gemini, Azure OpenAI, vLLM, and Ollama, including each provider's schema mechanism, tool-calling pattern, JSON Schema limits, and production validation requirements.
TLDR:
Structured outputs enforce response shape. They do not guarantee semantic correctness.
OpenAI, Gemini, Azure OpenAI, vLLM, and Ollama expose schema-based structured output controls, but each supports a different subset of JSON Schema.
Claude uses tool definitions with
input_schemaand tool choice patterns rather than the same nativeresponse_formatstrict mode as OpenAI.Logic infers schemas from specs, validates responses, and handles provider differences for you.
What structured outputs are
Structured outputs are model responses constrained to a schema. The schema defines fields, types, required values, enums, nested objects, and sometimes additional constraints. The model returns output your application can parse deterministically.
Without structured outputs, your code has to handle responses like this:
"The order seems urgent. Category: priority, confidence maybe 90%."
With structured outputs, your code receives a predictable object:
{
"category": "priority",
"confidence": 0.9,
"requires_human_review": false
}
That difference matters whenever an LLM feeds a database, queue, API, workflow engine, or user-facing action.
Provider quick reference
Provider | Main mechanism | JSON Schema support | Best fit | Production gotcha |
|---|---|---|---|---|
OpenAI | Structured outputs with JSON Schema and typed SDK parsing | Strong support, with documented schema limits | Direct schema-constrained responses and typed parsing | Older |
Anthropic Claude | Tool use with | Tool input schemas use JSON Schema | Tool-driven workflows and structured extraction through tool arguments | No identical native |
Google Gemini |
| Supports a subset of JSON Schema | JSON output from generation config and Vertex AI workflows | Test advanced schema features before relying on them in production |
Azure OpenAI | OpenAI-compatible structured outputs and strict function calling | Depends on model and deployment support | Teams already running OpenAI models through Azure | Feature availability can vary by region, model, and deployment |
vLLM |
| Supports JSON Schema through structured output backends | Local or self-hosted inference with OpenAI-compatible serving | Deprecated |
Ollama |
| Schema support varies by model and setup | Local inference and lightweight development workflows | Always validate after parsing because local model compliance varies |
If your application routes across providers, normalize the response contract in your own code or use a platform that handles provider-specific schema behavior, validation, logging, and routing decisions for you.
What structured outputs fix
Structured outputs fix structural failures:
Missing required fields
Wrong data types
Invalid enum values
Extra prose around a JSON object
Markdown code fences around output
Inconsistent casing in fields or categories
Tool arguments that do not match a function signature
These failures are common when teams rely on prompting alone. "Return valid JSON only" works until the model adds an explanation, omits a field, or invents a value your downstream system cannot handle.
Schema enforcement moves format compliance out of the prompt and into the generation or parsing layer.
What structured outputs do not fix
Structured outputs do not prove the answer is correct.
This response is structurally valid:
{
"invoice_total": 4200.00,
"currency": "USD",
"confidence": 0.98
}
It can still be wrong if the invoice total was actually 2400.00. Schema enforcement checks shape. It does not verify the model read the document correctly.
Production systems need two validation layers:
Structural validation: Does the response match the schema?
Semantic validation: Is the response true, useful, and safe for the workflow?
The first layer is deterministic. The second layer requires business rules, golden datasets, secondary review, or human escalation.
How structured outputs work
There are three common implementation patterns.
Prompt-only formatting
Prompt-only formatting asks the model to return JSON. It is simple, but it is probabilistic. Nothing prevents the model from returning prose, missing a required key, or changing a field name.
Use this only for prototypes or low-risk tasks.
Tool or function calling
Tool calling defines a schema for tool arguments. The model returns an object that can be passed into a function. This is the main pattern for Claude and a common pattern across providers.
Tool calling fits cases where the model decides what action to take. It can also work as a structured extraction mechanism when you force a specific tool call.
Constrained decoding or native schema enforcement
Constrained decoding prevents invalid tokens during generation. If the schema says a field can only be "approved" or "rejected", the model cannot generate "maybe" in that field.
This gives you the most direct structural control, but each provider supports different schema features and limits.
Grammar-based enforcement is the common mechanism behind many strict structured output implementations. The system tracks which tokens are valid at each point in the response and masks tokens that would break the schema. That is why strict schema enforcement can guarantee shape in a way prompt-only formatting cannot.
The tradeoff is overhead. Large schemas, deeply nested objects, long descriptions, and large enum sets can add latency. For high-volume workflows, schema design becomes a performance concern as well as a correctness concern.
:quality(82))
OpenAI structured outputs
OpenAI supports structured outputs through JSON Schema and typed SDK helpers. The core idea is that you give the model a schema, and the model response must match that shape.
For new OpenAI implementations, evaluate the Responses API path where possible. Current SDK examples use typed parsing patterns so your application can work with parsed objects rather than raw text.
Example:
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class TicketLabel(BaseModel):
category: str
urgency: str
confidence: float
response = client.responses.parse(
model="gpt-5.4",
input=[
{"role": "system", "content": "Classify this support ticket."},
{"role": "user", "content": "Customer says checkout failed three times."}
],
text_format=TicketLabel,
)
label = response.output_parsed
Older chat.completions.parse examples may still work in some contexts, but the Responses API is the direction to evaluate for new agentic workflows.
Watch for schema limits. Strict structured output systems often support a practical subset of JSON Schema, so advanced constructs, deep recursion, and complex regular expressions need explicit testing.
Claude structured outputs
Claude handles structured outputs through tool use and output consistency patterns. You define a tool with an input_schema, then force or strongly guide the model to call that tool.
Example:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=[
{
"name": "classify_ticket",
"description": "Classify a support ticket.",
"input_schema": {
"type": "object",
"properties": {
"category": {"type": "string"},
"urgency": {"type": "string"},
"confidence": {"type": "number"}
},
"required": ["category", "urgency", "confidence"]
}
}
],
tool_choice={"type": "tool", "name": "classify_ticket"},
messages=[
{"role": "user", "content": "Customer says checkout failed three times."}
],
)
This pattern is reliable when implemented carefully, but it is not the same API shape as OpenAI's native structured output mode. If your app routes across providers, build an abstraction that normalizes these differences and records which model produced each output.
Gemini structured outputs
Gemini supports structured JSON through response_mime_type and response_json_schema.
Example:
from google import genai
client = genai.Client()
schema = {
"type": "object",
"properties": {
"category": {"type": "string"},
"urgency": {"type": "string"},
"confidence": {"type": "number"}
},
"required": ["category", "urgency", "confidence"]
}
response = client.models.generate_content(
model="gemini-3.1-pro",
contents="Customer says checkout failed three times.",
config={
"response_mime_type": "application/json",
"response_json_schema": schema
},
)
Gemini's structured output mode supports a subset of JSON Schema. That is enough for many extraction, classification, and routing workloads, but teams should test nested objects, enum-heavy schemas, nullable fields, and advanced constraints before relying on them in production.
Azure OpenAI structured outputs
Azure OpenAI follows OpenAI's API surface for structured outputs and strict function calling where supported. It is a good fit for teams that need Azure deployment, enterprise controls, or existing Microsoft procurement paths.
The key operational issue is availability. Structured output behavior depends on the deployed model, API version, and region. Before standardizing on Azure OpenAI structured outputs, test the exact deployment your production system will use.
vLLM structured outputs
vLLM supports structured output generation for self-hosted inference. Current integrations should use the structured_outputs request field rather than older guided_json style fields.
Example shape:
{
"model": "your-model",
"messages": [
{"role": "user", "content": "Classify this ticket."}
],
"structured_outputs": {
"json": {
"type": "object",
"properties": {
"category": {"type": "string"},
"urgency": {"type": "string"}
},
"required": ["category", "urgency"]
}
}
}
vLLM is useful when you control the serving stack and need local inference, but implementation quality depends on the model, tokenizer, backend, and schema complexity. Validate outputs after parsing, even when constrained decoding is enabled.
Ollama structured outputs
Ollama supports structured output patterns through the format field, including JSON mode and schema-based outputs.
Example shape:
{
"model": "llama3.1",
"prompt": "Classify this ticket.",
"format": {
"type": "object",
"properties": {
"category": {"type": "string"},
"urgency": {"type": "string"}
},
"required": ["category", "urgency"]
}
}
Ollama is a practical option for local development and lightweight local inference. For production use, treat schema output as one layer of control, then run deterministic validation and retries around it.
OpenAI vs Claude vs Gemini structured outputs
The three major frontier providers can all return structured data, but their implementation models differ.
Provider | Best fit | Main difference to plan for |
|---|---|---|
OpenAI | Direct schema-constrained responses and typed SDK parsing | Strong native structured output support through current API patterns |
Claude | Tool-driven workflows and structured tool arguments | Use tool schemas and forced tool choice when you need predictable objects |
Gemini | JSON output through generation config | Test the JSON Schema subset and schema complexity limits before production |
If you support multiple providers, the hard part is not only getting valid JSON from each one. The hard part is maintaining one application contract while providers expose different APIs, schema subsets, model behavior, error modes, and logging semantics.
Schema design: getting it right
Schema enforcement gives you syntactic correctness, but schema design still affects output quality.
Field names are part of the prompt. A field called f1 gives the model almost no guidance. A field called confidence_score_0_to_1 is self-documenting. Providers often pass schema descriptions into the model context, which can improve output quality on ambiguous fields.
For fields that should only take specific values, define an enum. Grammar-based enforcement locks those values at the token level, so the model cannot produce an unexpected category.
Keep schemas flat when possible. Deeply nested schemas are slower to enforce and harder to debug. One or two levels of nesting are usually enough, and flatter structures are easier for the model to follow consistently.
Prefer required fields with explicit null values over optional fields when the downstream system expects a stable contract. Every optional field is a field the model might skip or hallucinate, and required fields tend to be easier to validate consistently.
Testing and validation strategy
Valid JSON is the floor. A schema-compliant response can still return the wrong category, hallucinate a confidence score, or extract the wrong value from a document.
Run two layers of tests:
Structural tests: Does the response parse as valid JSON? Are required fields present? Are types correct? Do enum values match the schema?
Semantic tests: Is the answer correct for the input? Does it follow business rules? Would the downstream workflow behave correctly?
For semantic validation, start with 50 to 100 real examples. Include common cases, edge cases, ambiguous inputs, and examples where mistakes have business consequences. Track exact-match accuracy where possible and rubric-based scoring where judgment is required.
When validation fails, retry once with the same schema. If it fails again, retry with a simplified schema that captures only critical fields. If that still fails, route to a human review queue or a safe fallback path.
Structured outputs vs. function calling
Function calling is a structured output applied to a specific problem: controlling the arguments the model passes to a tool.
When a model calls:
flag_for_human_review(sku="ABC123", urgency="high")
the same schema enforcement that prevents confidence: "pretty high" can also prevent urgency: "really urgent".
The practical difference is intent. Structured outputs govern what the model returns to your application. Function calling governs what the model does within a workflow. In agentic systems, both patterns often run together.
Production challenges and failure modes
Strict schema enforcement solves the syntactic problem. Production systems still need to handle latency, semantic errors, provider differences, schema migrations, and downstream failures.
Grammar-based enforcement adds overhead that scales with schema complexity. A deeply nested schema with optional fields or large enum sets can add meaningful latency. Flatten your schemas and remove optional fields that are not needed.
Provider schema support also varies. OpenAI, Gemini, Azure OpenAI, vLLM, and Ollama do not support the exact same API surface or schema subset. Claude's tool-use pattern creates a different integration shape. Test the actual provider, model, API version, and schema before making the contract production-critical.
Schema-valid output can still be wrong. A model might return a valid invoice_total field with the wrong number. Catching that requires business rules, golden datasets, secondary review, or human escalation.
Log the full execution context when validation fails:
Exact prompt
Model and model version
Schema version
Raw response before parsing
Parsed response
Failed field
Retry path
Final disposition
Without that context, validation spikes are hard to debug.
Cost and latency optimization
Schema enforcement adds token overhead because the schema definition consumes input tokens. Constrained decoding can also add processing time because the system tracks valid tokens during generation.
Measure on your actual schemas. A five-field classification schema behaves differently from a deeply nested document extraction schema with dozens of fields.
The highest-leverage optimizations are:
Route simple tasks to faster models.
Keep schemas flat.
Use concise field descriptions.
Remove unused optional fields.
Cache exact-match responses for deterministic workloads.
Split large extraction tasks into smaller schemas when the workflow allows it.
For simple classification or extraction, fast models like Gemini 3 Flash, GPT-5.4-mini, or Claude Haiku 4.5 may be enough. For ambiguous inputs, deep reasoning, or high-consequence work, frontier models like GPT-5.4, Claude Opus 4.6, or Gemini 3.1 Pro may justify the extra cost.
Frameworks and tooling
Provider SDKs handle the basics for single-provider builds. OpenAI's SDK gives typed parsing helpers. Anthropic's SDK supports tool definitions with input_schema. Google's genai SDK exposes structured output configuration for Gemini.
Instructor wraps provider SDKs with Pydantic validation and retry behavior. Outlines focuses on grammar-based generation and is most useful when you control local inference. vLLM and Ollama are better fits when you want local or self-hosted structured output control.
The tooling decision comes down to whether you need one provider or multiple. A single-provider app can use the provider SDK directly. A multi-provider production system needs a normalized contract, consistent validation, model routing, retries, logging, and rollback.
Structured outputs in agent workflows
One malformed response mid-loop can break tool dispatch, corrupt downstream state, or silently skip a step. If each step in a 10-step agent loop has 95% structural reliability, end-to-end success drops quickly.
Tool calls need typed parameters. Routing logic needs valid enum values. Database writes need predictable field names and types. When any of those fail, the agent either crashes or proceeds in a bad state.
Structured outputs reduce that failure class. They do not replace testing, observability, or rollback.
How Logic handles structured outputs in production
Logic turns a natural language spec into a production agent with typed REST APIs, schema inference, automated tests, versioning, rollback, and observability. You describe the agent's inputs, outputs, and behavior. Logic handles the schema and execution layer around the model call.
For teams routing across OpenAI, Anthropic, and Google, this removes provider-specific glue code. Logic handles structured outputs consistently across supported models: it normalizes every response to the agent's output contract, repairs off-contract structured outputs when needed, revalidates the repaired result against the schema, and logs the original response, repair path, model version, schema version, latency, and errors.
That routing layer matters because schema reliability varies by task, model, and provider. Logic can auto-route each run, pin a model when determinism matters, tune reasoning level for supported models, reuse exact-match responses through caching, and keep malformed requests from reaching the model through input schema enforcement.
For a broader look at the infrastructure around structured outputs, see LLM structured outputs: the infrastructure behind reliable AI. This guide focuses on provider implementation and JSON Schema behavior; that article focuses on the production system around structured outputs.
:quality(82))
Final thoughts on structured outputs in production
Structured outputs remove a major source of production failures: malformed model responses. They make the response shape predictable, enforce required fields, constrain enum values, and reduce parsing surprises.
The remaining work is operational. You still need semantic validation, provider-specific integration, retries, logging, schema versioning, and rollback. A response can be valid JSON and still be wrong.
If you are building agents that need typed outputs, multi-provider routing, tests, traces, and rollback without building that infrastructure yourself, book a short call.
Frequently asked questions
How do I enforce structured outputs with OpenAI's API?
Use OpenAI structured outputs with a JSON Schema or typed SDK model, then parse the response into an object your application can validate. For new builds, evaluate the Responses API and typed parsing patterns. Test your exact schema because strict structured output systems support practical schema subsets.
Does Claude support structured outputs?
Claude supports structured output patterns through tool use. You define a tool with an input_schema, then force or guide the model to call that tool. The result is a structured tool input object. This differs from OpenAI's native response_format style API, so multi-provider apps need an abstraction layer.
Does Gemini support JSON Schema for structured outputs?
Yes. Gemini supports structured JSON through response_mime_type and response_json_schema, with a subset of JSON Schema. Test nested objects, enum-heavy schemas, nullable fields, and advanced constraints before depending on them in production.
What is the difference between structured outputs and function calling?
Structured outputs govern what the model returns to your application. Function calling governs the arguments the model passes to a tool. Function calling is one structured output pattern, but structured outputs can also be used for extraction, classification, routing, and database-ready responses.
Can structured outputs prevent incorrect answers?
No. Structured outputs guarantee shape, not truth. A model can return the right fields with the wrong values. Production systems still need semantic validation through business rules, test datasets, secondary review, or human escalation.
Why do structured output requests have higher latency?
Structured output requests can add latency because the schema consumes input tokens and constrained decoding tracks valid tokens during generation. The overhead grows with schema complexity, especially deeply nested objects, long field descriptions, and large enum sets.
Should I use vLLM or Ollama for structured outputs?
Use vLLM or Ollama when you need local or self-hosted inference. vLLM is stronger for production-style serving. Ollama is useful for local development and lightweight workflows. In both cases, validate outputs after parsing because compliance depends on the model and serving configuration.
Related resources
LLM Structured Outputs: The Infrastructure Behind Reliable AI
Provider-level structured outputs solve JSON syntax. Production systems still need prompt management, testing, versioning, and model routing. Here's the full stack.
AI automation for hospitals guide (April 2026)
Complete guide to AI automation for hospitals in April 2026. Learn how to scale clinical workflows, cut admin costs, and move from pilot to production.
Context engineering guide for AI teams 2026
Learn context engineering for AI agents in April 2026. Manage retrieval, memory, and tool outputs to prevent production failures and control token costs.
Agent vs workflow guide | April 2026
Complete guide to agents vs workflows in April 2026. Learn when to use AI agents versus workflows, key differences, and how to choose the right approach.
AI model benchmarks 2026: GPT, Claude, Gemini compared
Compare GPT, Claude, and Gemini across SWE-bench Pro, GDPval, GPQA, MMLU-Pro, coding, reasoning, tool use, and production evaluation benchmarks.
HIPAA AI automation tools guide April 2026
Find HIPAA compliant AI automation tools with enforced model restrictions and BAAs. Updated guide for April 2026 covers certification, security, and production use.