Back to Resources
Structured outputs guide: JSON Schema, OpenAI, Claude, Gemini

Structured outputs guide: JSON Schema, OpenAI, Claude, Gemini

Marcus Fields
Marcus FieldsPublished April 10, 2026Updated May 20, 2026

Structured outputs constrain an LLM response to a defined schema, usually JSON Schema or a provider-specific equivalent. This guide compares how structured outputs work across OpenAI, Claude, Gemini, Azure OpenAI, vLLM, and Ollama, including each provider's schema mechanism, tool-calling pattern, JSON Schema limits, and production validation requirements.

TLDR:

  • Structured outputs enforce response shape. They do not guarantee semantic correctness.

  • OpenAI, Gemini, Azure OpenAI, vLLM, and Ollama expose schema-based structured output controls, but each supports a different subset of JSON Schema.

  • Claude uses tool definitions with input_schema and tool choice patterns rather than the same native response_format strict mode as OpenAI.

  • Logic infers schemas from specs, validates responses, and handles provider differences for you.

What structured outputs are

Structured outputs are model responses constrained to a schema. The schema defines fields, types, required values, enums, nested objects, and sometimes additional constraints. The model returns output your application can parse deterministically.

Without structured outputs, your code has to handle responses like this:

"The order seems urgent. Category: priority, confidence maybe 90%."

With structured outputs, your code receives a predictable object:

{
  "category": "priority",
  "confidence": 0.9,
  "requires_human_review": false
}

That difference matters whenever an LLM feeds a database, queue, API, workflow engine, or user-facing action.

Provider quick reference

Provider

Main mechanism

JSON Schema support

Best fit

Production gotcha

OpenAI

Structured outputs with JSON Schema and typed SDK parsing

Strong support, with documented schema limits

Direct schema-constrained responses and typed parsing

Older chat.completions.parse examples still exist, but new agentic builds should evaluate the Responses API path

Anthropic Claude

Tool use with input_schema and forced or guided tool choice

Tool input schemas use JSON Schema

Tool-driven workflows and structured extraction through tool arguments

No identical native response_format strict mode, so provider abstraction differs from OpenAI

Google Gemini

response_mime_type: application/json with response_json_schema

Supports a subset of JSON Schema

JSON output from generation config and Vertex AI workflows

Test advanced schema features before relying on them in production

Azure OpenAI

OpenAI-compatible structured outputs and strict function calling

Depends on model and deployment support

Teams already running OpenAI models through Azure

Feature availability can vary by region, model, and deployment

vLLM

structured_outputs request field

Supports JSON Schema through structured output backends

Local or self-hosted inference with OpenAI-compatible serving

Deprecated guided_json fields were removed in v0.12.0, so current integrations should use structured_outputs

Ollama

format field with JSON mode or schema

Schema support varies by model and setup

Local inference and lightweight development workflows

Always validate after parsing because local model compliance varies

If your application routes across providers, normalize the response contract in your own code or use a platform that handles provider-specific schema behavior, validation, logging, and routing decisions for you.

What structured outputs fix

Structured outputs fix structural failures:

  • Missing required fields

  • Wrong data types

  • Invalid enum values

  • Extra prose around a JSON object

  • Markdown code fences around output

  • Inconsistent casing in fields or categories

  • Tool arguments that do not match a function signature

These failures are common when teams rely on prompting alone. "Return valid JSON only" works until the model adds an explanation, omits a field, or invents a value your downstream system cannot handle.

Schema enforcement moves format compliance out of the prompt and into the generation or parsing layer.

What structured outputs do not fix

Structured outputs do not prove the answer is correct.

This response is structurally valid:

{
  "invoice_total": 4200.00,
  "currency": "USD",
  "confidence": 0.98
}

It can still be wrong if the invoice total was actually 2400.00. Schema enforcement checks shape. It does not verify the model read the document correctly.

Production systems need two validation layers:

  • Structural validation: Does the response match the schema?

  • Semantic validation: Is the response true, useful, and safe for the workflow?

The first layer is deterministic. The second layer requires business rules, golden datasets, secondary review, or human escalation.

How structured outputs work

There are three common implementation patterns.

Prompt-only formatting

Prompt-only formatting asks the model to return JSON. It is simple, but it is probabilistic. Nothing prevents the model from returning prose, missing a required key, or changing a field name.

Use this only for prototypes or low-risk tasks.

Tool or function calling

Tool calling defines a schema for tool arguments. The model returns an object that can be passed into a function. This is the main pattern for Claude and a common pattern across providers.

Tool calling fits cases where the model decides what action to take. It can also work as a structured extraction mechanism when you force a specific tool call.

Constrained decoding or native schema enforcement

Constrained decoding prevents invalid tokens during generation. If the schema says a field can only be "approved" or "rejected", the model cannot generate "maybe" in that field.

This gives you the most direct structural control, but each provider supports different schema features and limits.

Grammar-based enforcement is the common mechanism behind many strict structured output implementations. The system tracks which tokens are valid at each point in the response and masks tokens that would break the schema. That is why strict schema enforcement can guarantee shape in a way prompt-only formatting cannot.

The tradeoff is overhead. Large schemas, deeply nested objects, long descriptions, and large enum sets can add latency. For high-volume workflows, schema design becomes a performance concern as well as a correctness concern.

OpenAI structured outputs

OpenAI supports structured outputs through JSON Schema and typed SDK helpers. The core idea is that you give the model a schema, and the model response must match that shape.

For new OpenAI implementations, evaluate the Responses API path where possible. Current SDK examples use typed parsing patterns so your application can work with parsed objects rather than raw text.

Example:

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class TicketLabel(BaseModel):
    category: str
    urgency: str
    confidence: float

response = client.responses.parse(
    model="gpt-5.4",
    input=[
        {"role": "system", "content": "Classify this support ticket."},
        {"role": "user", "content": "Customer says checkout failed three times."}
    ],
    text_format=TicketLabel,
)

label = response.output_parsed

Older chat.completions.parse examples may still work in some contexts, but the Responses API is the direction to evaluate for new agentic workflows.

Watch for schema limits. Strict structured output systems often support a practical subset of JSON Schema, so advanced constructs, deep recursion, and complex regular expressions need explicit testing.

Claude structured outputs

Claude handles structured outputs through tool use and output consistency patterns. You define a tool with an input_schema, then force or strongly guide the model to call that tool.

Example:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[
        {
            "name": "classify_ticket",
            "description": "Classify a support ticket.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "category": {"type": "string"},
                    "urgency": {"type": "string"},
                    "confidence": {"type": "number"}
                },
                "required": ["category", "urgency", "confidence"]
            }
        }
    ],
    tool_choice={"type": "tool", "name": "classify_ticket"},
    messages=[
        {"role": "user", "content": "Customer says checkout failed three times."}
    ],
)

This pattern is reliable when implemented carefully, but it is not the same API shape as OpenAI's native structured output mode. If your app routes across providers, build an abstraction that normalizes these differences and records which model produced each output.

Gemini structured outputs

Gemini supports structured JSON through response_mime_type and response_json_schema.

Example:

from google import genai

client = genai.Client()

schema = {
    "type": "object",
    "properties": {
        "category": {"type": "string"},
        "urgency": {"type": "string"},
        "confidence": {"type": "number"}
    },
    "required": ["category", "urgency", "confidence"]
}

response = client.models.generate_content(
    model="gemini-3.1-pro",
    contents="Customer says checkout failed three times.",
    config={
        "response_mime_type": "application/json",
        "response_json_schema": schema
    },
)

Gemini's structured output mode supports a subset of JSON Schema. That is enough for many extraction, classification, and routing workloads, but teams should test nested objects, enum-heavy schemas, nullable fields, and advanced constraints before relying on them in production.

Azure OpenAI structured outputs

Azure OpenAI follows OpenAI's API surface for structured outputs and strict function calling where supported. It is a good fit for teams that need Azure deployment, enterprise controls, or existing Microsoft procurement paths.

The key operational issue is availability. Structured output behavior depends on the deployed model, API version, and region. Before standardizing on Azure OpenAI structured outputs, test the exact deployment your production system will use.

vLLM structured outputs

vLLM supports structured output generation for self-hosted inference. Current integrations should use the structured_outputs request field rather than older guided_json style fields.

Example shape:

{
  "model": "your-model",
  "messages": [
    {"role": "user", "content": "Classify this ticket."}
  ],
  "structured_outputs": {
    "json": {
      "type": "object",
      "properties": {
        "category": {"type": "string"},
        "urgency": {"type": "string"}
      },
      "required": ["category", "urgency"]
    }
  }
}

vLLM is useful when you control the serving stack and need local inference, but implementation quality depends on the model, tokenizer, backend, and schema complexity. Validate outputs after parsing, even when constrained decoding is enabled.

Ollama structured outputs

Ollama supports structured output patterns through the format field, including JSON mode and schema-based outputs.

Example shape:

{
  "model": "llama3.1",
  "prompt": "Classify this ticket.",
  "format": {
    "type": "object",
    "properties": {
      "category": {"type": "string"},
      "urgency": {"type": "string"}
    },
    "required": ["category", "urgency"]
  }
}

Ollama is a practical option for local development and lightweight local inference. For production use, treat schema output as one layer of control, then run deterministic validation and retries around it.

OpenAI vs Claude vs Gemini structured outputs

The three major frontier providers can all return structured data, but their implementation models differ.

Provider

Best fit

Main difference to plan for

OpenAI

Direct schema-constrained responses and typed SDK parsing

Strong native structured output support through current API patterns

Claude

Tool-driven workflows and structured tool arguments

Use tool schemas and forced tool choice when you need predictable objects

Gemini

JSON output through generation config

Test the JSON Schema subset and schema complexity limits before production

If you support multiple providers, the hard part is not only getting valid JSON from each one. The hard part is maintaining one application contract while providers expose different APIs, schema subsets, model behavior, error modes, and logging semantics.

Schema design: getting it right

Schema enforcement gives you syntactic correctness, but schema design still affects output quality.

Field names are part of the prompt. A field called f1 gives the model almost no guidance. A field called confidence_score_0_to_1 is self-documenting. Providers often pass schema descriptions into the model context, which can improve output quality on ambiguous fields.

For fields that should only take specific values, define an enum. Grammar-based enforcement locks those values at the token level, so the model cannot produce an unexpected category.

Keep schemas flat when possible. Deeply nested schemas are slower to enforce and harder to debug. One or two levels of nesting are usually enough, and flatter structures are easier for the model to follow consistently.

Prefer required fields with explicit null values over optional fields when the downstream system expects a stable contract. Every optional field is a field the model might skip or hallucinate, and required fields tend to be easier to validate consistently.

Testing and validation strategy

Valid JSON is the floor. A schema-compliant response can still return the wrong category, hallucinate a confidence score, or extract the wrong value from a document.

Run two layers of tests:

  • Structural tests: Does the response parse as valid JSON? Are required fields present? Are types correct? Do enum values match the schema?

  • Semantic tests: Is the answer correct for the input? Does it follow business rules? Would the downstream workflow behave correctly?

For semantic validation, start with 50 to 100 real examples. Include common cases, edge cases, ambiguous inputs, and examples where mistakes have business consequences. Track exact-match accuracy where possible and rubric-based scoring where judgment is required.

When validation fails, retry once with the same schema. If it fails again, retry with a simplified schema that captures only critical fields. If that still fails, route to a human review queue or a safe fallback path.

Structured outputs vs. function calling

Function calling is a structured output applied to a specific problem: controlling the arguments the model passes to a tool.

When a model calls:

flag_for_human_review(sku="ABC123", urgency="high")

the same schema enforcement that prevents confidence: "pretty high" can also prevent urgency: "really urgent".

The practical difference is intent. Structured outputs govern what the model returns to your application. Function calling governs what the model does within a workflow. In agentic systems, both patterns often run together.

Production challenges and failure modes

Strict schema enforcement solves the syntactic problem. Production systems still need to handle latency, semantic errors, provider differences, schema migrations, and downstream failures.

Grammar-based enforcement adds overhead that scales with schema complexity. A deeply nested schema with optional fields or large enum sets can add meaningful latency. Flatten your schemas and remove optional fields that are not needed.

Provider schema support also varies. OpenAI, Gemini, Azure OpenAI, vLLM, and Ollama do not support the exact same API surface or schema subset. Claude's tool-use pattern creates a different integration shape. Test the actual provider, model, API version, and schema before making the contract production-critical.

Schema-valid output can still be wrong. A model might return a valid invoice_total field with the wrong number. Catching that requires business rules, golden datasets, secondary review, or human escalation.

Log the full execution context when validation fails:

  • Exact prompt

  • Model and model version

  • Schema version

  • Raw response before parsing

  • Parsed response

  • Failed field

  • Retry path

  • Final disposition

Without that context, validation spikes are hard to debug.

Cost and latency optimization

Schema enforcement adds token overhead because the schema definition consumes input tokens. Constrained decoding can also add processing time because the system tracks valid tokens during generation.

Measure on your actual schemas. A five-field classification schema behaves differently from a deeply nested document extraction schema with dozens of fields.

The highest-leverage optimizations are:

  • Route simple tasks to faster models.

  • Keep schemas flat.

  • Use concise field descriptions.

  • Remove unused optional fields.

  • Cache exact-match responses for deterministic workloads.

  • Split large extraction tasks into smaller schemas when the workflow allows it.

For simple classification or extraction, fast models like Gemini 3 Flash, GPT-5.4-mini, or Claude Haiku 4.5 may be enough. For ambiguous inputs, deep reasoning, or high-consequence work, frontier models like GPT-5.4, Claude Opus 4.6, or Gemini 3.1 Pro may justify the extra cost.

Frameworks and tooling

Provider SDKs handle the basics for single-provider builds. OpenAI's SDK gives typed parsing helpers. Anthropic's SDK supports tool definitions with input_schema. Google's genai SDK exposes structured output configuration for Gemini.

Instructor wraps provider SDKs with Pydantic validation and retry behavior. Outlines focuses on grammar-based generation and is most useful when you control local inference. vLLM and Ollama are better fits when you want local or self-hosted structured output control.

The tooling decision comes down to whether you need one provider or multiple. A single-provider app can use the provider SDK directly. A multi-provider production system needs a normalized contract, consistent validation, model routing, retries, logging, and rollback.

Structured outputs in agent workflows

One malformed response mid-loop can break tool dispatch, corrupt downstream state, or silently skip a step. If each step in a 10-step agent loop has 95% structural reliability, end-to-end success drops quickly.

Tool calls need typed parameters. Routing logic needs valid enum values. Database writes need predictable field names and types. When any of those fail, the agent either crashes or proceeds in a bad state.

Structured outputs reduce that failure class. They do not replace testing, observability, or rollback.

How Logic handles structured outputs in production

Logic turns a natural language spec into a production agent with typed REST APIs, schema inference, automated tests, versioning, rollback, and observability. You describe the agent's inputs, outputs, and behavior. Logic handles the schema and execution layer around the model call.

For teams routing across OpenAI, Anthropic, and Google, this removes provider-specific glue code. Logic handles structured outputs consistently across supported models: it normalizes every response to the agent's output contract, repairs off-contract structured outputs when needed, revalidates the repaired result against the schema, and logs the original response, repair path, model version, schema version, latency, and errors.

That routing layer matters because schema reliability varies by task, model, and provider. Logic can auto-route each run, pin a model when determinism matters, tune reasoning level for supported models, reuse exact-match responses through caching, and keep malformed requests from reaching the model through input schema enforcement.

For a broader look at the infrastructure around structured outputs, see LLM structured outputs: the infrastructure behind reliable AI. This guide focuses on provider implementation and JSON Schema behavior; that article focuses on the production system around structured outputs.

Final thoughts on structured outputs in production

Structured outputs remove a major source of production failures: malformed model responses. They make the response shape predictable, enforce required fields, constrain enum values, and reduce parsing surprises.

The remaining work is operational. You still need semantic validation, provider-specific integration, retries, logging, schema versioning, and rollback. A response can be valid JSON and still be wrong.

If you are building agents that need typed outputs, multi-provider routing, tests, traces, and rollback without building that infrastructure yourself, book a short call.

Frequently asked questions

How do I enforce structured outputs with OpenAI's API?

Use OpenAI structured outputs with a JSON Schema or typed SDK model, then parse the response into an object your application can validate. For new builds, evaluate the Responses API and typed parsing patterns. Test your exact schema because strict structured output systems support practical schema subsets.

Does Claude support structured outputs?

Claude supports structured output patterns through tool use. You define a tool with an input_schema, then force or guide the model to call that tool. The result is a structured tool input object. This differs from OpenAI's native response_format style API, so multi-provider apps need an abstraction layer.

Does Gemini support JSON Schema for structured outputs?

Yes. Gemini supports structured JSON through response_mime_type and response_json_schema, with a subset of JSON Schema. Test nested objects, enum-heavy schemas, nullable fields, and advanced constraints before depending on them in production.

What is the difference between structured outputs and function calling?

Structured outputs govern what the model returns to your application. Function calling governs the arguments the model passes to a tool. Function calling is one structured output pattern, but structured outputs can also be used for extraction, classification, routing, and database-ready responses.

Can structured outputs prevent incorrect answers?

No. Structured outputs guarantee shape, not truth. A model can return the right fields with the wrong values. Production systems still need semantic validation through business rules, test datasets, secondary review, or human escalation.

Why do structured output requests have higher latency?

Structured output requests can add latency because the schema consumes input tokens and constrained decoding tracks valid tokens during generation. The overhead grows with schema complexity, especially deeply nested objects, long field descriptions, and large enum sets.

Should I use vLLM or Ollama for structured outputs?

Use vLLM or Ollama when you need local or self-hosted inference. vLLM is stronger for production-style serving. Ollama is useful for local development and lightweight workflows. In both cases, validate outputs after parsing because compliance depends on the model and serving configuration.

Related resources

Ship your first production agent

Logic gives you typed APIs, evals, versioning, observability, and model routing for agents that run in production.