Structured Outputs and Tool Use: Making LLMs Reliable

The gap between a demo that impresses and a system that ships is usually filled with one thing: fragile parsing. You get a response back from the model, grep for a JSON block somewhere in the prose, hope the field names match what you expected, and wonder why it worked fine in testing but explodes at 2am on a Tuesday.

This guide covers the two main tools that close that gap: structured output via constrained JSON generation, and tool use (function calling) that forces the model to return typed arguments your code can consume directly. Both techniques move you from “the model probably gave me the right thing” to “the shape of this response is guaranteed.”

Why Unstructured Output Fails in Production

Large language models are trained to be helpful and verbose. Left to their own devices, they wrap everything in explanation. Ask for a JSON object and you might get:

Sure! Here is the JSON you requested:

```json
{ "status": "ok", "count": 42 }
```

Let me know if you need anything else!

Every extraction strategy you write for that format is technical debt. The model might use a different preamble next time, omit the code fence, add a trailing comment inside the block, or use single quotes instead of double. Any of those breaks a naive parser.

The fix is not better regex. The fix is to never let the model produce free-form text when you need structured data.

Approach One: JSON Mode with a System Prompt

The simplest technique is prompting the model to return only raw JSON and nothing else. This is not foolproof on its own, but it works well when combined with explicit schema instructions.

The pattern:

Tell the model in the system prompt that it must respond with valid JSON only, no prose, no code fences.
Describe the exact schema you expect, either inline or as a JSON Schema object.
Validate the response on the client before you use it.

import anthropic
import json
from jsonschema import validate

client = anthropic.Anthropic()

SYSTEM = """
You are a data extraction assistant. Respond only with a valid JSON object.
Do not include explanations, markdown, or code fences. The JSON must match
this schema exactly:
{
  "type": "object",
  "properties": {
    "company": { "type": "string" },
    "revenue_usd": { "type": "number" },
    "employees": { "type": "integer" }
  },
  "required": ["company", "revenue_usd", "employees"]
}
"""

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=SYSTEM,
    messages=[
        {"role": "user", "content": "Extract company data from: Acme Corp reported $4.2M revenue with 87 staff."}
    ]
)

raw = message.content[0].text
data = json.loads(raw)  # raises if not valid JSON
validate(instance=data, schema=your_schema)  # raises if schema mismatch

The schema in the system prompt pulls double duty: it constrains what the model generates, and it documents what your code expects. Keep them in sync.

Approach Two: Tool Use for Guaranteed Typed Calls

Tool use is the more robust approach for structured output. Instead of asking the model to format data as JSON, you define a tool with a typed schema and tell the model to call it. The API returns a structured tool_use block rather than raw text, and the arguments are already parsed JSON that matches your schema.

This works because the model is not free-writing text that happens to look like JSON. It is filling in the arguments of a function call against a schema the API enforces.

import anthropic
import json

client = anthropic.Anthropic()

tools = [
    {
        "name": "record_company_data",
        "description": "Record structured company information extracted from text.",
        "input_schema": {
            "type": "object",
            "properties": {
                "company": {
                    "type": "string",
                    "description": "Company name"
                },
                "revenue_usd": {
                    "type": "number",
                    "description": "Annual revenue in US dollars"
                },
                "employees": {
                    "type": "integer",
                    "description": "Number of employees"
                }
            },
            "required": ["company", "revenue_usd", "employees"]
        }
    }
]

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "tool", "name": "record_company_data"},
    messages=[
        {"role": "user", "content": "Extract company data from: Acme Corp reported $4.2M revenue with 87 staff."}
    ]
)

# The model will always call this tool because we forced it
tool_block = next(b for b in message.content if b.type == "tool_use")
data = tool_block.input  # Already a dict, already validated against the schema
print(data["company"], data["revenue_usd"], data["employees"])

The key line is tool_choice={"type": "tool", "name": "record_company_data"}. Setting tool_choice to a specific tool forces the model to call it rather than responding in text. Without this, the model chooses whether to call a tool or respond normally. For extraction workflows where you always want structured output, force the call.

Tool Use in TypeScript

The same pattern in Node using the official SDK:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-haiku-4-5",
  max_tokens: 1024,
  tools: [
    {
      name: "record_company_data",
      description: "Record structured company information.",
      input_schema: {
        type: "object",
        properties: {
          company: { type: "string" },
          revenue_usd: { type: "number" },
          employees: { type: "integer" }
        },
        required: ["company", "revenue_usd", "employees"]
      }
    }
  ],
  tool_choice: { type: "tool", name: "record_company_data" },
  messages: [
    { role: "user", content: "Acme Corp: $4.2M revenue, 87 employees." }
  ]
});

const toolBlock = response.content.find((b) => b.type === "tool_use");
if (toolBlock && toolBlock.type === "tool_use") {
  const data = toolBlock.input as {
    company: string;
    revenue_usd: number;
    employees: number;
  };
  console.log(data.company, data.revenue_usd, data.employees);
}

For high-volume extraction tasks, claude-haiku-4-5 is worth considering. It has a 200K context window and is the fastest option in the current model lineup. For tasks that benefit from deeper reasoning over complex documents, claude-sonnet-4-6 or claude-opus-4-8 give you a 1M-token context window alongside greater capability.

When to Use Adaptive Thinking with Structured Output

On Claude 4.6 and later models, extended reasoning is available via adaptive thinking. When you are extracting structured data from ambiguous, messy, or conflicting source material, enabling adaptive thinking can improve accuracy before the final structured response is generated.

message = client.messages.create(
    model="claude-fable-5",
    max_tokens=8000,
    thinking={"type": "adaptive"},
    tools=tools,
    tool_choice={"type": "tool", "name": "record_company_data"},
    messages=[{"role": "user", "content": complex_document}]
)

Note that the old fixed budget_tokens thinking budget parameter is removed on Claude 4.6 and later. Use {"type": "adaptive"} and the model manages the reasoning budget itself. The structured tool call still arrives in the response content the same way, but the model reasons through ambiguity before committing to values.

Do not enable adaptive thinking for every structured output call. For straightforward extraction where the source data is clean, it adds latency without benefit. Reserve it for cases where the data is genuinely complex or ambiguous.

Connecting Structured Outputs to External Systems with MCP

Tool use pairs naturally with the Model Context Protocol. MCP is an open standard that lets you expose external data sources and actions as tools the model can call. When you define MCP tools with proper input schemas, structured output guarantees that calls from the model are valid before they hit your database, API, or service.

The discipline is the same: define tight schemas, validate inputs on the server side even if the model called them correctly, and never treat the model’s tool arguments as pre-sanitized for security purposes. MCP handles the transport and tool registration; your schema design determines whether the integration is reliable.

Schema Design Principles That Actually Help

A well-designed schema does most of the reliability work before the model even runs.

Use required fields aggressively. If your downstream code depends on a field, mark it required. The model fills in required fields even when the source data is sparse.
Prefer specific types over strings. A number field cannot come back as “four point two million.” A string field can.
Use enums for categorical values. If a status field can only be active, inactive, or pending, say so. The model will not invent a fourth option.
Add descriptions to every property. The description is instructions to the model, not documentation for developers. Tell it what to extract, not just what type to return.
Keep schemas as flat as practical. Deeply nested objects increase the chance of structural errors. If you need nesting, test it carefully.

Client-Side Validation Is Not Optional

Even with forced tool calls, validate on the client. The API enforces that the response is valid JSON matching the schema, but your application logic may have constraints the schema cannot express: a revenue figure must be positive, a date must be in the past, a count cannot exceed a known maximum.

Treat the model’s output the way you treat any external input: parse it, validate it, reject it loudly if it fails. A validation error in a structured output pipeline is a signal to improve your schema or your prompt, not a reason to add a try/catch that swallows the problem.

Takeaway

Structured output and tool use are not advanced features. They are the baseline for any AI integration that needs to be production-worthy. Start with tool use and forced tool choice for any workflow where you need guaranteed structure. Use adaptive thinking when the source material demands it, and apply it only to the model versions that support it. Design schemas with the model as the consumer, validate outputs as external input, and you will spend your time building features rather than debugging parsers.