JSON, Schemas, and Structured Output Prompts
Master techniques for getting AI to produce clean, machine-readable JSON output reliably. Learn how to write prompts with explicit schemas, use few-shot examples effectively, handle missing data gracefully, implement error recovery patterns, and build validation awareness into your prompts for production-ready structured output.
4/29/20243 min read


Language models are conversational by nature—they want to chat, explain, and elaborate. But real applications need structured data: JSON objects that feed into databases, API responses that downstream systems can parse, and formatted output that machines can reliably process. Getting AI to consistently produce clean, parseable structured output is one of the most critical skills in production prompt engineering.
Why Structured Output Matters
Imagine building an email categorization system. You need the AI to analyze emails and return structured classifications: priority level, category, sentiment, and required action. A chatty response like "This seems like a medium priority customer inquiry about billing, and the customer appears frustrated, so you should probably respond within 24 hours" is useless to your automation pipeline.
You need exactly this:
The nested structure example shows exactly how to organize hierarchical data.
Error Recovery Patterns
Even with perfect prompts, models occasionally produce malformed JSON. Build recovery into your system:
This internal check catches many errors before they reach your parser.
From Chaos to Structure
Structured output prompting transforms unpredictable language model responses into reliable data pipelines. Through explicit schemas, clear examples, thoughtful error handling, and validation awareness, you create AI systems that integrate seamlessly with the rest of your technology stack. The conversation is great for humans; clean JSON is what your code needs.
def parse_llm_json(response):
try:
return json.loads(response)
except json.JSONDecodeError:
# Try to extract JSON from markdown code blocks
match = re.search(r'```json\n(.*?)\n```', response, re.DOTALL)
if match:
return json.loads(match.group(1))
# Try to find anything between { and }
match = re.search(r'\{.*\}', response, re.DOTALL)
if match:
return json.loads(match.group(0))
raise ValueError("Could not extract valid JSON")
```
Validation-Aware Prompting
Tell the model to self-validate before responding:
```
Before outputting your JSON:
1. Verify all required fields are present
2. Check that number fields contain valid numbers
3. Ensure enum fields match allowed values exactly
4. Confirm the JSON is syntactically valid
Only output the JSON if all validations pass.
{
"priority": "medium",
"category": "billing_inquiry",
"sentiment": "frustrated",
"action_required": "respond_24h"
}
```
Clean, parseable, predictable. This is what structured output prompting achieves.
The JSON-Only Instruction
The foundation of structured output prompting is crystal-clear instruction that the model should emit nothing but valid JSON. Ambiguity here is your enemy.
Weak approach:
"Respond in JSON format."
This invites the model to add preambles like "Here's the JSON you requested:" or explanations after the JSON. Your parser breaks immediately.
Strong approach:
```
Respond with ONLY valid JSON. No preamble, no explanation, no markdown code blocks.
Your entire response must be a single valid JSON object that begins with { and ends with }.
```
Even better, explicitly show the start:
```
Output format: Your response must be valid JSON starting immediately with {
Example of correct format:
{"field": "value", "another_field": 123}
Now process the following data:
[your input here]
```
Schema Definition: Teaching Structure
The model needs to understand exactly what fields to include and what types they should be. JSON schema definitions embedded in your prompt provide this blueprint.
```
Respond with a JSON object matching this schema:
{
"customer_name": string (required),
"order_total": number (required, positive),
"items": array of strings (required, minimum 1 item),
"priority": string (required, must be "low" | "medium" | "high"),
"notes": string or null (optional)
}
```
This explicit schema prevents the model from inventing fields, using wrong types, or omitting required data. It's essentially a contract between your prompt and the output.
Inline Examples: The Power of Few-Shot
Schema definitions tell the model what to do; examples show it how. Combining both creates robust structured output.
```
Extract customer information as JSON following this schema:
{
"name": string,
"email": string,
"issue_type": string (one of: technical, billing, general),
"urgency": number (1-5 scale)
}
Examples:
Input: "Hi, I'm John Smith (john@example.com) and I'm having trouble logging in. This is urgent!"
Output: {"name": "John Smith", "email": "john@example.com", "issue_type": "technical", "urgency": 5}
Input: "Sarah Johnson here, sarah.j@test.com, just a general question about your pricing, no rush"
Output: {"name": "Sarah Johnson", "email": "sarah.j@test.com", "issue_type": "general", "urgency": 2}
Now process this input:
[actual input]
```
The examples demonstrate edge cases: how to extract names from conversational text, how to infer urgency from language cues, and how to map various phrasings to standardized categories.
Handling Missing or Ambiguous Data
Real-world inputs are messy. Your prompt must define how to handle uncertainty.
```
For missing or unclear information:
- Use null for truly missing data (e.g., if email not mentioned: "email": null)
- Use empty strings "" only for explicitly empty text fields
- Use "unknown" as a string value for categorical fields when unclear
- If you cannot reasonably extract a required field, use null and explain in an "errors" array
Schema:
{
"extracted_data": {
"name": string or null,
"amount": number or null
},
"errors": array of strings (empty if no errors)
}
```
This pattern keeps your JSON structure intact even when input quality varies, while providing error information your code can handle programmatically.
Nested Structures and Arrays
Complex data requires nested JSON. Show the model the complete structure:
```
Extract all mentioned products as a JSON array of objects:
[
{
"product_name": string,
"quantity": number,
"price": number,
"options": {
"color": string or null,
"size": string or null
}
}
]
Example:
Input: "I want 2 blue shirts in large for $29.99 each and 1 red hat for $15"
Output: [
{
"product_name": "shirt",
"quantity": 2,
"price": 29.99,
"options": {"color": "blue", "size": "large"}
},
{
"product_name": "hat",
"quantity": 1,
"price": 15.00,
"options": {"color": "red", "size": null}
}
]

