Structured Outputs with Prompts
Define JSON schemas to structure AI responses when using prompts for data extraction
Structured Outputs with Prompts
Structured Outputs ensure that AI-generated responses adhere to a JSON schema you define. This feature eliminates the need to validate or retry incorrectly formatted responses, making it perfect for extracting structured data from web pages.
Benefits
- Reliable type-safety: No need to validate or retry incorrectly formatted responses
- Consistent formatting: The AI output will always match your defined structure
- Simpler implementation: Define your schema once and get predictable results every time
How It Works
When you provide a prompt, the /v2/scrape endpoint returns a JSON object in structured_data instead of markdown or HTML. Add an optional response_schema to enforce a strict JSON schema for the response. The schema follows the JSON Schema format used by OpenAI Structured Outputs.
Basic Example
Extract product information with a guaranteed structure:
curl --request POST \
--url https://api.webcrawlerapi.com/v2/scrape \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/product/widget",
"prompt": "Extract product details from this page",
"response_schema": {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"},
"description": {"type": "string"}
},
"required": ["product_name", "price", "in_stock"],
"additionalProperties": false
}
}'Response:
{
"success": true,
"status": "done",
"page_status_code": 200,
"page_title": "Premium Widget",
"structured_data": {
"product_name": "Premium Widget",
"price": 29.99,
"in_stock": true,
"description": "A high-quality widget for all your needs"
}
}Schema Format
Your response_schema must be a valid JSON Schema object. OpenAI structured outputs are strict, so we recommend following these conventions to avoid schema validation errors:
Recommended Fields
type: Use"object"at the root levelproperties: Define the structure of your datarequired: Include required property names for predictable outputadditionalProperties: Set tofalseto keep the output strict
Supported Types
string- Text datanumber- Numeric values (integers or decimals)boolean- True/false valuesobject- Nested objectsarray- Lists of itemsenum- Predefined set of values
Advanced Examples
Nested Objects
Extract business information with address details:
{
"type": "object",
"properties": {
"business_name": {"type": "string"},
"phone": {"type": "string"},
"address": {
"type": "object",
"properties": {
"street": {"type": "string"},
"city": {"type": "string"},
"state": {"type": "string"},
"postal_code": {"type": "string"}
},
"required": ["street", "city"],
"additionalProperties": false
}
},
"required": ["business_name", "address"],
"additionalProperties": false
}Arrays of Objects
Extract multiple products from a listing page:
{
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"rating": {"type": "number"}
},
"required": ["name", "price"],
"additionalProperties": false
}
}
},
"required": ["products"],
"additionalProperties": false
}Enum Constraints
Restrict values to predefined options:
{
"type": "object",
"properties": {
"product_name": {"type": "string"},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books", "home"]
},
"condition": {
"type": "string",
"enum": ["new", "used", "refurbished"]
}
},
"required": ["product_name", "category", "condition"],
"additionalProperties": false
}Optional Fields
Use null union types for optional fields:
{
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": ["string", "null"]},
"phone": {"type": ["string", "null"]}
},
"required": ["name", "email", "phone"],
"additionalProperties": false
}Even though all fields are in the required array, email and phone can be null if the information isn't available.
Schema Constraints
To ensure performance and reliability, structured outputs have these limitations:
- Maximum properties: 5,000 object properties total
- Nesting depth: Maximum 10 levels of nested objects
- Enum values: Maximum 1,000 enum values across all enum properties
- String length: Total string length of all property names, enum values, and const values cannot exceed 120,000 characters
Error Handling
Invalid Schema
If your schema is invalid, you'll receive an error from the AI model:
{
"success": false,
"error_code": "invalid_schema",
"error_message": "Invalid response schema format"
}No Prompt Provided
The response_schema parameter only works when a prompt is also provided. If you include a schema without a prompt, it will be ignored.
Prompt Without a Schema
If you send a prompt without response_schema, the API still returns structured_data, but uses JSON-object mode instead of strict schema validation.
LLM Refusal
In rare cases, the AI may refuse to process content for safety reasons. You'll receive a refusal message explaining why.
Pricing
Structured outputs cost the same as regular prompts: $0.002 per request with prompt (in addition to the base crawling cost).
SDK Support
JavaScript/TypeScript
import WebcrawlerAPI from 'webcrawlerapi';
const client = new WebcrawlerAPI({ apiKey: 'YOUR_API_KEY' });
const response = await client.scrapeUrl({
url: 'https://example.com/product',
prompt: 'Extract product details',
response_schema: {
type: 'object',
properties: {
name: { type: 'string' },
price: { type: 'number' },
in_stock: { type: 'boolean' }
},
required: ['name', 'price', 'in_stock'],
additionalProperties: false
}
});
console.log(response.structured_data);Python
from webcrawlerapi import WebcrawlerAPI
client = WebcrawlerAPI(api_key='YOUR_API_KEY')
response = client.scrape_url(
url='https://example.com/product',
prompt='Extract product details',
response_schema={
'type': 'object',
'properties': {
'name': {'type': 'string'},
'price': {'type': 'number'},
'in_stock': {'type': 'boolean'}
},
'required': ['name', 'price', 'in_stock'],
'additionalProperties': False
}
)
print(response['structured_data'])Best Practices
- Clear property names: Use descriptive, self-documenting property names
- Specific prompts: Combine schemas with clear, specific prompts for best results
- Start simple: Begin with basic schemas and add complexity as needed
- Test iteratively: Test your schemas with sample pages to refine the structure
- Handle nulls: Use null unions for optional data that may not always be present
Related Documentation
- Async Requests - Process multiple pages with structured output
- Crawling Types - Different output formats available
- Rate Limits - API usage limits and best practices