Define JSON schemas to structure AI responses when using prompts for data extraction

Structured Outputs with Prompts

Structured Outputs ensure that AI-generated responses adhere to a JSON schema you define. This feature eliminates the need to validate or retry incorrectly formatted responses, making it perfect for extracting structured data from web pages.

Benefits

Reliable type-safety: No need to validate or retry incorrectly formatted responses
Consistent formatting: The AI output will always match your defined structure
Simpler implementation: Define your schema once and get predictable results every time

How It Works

When you provide a prompt, the /v2/scrape endpoint returns a JSON object in structured_data instead of markdown or HTML. Add an optional response_schema to enforce a strict JSON schema for the response. The schema follows the JSON Schema format used by OpenAI Structured Outputs.

Basic Example

Extract product information with a guaranteed structure:

curl --request POST \
  --url https://api.webcrawlerapi.com/v2/scrape \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://example.com/product/widget",
    "prompt": "Extract product details from this page",
    "response_schema": {
      "type": "object",
      "properties": {
        "product_name": {"type": "string"},
        "price": {"type": "number"},
        "in_stock": {"type": "boolean"},
        "description": {"type": "string"}
      },
      "required": ["product_name", "price", "in_stock"],
      "additionalProperties": false
    }
  }'

Response:

{
  "success": true,
  "status": "done",
  "page_status_code": 200,
  "page_title": "Premium Widget",
  "structured_data": {
    "product_name": "Premium Widget",
    "price": 29.99,
    "in_stock": true,
    "description": "A high-quality widget for all your needs"
  }
}

Schema Format

Your response_schema must be a valid JSON Schema object. OpenAI structured outputs are strict, so we recommend following these conventions to avoid schema validation errors:

Recommended Fields

type: Use "object" at the root level
properties: Define the structure of your data
required: Include required property names for predictable output
additionalProperties: Set to false to keep the output strict

Supported Types

string - Text data
number - Numeric values (integers or decimals)
boolean - True/false values
object - Nested objects
array - Lists of items
enum - Predefined set of values

Advanced Examples

Nested Objects

Extract business information with address details:

{
  "type": "object",
  "properties": {
    "business_name": {"type": "string"},
    "phone": {"type": "string"},
    "address": {
      "type": "object",
      "properties": {
        "street": {"type": "string"},
        "city": {"type": "string"},
        "state": {"type": "string"},
        "postal_code": {"type": "string"}
      },
      "required": ["street", "city"],
      "additionalProperties": false
    }
  },
  "required": ["business_name", "address"],
  "additionalProperties": false
}

Arrays of Objects

Extract multiple products from a listing page:

{
  "type": "object",
  "properties": {
    "products": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "price": {"type": "number"},
          "rating": {"type": "number"}
        },
        "required": ["name", "price"],
        "additionalProperties": false
      }
    }
  },
  "required": ["products"],
  "additionalProperties": false
}

Enum Constraints

Restrict values to predefined options:

{
  "type": "object",
  "properties": {
    "product_name": {"type": "string"},
    "category": {
      "type": "string",
      "enum": ["electronics", "clothing", "books", "home"]
    },
    "condition": {
      "type": "string",
      "enum": ["new", "used", "refurbished"]
    }
  },
  "required": ["product_name", "category", "condition"],
  "additionalProperties": false
}

Optional Fields

Use null union types for optional fields:

{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "email": {"type": ["string", "null"]},
    "phone": {"type": ["string", "null"]}
  },
  "required": ["name", "email", "phone"],
  "additionalProperties": false
}

Even though all fields are in the required array, email and phone can be null if the information isn't available.

Schema Constraints

To ensure performance and reliability, structured outputs have these limitations:

Maximum properties: 5,000 object properties total
Nesting depth: Maximum 10 levels of nested objects
Enum values: Maximum 1,000 enum values across all enum properties
String length: Total string length of all property names, enum values, and const values cannot exceed 120,000 characters

Error Handling

Invalid Schema

If your schema is invalid, you'll receive an error from the AI model:

{
  "success": false,
  "error_code": "invalid_schema",
  "error_message": "Invalid response schema format"
}

import WebcrawlerAPI from 'webcrawlerapi';

const client = new WebcrawlerAPI({ apiKey: 'YOUR_API_KEY' });

const response = await client.scrapeUrl({
  url: 'https://example.com/product',
  prompt: 'Extract product details',
  response_schema: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      price: { type: 'number' },
      in_stock: { type: 'boolean' }
    },
    required: ['name', 'price', 'in_stock'],
    additionalProperties: false
  }
});

console.log(response.structured_data);

Python

from webcrawlerapi import WebcrawlerAPI

client = WebcrawlerAPI(api_key='YOUR_API_KEY')

response = client.scrape_url(
    url='https://example.com/product',
    prompt='Extract product details',
    response_schema={
        'type': 'object',
        'properties': {
            'name': {'type': 'string'},
            'price': {'type': 'number'},
            'in_stock': {'type': 'boolean'}
        },
        'required': ['name', 'price', 'in_stock'],
        'additionalProperties': False
    }
)

print(response['structured_data'])

Best Practices

Clear property names: Use descriptive, self-documenting property names
Specific prompts: Combine schemas with clear, specific prompts for best results
Start simple: Begin with basic schemas and add complexity as needed
Test iteratively: Test your schemas with sample pages to refine the structure
Handle nulls: Use null unions for optional data that may not always be present

Async Requests - Process multiple pages with structured output
Crawling Types - Different output formats available
Rate Limits - API usage limits and best practices

Structured Outputs with Prompts

Structured Outputs with Prompts

Benefits

How It Works

Basic Example

Schema Format

Recommended Fields

Supported Types

Advanced Examples

Nested Objects

Arrays of Objects

Enum Constraints

Optional Fields

Schema Constraints

Error Handling

Invalid Schema

No Prompt Provided

Prompt Without a Schema

LLM Refusal

Pricing

SDK Support

JavaScript/TypeScript

Python

Best Practices

On this page