Webcrawler API LogoWebCrawlerAPI
APIAgent

POST /v1/agent

Run an AI agent that crawls URLs and extracts structured data

Run an AI agent that browses provided URLs, follows relevant links, and extracts data based on your prompt.

https://api.webcrawlerapi.com/v1/agent

Format: JSON
Method: POST

Request

  • prompt - (required) Natural language instruction — what data to extract or what task to perform.
  • max_spend_usd - (required) Maximum budget in USD the agent may spend on this run. Must be > 0.
  • urls - (optional) Seed URLs for the agent to start crawling from.
  • seed_urls_only - (optional) When true, agent only processes provided seed URLs and does not follow links. Default false.
  • output_schema - (optional) JSON Schema object describing expected structure of extracted data.
  • model - (optional) LLM model to use. See available models below.

Example:

{
    "prompt": "Extract product names, prices, and descriptions",
    "urls": ["https://example.com/products"],
    "max_spend_usd": 0.5,
    "model": "google/gemini-3.1-flash-lite-preview",
    "output_schema": {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "name": { "type": "string" },
                "price": { "type": "number" },
                "description": { "type": "string" }
            }
        }
    }
}

curl example

curl --request POST \
  --url https://api.webcrawlerapi.com/v1/agent \
  --header 'Authorization: Bearer <YOUR_API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "Extract product names, prices, and descriptions",
    "urls": ["https://example.com/products"],
    "max_spend_usd": 0.5
  }'

Available Models

  • google/gemini-3.1-flash-lite-preview
  • google/gemini-3-flash-preview
  • google/gemini-3.1-pro-preview
  • openai/gpt-5.4-mini
  • openai/gpt-5.4
  • openai/gpt-5.5
  • anthropic/claude-sonnet-4.6

Response

Returns an AgentRun object for the queued run.

{
    "id": "ar_abc123",
    "status": "queued",
    "prompt": "Extract product names, prices, and descriptions",
    "model": "google/gemini-3.1-flash-lite-preview",
    "urls": ["https://example.com/products"],
    "max_spend_usd": 0.5,
    "balance_used_usd": 0.0,
    "success": false,
    "created_at": "2025-01-01T00:00:00Z",
    "updated_at": "2025-01-01T00:00:00Z"
}

Response fields

  • id — unique identifier of the agent run
  • statusqueued, processing, done, or failed
  • prompt — prompt from the request
  • model — LLM model used
  • urls — seed URLs provided
  • max_spend_usd — spending cap set for this run
  • balance_used_usd — actual amount spent
  • data — extracted result data (present when status is done)
  • successtrue when run completed with non-empty data
  • error — error message if run failed
  • error_reason — machine-readable error reason
  • trace — agent reasoning trace (if available)
  • llm_requests — list of individual LLM calls made during the run
  • created_at — ISO 8601 timestamp
  • updated_at — ISO 8601 timestamp

Agent runs are asynchronous. Poll GET /v1/agent/job/{id} to check progress.

Error Responses

  • 400 Bad Request - Missing or invalid parameters (e.g. max_spend_usd is 0 or missing)
  • 401 Unauthorized - Invalid or missing API key
  • 402 Payment Required - Insufficient account balance
  • 500 Internal Server Error - Server-side error