POST /scrape

Endpoint to scrape a single webpage.

https://api.webcrawlerapi.com/v2/scrape

Format: JSON
Method: POST

Request

Available request params:

url - (required) The URL of the webpage to scrape.
prompt - (optional) A prompt to run on the scraped content. This can be used to extract specific information or to format the output (Extra 0.002$ per prompt).
output_format - (optional) The format of the output. Can be markdown, cleaned or html. Default is markdown.
clean_selectors - (optional) CSS selectors to clean from the output. Read more about advanced cleaning in clean selectors.
respect_robots_txt - (optional) if set to true, the scraper will respect the website's robots.txt file and return an error if the URL is disallowed. Default is false.

Example:

{
    "url": "https://www.example.com",
    "output_format": "markdown",
    "clean_selectors": ".advertisement,.footer",
    "respect_robots_txt": true
}

Response

The response will contain a status and the output in the requested format.

{
    "status": "done",
    "markdown": "## Example Product\n\nThis is an example product page. It has a title, a price, and a description.",
    "page_status_code": 200,
    "page_title": "Example Product"
}

Scrape errors

If the scrape fails, the response will have 200 status code but the success will be false, the error_code and error_message will be set.

For example:

{
    "success": false,
    "error_code": "name_not_resolved",
    "error_message": "Unable to resolve domain name"
}

Read more about error codes in Error section.

Error Responses

400 Bad Request - Invalid parameters or missing required fields
401 Unauthorized - Invalid or missing API key
402 Payment Required - Insufficient account balance
500 Internal Server Error - Server-side error

Refer to Async Requests for more information about handling asynchronous scraping jobs.

Job Cancel Get Job URLs