Webcrawler API LogoWebCrawlerAPI
API

POST /scrape

Endpoint to scrape a single webpage

Endpoint to scrape a single webpage.

https://api.webcrawlerapi.com/v2/scrape

Format: JSON
Method: POST

Request

Available request params:

  • url - (required) The URL of the webpage to scrape.
  • prompt - (optional) A prompt to run on the scraped content. This can be used to extract specific information or to format the output (Extra 0.002$ per prompt).
  • output_format - (optional) The format of the output. Can be markdown, cleaned or html. Default is markdown.
  • main_content_only - (optional) Extract only the main content of article or blog post. When set to true, the scraper will focus on extracting the primary article content while filtering out navigation, sidebars, ads, and other non-essential elements. Default is false.
  • clean_selectors - (optional) CSS selectors to clean from the output. Read more about advanced cleaning in clean selectors.
  • respect_robots_txt - (optional) if set to true, the scraper will respect the website's robots.txt file and return an error if the URL is disallowed. Default is false.

Example:

{
    "url": "https://www.example.com",
    "output_format": "markdown",
    "main_content_only": true,
    "clean_selectors": ".advertisement,.footer",
    "respect_robots_txt": true
}

Response

The response will contain a status and the output in the requested format.

{
    "status": "done",
    "markdown": "## Example Product\n\nThis is an example product page. It has a title, a price, and a description.",
    "page_status_code": 200,
    "page_title": "Example Product"
}

Scrape errors

If the scrape fails, the response will have 200 status code but the success will be false, the error_code and error_message will be set.

For example:

{
    "success": false,
    "error_code": "name_not_resolved",
    "error_message": "Unable to resolve domain name"
}

Read more about error codes in Error section.

Error Responses

  • 400 Bad Request - Invalid parameters or missing required fields
  • 401 Unauthorized - Invalid or missing API key
  • 402 Payment Required - Insufficient account balance
  • 500 Internal Server Error - Server-side error

Refer to Async Requests for more information about handling asynchronous scraping jobs.