API
POST /scrape
Endpoint to scrape a single webpage
Endpoint to scrape a single webpage.
https://api.webcrawlerapi.com/v2/scrapeFormat: JSON
Method: POST
Request
Available request params:
url- (required) The URL of the webpage to scrape.prompt- (optional) A prompt to run on the scraped content. This can be used to extract specific information or to format the output (Extra 0.002$ per prompt).output_format- (optional) The format of the output. Can bemarkdown,cleanedorhtml. Default ismarkdown.clean_selectors- (optional) CSS selectors to clean from the output. Read more about advanced cleaning in clean selectors.respect_robots_txt- (optional) if set totrue, the scraper will respect the website's robots.txt file and return an error if the URL is disallowed. Default isfalse.
Example:
{
"url": "https://www.example.com",
"output_format": "markdown",
"clean_selectors": ".advertisement,.footer",
"respect_robots_txt": true
}Response
The response will contain a status and the output in the requested format.
{
"status": "done",
"markdown": "## Example Product\n\nThis is an example product page. It has a title, a price, and a description.",
"page_status_code": 200,
"page_title": "Example Product"
}Scrape errors
If the scrape fails, the response will have 200 status code but the success will be false, the error_code and error_message will be set.
For example:
{
"success": false,
"error_code": "name_not_resolved",
"error_message": "Unable to resolve domain name"
}Read more about error codes in Error section.
Error Responses
400 Bad Request- Invalid parameters or missing required fields401 Unauthorized- Invalid or missing API key402 Payment Required- Insufficient account balance500 Internal Server Error- Server-side error
Refer to Async Requests for more information about handling asynchronous scraping jobs.