API
POST /scrape
Endpoint to scrape a single webpage
Endpoint to scrape a single webpage.
https://api.webcrawlerapi.com/v2/scrapeFormat: JSON
Method: POST
Request
Available request params:
url- (required) The URL of the webpage to scrape.prompt- (optional) A prompt to run on the scraped content. This can be used to extract specific information or to format the output (Extra 0.002$ per prompt).output_format- (optional) The format of the output. Can bemarkdown,cleanedorhtml. Default ismarkdown.main_content_only- (optional) Extract only the main content of article or blog post. When set totrue, the scraper will focus on extracting the primary article content while filtering out navigation, sidebars, ads, and other non-essential elements. Default isfalse.clean_selectors- (optional) CSS selectors to clean from the output. Read more about advanced cleaning in clean selectors.respect_robots_txt- (optional) if set totrue, the scraper will respect the website's robots.txt file and return an error if the URL is disallowed. Default isfalse.
Example:
{
"url": "https://www.example.com",
"output_format": "markdown",
"main_content_only": true,
"clean_selectors": ".advertisement,.footer",
"respect_robots_txt": true
}Response
The response will contain a status and the output in the requested format.
{
"status": "done",
"markdown": "## Example Product\n\nThis is an example product page. It has a title, a price, and a description.",
"page_status_code": 200,
"page_title": "Example Product"
}Scrape errors
If the scrape fails, the response will have 200 status code but the success will be false, the error_code and error_message will be set.
For example:
{
"success": false,
"error_code": "name_not_resolved",
"error_message": "Unable to resolve domain name"
}Read more about error codes in Error section.
Error Responses
400 Bad Request- Invalid parameters or missing required fields401 Unauthorized- Invalid or missing API key402 Payment Required- Insufficient account balance500 Internal Server Error- Server-side error
Refer to Async Requests for more information about handling asynchronous scraping jobs.