POST /crawl
Basic API endpoint to start crawling a website.
https://api.webcrawlerapi.com/v1/crawl
Format: JSON
Method: POST
Request
Available request params
url
- (required) the seed URL where the crawler starts. Can be any valid URL.scrape_type
- (default:html
) the type of scraping you want to perform. Can behtml
,cleaned
,markdown
.items_limit
- (default:10
) crawler will stops when it reaches this limit of pages for this job.webhook_url
- (optional) the URL where the server will send a POST request once the task is completed (read more about webhooks and async requests).allow_subdomains
- (default:false
) iftrue
the crawler will also crawl subdomains (for example,blog.example.com
if the seed URL isexample.com
).whitelist_regexp
- (optional) a regular expression to whitelist URLs. Only URLs that match the pattern will be crawled.blacklist_regexp
- (optional) a regular expression to blacklist URLs. URLs that match the pattern will be skipped.
Example:
{
"url": "https://stripe.com/",
"webhook_url": "https://yourserver.com/webhook",
"items_limit": 10,
"scrape_type": "cleaned",
"allow_subdomains": false
}
Response
Example:
{
"id": "23b81e21-c672-4402-a886-303f18de9555"
}
Crawling request is done in asynchronous way. It means that you will receive a response with a task id. You can use this task id to check the status of the scraping task (Read more about Async Requests)