API
POST /crawl
Basic API endpoint to start crawling a website
Basic API endpoint to start crawling a website.
https://api.webcrawlerapi.com/v1/crawlFormat: JSON Method: POST
Request
Available request params
url- (required) the seed URL where the crawler starts. Can be any valid URL.scrape_type- (default:markdown) the type of scraping you want to perform. Can behtml,cleaned,markdown.items_limit- (required) crawler will stops when it reaches this limit of pages for this job.webhook_url- (optional) the URL where the server will send a POST request once the task is completed (read more about webhooks and async requests).main_content_only- (optional) Extract only the main content of article or blog post. When set totrue, the scraper will focus on extracting the primary article content while filtering out navigation, sidebars, ads, and other non-essential elements. Default isfalse.allow_subdomains- (default:false) iftruethe crawler will also crawl subdomains (for example,blog.example.comif the seed URL isexample.com).whitelist_regexp- (optional) a regular expression to whitelist URLs. Only URLs that match the pattern will be crawled.blacklist_regexp- (optional) a regular expression to blacklist URLs. URLs that match the pattern will be skipped.respect_robots_txt- (optional) if set totrue, the crawler will respect the website's robots.txt file and skip pages that are disallowed by it. Default isfalse.max_depth- (optional) maximum depth of crawling from the starting URL. A value of0means only the starting page,1means the starting page plus pages directly linked from it,2adds one more level of depth, and so on. By default, there is no depth limit.
Example:
{
"url": "https://stripe.com/",
"webhook_url": "https://yourserver.com/webhook",
"items_limit": 10,
"scrape_type": "cleaned",
"main_content_only": true,
"allow_subdomains": false,
"respect_robots_txt": true,
"max_depth": 2
}Response
Example:
{
"id": "23b81e21-c672-4402-a886-303f18de9555"
}Crawling request is done in asynchronous way. It means that you will receive a response with a task id. You can use this task id to check the status of the scraping task (Read more about Async Requests)