Skip to content

Getting started with Webcrawler API

Webcrawler API helps you to extract data from websites. It is a powerful tool that can be used to extract data from websites that do not provide an API. Read more about it here: Webcrawler API

Prerequisites

In order to use Webcrawler API you need first to obtain an API key:

  1. Register on Webcrawler API Dashboard
  2. Navigate to the API key section
  3. Copy your API key

First request

To make your first request you can use the following curl command:

Terminal window
curl --request POST \
--url https://api.webcrawlerapi.com/v1/crawl \
--header 'Authorization: Bearer <PASTE YOUR API KEY HERE>' \
--data '{
"items_limit": 5,
"url": "https://stripe.com/",
"clean_content": true,
}'

This command will start a new crawl job that will extract data from the Stripe website. The items_limit parameter specifies how many items you want to extract. The clean_content parameter specifies if you want to clean the extracted content from HTML tags. See more at POST /v1/crawl API reference.

Result:

{
"id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b", // <--- <CRAWL_JOB_ID>
}

Get crawling result

To get the crawling result you can use the following curl command:

Terminal window
curl --request GET \
--url https://api.webcrawlerapi.com/v1/crawl/<CRAWL_JOB_ID> \
--header 'Authorization : Bearer <PASTE YOUR API KEY HERE>'

Result:

{
"id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b",
"url": "https://stripe.com/",
...
"status": "done",
"job_items": [
{
"id": 423,
"job_id": "be0c2ae2-8545-4c4a-8728-5dd122878098",
"original_url": "https://stripe.com",
"page_status_code": 200,
"raw_content_url": "https://data.webcrawlerapi.com/raw/clrgcx48g0001ozloz9ficivc/be0c2ae2-8545-4c4a-8728-5dd122878098/https:__stripe_com",
"clean_content_url": "https://data.webcrawlerapi.com/clean/clrgcx48g0001ozloz9ficivc/be0c2ae2-8545-4c4a-8728-5dd122878098/https:__stripe_com",
...
}
...
}

See more at GET /v1/crawl/:id reference.