Getting started with Webcrawler API

Webcrawler API helps you to extract data from websites. It is a powerful tool that can be used to extract data from websites that do not provide an API. Read more about it here: Webcrawler API (opens in a new tab)

Prerequisites

In order to use Webcrawler API you need first to obtain an API key:

Register on Webcrawler API Dashboard (opens in a new tab)
Navigate to the API key section (opens in a new tab)
Copy your API key

Request

To start using the WebcrawlerAPI you need to make an HTTP POST request to the API endpoint:

https://api.webcrawlerapi.com/v1/crawl

with JSON body that contains parameters

Note: You must use the API key to authenticate requests to the API.

First request

To make your first request you can use the following curl command:

curl --request POST \
  --url https://api.webcrawlerapi.com/v1/crawl \
  --header 'Authorization: Bearer <PASTE YOUR API KEY HERE>' \
  --data '{
	"items_limit": 5,
	"url": "https://stripe.com/",
	"scrape_type": "markdown"
}'

This command will start a new crawl Job that will extract data from the Stripe website. The items_limit parameter specifies how many items you want to extract. The scrape_type parameter specifies that you want to see markdown formatted data (read more about Crawling Types.

Result:

{
    "id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b", // <--- <CRAWL_JOB_ID>
}

Crawling request is done in asynchronous way. It means that you will receive a response with a task id. You can use this task id to check the status of the scraping task (Read more about Async Requests)

Get crawling result

To get the crawling result you can use the following curl command:

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/job/<CRAWL_JOB_ID> \
  --header 'Authorization : Bearer <PASTE YOUR API KEY HERE>'

Result:

{
	"id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b",
	"url": "https://stripe.com/",
	...
		"status": "done",
	    "job_items": [
		{
			"id": "be0c2ae2-8545-4c4a-8728-5dd122878098",
			"job_id": "be0c2ae2-8545-4c4a-8728-5dd122878098",
			"original_url": "https://stripe.com",
			"page_status_code": 200,
			"raw_content_url": "https://data.webcrawlerapi.com/raw/clrgcx48g0001ozloz9ficivc/be0c2ae2-8545-4c4a-8728-5dd122878098/https:__stripe_com",
			"clean_content_url": "https://data.webcrawlerapi.com/clean/clrgcx48g0001ozloz9ficivc/be0c2ae2-8545-4c4a-8728-5dd122878098/https:__stripe_com",
			...
        }
    ...
}

API access key