Async and Webhooks

All request in Webcrawler API are asynchronous by default.

What is asynchronous request?

Asynchronous communication means that the client (you) does not have to wait for the server to finish processing a request. Instead, there are two ways to get the result:

The server will notify the client once the task is completed via Webhook.
The client can check the status of the task via API call.

Why use asynchronous requests?

Crawling and scraping jobs can take a long time to complete. Using asynchronous, the client can continue with other tasks without waiting for the server’s response.

Using webhooks

Using webhooks with WebcrawlerAPI allows you to deliver the results of the job to your URL as a POST body. To use webhook you need to provide a URL where the server will send a POST request once the task is completed. It means to add a webhook_url parameter to the request.

Request example:

{
    "url": "https://stripe.com/",
    "webhook_url": "https://yourserver.com/webhook"
}

Once the job is completed, the server will send a POST request to the provided URL with the payload:

{
    "id": "b1b1b1b1-b1b1-b1b1-b1b1-b1b1b1b1b1b1",
    ...
}

Using API calls

To check the status of the crawling job you can use the foll API call:

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/job/b1b1b1b1-b1b1-b1b1-b1b1-b1b1b1b1b1b1 \
  --header 'Authorization: Bearer <YOUR API TOKEN HERE>'

Response will contains job info and the job status:

{
    "id": "b1b1b1b1-b1b1-b1b1-b1b1-b1b1b1b1b1b1",
    "status": "done",
    ...
}

Upload to S3 Errors