Getting Started
A guide how to get started with Webcrawler API
Webcrawler API helps you to extract data from websites. It is a powerful tool that can be used to extract data from websites that do not provide an API. Read more about it here: Webcrawler API
Prerequisites
In order to use Webcrawler API you need first to obtain an API key:
- Register on Webcrawler API Dashboard
- Navigate to the API key section
- Copy your API key
Request
To start using the WebcrawlerAPI you need to make an HTTP POST request to the API endpoint:
https://api.webcrawlerapi.com/v1/crawlwith JSON body that contains parameters
Note: You must use the API key to authenticate requests to the API.
First request
To make your first request you can use the following curl command:
curl --request POST \
--url https://api.webcrawlerapi.com/v1/crawl \
--header 'Authorization: Bearer <PASTE YOUR API KEY HERE>' \
--data '{
"items_limit": 5,
"url": "https://stripe.com/",
"output_formats": ["markdown"]
}'This command will start a new crawl Job that will extract data from the Stripe website. The items_limit parameter specifies how many items you want to extract. The output_formats parameter specifies that you want to see markdown formatted data (read more about Crawling Types.
Result:
{
"id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b", // <--- <CRAWL_JOB_ID>
}Crawling request is done in asynchronous way. It means that you will receive a response with a task id. You can use this task id to check the status of the scraping task (Read more about Async Requests)
Get crawling result
To get the crawling result you can use the following curl command:
curl --request GET \
--url https://api.webcrawlerapi.com/v1/job/<CRAWL_JOB_ID> \
--header 'Authorization : Bearer <PASTE YOUR API KEY HERE>'Result:
{
"id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b",
"url": "https://stripe.com/",
...
"status": "done",
"job_items": [
{
"id": "be0c2ae2-8545-4c4a-8728-5dd122878098",
"job_id": "be0c2ae2-8545-4c4a-8728-5dd122878098",
"original_url": "https://stripe.com",
"page_status_code": 200,
"raw_content_url": "https://data.webcrawlerapi.com/raw/clrgcx48g0001ozloz9ficivc/be0c2ae2-8545-4c4a-8728-5dd122878098/https:__stripe_com",
"clean_content_url": "https://data.webcrawlerapi.com/clean/clrgcx48g0001ozloz9ficivc/be0c2ae2-8545-4c4a-8728-5dd122878098/https:__stripe_com",
...
}
...
}Integrate with your Coding Agent
Just copy paste this into your Coding Agent:
//Copy-paste this to your coding agent.
You are integrating WebCrawlerAPI into an existing project.
Before writing code, fetch and follow the full guide at:
https://webcrawlerapi.com/docs/agent-integration
Your job:
1. Inspect the codebase and identify the project's language, framework, and HTTP/client patterns.
2. Use an official WebCrawlerAPI SDK if one matches the project language; otherwise use the raw HTTP API.
3. Ask the user for a WebCrawlerAPI API key if one is not already configured.
4. Store the key in environment variables only. Never hardcode secrets.
5. Implement the smallest clean integration that fits the existing architecture and conventions.
Integration rules:
- For a single page, prefer POST https://api.webcrawlerapi.com/v2/scrape
- For multi-page website crawling, prefer POST https://api.webcrawlerapi.com/v1/crawl
- Default to output_formats: ["markdown"] unless the user needs something else
- Use main_content_only when the goal is clean LLM-ready content
- Handle errors, timeouts, and async polling correctly
- Reuse existing logging, config, and error-handling patterns in the repo
- Add concise usage notes or examples if the codebase already includes integration docs/tests
Expected outcome:
- Working WebCrawlerAPI integration
- Environment variable wiring
- Minimal example usage in the project's existing style
- Clear notes on what was added and how to use it