GET /job/:id
Get job status and crawling results by job ID
Basic API endpoint to start crawling a website.
https://api.webcrawlerapi.com/v1/job/:idMethod: GET
Request
Available request params
- id- (required) the unique identifier of the job.
Example:
https://api.webcrawlerapi.com/v1/job/6c391693-e566-4b99-97ca-5fa00032e281Response
Job contains:
- 
id- the unique identifier of the job.
- 
org_id- your organization identifier.
- 
url- the seed URL where the crawler started.
- 
status- the status of the job. Can benew,in_progress,done,error.
- 
scrape_type- the type of scraping you want to perform (html,cleanedormarkdown).
- 
whitelist_regexp- a regular expression to whitelist URLs.
- 
blacklist_regexp- a regular expression to blacklist URLs.
- 
allow_subdomains- if the crawler will also crawl subdomains.
- 
items_limit- the limit of pages for this job.
- 
created_at- the date when the job was created.
- 
finished_at- the date when the job was finished.
- 
webhook_url- the URL where the server will send a POST request once the task is completed.
- 
webhook_status- the status of the webhook request.
- 
webhook_error- the error message if the webhook request failed.
- 
job_items- an array of items that were extracted from the pages.Job Item: - id- the unique identifier of the item.
- status- the status of the item. Can be- new,- in_progress,- done,- error.
- job_id- the job identifier.
- original_url- the URL of the page.
- page_status_code- the status code of the page request.
- raw_content_url- the URL to the raw content of the page.
- cleaned_content_url- the URL to the cleaned content of the page (if- scrape_typeis- cleaned. Check Crawling Types).
- markdown_content_url- the URL to the markdown content of the page (if- scrape_typeis- markdown. Check Crawling Types).
- title- the title of the page (- <title>tag content).
- created_at- the date when the item was created.
- cost- the cost of the item in $.
- referred_url- the URL where the page was referred from.
- last_error- the last error message if the item failed.
 
Example:
{
	"id": "abb39f29-087e-4714-aa05-15537be12f90",
	"org_id": "cm48ww9kw00019rv7bsyfko1d",
	"url": "https://books.toscrape.com/",
	"scrape_type": "markdown",
	"whitelist_regexp": ".*category.*",
	"blacklist_regexp": "",
	"allow_subdomains": false,
	"items_limit": 10,
	"created_at": "2024-12-15T10:26:13.893Z",
	"finished_at": "2024-12-15T10:26:37.118Z",
	"updated_at": "2024-12-15T10:26:37.118Z",
	"webhook_url": "",
	"status": "done",
	"job_items": [
		{
			"id": "a46f3117-f97a-4ca2-a434-6cfdcd022b72",
			"job_id": "abb39f29-087e-4714-aa05-15537be12f90",
			"original_url": "https://books.toscrape.com/catalogue/category/books/travel_2/index.html",
			"page_status_code": 200,
			"markdown_content_url": "https://data.webcrawlerapi.com/markdown/books.toscrape.com/https___books_toscrape_com_catalogue_category_books_travel_2_index_html",
			"status": "done",
			"title": "All products | Books to Scrape - Sandbox",
			"last_error": "",
			"created_at": "2024-12-15T10:26:17.941Z",
			"updated_at": "2024-12-15T10:26:23.915Z",
			"cost": 2000,
			"referred_url": "https://books.toscrape.com/"
		}
    ]
}
		Refer to Job overview for more information about the response fields.
Crawling request is done in asynchronous way. It means that you will receive a response with a task id. You can use this task id to check the status of the scraping task (Read more about Async Requests)