GET /job/:id/urls

Endpoint to get all URLs of a job. This endpoint returns a structured view of all URLs discovered during the job execution, organized into clusters and a flat list.

Method: GET

Request example

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/job/46c7b8ff-eb5e-4ebb-96f1-2685334c07d7/urls \
  --header 'Authorization: Bearer <YOUR TOKEN>'

Response Structure

The response contains two main sections:

Clusters

The clusters array contains path-based groupings of URLs, showing how URLs are distributed across different sections of the website. Each cluster object has:

path: The URL path segment (e.g., "/docs", "/blog")
size: The number of URLs found under this path

This clustering helps visualize the website's structure and identify the most significant sections.

URLs

The urls array contains a complete list of all discovered URLs in their full form. These are the actual URLs that were crawled during the job execution.

Example Response

{
	"clusters": [
		{
			"path": "/scrapers",
			"size": 36
		},
		{
			"path": "/scrapers/webcrawler",
			"size": 29
		},
		{
			"path": "/blog",
			"size": 21
		},
		{
			"path": "/docs",
			"size": 20
		},
		{
			"path": "/docs/API",
			"size": 7
		}
		// ... more clusters
	],
	"urls": [
		"https://webcrawlerapi.com/privacy",
		"https://webcrawlerapi.com/changelog",
		"https://webcrawlerapi.com/docs/API/cancel",
		"https://webcrawlerapi.com/scrapers/webcrawler/google-search-result/api",
		"https://webcrawlerapi.com/docs/sdk/python"
		// ... more URLs
	]
}

Error Responses

401 Unauthorized - Invalid or missing API key
404 Not Found - Job not found or not completed
500 Internal Server Error - Server-side error

Notes

The endpoint is only available after the job has completed successfully
Clusters are automatically generated based on URL path segments
The size in clusters represents the number of URLs under that path
The urls array contains the complete list of discovered URLs (filtered by whitelist_regexp or blacklist_regexp pages will not be shown here)
URLs are returned in their full form, including the domain

Scrape Result