docs
API
Get Job URLs

GET /job/:id/urls

Endpoint to get all URLs of a job. This endpoint returns a structured view of all URLs discovered during the job execution, organized into clusters and a flat list.

Method: GET

Request example

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/job/46c7b8ff-eb5e-4ebb-96f1-2685334c07d7/urls \
  --header 'Authorization: Bearer <YOUR TOKEN>'

Response Structure

The response contains two main sections:

Clusters

The clusters array contains path-based groupings of URLs, showing how URLs are distributed across different sections of the website. Each cluster object has:

  • path: The URL path segment (e.g., "/docs", "/blog")
  • size: The number of URLs found under this path

This clustering helps visualize the website's structure and identify the most significant sections.

URLs

The urls array contains a complete list of all discovered URLs in their full form. These are the actual URLs that were crawled during the job execution.

Example Response

{
	"clusters": [
		{
			"path": "/scrapers",
			"size": 36
		},
		{
			"path": "/scrapers/webcrawler",
			"size": 29
		},
		{
			"path": "/blog",
			"size": 21
		},
		{
			"path": "/docs",
			"size": 20
		},
		{
			"path": "/docs/API",
			"size": 7
		}
		// ... more clusters
	],
	"urls": [
		"https://webcrawlerapi.com/privacy",
		"https://webcrawlerapi.com/changelog",
		"https://webcrawlerapi.com/docs/API/cancel",
		"https://webcrawlerapi.com/scrapers/webcrawler/google-search-result/api",
		"https://webcrawlerapi.com/docs/sdk/python"
		// ... more URLs
	]
}

Error Responses

  • 401 Unauthorized - Invalid or missing API key
  • 404 Not Found - Job not found or not completed
  • 500 Internal Server Error - Server-side error

Notes

  • The endpoint is only available after the job has completed successfully
  • Clusters are automatically generated based on URL path segments
  • The size in clusters represents the number of URLs under that path
  • The urls array contains the complete list of discovered URLs (filtered by whitelist_regexp or blacklist_regexp pages will not be shown here)
  • URLs are returned in their full form, including the domain