GET /job/:id/urls
Endpoint to get all URLs of a job. This endpoint returns a structured view of all URLs discovered during the job execution, organized into clusters and a flat list.
Method: GET
Request example
curl --request GET \
--url https://api.webcrawlerapi.com/v1/job/46c7b8ff-eb5e-4ebb-96f1-2685334c07d7/urls \
--header 'Authorization: Bearer <YOUR TOKEN>'
Response Structure
The response contains two main sections:
Clusters
The clusters
array contains path-based groupings of URLs, showing how URLs are distributed across different sections of the website. Each cluster object has:
path
: The URL path segment (e.g., "/docs", "/blog")size
: The number of URLs found under this path
This clustering helps visualize the website's structure and identify the most significant sections.
URLs
The urls
array contains a complete list of all discovered URLs in their full form. These are the actual URLs that were crawled during the job execution.
Example Response
{
"clusters": [
{
"path": "/scrapers",
"size": 36
},
{
"path": "/scrapers/webcrawler",
"size": 29
},
{
"path": "/blog",
"size": 21
},
{
"path": "/docs",
"size": 20
},
{
"path": "/docs/API",
"size": 7
}
// ... more clusters
],
"urls": [
"https://webcrawlerapi.com/privacy",
"https://webcrawlerapi.com/changelog",
"https://webcrawlerapi.com/docs/API/cancel",
"https://webcrawlerapi.com/scrapers/webcrawler/google-search-result/api",
"https://webcrawlerapi.com/docs/sdk/python"
// ... more URLs
]
}
Error Responses
401 Unauthorized
- Invalid or missing API key404 Not Found
- Job not found or not completed500 Internal Server Error
- Server-side error
Notes
- The endpoint is only available after the job has completed successfully
- Clusters are automatically generated based on URL path segments
- The
size
in clusters represents the number of URLs under that path - The
urls
array contains the complete list of discovered URLs (filtered bywhitelist_regexp
orblacklist_regexp
pages will not be shown here) - URLs are returned in their full form, including the domain