Errors
Complete guide to error codes in WebcrawlerAPI with job level and job item level errors
There are 2 levels of errors: job level and job item level.
Job level error codes:
- insufficient_balance - Insufficient balance
- invalid_request - Invalid request
- internal_error - Internal server error
Job item level error codes:
- host_returned_error - Unsuccessful HTTP response from the host
- website_access_denied - Website access denied
- blocked_by_robots_txt - URL blocked by robots.txt
- name_not_resolved - Name resolution error
- internal_error - Internal server error
- timeout_error - Website timeout
- llm_max_context_length_error - AI request error: maximum context length 128k tokens exceeded
Job Level Errors
Job level errors means that the job failed to run. It could be for example that there is not enough balance or internal error from the service.
Insufficient Balance
This error occurs when the balance is not enough to run the job. Go to the dashboard to top up your balance.
API error response example:
{
"error_code": "insufficient_balance",
"error_message": "Your balance is not enough to run this job"
}Invalid request
This error occurs when the request is invalid. For example, the URL is invalid or the parameters are invalid.
API error response example:
{
"error_code": "invalid_request",
"error_message": "whitelist_regexp is invalid"
}Internal error
This error means that something went wrong on our side. Please contact us on [email protected] if you encounter this error.
API error response example:
{
"error_code": "internal_error",
"error_message": "Internal server error"
}Job Item Level Errors
Job item level error means that the job item failed with the specific error.
Job item level errors are returned in the job_items array. List of error codes:
Host returned error
Most common error. This error means that the response HTTP status code is not in range 200-299.
Exception is 403 status code, that has a diffrenen error code website_access_denied.
API error response example:
{
"id": "60b7c4a5-aca7-4183-87db-017418218641",
//...
"status": "done",
"job_items": [
{
//...
"error_code": "host_returned_error",
"status": "error",
"last_error": "Webpage returned error status code: 404"
}
]
}Website access denied
This is a special case of the host_returned_error error. It means that the website returned a 403 status code.
API error response example:
{
//...
"status": "done",
"job_items": [
{
//...
"error_code": "website_access_denied",
"status": "error",
"last_error": "Webpage returned access denied status code: 403"
}
]
}Blocked by robots.txt
This error occurs when the respect_robots_txt parameter is set to true and the website's robots.txt file disallows access to the specific URL for crawlers. The robots.txt file is a standard used by websites to communicate with web crawlers about which parts of the site should not be crawled.
API error response example:
{
"error_code": "blocked_by_robots_txt",
"error_message": "URL is blocked by robots.txt. The website's robots.txt file disallows access to this URL for crawlers. Respect robots.txt can be disabled in the request."
}Name resolution error
This error means that there was a problem with the website host name resolution. Most likelt the website does not exist or there is a typo in the URL.
API error response example:
{
//...
"job_items": [
{
//...
"error_code": "name_not_resolved",
"status": "error",
"last_error": "Connection refused"
}
]
}Website timeout
This error occurs when we tried to reach the webpage several times with different proxies, but unfortunately the website hasn't responded within a reasonable time. There could be several reasons for this:
Troubleshooting steps:
- Check website accessibility - First, verify that the website and webpage are accessible by visiting the webpage manually in your browser
- If it loads slowly in your browser - The issue is likely on the website's side (slow server, high traffic, or downtime)
- If it loads instantly in your browser but still times out in the API - This indicates the website has sophisticated anti-bot protection that we cannot bypass
Common causes:
- The website is slow to respond or experiencing high traffic
- The website is temporarily down or experiencing server issues
- The website has advanced anti-bot protection systems
- The website has captcha or other interactive elements that weren't solved in time
We recommend retrying the request. If the problem persists and the website loads normally in your browser, please contact us at [email protected].
API error response example:
{
//...
"job_items": [
{
//...
"error_code": "timeout_error",
"status": "error",
"last_error": "Website timeout. Please try again later or contact support at [email protected]"
}
]
}LLM Max Context Length Error
This error occurs when the webpage content is too large and doesn't fit within the AI model's context window. The AI processing requires the entire webpage content to fit within its maximum context length limit. When a webpage has too much text, images, or other content, it exceeds this limit and cannot be processed.
A possible solution is to use the clean_selectors parameter which allows you to exclude unneeded content (like navigation, ads, footers) before sending it to the LLM. See the cleaning documentation for more details on how to use clean selectors.
API error response example:
{
//...
"job_items": [
{
//...
"error_code": "llm_max_context_length_error",
"status": "error",
"last_error": "AI request error: maximum context length 128k tokens exceeded for this page"
}
]
}Internal error
This error means that something went wrong on our side. Please contact us on [email protected] if you encounter this error.
API error response example:
{
//...
"job_items": [
{
//...
"error_code": "internal_error",
"status": "error",
"last_error": "Internal server error"
}
]
}