Caching
How to use caching to improve performance and reduce redundant API calls
WebCrawlerAPI provides built-in caching to avoid re-fetching the same content within a configurable time window. This improves response times and reduces redundant processing.
Default Cache Duration: 7 days (604800 seconds)
By default, all scrape and crawl requests are cached for 7 days. If you need fresh content, set
max_ageto0to disable caching.
How Caching Works
When you make a scrape or crawl request, WebCrawlerAPI checks if a matching result already exists in the cache. If a valid cached result is found (within the specified time window), it is returned immediately without making a new request to the target website.
Cache matching is based on:
- URL (normalized)
- Output format (
markdown,html,cleaned) main_content_onlysettingpromptparameter (if specified)clean_selectorsparameter (if specified)
Cache Parameter
Both the scraping and crawling APIs support the max_age parameter to control caching behavior.
Parameter: max_age
Type: Integer (seconds)
Default: 604800 (7 days)
Values
| Value | Behavior |
|---|---|
604800 (default) | Cache results for 7 days |
0 | Disable cache, always fetch fresh content |
| Custom value | Cache for specified number of seconds |
Scraping API
Endpoint: POST https://api.webcrawlerapi.com/v2/scrape
Request Example
{
"url": "https://example.com",
"output_format": "markdown",
"max_age": 604800
}Request with Cache Disabled
{
"url": "https://example.com",
"output_format": "markdown",
"max_age": 0
}Crawling API
Endpoint: POST https://api.webcrawlerapi.com/v1/crawl
Request Example
{
"url": "https://example.com",
"items_limit": 10,
"scrape_type": "markdown",
"max_age": 604800
}Request with Cache Disabled
{
"url": "https://example.com",
"items_limit": 10,
"scrape_type": "markdown",
"max_age": 0
}Response Headers
When caching is enabled, the API returns headers indicating whether the result was served from cache.
Cache Hit
When a cached result is returned:
| Header | Description |
|---|---|
X-Cache | HIT - Result was served from cache |
Age | Seconds since the content was originally fetched |
Cache-Control | max-age={remaining_seconds} - Time until cache expires |
X-Cache-Created-At | ISO 8601 timestamp when content was originally fetched |
Cache Miss
When fresh content is fetched:
| Header | Description |
|---|---|
X-Cache | MISS - Fresh content was fetched |
Cache-Control | max-age={max_age} - Cache duration for this content |
X-Cache-Created-At | ISO 8601 timestamp when content was fetched |
Common Cache Durations
| Duration | Seconds | Use Case |
|---|---|---|
| 1 hour | 3600 | Frequently updated content |
| 1 day | 86400 | Daily updates |
| 1 week | 604800 | Stable content (default) |
| 1 month | 2592000 | Rarely changing content |
Best Practices
- Use default caching for most use cases - 7 days works well for typical content
- Disable cache (
max_age: 0) when you need the latest version of frequently updated pages - Use shorter cache times for news sites, social media, or real-time data
- Use longer cache times for documentation, archived content, or static pages