Webcrawler API LogoWebCrawlerAPI

Caching

How to use caching to improve performance and reduce redundant API calls

WebCrawlerAPI provides built-in caching to avoid re-fetching the same content within a configurable time window. This improves response times and reduces redundant processing.

Default Cache Duration: 7 days (604800 seconds)

By default, all scrape and crawl requests are cached for 7 days. If you need fresh content, set max_age to 0 to disable caching.

How Caching Works

When you make a scrape or crawl request, WebCrawlerAPI checks if a matching result already exists in the cache. If a valid cached result is found (within the specified time window), it is returned immediately without making a new request to the target website.

Cache matching is based on:

  • URL (normalized)
  • Output format (markdown, html, cleaned)
  • main_content_only setting
  • prompt parameter (if specified)
  • clean_selectors parameter (if specified)

Cache Parameter

Both the scraping and crawling APIs support the max_age parameter to control caching behavior.

Parameter: max_age Type: Integer (seconds) Default: 604800 (7 days)

Values

ValueBehavior
604800 (default)Cache results for 7 days
0Disable cache, always fetch fresh content
Custom valueCache for specified number of seconds

Scraping API

Endpoint: POST https://api.webcrawlerapi.com/v2/scrape

Request Example

{
    "url": "https://example.com",
    "output_format": "markdown",
    "max_age": 604800
}

Request with Cache Disabled

{
    "url": "https://example.com",
    "output_format": "markdown",
    "max_age": 0
}

Crawling API

Endpoint: POST https://api.webcrawlerapi.com/v1/crawl

Request Example

{
    "url": "https://example.com",
    "items_limit": 10,
    "scrape_type": "markdown",
    "max_age": 604800
}

Request with Cache Disabled

{
    "url": "https://example.com",
    "items_limit": 10,
    "scrape_type": "markdown",
    "max_age": 0
}

Response Headers

When caching is enabled, the API returns headers indicating whether the result was served from cache.

Cache Hit

When a cached result is returned:

HeaderDescription
X-CacheHIT - Result was served from cache
AgeSeconds since the content was originally fetched
Cache-Controlmax-age={remaining_seconds} - Time until cache expires
X-Cache-Created-AtISO 8601 timestamp when content was originally fetched

Cache Miss

When fresh content is fetched:

HeaderDescription
X-CacheMISS - Fresh content was fetched
Cache-Controlmax-age={max_age} - Cache duration for this content
X-Cache-Created-AtISO 8601 timestamp when content was fetched

Common Cache Durations

DurationSecondsUse Case
1 hour3600Frequently updated content
1 day86400Daily updates
1 week604800Stable content (default)
1 month2592000Rarely changing content

Best Practices

  1. Use default caching for most use cases - 7 days works well for typical content
  2. Disable cache (max_age: 0) when you need the latest version of frequently updated pages
  3. Use shorter cache times for news sites, social media, or real-time data
  4. Use longer cache times for documentation, archived content, or static pages