Webcrawler API LogoWebCrawlerAPI

Any website to feed

Monitor websites for changes and get updates via RSS, JSON, or webhooks

What is a feed?

A feed monitors a website for changes and delivers updates automatically. The system crawls your target website periodically and notifies you when content changes.

Creating a feed

Create a feed by sending a POST request to /v1/feeds:

curl --request POST \
  --url https://api.webcrawlerapi.com/v1/feeds \
  --header 'Authorization: Bearer <YOUR API KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://example.com",
    "name": "Example Blog",
    "scrape_type": "markdown",
    "max_depth": 1,
    "webhook_url": "https://webhook.site/287700f1-c94e-4ccb-839f-a7dc4b0992b1"
  }'

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "url": "https://example.com",
  "status": "active",
  "next_run_at": "2024-01-15T14:30:00Z",
  "webhook_url": "https://webhook.site/287700f1-c94e-4ccb-839f-a7dc4b0992b1"
}

Parameters

Similar as for Job

  • url (required) - The website URL to monitor
  • name (optional) - A friendly name for your feed
  • scrape_type (optional) - Output format: markdown, cleaned, or html (default: markdown)
  • max_depth (optional) - Maximum crawl depth (0-10)
  • items_limit (optional) - Maximum pages to crawl (default: 10)
  • webhook_url (optional) - URL to receive change notifications
  • whitelist_regexp (optional) - Regex pattern to include URLs
  • blacklist_regexp (optional) - Regex pattern to exclude URLs
  • allow_subdomains (optional) - Include subdomains in crawl
  • respect_robots_txt (optional) - Follow robots.txt rules
  • main_content_only (optional) - Extract main content only

See the API Reference for complete parameter details.

Getting feed updates

There are three ways to receive feed updates:

1. RSS/Atom Feed

Get updates in standard Atom 1.0 format for use with feed readers:

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/feeds/{feed_id}/rss \
  --header 'Authorization: Bearer <YOUR API KEY>'

Subscribe to this URL in any RSS/Atom reader.

See the API Reference for details.

2. JSON Feed

Get updates in JSON Feed format:

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/feeds/{feed_id}/json \
  --header 'Authorization: Bearer <YOUR API KEY>'

Response:

{
  "version": "https://jsonfeed.org/version/1",
  "title": "Example Blog",
  "home_page_url": "https://example.com",
  "feed_url": "https://api.webcrawlerapi.com/v1/feeds/{feed_id}/json",
  "items": [
    {
      "id": "https://example.com/article-1",
      "url": "https://example.com/article-1",
      "title": "Article Title",
      "content_text": "Article content...",
      "date_published": "2024-01-15T14:30:00Z"
    }
  ]
}

See the API Reference for details.

3. Webhooks

Receive a POST request when changes are detected. Add webhook_url when creating your feed:

{
  "url": "https://example.com",
  "webhook_url": "https://yourserver.com/webhook"
}

When changes are detected, you'll receive a POST request with the feed run details including change information. Each changed/new page will include a content_url field pointing to the page content in the format specified by your feed's scrape_type setting (markdown, cleaned, or html).

Feed status

Get information about a feed and its recent runs:

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/feeds/{feed_id} \
  --header 'Authorization: Bearer <YOUR API KEY>'

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "url": "https://example.com",
  "name": "Example Blog",
  "scrape_type": "markdown",
  "items_limit": 10,
  "status": "active",
  "next_run_at": "2024-01-16T14:30:00Z",
  "last_run_at": "2024-01-15T14:30:00Z",
  "created_at": "2024-01-01T10:00:00Z",
  "recent_runs": [
    {
      "id": "run-123",
      "status": "completed",
      "pages_crawled": 10,
      "pages_changed": 2,
      "pages_new": 1,
      "pages_unavailable": 0,
      "pages_errors": 0,
      "cost_usd": 0.002,
      "started_at": "2024-01-15T14:30:00Z",
      "finished_at": "2024-01-15T14:32:00Z"
    }
  ]
}

Response fields

  • status - Feed status: active, paused, or canceled
  • next_run_at - When the next crawl will run
  • last_run_at - When the last crawl completed
  • recent_runs - Array of recent feed runs
    • pages_crawled - Total pages processed
    • pages_changed - Pages with content changes
    • pages_new - Newly discovered pages
    • pages_unavailable - Pages that returned 404 or similar
    • pages_errors - Pages that failed to load
    • cost_usd - Cost in USD for this run

See the API Reference for all available fields.

Error handling

Feeds are automatically paused after 3 consecutive errors. This prevents unnecessary charges when a website becomes unreachable. You can resume the feed manually once the issue is resolved.

Managing feeds