Any website to feed

What is a feed?

A feed monitors a website for changes and delivers updates automatically. The system crawls your target website periodically and notifies you when content changes.

Creating a feed

Create a feed by sending a POST request to /v1/feeds:

curl --request POST \
  --url https://api.webcrawlerapi.com/v1/feeds \
  --header 'Authorization: Bearer <YOUR API KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://example.com",
    "name": "Example Blog",
    "scrape_type": "markdown",
    "max_depth": 1,
    "webhook_url": "https://webhook.site/287700f1-c94e-4ccb-839f-a7dc4b0992b1"
  }'

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "url": "https://example.com",
  "status": "active",
  "next_run_at": "2024-01-15T14:30:00Z",
  "webhook_url": "https://webhook.site/287700f1-c94e-4ccb-839f-a7dc4b0992b1"
}

Parameters

Similar as for Job

url (required) - The website URL to monitor
name (optional) - A friendly name for your feed
scrape_type (optional) - Output format: markdown, cleaned, or html (default: markdown)
max_depth (optional) - Maximum crawl depth (0-10)
items_limit (optional) - Maximum pages to crawl (default: 10)
webhook_url (optional) - URL to receive change notifications
whitelist_regexp (optional) - Regex pattern to include URLs
blacklist_regexp (optional) - Regex pattern to exclude URLs
respect_robots_txt (optional) - Follow robots.txt rules
main_content_only (optional) - Extract main content only

See the API Reference for complete parameter details.

Getting feed updates

There are three ways to receive feed updates:

1. RSS/Atom Feed

Get updates in standard Atom 1.0 format for use with feed readers:

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/feeds/{feed_id}/rss \
  --header 'Authorization: Bearer <YOUR API KEY>'

Subscribe to this URL in any RSS/Atom reader.

See the API Reference for details.

3. Webhooks

Receive a POST request when changes are detected. Add webhook_url when creating your feed:

{
  "url": "https://example.com",
  "webhook_url": "https://yourserver.com/webhook"
}

When changes are detected, you'll receive a POST request with the feed run details including change information. Each changed/new page will include a content_url field pointing to the page content in the format specified by your feed's scrape_type setting (markdown, cleaned, or html).

Feed status

Get information about a feed and its recent runs:

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/feeds/{feed_id} \
  --header 'Authorization: Bearer <YOUR API KEY>'

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "url": "https://example.com",
  "name": "Example Blog",
  "scrape_type": "markdown",
  "items_limit": 10,
  "status": "active",
  "next_run_at": "2024-01-16T14:30:00Z",
  "last_run_at": "2024-01-15T14:30:00Z",
  "created_at": "2024-01-01T10:00:00Z",
  "recent_runs": [
    {
      "id": "run-123",
      "status": "completed",
      "pages_crawled": 10,
      "pages_changed": 2,
      "pages_new": 1,
      "pages_unavailable": 0,
      "pages_errors": 0,
      "cost_usd": 0.002,
      "started_at": "2024-01-15T14:30:00Z",
      "finished_at": "2024-01-15T14:32:00Z"
    }
  ]
}

Response fields

status - Feed status: active, paused, or canceled
next_run_at - When the next crawl will run
last_run_at - When the last crawl completed
recent_runs - Array of recent feed runs
- pages_crawled - Total pages processed
- pages_changed - Pages with content changes
- pages_new - Newly discovered pages
- pages_unavailable - Pages that returned 404 or similar
- pages_errors - Pages that failed to load
- cost_usd - Cost in USD for this run

See the API Reference for all available fields.

Error handling

Feeds are automatically paused after 3 consecutive errors. This prevents unnecessary charges when a website becomes unreachable. You can resume the feed manually once the issue is resolved.

Managing feeds

List all feeds: GET /v1/feeds - API Reference
Pause a feed: POST /v1/feeds/{id}/pause - API Reference
Resume a feed: POST /v1/feeds/{id}/resume - API Reference
Delete a feed: DELETE /v1/feeds/{id} - API Reference
Force run: POST /v1/feeds/{id}/run - API Reference