Any website to feed
Monitor websites for changes and get updates via RSS, JSON, or webhooks
What is a feed?
A feed monitors a website for changes and delivers updates automatically. The system crawls your target website periodically and notifies you when content changes.
Creating a feed
Create a feed by sending a POST request to /v1/feeds:
curl --request POST \
--url https://api.webcrawlerapi.com/v1/feeds \
--header 'Authorization: Bearer <YOUR API KEY>' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com",
"name": "Example Blog",
"scrape_type": "markdown",
"max_depth": 1,
"webhook_url": "https://webhook.site/287700f1-c94e-4ccb-839f-a7dc4b0992b1"
}'Response:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"url": "https://example.com",
"status": "active",
"next_run_at": "2024-01-15T14:30:00Z",
"webhook_url": "https://webhook.site/287700f1-c94e-4ccb-839f-a7dc4b0992b1"
}Parameters
Similar as for Job
url(required) - The website URL to monitorname(optional) - A friendly name for your feedscrape_type(optional) - Output format:markdown,cleaned, orhtml(default:markdown)max_depth(optional) - Maximum crawl depth (0-10)items_limit(optional) - Maximum pages to crawl (default: 10)webhook_url(optional) - URL to receive change notificationswhitelist_regexp(optional) - Regex pattern to include URLsblacklist_regexp(optional) - Regex pattern to exclude URLsallow_subdomains(optional) - Include subdomains in crawlrespect_robots_txt(optional) - Follow robots.txt rulesmain_content_only(optional) - Extract main content only
See the API Reference for complete parameter details.
Getting feed updates
There are three ways to receive feed updates:
1. RSS/Atom Feed
Get updates in standard Atom 1.0 format for use with feed readers:
curl --request GET \
--url https://api.webcrawlerapi.com/v1/feeds/{feed_id}/rss \
--header 'Authorization: Bearer <YOUR API KEY>'Subscribe to this URL in any RSS/Atom reader.
See the API Reference for details.
2. JSON Feed
Get updates in JSON Feed format:
curl --request GET \
--url https://api.webcrawlerapi.com/v1/feeds/{feed_id}/json \
--header 'Authorization: Bearer <YOUR API KEY>'Response:
{
"version": "https://jsonfeed.org/version/1",
"title": "Example Blog",
"home_page_url": "https://example.com",
"feed_url": "https://api.webcrawlerapi.com/v1/feeds/{feed_id}/json",
"items": [
{
"id": "https://example.com/article-1",
"url": "https://example.com/article-1",
"title": "Article Title",
"content_text": "Article content...",
"date_published": "2024-01-15T14:30:00Z"
}
]
}See the API Reference for details.
3. Webhooks
Receive a POST request when changes are detected. Add webhook_url when creating your feed:
{
"url": "https://example.com",
"webhook_url": "https://yourserver.com/webhook"
}When changes are detected, you'll receive a POST request with the feed run details including change information. Each changed/new page will include a content_url field pointing to the page content in the format specified by your feed's scrape_type setting (markdown, cleaned, or html).
Feed status
Get information about a feed and its recent runs:
curl --request GET \
--url https://api.webcrawlerapi.com/v1/feeds/{feed_id} \
--header 'Authorization: Bearer <YOUR API KEY>'Response:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"url": "https://example.com",
"name": "Example Blog",
"scrape_type": "markdown",
"items_limit": 10,
"status": "active",
"next_run_at": "2024-01-16T14:30:00Z",
"last_run_at": "2024-01-15T14:30:00Z",
"created_at": "2024-01-01T10:00:00Z",
"recent_runs": [
{
"id": "run-123",
"status": "completed",
"pages_crawled": 10,
"pages_changed": 2,
"pages_new": 1,
"pages_unavailable": 0,
"pages_errors": 0,
"cost_usd": 0.002,
"started_at": "2024-01-15T14:30:00Z",
"finished_at": "2024-01-15T14:32:00Z"
}
]
}Response fields
status- Feed status:active,paused, orcancelednext_run_at- When the next crawl will runlast_run_at- When the last crawl completedrecent_runs- Array of recent feed runspages_crawled- Total pages processedpages_changed- Pages with content changespages_new- Newly discovered pagespages_unavailable- Pages that returned 404 or similarpages_errors- Pages that failed to loadcost_usd- Cost in USD for this run
See the API Reference for all available fields.
Error handling
Feeds are automatically paused after 3 consecutive errors. This prevents unnecessary charges when a website becomes unreachable. You can resume the feed manually once the issue is resolved.
Managing feeds
- List all feeds:
GET /v1/feeds- API Reference - Pause a feed:
POST /v1/feeds/{id}/pause- API Reference - Resume a feed:
POST /v1/feeds/{id}/resume- API Reference - Delete a feed:
DELETE /v1/feeds/{id}- API Reference - Force run:
POST /v1/feeds/{id}/run- API Reference