JavaScript and TypeScript (Node.js) WebCrawler API SDK

Installation

npm i webcrawlerapi-js

Usage

Synchronous Crawling

The synchronous method waits for the crawl to complete and returns all data at once.

import webcrawlerapi from "webcrawlerapi-js";
 
const client = new webcrawlerapi.WebcrawlerClient("YOUR_API_KEY");
 
// Synchronous crawling
const result = await client.crawl({
    "url": "https://stripe.com/",
    "scrape_type": "markdown",
    "items_limit": 10
});
 
for (const item of syncJob.job_items) {
    item.getContent().then((content) => {
        console.log(content.slice(0, 100));
    })
}
console.log(result);

Asynchronous Crawling

The asynchronous method returns a job ID immediately and allows you to check the status later.

import webcrawlerapi from "webcrawlerapi-js";
 
const client = new webcrawlerapi.WebcrawlerClient("YOUR_API_KEY");
 
// Start the async crawl job
const job = await client.crawlAsync({
    "url": "https://stripe.com/",
    "scrape_type": "markdown",
    "items_limit": 10
});
 
// Get the job ID
const jobId = job.id;
 
// Check job status
let jobStatus = await client.getJob(jobId);
console.log(jobStatus);
 
// You can poll the job status until it's complete
while (jobStatus.status === 'in_progress') {
    await new Promise(resolve => setTimeout(resolve, jobStatus.recommended_pull_delay_ms));
    jobStatus = await client.getJob(jobId);
}
 
console.log('Final result:', jobStatus);

Options

Both methods support the following options:

url: The target URL to crawl
scrape_type: Type of content to extract ('markdown', 'html', etc.)
items_limit: Maximum number of pages to crawl
allow_subdomains: Whether to crawl subdomains (default: false)
whitelist_regexp: Regular expression for allowed URLs
blacklist_regexp: Regular expression for blocked URLs
webhook_url: URL to receive notifications when the job completes

GetContent

The job item contains a link to its content. For convenience, there is a getContent() method that allows you to easily access this content. Here's an example:

const result = await client.crawl({
    "url": "https://stripe.com/",
    "scrape_type": "markdown",
    "items_limit": 10
});
 
for (const item of syncJob.job_items) {
    item.getContent().then((content) => {
        console.log(content.slice(0, 100));
    })
}

This method retrieves the full content associated with the job, which can be useful for processing or displaying the job's data.

.NET PHP