Website crawler (Webcrawler) PHP integration
To try it without code integration go to Dashboard.
Check out Docs to learn more about Webcrawler API.
Install dependency
composer require webcrawlerapi/sdk
Requirements
- PHP 8.0 or higher
- Composer
- ext-json PHP extension
- Guzzle HTTP Client 7.0 or higher
How to get an access key?
Read Docs Access Key section to obtain a key.
Usage
In sync way, waiting for all items to complete:
use WebCrawlerAPI\WebCrawlerAPI;
$crawler = new WebCrawlerAPI('YOUR API ACCESS KEY HERE');
// Synchronous crawling (blocks until completion)
$job = $crawler->crawl(
url: 'https://books.toscrape.com/',
scrapeType: 'markdown',
itemsLimit: 3
);
// Access job items and their content
foreach ($job->jobItems as $item) {
echo "Original URL: {$item->originalUrl}\n";
echo "Content URL: {$item->markdownContentUrl}\n";
}
Read Docs Job section to learn more about job.
Async way:
use WebCrawlerAPI\WebCrawlerAPI;
$crawler = new WebCrawlerAPI('YOUR API ACCESS KEY HERE');
// Start async crawling
$response = $crawler->crawlAsync(
url: 'https://books.toscrape.com/',
scrapeType: 'markdown',
itemsLimit: 20
);
$jobId = $response->id;
echo "Job id: {$jobId}\n";
echo "Job Dashboard link: https://dash.webcrawlerapi.com/jobs/job/{$jobId}\n";
// Poll for completion
for ($i = 0; $i < 100; $i++) {
$job = $crawler->getJob($jobId);
$doneItemsCount = count(array_filter($job->jobItems, fn($item) => $item->status === 'done'));
$limitItemsCount = $job->itemsLimit;
if ($doneItemsCount === $limitItemsCount) {
echo "All items are done\n";
foreach ($job->jobItems as $item) {
echo "{$item->originalUrl}\n";
echo "\t{$item->markdownContentUrl}\n";
}
break;
}
echo "Crawled {$doneItemsCount} out of {$limitItemsCount} items\n";
sleep(2);
}