When starting a job via API, just add several parameters, like access_key_id, secret_access_key and a few others. Crawled data will be placed under the specified path. Your keys will be deleted after the job ends. Read Upload to S3 docs for detailed information.
π New: Organizations Support with Role-Based Access
Organizations has been added to WebcrawlerAPI. This feature lets multiple team members use the same API account with different access levels.
What's new:
Organizations are now automatically created for all accounts
All existing users have been assigned the "OWNER" role
New "DEVELOPER" role with limited access:
Can use API and see usage statistics
Cannot access billing information
Cannot add or manage team members
How it works:
To add team members, go to your dashboard and click the "Invite member" button. You can assign roles based on what each person needs to do. This lets developers use the API without seeing billing details or changing the team.
We're thrilled to announce the release of our official LangChain integration! The new webcrawlerapi-langchain package makes it seamless to incorporate WebcrawlerAPI's powerful web crawling capabilities into your LangChain document processing pipelines.
Key Features:
π Simple integration with LangChain's document loaders
Check out our LangChain SDK documentation for detailed usage instructions and examples. Start building powerful AI applications with web data today!
β¨ New: $10 Trial Balance for WebcrawlerAPI π«
We're excited to announce that all new WebcrawlerAPI accounts now receive a $10 evaluation balance for a 7-day trial period! This initiative allows new users to thoroughly test our API capabilities without any upfront commitment.
What's included:
$10 trial funds automatically added to new accounts
Complete API access during 7-day evaluation period
Start immediately with no credit card required
Full access to all standard API features
The new trial balance makes it easier than ever to evaluate WebcrawlerAPI and test its capabilities for your projects.
Additional dashboard improvements
Pagination for jobs and job items
Download button now has a progress and file size
Graphs now more interactive
Major Dashboard Improvements
Major dashboard improvements π«
Enhanced login with email form:
Implemented rate limiting for magic link emails
Improved user experience and security
Dashboard page enhancements:
Added time period toggles (24h, 7d, 15d, 30d)
Implemented total counter for each period
Enhanced graphs for funds spent and crawled pages
New dedicated billing page:
Comprehensive payment history
Detailed payment usage tracking for all time
Integrated Proxy Management System
Major Update π
Integrated proxy management system:
All proxies are now handled internally
Included in the standard pricing
Significantly improved success rates
Enhanced protection against anti-bot measures
No additional setup required from users
LLMStxt Generator Tool Launch
Launched free llmstxt Generator Tool that helps create standardized llms.txt files for documenting AI models in your projects. You can learn more about the llms.txt standard in our detailed guide.
Comprehensive Error Handling System
Major WebcrawlerAPI update: Comprehensive error handling system implementation
Each error now includes detailed error messages and specific error codes for better debugging
Headless Browser Improvements
Major improvements to our headless browser implementation for enhanced web scraping capabilities:
Improved anti-bot protection bypass mechanisms
Enhanced blocking of non-essential content:
Advertisement content filtering
Cookie consent banner removal
Other non-page-content elements blocking
These updates result in cleaner data extraction and improved scraping reliability
Monitoring Server Incident Resolution
The issue lasted for 9 hours but was not related to crawling. The root cause was a network issue affecting the monitoring server. Because the monitoring server was unavailable to the main job manager, each job report had to wait several minutes for a timeout response from the monitoring server.
As a result, the processing time for each job increased, and the job queue grew to several thousand jobs.
The incident has now been resolved. We are continuously working on improving our monitoring system to prevent similar issues in the future.
Status Page Link Added
A status page link has been added to the website footer. The current status of WebCrawlerAPI services can now be checked at status.webcrawlerapi.com.
Changelog Page Added
A changelog page has been added to the website. This page tracks all the changes, improvements, and fixes to WebCrawlerAPI.
Webpage to Markdown Tool Launch
A new tool Webpage to Markdown has been added. This tool converts any documentation or website into a beautiful Markdown file. It is free and does not require an API key. It can crawl up to 100 pages.
PDF Content Rendering Implementation
PDF content rendering has been implemented. Text content can now be extracted from PDF files. When a website contains a PDF file, its content will be extracted and returned in the response as page content.