Easier and straight forward. The new version lets you run a prompt on the page. You can get results in markdown, cleaned text, or HTML. Scraping is now in synchronous mode, with a single API call.
The new endpoint is at https://api.webcrawlerapi.com/v2/scrape. See the API Reference for details.
What's new?
Scraping is now sync, with a single API call.
You can remove parts of the page using CSS selectors.
You can get results in markdown, cleaned text, or HTML.
When starting a job via API, just add several parameters, like access_key_id, secret_access_key and a few others. Crawled data will be placed under the specified path. Your keys will be deleted after the job ends. Read Upload to S3 docs for detailed information.
Organizations has been added to WebcrawlerAPI. This feature lets multiple team members use the same API account with different access levels.
What's new:
Organizations are now automatically created for all accounts
All existing users have been assigned the "OWNER" role
New "DEVELOPER" role with limited access:
Can use API and see usage statistics
Cannot access billing information
Cannot add or manage team members
How it works:
To add team members, go to your dashboard and click the "Invite member" button. You can assign roles based on what each person needs to do. This lets developers use the API without seeing billing details or changing the team.
We're thrilled to announce the release of our official LangChain integration! The new webcrawlerapi-langchain package makes it seamless to incorporate WebcrawlerAPI's powerful web crawling capabilities into your LangChain document processing pipelines.
Key Features:
π Simple integration with LangChain's document loaders
We're excited to announce that all new WebcrawlerAPI accounts now receive a $10 evaluation balance for a 7-day trial period! This initiative allows new users to thoroughly test our API capabilities without any upfront commitment.
What's included:
$10 trial funds automatically added to new accounts
Complete API access during 7-day evaluation period
Start immediately with no credit card required
Full access to all standard API features
The new trial balance makes it easier than ever to evaluate WebcrawlerAPI and test its capabilities for your projects.
Launched free llmstxt Generator Tool that helps create standardized llms.txt files for documenting AI models in your projects. You can learn more about the llms.txt standard in our detailed guide.
The issue lasted for 9 hours but was not related to crawling. The root cause was a network issue affecting the monitoring server. Because the monitoring server was unavailable to the main job manager, each job report had to wait several minutes for a timeout response from the monitoring server.
As a result, the processing time for each job increased, and the job queue grew to several thousand jobs.
The incident has now been resolved. We are continuously working on improving our monitoring system to prevent similar issues in the future.
A new tool Webpage to Markdown has been added. This tool converts any documentation or website into a beautiful Markdown file. It is free and does not require an API key. It can crawl up to 100 pages.
PDF content rendering has been implemented. Text content can now be extracted from PDF files. When a website contains a PDF file, its content will be extracted and returned in the response as page content.