WebcrawlerAPI product updates

Keep track of updates and improvements to our platform

🦜🔗 Introducing WebcrawlerAPI LangChain Integration 🤖

🦜🔗  Introducing WebcrawlerAPI LangChain Integration 🤖

We're thrilled to announce the release of our official LangChain integration! The new webcrawlerapi-langchain package makes it seamless to incorporate WebcrawlerAPI's powerful web crawling capabilities into your LangChain document processing pipelines.

Key Features:

  • 🚀 Simple integration with LangChain's document loaders
  • 📄 Multiple content formats (markdown, cleaned text, HTML)
  • ⚡️ Async and lazy loading support
  • 🔄 Built-in retry mechanisms, proxies and error handling
  • 🎯 Configurable URL filtering with regex patterns

Quick Start:

pip install webcrawlerapi-langchain
from webcrawlerapi_langchain import WebCrawlerAPILoader

loader = WebCrawlerAPILoader(
    url="https://example.com",
    api_key="your-api-key",
    scrape_type="markdown"
)
documents = loader.load()

Perfect for:

  • Building AI-powered knowledge bases
  • Creating document QA systems
  • Training custom language models
  • Processing web content for LLM applications

Need an integration example?

Check our WebcrawlerAPI examples

Check out our LangChain SDK documentation for detailed usage instructions and examples. Start building powerful AI applications with web data today!

✨ New: $10 Trial Balance for WebcrawlerAPI 💫

✨ New: $10 Trial Balance for WebcrawlerAPI 💫

We're excited to announce that all new WebcrawlerAPI accounts now receive a $10 evaluation balance for a 7-day trial period! This initiative allows new users to thoroughly test our API capabilities without any upfront commitment.

What's included:

  • $10 trial funds automatically added to new accounts
  • Complete API access during 7-day evaluation period
  • Start immediately with no credit card required
  • Full access to all standard API features

The new trial balance makes it easier than ever to evaluate WebcrawlerAPI and test its capabilities for your projects.

Additional dashboard improvements

Additional dashboard improvements
  • Pagination for jobs and job items
  • Download button now has a progress and file size
  • Graphs now more interactive

Major Dashboard Improvements

Major dashboard improvements 💫

  • Enhanced login with email form:
    • Implemented rate limiting for magic link emails
    • Improved user experience and security
  • Dashboard page enhancements:
    • Added time period toggles (24h, 7d, 15d, 30d)
    • Implemented total counter for each period
    • Enhanced graphs for funds spent and crawled pages
  • New dedicated billing page:
    • Comprehensive payment history
    • Detailed payment usage tracking for all time

Integrated Proxy Management System

Major Update 🚀

  • Integrated proxy management system:
    • All proxies are now handled internally
    • Included in the standard pricing
    • Significantly improved success rates
    • Enhanced protection against anti-bot measures
    • No additional setup required from users

LLMStxt Generator Tool Launch

Launched free llmstxt Generator Tool that helps create standardized llms.txt files for documenting AI models in your projects. You can learn more about the llms.txt standard in our detailed guide.

Comprehensive Error Handling System

Major WebcrawlerAPI update: Comprehensive error handling system implementation

  • Added two-level error handling system: job level and job item level errors
  • New job level error codes:
    • insufficient_balance for balance-related issues
    • invalid_request for malformed requests
    • internal_error for system-level issues
  • New job item level error codes:
    • host_returned_error for non-200 HTTP responses
    • website_access_denied for 403 responses
    • name_not_resolved for DNS resolution failures
    • internal_error for system-level issues
  • Each error now includes detailed error messages and specific error codes for better debugging

Headless Browser Improvements

Major improvements to our headless browser implementation for enhanced web scraping capabilities:

  • Improved anti-bot protection bypass mechanisms
  • Enhanced blocking of non-essential content:
    • Advertisement content filtering
    • Cookie consent banner removal
    • Other non-page-content elements blocking
  • These updates result in cleaner data extraction and improved scraping reliability

Monitoring Server Incident Resolution

The issue lasted for 9 hours but was not related to crawling. The root cause was a network issue affecting the monitoring server. Because the monitoring server was unavailable to the main job manager, each job report had to wait several minutes for a timeout response from the monitoring server.

As a result, the processing time for each job increased, and the job queue grew to several thousand jobs.

The incident has now been resolved. We are continuously working on improving our monitoring system to prevent similar issues in the future.

Status Page Link Added

A status page link has been added to the website footer. The current status of WebCrawlerAPI services can now be checked at status.webcrawlerapi.com.

Changelog Page Added

A changelog page has been added to the website. This page tracks all the changes, improvements, and fixes to WebCrawlerAPI.

Webpage to Markdown Tool Launch

A new tool Webpage to Markdown has been added. This tool converts any documentation or website into a beautiful Markdown file. It is free and does not require an API key. It can crawl up to 100 pages.

PDF Content Rendering Implementation

PDF content rendering has been implemented. Text content can now be extracted from PDF files. When a website contains a PDF file, its content will be extracted and returned in the response as page content.