Glossary

Web Scraping & API Glossary

Comprehensive glossary of web scraping, crawling, and API terms. Learn the essential concepts and terminology used in web data extraction.

Answer Web crawling focuses on discovering and retrieving pages, while web scraping extracts specific data from those pa...

Answer Match crawl frequency to how often content changes and how quickly you need updates. High‑change sites may need m...

Answer To avoid getting blocked, crawl politely and predictably. Respect robots.txt, use reasonable rate limits, and ide...

Answer To crawl JavaScript‑heavy sites, use a headless browser to render pages before extracting content. Wait for criti...

Answer Web crawling legality depends on the website, the data you collect, and the laws in your jurisdiction. Many sites...

Answer Common web crawling tools include Scrapy, Apache Nutch, Playwright, Puppeteer, and managed crawler platforms. Scr...

Answer Common crawler data includes URLs, status codes, headers, page content, metadata, links, and timestamps. Many sys...

Answer Crawl budget is the number of pages a crawler can fetch within time and resource constraints. It is limited by yo...

Answer robots.txt is a file at a site root that tells crawlers which paths they may or may not access. It uses a simple ...

Answer Web crawling is the automated process of discovering and fetching web pages by following links so you can build a...