Glossary

Web Scraping & API Glossary

Comprehensive glossary of web scraping, crawling, and API terms. Learn the essential concepts and terminology used in web data extraction.

P

Playwright

(50)

How to fix "Execution context was destroyed, most likely because of a navigation"?

Playwright

The message "execution context was destroyed, most likely because of a navigation" means Playwright started evaluating J...

How to fix "Frame was detached" errors in Playwright?

Playwright

"Frame was detached" means the iframe you were interacting with was removed from the DOM or replaced while your script h...

How to fix net::ERR_ABORTED during page.goto in Playwright?

Playwright

page.goto: net::ERRABORTED means the browser started a navigation but the request was cancelled before a response was re...

How to fix net::ERR_CONNECTION_REFUSED in Playwright navigation?

Playwright

net::ERRCONNECTIONREFUSED means the browser attempted a TCP connection to the target host and port, and the operating sy...

How to fix net::ERR_INTERNET_DISCONNECTED or net::ERR_FAILED in Playwright?

Playwright

net::ERRINTERNETDISCONNECTED means the browser has no active network path to reach the target — either the machine itsel...

How to fix net::ERR_INVALID_AUTH_CREDENTIALS in Playwright?

Playwright

net::ERRINVALIDAUTH_CREDENTIALS is thrown when a server challenges the browser with HTTP Basic Authentication and the cr...

How to fix Playwright APIRequestContext cookie mismatch between API and UI tests?

Playwright

When Playwright's request fixture (or APIRequestContext) is used alongside page in the same test, their cookies and sess...

How to fix Playwright auth state issues when login passes but tests start logged out?

Playwright

This happens when the authentication setup step saves session state to a file, but the test project that runs your tests...

How to fix brittle exact-text assertions in Playwright tests?

Playwright

Exact text assertions break on whitespace normalization, added punctuation, translated strings, or copy edits that don't...

How to fix Playwright assertions that ignore disabled/loading button states?

Playwright

Clicking a button that is disabled or in a loading state produces no action — the click goes through Playwright's action...

How to fix common Playwright test mistake: `Cannot read properties of undefined`?

Playwright

Cannot read properties of undefined in a Playwright test is a JavaScript runtime error, not a browser or network error. ...

How to fix Playwright CI failures from missing OS dependencies and fonts?

Playwright

Playwright browser launches fail in CI when the container image is missing the Linux shared libraries that Chromium requ...

How to fix Playwright "locator.click: Timeout ... element is not visible"?

Playwright

This timeout fires when the element exists in the DOM but Playwright cannot confirm it is visible, enabled, and not obsc...

How to fix Playwright tests broken by CSS-class selectors after refactors?

Playwright

CSS class selectors in Playwright tests couple test code to implementation details of the component's styling. When a de...

How to fix Playwright Test error: "did not expect test() to be called here"?

Playwright

This error means a test() call was executed outside of a spec file — most commonly because a helper module or config fil...

How to fix Playwright download tests that hang waiting for files?

Playwright

Download tests hang when the event listener is attached after the download has already started. The browser fires the do...

How to fix Playwright "Element is not attached to the DOM"?

Playwright

"Element is not attached to the DOM" surfaces when a Playwright action targets an element that was in the DOM when queri...

How to fix Playwright `net::ERR_NAME_NOT_RESOLVED` in `page.goto`?

Playwright

net::ERRNAMENOT_RESOLVED means the browser's DNS resolver could not find an IP address for the hostname in the URL. In P...

How to fix Playwright "Executable doesn't exist" after install?

Playwright

The "Executable doesn't exist" error means the @playwright/test package is installed but the corresponding browser binar...

How to fix Playwright "Execution context was destroyed" errors?

Playwright

"Execution context was destroyed" appears when Playwright has a JavaScript handle or evaluation in flight and the page n...

How to fix Playwright file upload failures with `setInputFiles`?

Playwright

File upload failures in Playwright almost always come down to three root causes: targeting the wrong element (a styled <...

How to fix Playwright geolocation/permissions not applying in tests?

Playwright

Geolocation and browser permissions fail to apply in Playwright tests when they are set at the wrong stage of the browse...

How to fix Playwright tests that fail only in headless mode?

Playwright

Headless-only failures are among the most confusing Playwright issues because the test is syntactically correct and pass...

How to fix Playwright locator failures caused by hidden accessibility names?

Playwright

Playwright's role-based locators (getByRole, getByLabel) match elements using the computed accessible name, which may co...

How to fix flaky Playwright assertions caused by immediate `isVisible()` checks?

Playwright

locator.isVisible() is a point-in-time query that returns the current visibility state immediately without any retrying....

How to fix Playwright failures from leaked state between tests?

Playwright

Leaked state between tests happens when one test leaves side effects that affect subsequent tests in the same worker: co...

How to fix false positives when using `locator.count()` without assertions?

Playwright

locator.count() returns a number immediately — it does not assert, does not retry, and does not fail if the count is wro...

How to fix Playwright mobile emulation issues when desktop layout still appears?

Playwright

Mobile emulation in Playwright fails to trigger the mobile layout when viewport and device settings are applied after co...

How to fix flaky assertions caused by `networkidle` misuse in Playwright?

Playwright

networkidle as a waitUntil condition tells Playwright to wait until there are no network connections for at least 500ms....

How to fix Playwright "No frame for selector" and iframe locator issues?

Playwright

"No frame for selector" and similar iframe-related errors happen when test code tries to locate elements inside an ifram...

How to fix Playwright clicks on wrong element when using `.nth()` indexes?

Playwright

Index-based locators with .nth() are fragile because they rely on the DOM rendering elements in a specific, stable order...

How to fix Playwright strict mode errors from overbroad `getByText()` locators?

Playwright

getByText() matches any element in the document that contains the specified text, including headings, list items, table ...

How to fix Playwright tests that overuse `waitForTimeout` and still flake?

Playwright

page.waitForTimeout() (a hard sleep) is the most common anti-pattern in Playwright test suites. It introduces fixed dela...

How to fix Playwright DB data collisions in parallel test workers?

Playwright

Parallel Playwright workers interact with a shared database simultaneously, causing data collisions when tests assume ex...

How to fix flaky Playwright popup/new-tab tests (`page.waitForEvent('popup')`)?

Playwright

Popup and new-tab tests become flaky when page.waitForEvent('popup') is called after the action that opens the popup. Ju...

How to fix Playwright "Protocol error (...): invalid argument"?

Playwright

A "Protocol error: invalid argument" originates from the Chrome DevTools Protocol (CDP) layer and means Playwright sent ...

How to fix flaky Playwright crashes and random browser exits in CI?

Playwright

Random browser crashes in CI — where tests pass locally but fail unpredictably in CI runners — are almost always resourc...

How to fix hidden flakes when retries make failures "pass" in Playwright?

Playwright

Playwright retries are designed to improve stability for genuinely intermittent conditions — network latency, slow CI ma...

How to fix Playwright route mocking that does not intercept requests?

Playwright

Route mocking fails silently when the URL pattern doesn't match what the browser actually requests, or when page.route()...

How to fix Playwright selectors that fail with Shadow DOM components?

Playwright

Shadow DOM components encapsulate their internal DOM in a shadow root that is intentionally hidden from the main documen...

How to fix Playwright shard imbalance and long-tail CI jobs?

Playwright

Playwright's --shard=N/M splits test files across M runners, but the distribution is based on file count — not test coun...

How to fix Playwright "strict mode violation" locator errors?

Playwright

A strict mode violation in Playwright fires when a locator resolves to more than one element at action time. The error m...

How to fix Playwright "Target page, context or browser has been closed" after crashes?

Playwright

When this error appears alongside a crash — rather than an orderly context.close() call — it means the browser process e...

How to fix Playwright TimeoutError when an action exceeds timeout?

Playwright

TimeoutError is thrown when Playwright cannot complete an action or find an actionable element within the configured tim...

How to fix Playwright race conditions when `waitForResponse` misses requests?

Playwright

waitForResponse misses requests for exactly the same reason waitForEvent('download') and waitForEvent('popup') do — the ...

How to fix Playwright tests that pass despite wrong page due to weak URL checks?

Playwright

Weak URL assertions allow tests to pass on the wrong page, producing false positives that hide navigation bugs. expect(p...

How to fix Playwright websocket or SSE-dependent tests in unstable CI networks?

Playwright

Tests that depend on WebSocket connections or Server-Sent Events (SSE) are inherently more sensitive to CI network insta...

How to fix Playwright worker crashes from memory pressure in parallel runs?

Playwright

Memory-related worker crashes in Playwright parallel runs manifest as browser processes being killed mid-test, producing...

How to fix Playwright "strict mode violation" for locators?

Playwright

A Playwright strict mode violation means the locator you used resolved to more than one element, but the operation requi...

How to fix "Target page, context or browser has been closed" in Playwright?

Playwright

The error "Target page, context or browser has been closed" surfaces when Playwright tries to interact with a page, fram...

P

Puppeteer

(28)

How can DevTools windows be treated as a page in Puppeteer?

Puppeteer

DevTools windows can be treated as regular pages by enabling the handleDevToolsAsPage option when launching or connectin...

How can I expose BackendNodeId in the a11y snapshot?

Puppeteer

BackendNodeId is exposed in the a11y snapshot. Each node in the snapshot includes a backendNodeId that lets you map acce...

How can I get detailed initiator data from CDP in Puppeteer?

Puppeteer

Use the capability to retrieve detailed initiator data from CDP when available, and filter out goog: data from events by...

How can I open a page in a tab or a window using Puppeteer?

Puppeteer

This feature allows opening a page in a tab or a window. newPage() can now be called with window options to choose where...

How can landmarks improve accessibility testing in Puppeteer?

Puppeteer

Overview Landmarks such as header, nav, main, aside, and footer provide semantic regions that assist screen readers and ...

How do I improve Chrome binary detection on Windows for Puppeteer?

Puppeteer

Answer To fix Puppeteer not finding the Chrome binary on Windows, make sure the detector checks the common install locat...

How to configure CDP message ID generator in Puppeteer?

Puppeteer

The CDP message ID generator can be configured by passing a custom idGenerator to the Connection constructor. This enabl...

How to disable xdg-open popup in Puppeteer?

Puppeteer

To stop the xdg-open popup in Puppeteer, configure a Chrome policy URLAllowlist and use a Chrome binary that reads that ...

How to expose the Connection from CdpBrowserContext in Puppeteer

Puppeteer

Summary This change adds a public getter to CdpBrowserContext to expose the internal Connection object. It returns the p...

How to expose the url property for links in Puppeteer

Puppeteer

How to expose the url property for links If you need the full URL of a link in Puppeteer, use the url property that was ...

How to fix duplicate response headers in Puppeteer

Puppeteer

Summary Duplicate header values should normally be merged into a single header value separated by a comma and a space. T...

How to fix Fetch.enable wasn't found error for workers in Puppeteer

Puppeteer

Fetch.enable wasn't found is raised when trying to enable the Fetch domain for a worker. The fix is to ignore this error...

How to fix Puppeteer ExtensionTransport tasks and session management

Puppeteer

Puppeteer now dispatches each CDP message in its own JavaScript task by scheduling dispatch with setTimeout. This ensure...

How to open DevTools for a Page in Puppeteer?

Puppeteer

To open DevTools for a page in Puppeteer, use the new Page.openDevTools() method. It calls the DevTools interface for th...

How can I reload a Puppeteer page while ignoring the cache?

Puppeteer

Use the ignoreCache option with Page.reload to reload while ignoring the browser cache. ``js await page.reload({ ignoreC...

How to use Puppeteer to override user agent with emulation setUserAgentOverride instead of network interception?

Puppeteer

Solution Use the Emulation.setUserAgentOverride command via a CDP session to override the user agent instead of relying ...

What does it mean that HTTPRequest.postData is deprecated in Puppeteer?

Puppeteer

The deprecation note indicates that the HTTPRequest.postData API is deprecated in Puppeteer. This means you should avoid...

What fixes Puppeteer not waiting for all targets when connecting?

Puppeteer

Fixes Puppeteer not waiting for all targets when connecting by only awaiting child targets for tab targets. When connect...

What is the correct event to create a Response in Puppeteer WebDriver?

Puppeteer

To align with the protocol behavior, create the Response when the responseStarted event fires, rather than after the res...

What is the correct type for the pageerror event in Puppeteer?

Puppeteer

Summary The pageerror event may emit not only Error objects but also values of unknown type. Treat the payload as unknow...

What is the difference between browser.close and browser.disconnect

Puppeteer

browser.close() and browser.disconnect() both end your current control flow, but they affect the browser lifecycle diffe...

What is the reason for removing the test server from release-please in Puppeteer

Puppeteer

The test server was removed from the release-please workflow to simplify the release process and remove an unnecessary e...

What target should I set for TypeScript to build Puppeteer types with tsc?

Puppeteer

If you run into TS18028 private identifiers errors when compiling Puppeteer types with TypeScript, set the TypeScript ta...

What TypeScript target should I use to compile Puppeteer-core types?

Puppeteer

To fix the TS18028 error, set the TypeScript target to ES2015 or higher. The error occurs because private identifiers (#...

Why are Firefox headful runs on Ubuntu flaky in Puppeteer?

Puppeteer

Summary The startup hang in headful mode on Ubuntu was caused by the Firefox Backup Service during startup. A practical ...

Why are headers not updated when using Puppeteer page.setRequestInterception(true) on Firefox?

Puppeteer

Firefox currently mutates the headers object returned by request.headers() in a way that does not reflect in the respons...

Why does HttpRequest.headers() not allow mutating data in Puppeteer?

Puppeteer

This was fixed to prevent accidental mutations of the underlying headers. HttpRequest.headers() no longer allows mutatin...

Why is page.goto never awaited for Firefox addons pages in Puppeteer?

Puppeteer

Summary Firefox addon pages navigated via moz-extension:// are treated as webextension contexts. Puppeteer currently doe...

S

Scraping

(10)

How do you avoid getting blocked when scraping?

Scraping

Answer Avoid blocks by scraping politely and limiting request rates. Respect robots.txt, identify your user agent, and s...

How do you clean and validate scraped data?

Scraping

Answer Clean scraped data by trimming whitespace, normalizing formats, and removing duplicates. Validate fields with sch...

How do you handle pagination when scraping?

Scraping

Answer Handle pagination by identifying the next page link, page parameter, or API cursor. Start from the first page and...

How do you scrape JavaScript-heavy sites?

Scraping

Answer Use a headless browser to render the page before extracting data. Wait for key selectors to appear or for network...

How is web scraping different from web crawling?

Scraping

Answer Web crawling is about discovering and fetching pages, while web scraping is about extracting data from those page...

Is web scraping legal?

Scraping

Answer Web scraping legality depends on the site terms, the data collected, and local laws. Public data may be allowed, ...

What are common web scraping tools?

Scraping

Answer Common tools include Beautiful Soup, Scrapy, Playwright, Puppeteer, and Selenium. Lightweight parsers are great f...

What are ethical web scraping practices?

Scraping

Answer Ethical scraping means minimizing harm and respecting site owners and users. Follow robots.txt, terms of service,...

What is the best data format for scraped data?

Scraping

Answer The best format depends on how you plan to use the data. CSV is simple and works well for tabular data and quick ...

What is web scraping?

Scraping

Answer Web scraping is the process of extracting specific data from web pages and converting it into structured formats....

W

Webcrawling

(10)

How is web crawling different from web scraping?

Webcrawling

Answer Web crawling focuses on discovering and retrieving pages, while web scraping extracts specific data from those pa...

How often should you crawl a site?

Webcrawling

Answer Match crawl frequency to how often content changes and how quickly you need updates. High‑change sites may need m...

How do you avoid getting blocked when crawling?

Webcrawling

Answer To avoid getting blocked, crawl politely and predictably. Respect robots.txt, use reasonable rate limits, and ide...

How do you crawl JavaScript-heavy sites?

Webcrawling

Answer To crawl JavaScript‑heavy sites, use a headless browser to render pages before extracting content. Wait for criti...

Is web crawling legal?

Webcrawling

Answer Web crawling legality depends on the website, the data you collect, and the laws in your jurisdiction. Many sites...

What are common web crawling tools?

Webcrawling

Answer Common web crawling tools include Scrapy, Apache Nutch, Playwright, Puppeteer, and managed crawler platforms. Scr...

What data does a web crawler collect?

Webcrawling

Answer Common crawler data includes URLs, status codes, headers, page content, metadata, links, and timestamps. Many sys...

What is crawl budget?

Webcrawling

Answer Crawl budget is the number of pages a crawler can fetch within time and resource constraints. It is limited by yo...

What is robots.txt?

Webcrawling

Answer robots.txt is a file at a site root that tells crawlers which paths they may or may not access. It uses a simple ...

What is web crawling?

Webcrawling

Answer Web crawling is the automated process of discovering and fetching web pages by following links so you can build a...