What is Shadow DOM?
Shadow DOM is a browser feature that lets a web component keep its internal HTML and CSS “private” from the rest of the page. You can think of it like a mini DOM tree attached to an element (the host) that renders its own content.
This is used a lot in design systems and modern UI widgets (dropdowns, date pickers, chat widgets, cookie banners) because it avoids CSS conflicts and makes components easier to reuse. Styles inside a shadow root do not automatically leak out, and page styles do not automatically leak in.
There are two common terms you will see:
- Light DOM: the regular DOM you see in document.documentElement and page HTML.
- Shadow DOM: the hidden/encapsulated DOM inside a component, accessible through the element’s shadowRoot (only for “open” shadow roots).
Example (open shadow root) so you can see the idea:
// Create a host element
const host = document.createElement("div");
host.id = "my-widget";
document.body.appendChild(host);
// Attach a shadow root (open = accessible via host.shadowRoot)
const root = host.attachShadow({ mode: "open" });
root.innerHTML = `
<style>
.title { color: rebeccapurple; font-weight: 600; }
</style>
<div class="title">Hello from Shadow DOM</div>
`;
// This works because the root is open
console.log(host.shadowRoot.querySelector(".title").textContent);
Why it is difficult to scrape Shadow DOM
Shadow DOM is difficult to scrape because most web scrapers start from the page HTML string. When you do a simple HTTP request and parse the response, you usually only get the light DOM HTML that came from the server. But shadow roots are often created at runtime by JavaScript, and their content may not exist in the raw HTML at all.
Even if the browser has rendered the page, the element you want might be inside a shadow root, so a normal selector like document.querySelector(".price") will return null. You must first find the host element, then “enter” its shadow root and query inside it.
There is also an extra limitation:
- Open shadow root: you can access it with element.shadowRoot.
- Closed shadow root: element.shadowRoot is null by design, even though the UI is visible.
Closed shadow roots are intentionally harder to access from scripts. In practice, scraping closed Shadow DOM often requires a different strategy (for example, using the component’s public attributes, listening to network responses, reading accessible text, or automating user-visible interactions).
How to scrape Shadow DOM in a local browser
If you are scraping on your own machine (Chrome/Edge/Firefox), the fastest way is to use DevTools and run JavaScript directly in the page.
1) Manually access the shadow root
If you know the host element, you can query it and then query inside its shadow root:
// Example: <product-card> is the host custom element
const host = document.querySelector("product-card");
const price = host?.shadowRoot?.querySelector(".price")?.textContent?.trim();
console.log({ price });
2) Use a helper that searches through nested Shadow DOM
Real pages often have shadow roots inside other shadow roots. This helper walks the DOM and any open shadow roots to find the first match:
function deepQuerySelector(selector, root = document) {
const lightDomMatch = root.querySelector(selector);
if (lightDomMatch) return lightDomMatch;
const treeWalker = document.createTreeWalker(
root instanceof Document ? root.documentElement : root,
NodeFilter.SHOW_ELEMENT
);
for (let node = treeWalker.currentNode; node; node = treeWalker.nextNode()) {
const el = /** @type {Element} */ (node);
const shadowRoot = /** @type {any} */ (el).shadowRoot;
if (!shadowRoot) continue; // closed shadow root (or no shadow root)
const matchInShadow = shadowRoot.querySelector(selector);
if (matchInShadow) return matchInShadow;
const matchDeeper = deepQuerySelector(selector, shadowRoot);
if (matchDeeper) return matchDeeper;
}
return null;
}
const titleEl = deepQuerySelector("h1");
console.log(titleEl?.textContent?.trim());
If DevTools cannot “see” the element easily, enable Shadow DOM inspection:
- Chrome/Edge DevTools → Settings → Preferences → Elements → enable “Show user agent shadow DOM” (wording can vary).
How to scrape Shadow DOM via browser extension
A browser extension can scrape Shadow DOM by injecting a content script into the page. The content script runs in the context of the page (with some isolation) and can access the DOM, including open shadow roots, just like code you run in DevTools.
Below is a minimal Chrome/Edge Manifest V3 extension example that extracts text from a Shadow DOM selector and sends it back to the extension.
manifest.json
{
"manifest_version": 3,
"name": "Shadow DOM Scraper (Example)",
"version": "1.0.0",
"permissions": ["activeTab", "scripting"],
"host_permissions": ["<all_urls>"],
"action": { "default_title": "Scrape Shadow DOM" },
"background": { "service_worker": "background.js" }
}
background.js
chrome.action.onClicked.addListener(async (tab) => {
if (!tab.id) return;
const [{ result }] = await chrome.scripting.executeScript({
target: { tabId: tab.id },
func: () => {
function deepQuerySelector(selector, root = document) {
const lightDomMatch = root.querySelector(selector);
if (lightDomMatch) return lightDomMatch;
const treeWalker = document.createTreeWalker(
root instanceof Document ? root.documentElement : root,
NodeFilter.SHOW_ELEMENT
);
for (
let node = treeWalker.currentNode;
node;
node = treeWalker.nextNode()
) {
const el = node;
const shadowRoot = el.shadowRoot;
if (!shadowRoot) continue;
const matchInShadow = shadowRoot.querySelector(selector);
if (matchInShadow) return matchInShadow;
const matchDeeper = deepQuerySelector(selector, shadowRoot);
if (matchDeeper) return matchDeeper;
}
return null;
}
// Replace with your selector:
const el = deepQuerySelector(".price");
return el ? el.textContent.trim() : null;
}
});
console.log("Scraped value:", result);
});
This approach works well for “open” shadow roots. If the site uses “closed” shadow roots, you cannot access them through shadowRoot, even in an extension. In that case, a practical workaround is to scrape what the user can see (rendered text), intercept network calls (if your extension is allowed to), or use the site’s public data attributes and APIs.
If you are building a web scraper that must handle Shadow DOM reliably at scale, you usually want a real browser automation tool (Playwright/Puppeteer) instead of pure HTML parsing, because it can execute JavaScript and interact with the live DOM.
