docs
Guides
Advanced Data Cleaning on Crawled Data

Advanced Data Cleaning

It is possible to add extra cleaning options to your crawling job. There is a special parameter called clean_selectors.

Cleaning Selectors

Cleaning selectors are used to clean the data in the crawled pages. They are applied to the data after the data is crawled. All found elements will be cleaned using the cleaning selectors.

The default value is:

script, style, noscript, iframe, img, footer, header, nav, head

Format is a comma separated list of CSS selectors.

API Example

{
		"url": "https://books.toscrape.com/",
		"scrape_type": "markdown",
		"items_limit": 10,
		"clean_selectors": ".card, #main-header"
}