Advanced Data Cleaning
It is possible to add extra cleaning options to your crawling job. There is a special parameter called clean_selectors
.
Cleaning Selectors
Cleaning selectors are used to clean the data in the crawled pages. They are applied to the data after the data is crawled. All found elements will be cleaned using the cleaning selectors.
The default value is:
script, style, noscript, iframe, img, footer, header, nav, head
Format is a comma separated list of CSS selectors.
API Example
{
"url": "https://books.toscrape.com/",
"scrape_type": "markdown",
"items_limit": 10,
"clean_selectors": ".card, #main-header"
}