What is robots.txt? | WebCrawlerAPI Glossary

Answer

robots.txt is a file at a site root that tells crawlers which paths they may or may not access. It uses a simple rule format that targets user agents and URL paths. Responsible crawlers check it before requesting pages. It is not a security mechanism, but a convention for crawl etiquette. Some sites also publish sitemap locations in robots.txt to help discovery. Following it helps reduce unnecessary load and avoids unwanted crawling.