Site Crawler Skill Respectfully crawl documentation sites and web content for RAG ingestion. Overview Documentation sites, blogs, and knowledge bases contain valuable structured content. This skill covers: - Respectful crawling (robots.txt, rate limiting) - Structure-preserving extraction - Incremental updates (only fetch changed pages) - Sitemap-based discovery Prerequisites Crawling Principles 1. Be Respectful - Always check robots.txt - Rate limit requests (1-2 seconds between) - Identify yourself with a User-Agent - Don't overload servers 2. Be Efficient - Use sitemaps when available - Tr…