Created agent_crawler.py:
AgentWebCrawler - AI-powered crawling that:
1. Analyzes site structure (LLM)
2. Decides what to crawl based on purpose
3. Scores relevance dynamically
4. Adapts as it learns more
5. Knows when it has enough
Purpose types:
- DOCUMENTATION - Technical docs, guides
- TRAINING - Learning materials
- KNOWLEDGE - General knowledge base
- RESEARCH - Research papers
- REFERENCE - Reference material
Usage:
Features:
- Content extraction (not HTML dump)
- Relevance scoring
- Rate limiting
- Configurable depth/pages
- Integration with multi-source ingest