- Created utils/retry.py with:
- RetryHandler with exponential backoff
- CircuitBreaker pattern
- Config for max attempts, delays
- Graceful degradation
- Updated LLM client to use retry logic
- API failures now retry with backoff
- Circuit breaker prevents cascade failures
- Graceful degradation on prolonged failures
This addresses the reliability gap identified in code review.
Created agent_crawler.py:
AgentWebCrawler - AI-powered crawling that:
1. Analyzes site structure (LLM)
2. Decides what to crawl based on purpose
3. Scores relevance dynamically
4. Adapts as it learns more
5. Knows when it has enough
Purpose types:
- DOCUMENTATION - Technical docs, guides
- TRAINING - Learning materials
- KNOWLEDGE - General knowledge base
- RESEARCH - Research papers
- REFERENCE - Reference material
Usage:
Features:
- Content extraction (not HTML dump)
- Relevance scoring
- Rate limiting
- Configurable depth/pages
- Integration with multi-source ingest
Team 3: Infrastructure & Config
Fixed:
- #4: GitHub Ingestor now works without token for public repos
- Token is now optional
- Uses unauthenticated requests (with rate limit warning) when no token
- Private repos still require token
- #14: Added startup API key validation
- get_config() now validates API keys at startup
- Raises clear error if neither OPENAI_API_KEY nor MINIMAX_API_KEY is set
- Fail-fast instead of silent failures
- #10: Added CostConfig for rate limiting and budget controls
- max_tokens_per_run: limit tokens per generation
- max_cost_usd: budget cap in dollars
- track_usage: enable/disable usage tracking
- price_per_million_tokens: pricing by model
NEW frameworks:
- Diátaxis Tutorial - Learn by doing a project
- Diátaxis How-To - Accomplish a specific task
- Diátaxis Explanation - Clarify and deepen understanding
- Diátaxis Reference - Complete information lookup
- Technical Manual - From foundations to mastery
- Codebase Tour - Document code systematically
- API Documentation - Complete API reference
NonfictionGenerator class to use these frameworks.
CLI integration with --framework flag.
Example:
opus generate --framework codebase-tour --concept 'Linux Kernel'
- LocalIngestor: Include ALL files by default (source code, configs, etc.)
- GitHubIngestor: Include ALL files by default
- AI witnesses everything and transforms it into documentation
- Filter only build artifacts (.pyc, .so, dist, build)
Philosophy: Don't filter what the AI can see - let it decide
what's relevant. The AI can document code directly!
Research Tools:
- SearchTool: Multiple backends (Tavily, Serper, Brave, DuckDuckGo)
- WikipediaTool: Wikipedia lookup
- AcademicSearchTool: CrossRef, Semantic Scholar
- ResearchOrchestrator: Comprehensive multi-source research
ResearchAgent:
- NOT just fact-checking - actively discovers NEW information
- Identifies trends beyond training data cutoff
- Generates innovations from cross-referencing sources
- Deep research with subtopics
VerifiedFactChecker:
- Live claim verification against web sources
- Confidence scoring
- Citation needed detection
Dependencies added: tavily, wikipedia, arxiv, duckduckgo-search
- Merge llm.py + llm_sync.py into single unified client
- Remove llm_sync.py (now just llm.py with both sync/async)
- Add requests to dependencies
- Add Dockerfile for containerized deployment
- Add .dockerignore
All issues resolved!
- LocalIngestor class for files and directories
- CLI: opus ingest-local PATH
- Generate from local: opus generate --local ./my-notes/
- Support for extensions, recursive scanning, summarize
- Pattern-based exclusion (.git, __pycache__, etc.)
CLI Commands:
- generate: Full manuscript generation with full GitHub content
- serve: Start FastAPI server with OpenAPI docs
- ingest: Standalone GitHub ingestion
- frameworks: List all story frameworks
- config: Show configuration
- docs: Show comprehensive docs (terminal/markdown/html)
- api: Export OpenAPI spec
Server:
- FastAPI with /docs, /redoc interactive docs
- /generate, /ingest, /frameworks, /health endpoints
- OpenAPI 3.0 specification
Documentation:
- Terminal, markdown, and HTML formats
- Full API reference
- Framework documentation
- Environment variables guide
- Project structure
Fix: Use full GitHub content as seed (not just 5000 chars)
- GitHubIngestor class to fetch repo contents
- Support for .md, .txt, .notes, .draft files
- Method to ingest from GitHub directly into orchestrator
- Export GitHubIngestor in __init__.py
Usage:
orch = OpusOrchestrator(book_type='fiction', genre='memoir')
content = await orch.ingest_from_github('mrhavens/my-notes')
await orch.run()
- Add .env to .gitignore (API keys stay local)
- Add LLM client with MiniMax and OpenAI support
- Update config to load from environment variables
- Wire up Architect agent to actually call the LLM
- Add MiniMax API key to local .env file