LLM Supervisor š® Handles rate limits and model fallbacks gracefully. Behavior On Rate Limit / Overload Errors When I encounter rate limits or overload errors from cloud providers (Anthropic, OpenAI): 1. Tell the user immediately ā Don't silently fail or retry endlessly 2. Offer local fallback ā Ask if they want to switch to Ollama 3. Wait for confirmation ā Never auto-switch for code generation tasks Confirmation Required Before using local models for code generation, ask: "Cloud is rate-limited. Switch to local Ollama ( )? Reply 'yes' to confirm." For simple queries (chat, summaries), can sā¦