LLM Gateway A unified API gateway that routes LLM requests across providers and self-hosted models — with rate limiting, cost tracking, caching, and failover. When to Use This Skill Use this skill when: - Running multiple LLM backends (OpenAI, Anthropic, vLLM, Ollama) behind a single endpoint - Enforcing per-team or per-user rate limits and spend budgets - Implementing automatic fallback when a provider is down - Adding semantic caching to reduce API costs by 20–50% - Centralizing API key management instead of distributing keys to every app Prerequisites - Docker and Docker Compose - A Postgr…