Cost-Aware LLM Pipeline Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline. When to Activate - Building applications that call LLM APIs (Claude, GPT, etc.) - Processing batches of items with varying complexity - Need to stay within a budget for API spend - Optimizing cost without sacrificing quality on complex tasks Core Concepts 1. Model Routing by Task Complexity Automatically select cheaper models for simple tasks, reserving expensive models for complex ones. 2. Immutable Cost…