Prompt Caching — 5min & 1h TTL Prompt caching reuses already-processed prefixes. Cache reads cost 10% of fresh input. For apps with large repeated context, this is the single biggest lever on your bill. When to Use - Large system prompts reused across many requests - Long documents/codebases with many follow-up questions - Multi-turn conversations with growing history - Tool/function definitions shared across sessions - Any call where 1024 tokens would be repeated (2048 for Haiku) Two TTLs | TTL | Use case | Cost of cache write | |---|---|---| | 5 min (ephemeral) | Conversation, active sessio…