anth-performance-tuning

Anthropic Performance Tuning Overview Optimize Claude API latency and throughput via prompt caching, model selection, streaming, and request optimization. The biggest wins come from prompt caching (90% input cost reduction) and model selection (Haiku is 4x faster than Sonnet). Prompt Caching (Biggest Win) Cache requirements: Minimum 1,024 tokens for Sonnet/Opus, 2,048 for Haiku. Cache lives for 5 minutes (refreshed on each hit). Model Selection for Speed | Model | Speed | Cost (per MTok in/out) | Best For | |-------|-------|----------------------|----------| | Claude Haiku | Fastest | $0.80 /…