Anthropic Performance Tuning Overview Claude latency has two components: time to first token (TTFT) and tokens per second (TPS) . Different strategies target each. Latency Benchmarks (approximate) | Model | TTFT (p50) | TTFT (p95) | Output TPS | |-------|-----------|-----------|------------| | Claude Haiku 4.5 | 200ms | 600ms | 150 | | Claude Sonnet 4 | 400ms | 1.2s | 90 | | Claude Opus 4 | 800ms | 2.5s | 40 | Optimization Strategies Instructions Step 1: Always Stream Step 2: Prompt Caching — Faster TTFT Step 3: Use Haiku for Speed-Critical Paths Step 4: Reuse Client Instance Step 5: Parallel…