LangChain Incident Runbook Overview 3:07am. PagerDuty: "LangChain p95 latency 10s for 5 minutes." You open LangSmith, filter by over the last 15 minutes, and the first trace is 43 seconds long — an agent is on step 24 of 25 iterations, bouncing between the same two tools on a vague user prompt ("help me with my account"). The cost dashboard shows $400 spent in the last 10 minutes , up from a $6/hour baseline. This is P10: defaults to with no cost cap; vague prompts never converge; the spend hits before surfaces. First move is not to push a code fix — it is to flip via config reload and add a…