Conversational RAG The Core Problem A retriever only sees the current query. In multi-turn chat the query is ambiguous without prior context: Retrieving on "How did that compare to last year?" matches nothing. The retriever must see a reformulated, self-contained query. Pipeline Two LLM calls per turn: one cheap (Haiku) for condensation, one standard for synthesis. Query Condensation (Haiku) Keep only the last 8 turns in the condenser prompt — older turns rarely affect coreference and inflate cost. Follow-up vs New Query Routing Not every turn needs retrieval. Classify first, retrieve only wh…