Scaling for Query Latency Latency of a single query is determined by the slowest component in the query execution path. It is sometimes correlated with throughput, but not always — throughput and latency are opposite tuning directions. Low latency optimization is aimed at utilising maximum resource saturation for a single query, while throughput optimization is aimed at minimizing per-query resource usage to allow more parallel queries. Performance Tuning for Lower Latency - Increase segment count to match CPU cores ( ) Minimizing latency - Keep quantized vectors and HNSW in RAM ( ) - Reduce…