ColBERT Retrieval Late Interaction in One Picture Bi-encoders compress a document into one vector. Cross-encoders score each (query, doc) pair from scratch. ColBERT sits in between: it stores one vector per token, and scores at query time with MaxSim. Each query token finds its best-matching document token. You keep bi-encoder speed at indexing time and cross-encoder-like quality at query time. When ColBERT Wins | Scenario | Dense + rerank | ColBERT | |---|---|---| | Out-of-domain corpus (legal, medical, code) | Often needs rerank to recover recall | Strong out-of-box | | Long documents ( 512…