late-chunking — Skillopedia

Late Chunking The Problem with Naive Chunking Traditional pipeline: split document into chunks first, embed each chunk independently. Each chunk embedding sees only its own 200-400 tokens — pronouns, anaphora, and shared terminology with earlier sections become invisible. A query like "Acme Corp revenue growth drivers" may match Chunk 1 but not Chunk 2, even though Chunk 2 is the one answering the question. The Late Chunking Fix 1. Run the full document through a long-context encoder (8k, 32k, or 128k tokens) in a single forward pass so every token attends to every other token. 2. Record toke…