Contextual Retrieval The Core Idea A chunk lifted from a long document loses context. "The company reported 3% revenue growth" does not say which company or which period. Anthropic's Contextual Retrieval prepends a short LLM-generated context string to each chunk before embedding and BM25 indexing. From Anthropic's 2024 research (Pro Research team): | Technique | Retrieval failure rate | Reduction | |---|---|---| | Embeddings only (baseline) | 5.7% | — | | + BM25 (hybrid) | 4.7% | 17.5% | | + Contextual embeddings | 3.7% | 35% | | + Contextual BM25 | 2.9% | 49% | | + Reranking | 1.9% | 67% |…