Hard Negative Mining (Deep Dive) Negative Type Taxonomy | Type | Source | Difficulty | Risk | |---|---|---|---| | Random | Uniform sample from corpus | Trivial | Wasted capacity | | In-batch | Other positives in the same batch | Easy-medium | Fine as baseline | | BM25-mined | Lexical matches minus true positives | Medium | Misses semantic lookalikes | | Dense-mined | Nearest neighbors under a weak embedder | Hard | High false-negative rate | | Cross-encoder-scored | Top-k reranked by a teacher | Very hard | Needs teacher model | | Self-mined (ANCE) | Nearest neighbors under the model being tr…