vector-quantization — Skillopedia

Vector Quantization Why Quantize A float32 1024-dim embedding is 4 KB. 100M vectors = 400 GB. Quantization cuts this 4-128x with small recall cost — especially when paired with rescoring. | Scheme | Bytes/vec (d=1024) | Reduction | Typical recall impact | |---|---|---|---| | Full float32 | 4096 | 1x | 0 | | float16 / bfloat16 | 2048 | 2x | < -0.5% | | int8 scalar | 1024 | 4x | -1 to -2% | | int4 scalar | 512 | 8x | -2 to -5% | | Binary (1 bit) | 128 | 32x | -5 to -15% (recover with rescore) | | PQ (64 subvectors x 8 bits) | 64 | 64x | -2 to -10% | | Binary + MRL (d=256) | 32 | 128x | -3 to -8…