Multilingual Embeddings Model Landscape | Model | Languages | Dims | Strengths | Weaknesses | |---|---|---|---|---| | multilingual-e5-large | 94 | 1024 | Strong retrieval, OSS, 512 tokens | Old tokenizer, short context | | multilingual-e5-large-instruct | 94 | 1024 | Instruction-tuned, better zero-shot | Same 512 token cap | | LaBSE | 109 | 768 | Sentence alignment, translation mining | Weak at short-query retrieval | | BGE-M3 | 100+ | 1024 | Dense+sparse+colbert, 8192 tokens | Large, slower | | cohere embed-multilingual-v3.0 | 100+ | 1024 | Managed API, query/doc modes | 512 tokens, closed |…