CLIP - Contrastive Language-Image Pre-Training OpenAI's model that understands images from natural language. When to use CLIP Use when: - Zero-shot image classification (no training data needed) - Image-text similarity/matching - Semantic image search - Content moderation (detect NSFW, violence) - Visual question answering - Cross-modal retrieval (image→text, text→image) Metrics : - 25,300+ GitHub stars - Trained on 400M image-text pairs - Matches ResNet-50 on ImageNet (zero-shot) - MIT License Use alternatives instead : - BLIP-2 : Better captioning - LLaVA : Vision-language chat - Segment An…