MLX-VLM — Vision Language Models on Apple Silicon Overview mlx-vlm runs vision-language models natively on Apple Silicon using the MLX framework. It supports inference and fine-tuning with unified memory — no GPU server needed. Repo: Requirements: macOS 14+, Apple Silicon (M1/M2/M3/M4), Python 3.10+ Installation For development: Supported Models | Model | HuggingFace ID | Best For | |-------|---------------|----------| | Pixtral | | General vision, multi-image | | Qwen2-VL | | OCR, document understanding | | Phi-3-Vision | | Lightweight, fast inference | | LLaVA-1.6 | | Conversation about ima…