gguf-quantization — Skillopedia

GGUF - Quantization Format for llama.cpp The GGUF (GPT-Generated Unified Format) is the standard file format for llama.cpp, enabling efficient inference on CPUs, Apple Silicon, and GPUs with flexible quantization options. When to use GGUF Use GGUF when: - Deploying on consumer hardware (laptops, desktops) - Running on Apple Silicon (M1/M2/M3) with Metal acceleration - Need CPU inference without GPU requirements - Want flexible quantization (Q2 K to Q8 0) - Using local AI tools (LM Studio, Ollama, text-generation-webui) Key advantages: - Universal hardware : CPU, Apple Silicon, NVIDIA, AMD sup…