llama.cpp + GGUF Use this skill for local GGUF inference, quant selection, or Hugging Face repo discovery for llama.cpp. When to use - Run local models on CPU, Apple Silicon, CUDA, ROCm, or Intel GPUs - Find the right GGUF for a specific Hugging Face repo - Build a or command from the Hub - Search the Hub for models that already support llama.cpp - Enumerate available files and sizes for a repo - Decide between Q4/Q5/Q6/IQ variants for the user's RAM or VRAM Model Discovery workflow Prefer URL workflows before asking for , Python, or custom scripts. 1. Search for candidate repos on the Hub: -…