Knowledge Distillation: Compressing LLMs When to Use This Skill Use Knowledge Distillation when you need to: - Compress models from 70B → 7B while retaining 90%+ performance - Transfer capabilities from proprietary models (GPT-4) to open-source (LLaMA, Mistral) - Reduce inference costs by deploying smaller student models - Create specialized models by distilling domain-specific knowledge - Improve small models using synthetic data from large teachers Key Techniques : Temperature scaling, soft targets, reverse KLD (MiniLLM), logit distillation, response distillation Papers : Hinton et al. 2015…