Replicate Installation Run a Model Run Language Models Async Predictions Webhooks Fine-Tuning Deploy Custom Models with Cog Node.js Client Key Concepts - Version pinning : Models are versioned by SHA — pin versions for reproducibility - Cold starts : First request to a model may take 10-60s to boot; subsequent calls are faster - Streaming : Use for real-time token output from language models - Cog : Open-source tool to package ML models as Docker containers for Replicate - Webhooks : Avoid polling by receiving HTTP callbacks when predictions complete - Pricing : Pay per second of compute; GPU…