LLM Serving Patterns When to Use This Skill Use this skill when: - Designing LLM inference infrastructure - Choosing between serving frameworks (vLLM, TGI, TensorRT-LLM) - Implementing quantization for production deployment - Optimizing batching and throughput - Building streaming response systems - Scaling LLM deployments cost-effectively Keywords: LLM serving, inference, vLLM, TGI, TensorRT-LLM, quantization, INT8, INT4, FP16, batching, continuous batching, streaming, SSE, WebSocket, KV cache, PagedAttention, speculative decoding LLM Serving Architecture Overview Serving Framework Compariso…