LLM Engine Selection SOP State-of-the-art warning. This skill is dated May 2026 . The inference-engine landscape moves in 3–6 month cycles (TGI exited maintenance into deprecation in late 2025; SGLang's RadixAttention regressed vLLM's lead in 2024; TensorRT-LLM dropped its proprietary-engine requirement in 2025; Ollama added concurrent-request support mid-2025). Re-verify before betting a quarter of eng budget on any choice below. --- 1. 何时激活 (When to activate) Activate this skill any time a coder-agent must: - Pick a serving stack for a new project (production hosting / batch / edge / dev la…