club-3090-llm-serving — Skillopedia

club-3090 LLM Serving Skill by ara.so — Daily 2026 Skills collection. Community recipes for serving modern LLMs on RTX 3090 (24 GB) hardware. Supports vLLM, llama.cpp, and SGLang engines with validated Docker Compose configs exposing an OpenAI-compatible API on . Currently ships Qwen3.6-27B configs for 1× and 2× cards. --- Engine Decision Matrix | Need | Engine | Why | |---|---|---| | Max throughput (code/chat) | vLLM dual | 89–127 TPS, MTP n=3, vision, tools | | Full 262K context, no crashes | llama.cpp single | No prefill cliffs, stable tool-use | | 4 concurrent streams @ 262K | vLLM dual t…