flash-moe-inference — Skillopedia

Flash-MoE Inference Engine Skill by ara.so — Daily 2026 Skills collection. Flash-MoE is a pure C/Objective-C/Metal inference engine that runs Qwen3.5-397B-A17B (397B parameter Mixture-of-Experts) on a MacBook Pro with 48GB RAM at 4.4+ tokens/second. It streams 209GB of expert weights from NVMe SSD on demand — no Python, no ML frameworks, just C, Objective-C, and hand-tuned Metal shaders. Requirements - Hardware : Apple Silicon Mac (M3 Max or similar), 48GB+ unified memory, 1TB+ SSD with 210GB free - OS : macOS 26+ (Darwin 25+) - Tools : Xcode Command Line Tools, Python 3.x (for weight extract…