TileKernels GPU Kernel Library Skill by ara.so — Daily 2026 Skills collection. TileKernels is a high-performance GPU kernel library for LLM operations (MoE routing, FP8/FP4 quantization, transpose, engram gating, Manifold HyperConnection) written in TileLang — a Python DSL for expressing GPU kernels with automatic optimization. Kernels target NVIDIA SM90/SM100 (Hopper/Blackwell) architectures and approach hardware performance limits. Requirements - Python 3.10+ - PyTorch 2.10+ - TileLang 0.1.9+ - NVIDIA SM90 or SM100 GPU (H100/H200/B100/B200) - CUDA Toolkit 13.1+ Installation Project Structur…