XPU Triton Kernels for Intel GPUs This skill provides patterns and guidance for developing optimized Triton kernels targeting Intel XPU GPUs (Battlemage/Arc Pro B50). It integrates the Xe-Forge optimization framework — an LLM-driven loop that transforms PyTorch code into fast Triton kernels. Quick Start Optimize a Kernel (Xe-Forge Workflow) The full optimization workflow analyzes a PyTorch baseline, generates Triton kernel variants in a branching trial tree, benchmarks each on XPU hardware, and finalizes the best result. Supported Hardware | GPU | Architecture | XVEs | Mem BW | Key Feature |…