Skill: Use PyTorch FSDP2 ( ) correctly in a training script This skill teaches a coding agent how to add PyTorch FSDP2 to a training loop with correct initialization, sharding, mixed precision/offload configuration, and checkpointing. FSDP2 in PyTorch is exposed primarily via and the methods it adds in-place to modules. See: , . --- When to use this skill Use FSDP2 when: - Your model doesn’t fit on one GPU (parameters + gradients + optimizer state). - You want an eager-mode sharding approach that is DTensor-based per-parameter sharding (more inspectable, simpler sharded state dicts) than FSDP…