Image Classification Best Practice Architecture selection: - Small scale (CIFAR-10/100): ResNet-18/34, WideResNet, Simple ViT - Medium scale: ResNet-50, EfficientNet-B0/B1, DeiT-Small - Large scale: ViT-B/16, ConvNeXt, Swin Transformer Training recipe: - Optimizer: AdamW (lr=1e-3 to 3e-4) or SGD (lr=0.1 with cosine decay) - Weight decay: 0.01-0.1 for AdamW, 5e-4 for SGD - Data augmentation: RandomCrop, RandomHorizontalFlip, Cutout/CutMix - Warmup: 5-10 epochs linear warmup for transformers - Batch size: 128-256 for CNNs, 512-1024 for ViTs (if memory allows) Standard benchmarks: - CIFAR-10: 96…