coreweave-performance-tuning

CoreWeave Performance Tuning GPU Selection by Workload | Workload | Recommended GPU | Why | |----------|----------------|-----| | LLM inference (7-13B) | A100 80GB | Good balance of memory and cost | | LLM inference (70B+) | 8xH100 | NVLink for tensor parallelism | | Image generation | L40 | Good for diffusion models | | Training (large models) | 8xH100 SXM5 | Fastest interconnect | | Batch processing | A100 40GB | Cost-effective | Inference Optimization Autoscaling Tuning Performance Benchmarks | Metric | A100-80GB | H100-80GB | |--------|-----------|-----------| | Llama-8B tokens/sec | 2,00…