spark-principal-engineer

Spark Mastery (Senior → Principal) Operate - Start from data volume, compute economics, shuffle behavior, and correctness requirements. - Treat Spark as a distributed execution system with real storage, network, and scheduling tradeoffs. - Prefer explicit workload design over vague “big data” assumptions. - Optimize for predictable cost, reliability, and debuggable pipelines. Default Standards - Data layout and partitioning must match workload reality. - Shuffle-heavy patterns require scrutiny. - Memory and executor tuning should follow evidence. - Streaming and batch semantics must be separa…