HPC Student Cluster Competition
Cluster Configuration
Per Node
Team Poster
Competition Photos
Team Photos
Shipping Cluster
Cluster Build
H100 Power Performance Results
Over the summer leading up to the competition I ran benchmarks on a single H100 GPU using the HPL benchmark — both the Nvidia optimized benchmarks container and my own pre-optimized HPL parameter set — to mirror the environment we planned to use in competition.
Setting the TDP to 100% proved to be the most inefficient power use case. I expected efficiency to peak around 75–80% TDP, but was surprised to find it continue to increase linearly all the way down to 50% TDP.
A rough estimate placed our cluster at 350W × 2 sockets × 3 nodes = 2,100W CPU, and 400W × 4 GPUs × 3 nodes = 4,800W GPU — ~6,900W total ignoring memory and other components. Based on the applications for this year's competition, we had room to dynamically allocate resources per workload while staying under the power cap. We chose a configuration with iDRAC chips, enabling us to power nodes on and off remotely without physically touching the cluster — a competition requirement after the benchmarking phase.