Ask HN: How is GPU power draw measured at scale?

How do people measure power usage of GPUs at large (32x) self-hosted setups or small multi-rack setups? I've seen some PDUs which collect and transmit data, but I'm unsure of the processes and if/how people do this on small builds.

Currently, I collect NVML nvmlDeviceGetPowerUsage, polled at 100ms during inference, peak and mean per request, and get this type of data:

model mean-power range (W) spread stdev

qwen3-8b 114.3-121.9 7.6W 1.17

llama-3.1-8b-instruct 104.7-122.1 17.4W 4.29

qwen2.5-1.5b-instruct 53.7-73.0 19.3W 5.23

mistral-7b-instruct-v0.3 96.2-120.0 23.8W 6.01

qwen2.5-7b-instruct 88.7-124.5 35.8W 7.73

gemma-3-1b-it 49.4-56.7 7.3W 2.13

this is per-GPU, single-card data - I don't know whether anything like per-request attribution survives at rack scale, or whether monitoring there happens entirely at the PDU/BMC level instead.

4 points | by anax32 8 hours ago

1 comments

  • lemonademan 5 hours ago
    I personally believe once you get beyond a handful of GPUs, people probably end up using both levels of telemetry because they answer different questions. NVML is nice for per-request attribution and understanding model behavior, but I believe PDU/BMC measurements are better suited for actual power draw since they capture everything (CPUs, networking, PSU losses, fans, etc.).

    For instance, people running 32+ GPU setups probably correlate timestamps rather than trying to preserve strict per-request attribution at the rack level. This will enable these individuals to have rack/PDU power sampled every second.

    Either way, I haven't seen many people publish how they instrument this in practice so take what I wrote with a gran of salt. I simple wanted to share a little bit of what I understand and I hope it helps.