NVIDIA smashes graph record with H100 cluster

NVIDIA just flexed on everyone by hitting 410 trillion traversed edges per second on the Graph500 benchmark using 8,192 H100 GPUs in a CoreWeave datacenter in Dallas. That crushed the competition by more than double and used way fewer nodes than other top entries, making it three times more cost-efficient. The system chewed through a graph with 2.2 trillion vertices and 35 trillion edges, which would be like searching every friendship connection on Earth in about three milliseconds.

The secret sauce was a custom GPU-only setup that lets H100s talk directly through InfiniBand without bothering the CPU, handling hundreds of thousands of threads sending active messages at once instead of the few hundred a CPU manages. This matters because graph workloads are messy and unpredictable, unlike the dense structured stuff AI training handles, and CPUs have been the default for years, since moving all that data around creates bottlenecks.

The win proves that huge sparse computing jobs in fields like weather forecasting and cybersecurity can run on commercially available GPU clusters instead of needing special supercomputers at national labs.
 

Attachments

  • NVIDIA smashes graph record with H100 cluster.webp
    NVIDIA smashes graph record with H100 cluster.webp
    112.7 KB · Views: 40
Top