The green team just absolutely nuked their competition in the latest artificial intelligence benchmarks. Signal65 published an analysis showing that NVIDIA Blackwell GB200 NVL72 hardware leaves AMD Instinct MI355X chips in the dust when handling Mixture of Experts setups, which is basically the new standard for making large models actually run efficiently. This hardware stack utilizes a design philosophy that integrates seventy-two chips alongside thirty terabytes of unified memory to solve the massive data transfer bottlenecks that usually slow down these complex sub-networks. While these specific architectures require intense communication across nodes, the Blackwell servers managed to pump out seventy-five tokens every second per processor. This output represents a throughput advantage of twenty-eight times over what the rival cluster achieved in the same environment.
The financial side of this slaughter looks even worse for the competition when looking at the total cost of ownership. Data pulled from Oracle Cloud pricing indicates that the NVL72 racks deliver a relative cost per token that is fifteen times lower than the alternative. This efficiency gap explains why hyperscalers keep throwing money at the same vendor since they get way more interactivity for every dollar spent. The architecture thrives because it was built from the ground up to handle the specific pressures of expert parallelism, whereas other dense environments struggle with the latency inherent in scaling these models.
Even though the red team tries to stay relevant with high memory capacity in their latest silicon, they currently lack a rack-scale solution that can keep up with this level of optimization. The gap might shift once newer generations like Vera Rubin or Helios eventually hit the market, but the current landscape shows one company dominating every single phase of the workload from prefill to decode. This annual release cycle keeps the market leader ahead of any attempts to close the performance distance in the evolving AI space.
The financial side of this slaughter looks even worse for the competition when looking at the total cost of ownership. Data pulled from Oracle Cloud pricing indicates that the NVL72 racks deliver a relative cost per token that is fifteen times lower than the alternative. This efficiency gap explains why hyperscalers keep throwing money at the same vendor since they get way more interactivity for every dollar spent. The architecture thrives because it was built from the ground up to handle the specific pressures of expert parallelism, whereas other dense environments struggle with the latency inherent in scaling these models.
Even though the red team tries to stay relevant with high memory capacity in their latest silicon, they currently lack a rack-scale solution that can keep up with this level of optimization. The gap might shift once newer generations like Vera Rubin or Helios eventually hit the market, but the current landscape shows one company dominating every single phase of the workload from prefill to decode. This annual release cycle keeps the market leader ahead of any attempts to close the performance distance in the evolving AI space.