GB300 NVL72 beats GB200 by up to 1.5x in latency benchmarks

Queen · Feb 22, 2026

Latency just took a noticeable hit as NVIDIA GB300 NVL72 flexed harder than the older GB200 in long context AI tests.

Blackwell Ultra performance jump

NVIDIA GB300 NVL72 was stress tested on DeepSeek open models.
LMSYS measured long context inference across the rack setup.
Results show roughly 1.4x to 1.5x gains over GB200 NVL72.
Latency-sensitive jobs saw about a 1.58x improvement.

Throughput and user speed gains

Peak output reached 226.2 tokens per second per GPU.
Multi Token Prediction pushed user-level speed up 1.87x.
Average uplift kept landing ahead of the prior generation.
Blackwell Ultra aims squarely at agent-style workloads.

Infrastructure level optimizations

LMSYS applied Prefill Decode disaggregation during testing.
That split prompt handling from token generation tasks.
Dynamic chunking tuned performance under long context windows.
KV capacity translation also tightened memory handling.

Cost and deployment questions

NVIDIA has not detailed the total cost of ownership yet.
Deployment expenses reportedly climbed alongside GB300.
Hyperscalers and neoclouds are eyeing it for agent systems.
VRAM-heavy workloads lean into its long context design.

GB300 NVL72 beats GB200 by up to 1.5x in latency benchmarks

Attachments

Latest media

Trending content

Sponsored

Latest posts

Featured content

Misc

NALA grabs Nigeria IMTO license for cross-border payments

Zambia rolls out SmartCare Pro to 2,000 health facilities

Showmax Originals move to DStv Stream before April shutdown

Côte d’Ivoire hikes digital budget by 37 percent

Vodacom Lesotho drops $40 million for network upgrade