Login to Continue Learning
CoreWeave demonstrated that the latest NVIDIA Blackwell AI superchip, the GB300, significantly outperforms the previous generation H100 GPU. This is evident from a benchmark conducted by CoreWeave in the DeepSeek R1 reasoning model.
NVIDIA’s new platform, powered by the Blackwell chip, offers substantial improvements over the H100. In the test, running the complex DeepSeek R1 model required 4 NVIDIA GB300 GPUs on the NVL72 infrastructure, compared to 16 NVIDIA H100 GPUs. Despite using only one-quarter of the GPUs, the GB300-based system delivered 6X higher raw throughput per GPU.
The GB300’s superior memory and bandwidth reduce parallelism overhead, enhancing performance. The benchmark chart shows that 4x GB300 GPUs outpaced 16x H100 GPUs by a factor of 6.5 tokens/s in throughput.
Due to fewer splits, inter-GPU communication is improved, and higher memory capacity and bandwidth are crucial for delivering significant performance uplifts. With the high-bandwidth NVLink and NVSwitch interconnects, the GB300 enables fast data exchange between GPUs.
For customers, this means faster token generation, lower latency, and more efficient scaling of enterprise AI workloads. The NVIDIA GB300 NVL72 rack-scale system offers 37 TB memory capacity (up to 40 TB) for running large and complex AI models, with blazing-fast interconnects delivering 130 TB/s of memory bandwidth.
In summary, the GB300 excels not just in raw TFLOPs but also in efficiency. By minimizing GPU communication overhead through reduced tensor parallelism, enterprises can achieve higher throughput with fewer GPUs, reducing costs and enabling efficient scaling.