Nvidia's Latest Chips Show Improvement in AI Training Efficiency

Nvidia's new Blackwell chips double AI training speed, reducing chip count and transforming large model development efficiency.

Nvidia's newest chips have significantly reduced the number of chips required to train large artificial intelligence systems, according to new data released by MLCommons, a nonprofit group focused on AI benchmarking.

The data reveals that Nvidia’s new Blackwell chips have made substantial strides in AI training, outperforming their predecessors and increasing efficiency in training large language models (LLMs).

Key Findings from MLCommons' Benchmarking

The benchmarking results showed that Blackwell chips are more than twice as fast as the previous generation of Nvidia's Hopper chips, making a major leap in AI training performance.

Specifically, in the fastest results from the new Blackwell chips, 2,496 chips completed a training test in just 27 minutes, a dramatic improvement compared to the previous generation. The same task would have taken over three times as many of Nvidia's previous chips to achieve a faster time.

These results reflect an ongoing trend in the AI industry, where more efficient chip configurations are critical for speeding up training processes that can involve trillions of parameters. This efficiency is crucial for training advanced models such as Llama 3.1 405B, an open-source AI model released by Meta Platforms.

Industry Trends: Smaller Groups of Chips for AI Training

A key insight from the results is that AI companies are increasingly turning to smaller groups of chips rather than massive clusters of chips for training AI models.

According to Chetan Kapoor, chief product officer for CoreWeave, which worked with Nvidia on some of the results, this new methodology allows AI companies to accelerate training and achieve faster results without relying on huge, homogeneous chip clusters.

This change in approach signals a shift in how AI systems are being built, with a focus on efficiency and scaling up AI capabilities without needing massive chip deployments. The ability to string together smaller chip groups into subsystems for specific AI training tasks is driving forward the AI industry's ability to handle multi-trillion parameter model sizes more efficiently.

Also read: Nvidia Expands AI in Medical Devices with J&J, GE Healthcare & Philips Partnerships

Competitive Landscape and AI Training

According to Reuters, the chip performance results come amid growing competition in the AI market. Companies like China’s DeepSeek are claiming to create competitive chatbots with fewer chips than US rivals, which highlights the importance of performance efficiency in the race for AI supremacy.

The results also emphasize the critical role that chips play in training AI models, a process that remains a key competitive concern even as the market shifts toward AI inference, where trained models interact with users.

Nvidia's Latest Chips Show Improvement in AI Training Efficiency

Key Findings from MLCommons' Benchmarking

Industry Trends: Smaller Groups of Chips for AI Training

Also read: Nvidia Expands AI in Medical Devices with J&J, GE Healthcare & Philips Partnerships

Competitive Landscape and AI Training

Related Topics