Google Unveils Ironwood as Next-Gen TPU for AI Inference

Google introduces Ironwood TPU at Cloud Next 2025, boosting AI performance with next-gen hardware designed for inference workloads, offering 42.5 exaflops of computing power.

At Google Cloud Next 2025, Google introduced its latest innovation in AI hardware, the Ironwood Tensor Processing Unit (TPU). This seventh-generation TPU represents Google’s most advanced and scalable AI accelerator yet, engineered explicitly for inference workloads. With Ironwood, Google aims to meet the growing demands of AI models that not only process data but also proactively deliver insights.

Ironwood builds on over a decade of TPU development, driving both Google’s own services and supporting Google Cloud customers’ complex AI needs. This next-generation TPU is crafted for what Google calls the "age of inference," where AI systems actively generate insights rather than simply respond to queries.

Supporting Proactive AI Models at Massive Scale

Designed for high-demand generative AI applications, Ironwood addresses the computational intensity and communication requirements of modern "thinking models," such as large language models (LLMs) and Mixture of Experts (MoEs). These systems require powerful parallel computing and efficient data access.

Ironwood’s advanced architecture minimizes data transfer and latency, thanks to its low-latency, high-bandwidth Inter-Chip Interconnect (ICI) system, which allows thousands of TPUs to work in perfect synchrony.

Configurations scale from 256 chips to an impressive 9,216-chip setup, delivering a staggering 42.5 exaflops of computing power, over 24 times that of El Capitan, the world's largest supercomputer.

Every chip in the Ironwood family offers 4,614 teraflops of peak compute performance, making it a major step forward in AI capabilities. Its architecture ensures that data is consistently available, maximizing efficiency even at massive scale.

Enhanced Features for Diverse AI Workloads

Ironwood incorporates an upgraded SparseCore accelerator to process ultra-large embeddings, which are crucial for recommendation engines and advanced AI applications beyond typical use cases, such as finance and scientific research.

Google’s Pathways ML runtime, created by DeepMind, plays a key role in distributing computing across thousands of TPU chips. This makes it easy for developers to build even larger models by combining hundreds of thousands of Ironwood chips, advancing the possibilities of generative AI.

Also read: Andreessen Horowitz Eyes Record-Breaking $20 Billion AI Fund

Breakthroughs in Performance and Efficiency

Ironwood offers major improvements over Trillium, including double the performance per watt, advanced liquid cooling for sustained high performance, and 192 GB of High Bandwidth Memory (HBM), six times more than Trillium.

With 7.2 terabytes per second of HBM bandwidth (4.5 times faster than Trillium) and 1.2 terabits per second bidirectional interconnect speeds (1.5 times faster), Ironwood enables faster data access and seamless chip communication for demanding AI tasks.

Using extensive experience developing AI infrastructure for services like Gmail and Google Search, Google Cloud has integrated its learnings into Ironwood. The result is a high-performance, energy-efficient AI accelerator ready to power the next frontier of AI development.

Google Unveils Ironwood as Next-Gen TPU for AI Inference

Supporting Proactive AI Models at Massive Scale

Enhanced Features for Diverse AI Workloads

Also read: Andreessen Horowitz Eyes Record-Breaking $20 Billion AI Fund

Breakthroughs in Performance and Efficiency

Related Topics