Google Cloud announces eighth-generation TPUs, boasting AI training and inference leaps

A CGI visualization of Google Cloud's eighth-generation TPUs, 8t (on the left) and 8i (on the right) with motherboards behind them. — (Image credit: Google Cloud)

Google Cloud has announced a generational overhaul of its AI infrastructure, spearheaded by two new tensor processing units (TPUs) designed to meet demands for frontier model training and inference.

TPU 8t and TPU 8i represent a significant leap in performance and memory optimization over preceding hardware, underpinned by larger high bandwidth memory, faster storage access, and a redesigned machine learning architecture.

The training-focused TPU 8t is approximately 2.8x more powerful than the seventh generation TPU Ironwood, announced at last year’s event, and is capable of 121 FP4 exaflops of AI compute.

Google Cloud said that TPU 8t will dramatically reduce the development lifecycle for frontier models, allowing for new models to be trained in weeks not months.

Google Cloud eyes agent orchestration, inference gains

Compute for inference has fast become a bottleneck, as AI agents need consistent data center capacity to proactively complete tasks and run in enterprise environments around the clock.

To meet this demand, TPU 8i has been built from the ground up to run AI agent ‘swarms’ with minimal latency and no chips sitting idle. Each 8i TPU comes with 384MB of onboard SRAM, allowing an agent’s short term memory to be retained entirely on-chip.

Google Cloud added that each 8i server contains double the CPU hosts, in the form of its in-house Axion CPUs based on Arm designs. These orchestrate tasks such as data input/output within the AI stack, with Google Cloud claiming they improve overall inference performance.

TPU 8i also benefits from Boardfly, a brand new network topology that Google Cloud announced today at Google Cloud Next 2026. This replaces the ‘3D torus’ topology used in previous generations and in 8t, which connects thousands of chips for throughput for training workloads.

Instead, Boardfly focuses on a new chip connection with increased ports, leading to a lower overall network diameter – the longest efficient distance between two nodes within a network. After implementing Boardfly for TPU 8i, Google Cloud recorded up to a 50% latency improvement.

“One of the things that we've had to realize is that the way that we were connecting the chips together, the network topology… our default way of connecting them together didn't support latency, it supported throughput, it supported bandwidth,” explained Amin Vahdat, SVP and chief technologist, AI and Infrastructure at Google.

“It was really good at getting large amounts of data through. But what you really care about in the age of agents is the latency, the minimum time it takes to get the data.”

Vahdat added that Google has been working on TPU 8t and 8i internally for two years, before AI agents were even widely adopted or understood, based on internal discussions with DeepMind about the direction AI was headed and where future bottlenecks could lie.

The move to split its TPU line-up puts Google in a similar position to the likes of AWS, which also segments its application-specific integrated circuits (ASICS), Trainium and Inferentia.

“For us, it’s a natural evolution,” said Google Cloud CEO Thomas Kurian in a briefing for the assembled press.

“We've been working on these chips and systems for multiple years now. And so when AI came along with particularly generative AI became, you know, widespread, we felt that people would want systems that were more optimized for training, and separately, systems that were more optimized for inference.”

AI hypercomputer innovations

In addition to its first-party hardware, Google Cloud relies on its extensive in-house infrastructure and cloud architecture to serve customers with enterprise AI capabilities.

It calls this the ‘AI hypercomputer’, which encompasses Google Cloud’s specialized compute, storage, networking, and software.

Last year, Google Cloud announced it would give customers access to its machine learning runtime Pathways. This is a Google DeepMind system which connects TPU pods and splits inference tasks across them in real time to efficiently act as a unified compute cluster for multimodal AI.

This year, Google Cloud has unveiled Virgo Network, its new AI data center fabric optimized for both high-capacity data access and massive, low latency throughput across TPU pods. Virgo Network is a flat,

Within a single data center, Virgo Network also allows up to 134,000 TPU 8t chips to work as a single fabric, a feat that is notable for organizations with data residency requirements for training models on sensitive data.

Combining Virgo Network, Pathways and Google DeepMind’s open source Python library JAX, Google Cloud is now able to connect more than one million TPU 8t chips across multiple data centers in a single training cluster for frontier model training.

For TPU 8i, Google Cloud said Virgo Network can lower latency on a network without other traffic by 40% compared to Ironwood.

On top of this, Virgo Network was designed to give network engineers constant observability over compute clusters, enabled by sub-millisecond telemetry and automated identification of nodes that are having a detrimental impact on AI training and inference jobs.

Energy optimization and reliability

Alongside their performance improvements, TPU 8t and 8i are Google Cloud’s most efficient and reliable chips to date.

Google Cloud stated that through chip-level innovations and new approaches such as its fourth-generation liquid cooling, it achieves double the performance per watt with the new TPU family compared to Ironwood.

This could be a key metric as customers look to eat the costs of running an increasing number of AI agents in their cloud environment, as it means more tasks can be achieved with the same energy cost.

Microsoft has predicted that 1.3 billion agents will be in operation by 2028, which will require not just a jump in infrastructure for inference, but also successive jumps in energy efficiency to make these workloads economical to run.

TPU energy improvements aren’t just good for AI workloads, either. Vahdat said that Citadel Securities, one of the world’s largest market making firms, has used TPUs to reduce the costs of its trading systems by 30%.

Kurian stated that Google Cloud has dedicated years to optimizing its TPUs, including early focuses on areas that weren’t seen as a market priority until recently.

“For example, we designed these systems to be super efficient in terms of how much power they use, because we felt that power efficiency would become a constraint as people continue to scale both training and inference.

“And the response from customers to both at and AI has been very rewarding to see, and it validates that people increasingly are specializing how they’re deploying AI infrastructure, whether it’s for training or inference.”

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.

In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.