Why networking is just as important as compute in AI data centers
Although compute accounts for the majority of data center investment, networking is becoming ever more critical as focus shifts from training to inference
The AI boom is fueling high demand for compute power and this is leaving enterprises with no choice but to spend billions of dollars on AI infrastructure. Blackstone has estimated that data centers in the US alone will need $1trn in investment by the end of the decade just to keep pace with demand.
“Bigger and better AI requires compute – a lot of it,” Mike Bushong, vice president of data center at Nokia, tells ITPro. “And, for every megawatt of data center capacity deployed today, networking is the second-largest budget item behind the AI systems themselves.”
There’s a good reason for this. Generative AI models rely on thousands of graphics processing units (GPUs) in a single data center to share information across nodes and racks in real-time. If GPUs are brain cells, then a network is the nervous system connecting them all by sending signals.
Unlike traditional workloads, high-bandwidth, low-latency fabrics – these are unified networks of switches – are critical for data to be sent along this nervous system in real-time. Performance and reliability have always been important, points out Bushong, but “they’ve now become the determining factor in the return on AI investments”.
Yoram Novick, CEO of storage-as-a-service provider Zadara, warns that “simply adding more GPUs without ensuring adequate interconnect bandwidth can lead to diminishing returns”. Underinvesting in networking, he adds, can cause “communication bottlenecks can idle expensive compute resources”.
Knowing the networking technologies
At the center of AI networking are a handful of key technologies to know about, namely Ethernet, InfiniBand, NVLink, and Ultra Accelerator Link (UALink).
Ethernet and Infiband are technologies that connect multiple servers. The latter has been the go-to choice, especially for large-scale AI training, because of its ultra-low latency and higher bandwidth. According to research by Dell’Oro Group, the market was dominated by Infiniband in 2023, with an 80% share, but it’s expected to be overtaken by Ethernet in the near future.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Dell’Oro Group expects the majority of switch ports deployed in AI back-end networks will be 800 gigabits per second this year, 1.6 terabits per second (Tbps) in 2027, and 3.2 Tbps in 2029.
Ethernet has been positioning itself to challenge InfiniBand’s dominance. The first version of 800-gigabit Ethernet was released in February 2024, while the Ultra Ethernet Consortium (UEC) has been set up to enable Ethernet to support more complex and intensive AI workloads. The UEC’s founding members include AMD, Broadcom, Cisco, Intel, Meta and Microsoft. The 1.6-terabit Ethernet standard is expected to be finalized next year.
To put it simply, Ethernet and InfiniBand are primarily for scaling out networks. On the other hand, NVLink and UALink are designed for scaling up networks.
Indeed, a number of widely-used, purpose-built networking technologies are based on Ethernet. This includes HPE Slingshot, the firm’s networking fabric technology designed for high-performance computing (HPC) and AI computing capacity, which is used in some of the world’s fastest supercomputers such as El Capitan, Frontier, and Aurora. New HPE supercomputers such as Discovery will also use Slingshot.
To put it simply, Ethernet and InfiniBand are primarily for scaling out networks. On the other hand, NVLink and UALink are designed for scaling up networks.
NVLink is Nvidia’s flagship interconnect technology. Established in 2014, it enables communication between GPUs in a single server, allowing them to share memory and compute. For more demanding AI workloads, Nvidia also offers the NVSwitch technology, which can support multiple NVLink connections.
At the end of last year, several members of the UEC helped to establish the UALink Consortium. The aim is to challenge NVLink’s dominance with a new standard for low-latency, high-speed communication between accelerators and switches.
The inaugural UALInk specification was released in the first half of this year and can support up to 1,024 accelerators connected through UALink switches. For comparison, the NVL72, a server rack that utilizes the latest NVLink technology can connect 72 Blackwell GPUs – up to 576 GPUs can be supported when a maximum of eight NVL72 racks are interconnected.
Enabling the shift from training to inference
Both Bushong and Novick agree that enterprises building and scaling data centers – be that up or out – need to strike a balance between raw compute power and data transfer capabilities. This balance is becoming ever more important as AI workloads transition from training to inference.
Inference is the term used to describe the processing of prompts by AI models, to produce outputs, predictions, or other desired outcomes. In short, it’s a blanket term used for all real-world generative AI use cases, in which AI models actually process data, as opposed to the training of new models.
Much has been written about the intense compute demands of AI training. This is largely true, as training clusters process tens of trillions of tokens including publicly-available data and synthetic data to form the backbone of frontier LLMs.
While inference clusters may not be crunching data, there’s more potential for critical business processes to hinge on them as they power much of the real-world use cases for cloud AI. They need exceptionally low latency in order to handle workloads without impacting user experience, as well as a high degree of reliability even when a huge number of users are sending prompts to be processed.
Right now, training accounts for around 60% of data center expenses, according to Bloomberg Intelligence analysts. But this is expected to shrink to as little as 20% by 2032 as more resources are shifted towards inference. As demand for inference grows, workloads won’t just be run across clusters in a single data center but across multiple data centers.
“Inferencing systems need to be tightly integrated with networking architectures that can handle high volumes of concurrent requests and deliver results with minimal latency,” says Novick.
Bushong adds that the new age of AI infrastructure means that efficiency is defined by networking performance and not compute speed alone. “Enterprises can maximize their investments and gain real differentiation through networks built explicitly for AI,” he concludes.
Rich is a freelance journalist writing about business and technology for national, B2B and trade publications. While his specialist areas are digital transformation and leadership and workplace issues, he’s also covered everything from how AI can be used to manage inventory levels during stock shortages to how digital twins can transform healthcare. You can follow Rich on LinkedIn.
-
Small businesses can't get cyber strategies up and running – here's whyNews SMBs are turning to outside help to shore up security as internal strategies fall flat
-
Apple M5 MacBook Pro 14in reviewReviews Literally looks the same as the M4 model, and only really a minor upgrade, but it is still a tremendous work machine
