Nvidia and Microsoft team up to build 'most powerful' AI supercomputer

The Nvidia and Microsoft logos on a black background

published 17 November 2022

Nvidia has announced that it will collaborate with Microsoft on a years-long project to create an AI supercomputer, that could rank amongst the world’s most powerful, in order to help organisations train and deploy AI at scale.

The deal will see Nvidia contribute tens of thousands of GPUs, networking technology, and its full stack of AI software to Microsoft Azure cloud’s supercomputing infrastructure, which already utilises ND and NC-series virtual machines trained to work on artificial intelligence (AI) and deep learning.

Nvidia will use Azure’s virtual machine (VM) instances to train generative AI, a field which includes language models such as OpenAI’s GPT-4, and Nvidia’s own Megatron Turing NLG.

The company’s full AI stack, containing workflows and development kits certified for use on Azure, will be made available to Azure enterprise customers.

The collaboration will also see improvements to DeepSpeed, Microsoft’s deep learning optimisation software suite used to accelerate training models. Going forward, DeepSpeed will utilise Nvidia’s H100 Transformer Engine architecture to accelerate large models, including generative AI, at up to twice the speed previously possible.

“AI is fueling the next wave of automation across enterprises and industrial computing, enabling organisations to do more with less as they navigate economic uncertainties,” said Scott Guthrie, executive vice president of the Cloud + AI Group at Microsoft.

“Our collaboration with Nvidia unlocks the world’s most scalable supercomputer platform, which delivers state-of-the-art AI capabilities for every enterprise on Microsoft Azure.”

RELATED RESOURCE

The Total Economic Impact™ of IBM Spectrum Virtualize

Cost savings and business benefits enabled by storage built with IBMSpectrum Virtualize

FREE DOWNLOAD

Azure VM instances currently contain Nvidia A100s, which utilise Quantum 200Gbit/sec Infiniband networking. The addition of the H100s will see this networking speed double through the use of Quantum-2 400Gbit/sec Infiniband networking, capable of handling larger AI training sets and workloads.

Natural language processing (NLP) models like GPT-3 and its upcoming successor GPT-4 have been associated with controversial deployment in the past. At one point, models such as these were thought to be too dangerous for general release, due to their ability to convincingly craft fake news pieces and propensity for spouting hate speech.

Researchers have expressed hope that through intensive training in supercomputers, NLPs could prove invaluable for business use in text and speech comprehension, the future of virtual assistants, and through the partial or full automation of tasks such as computer programming.

This is not the first supercomputer Nvidia has worked on, nor the first AI supercomputer. In 2021, the chip giant switched on the UK’s fastest supercomputer, aimed at using AI for intensive health research, and in January 2022 it was announced that Meta would build the “world’s fastest” AI supercomputer in collaboration with Nvidia.

Machine learning vs AI vs NLP: What are the differences? HPE Cray supercomputer to boost Singapore’s met office weather forecasting Nvidia's new RTX 4090 is a powerful password-cracking tool

More recently still, Hewlett Packard Enterprise (HPE) lifted the lid on ‘Champollion’, an AI supercomputer that it plans to make available to scientists and engineers that was designed in collaboration with Nvidia.

Nvidia has invested in a wide range of AI and supercomputing projects, with the company’s own ‘Selene’ supercomputer having consistently ranked in the top ten most powerful computers in the world since its 2020 creation.

Nvidia declined to give further details on the timeline of its agreement with Microsoft.

“We are not elaborating on the terms of the deal beyond saying that it is a multi-year agreement,” said Paresh Kharya, senior director of product management at Nvidia told IT Pro.

"The goal is to accelerate the most important AI applications. Working with Microsoft Azure, we gain speed and the ability to do things at scale using less energy. We can then take AI applications such as speech recognition and large language models to advance generative AI."

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.

In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.