HPE partners with NVIDIA to accelerate AI model training

The words "Hewlett Packard Enterprise" on an HPE office shot with a telephoto lens.

(Image credit: Getty Images)

published 14 November 2023

Hewlett Packard Enterprise (HPE) has announced the launch of a new supercomputing solution designed specifically for training and fine-tuning generative AI models.

The new offering from HPE will provide large enterprises, research institutions, and government organizations with an out-of-the-box and scalable solution for the computationally intensive workloads generative AI requires.

Designed in partnership with Nvidia, the platform will comprise a software suite for users to train and fine-tune AI models, liquid-cooled supercomputers, accelerated compute, networking, storage, and services, all aimed at enabling organizations to leverage the power of AI.

The software suite will contain three tools to facilitate accelerated development and deployment of AI models for organizations.

HPE’s Machine Learning Development Environment will simplify data preparation and help customers integrate their models with popular machine learning (ML) frameworks.

The HPE Cray Programming Environment will provide stability and sustainability with a set of tools for developing, porting, debugging, and refining code.

NVIDIA’s AI platform, NVIDIA AI Enterprise, offers customers over 50 frameworks, pretrained models, and development tools to both accelerate and simplify AI development and deployment.

Based on HPE Cray EX2500 exascale-class system, this will be the first system to feature a quad configuration of NVIDIA’s Grace Hopper GH200 Superchips that boasts 72 ARM cores, 4 PFLOPS of AI performance, and 624GB of high speed memory.

This will enable the solution to scale up to thousands of GPUs with the ability to allocate the full capacity of nodes to a single AI workload for faster time-to-value and speed up training by 2 – 3x.

HPE’s Slingshot Interconnect networking solution will help organizations meet the scalability demands of AI model deployment, with an ethernet-based high performance network made to support exascale-class workloads.

Finally, HPE will bundle-in its Complete Care Services package to ensure customers receive expert advice in terms of setup, installation, and ongoing support.

Justin Hotard, executive vice president and general manager, HPC, AI & Labs at HPE said AI-tailored solutions are required to support the widespread deployment of the technology.

“To support generative AI, organizations need to leverage solutions that are sustainable and deliver the dedicated performance and scale of a supercomputer to support AI model training. We are thrilled to expand our collaboration with NVIDIA to offer a turnkey AI-native solution that will help our customers significantly accelerate AI model training and outcomes” he said.

Sustainability a key to the future of supercomputing

According to estimates in Schneider Electric’s Energy Management Research Center, by 2028, AI workloads will have grown to require 20GW of power within data centers.

RELATED RESOURCE

A whitepaper from Nvidia on how to deliver secure, trustworthy, and scalable AI

This webinar explores the state of AI in the financial services industry and its potential

WATCH NOW

This means organizations looking to implement AI and maintain their ESG goals will need incremental compute additions to be realized in a carbon neutral way.

HPE said it has integrated a direct liquid-cooling solution to address these concerns.

The firm already delivers the majority of the world’s top 10 most efficient supercomputers using its direct liquid cooling (DLC) capabilities to lower the energy consumption for compute-intensive applications.

It estimates that its DLC capabilities will be able to drive up to a 20% performance improvement per kilowatt over air-cooled solutions, while consuming 15% less power.

HPE’s supercomputing solution will be generally available in December 2023 in more than 30 countries.

TOPICS

Solomon Klappholz is a former staff writer for ITPro and ChannelPro. He has experience writing about the technologies that facilitate industrial manufacturing, which led to him developing a particular interest in cybersecurity, IT regulation, industrial infrastructure applications, and machine learning.