HPE's new platform lets customers build machine learning models quickly and at scale

Deep learning illustrated by a brain over a microchip
(Image credit: Shutterstock)

HPE has launched a new system to help organisations easily build and train machine learning models at scale.

The HPE Machine Learning Development System integrates a machine learning software platform with compute, accelerators, and networking to develop and train more accurate AI models faster and at scale.

Also included is framework to help businesses share information from their AI models with each other without compromising their data.

The company revealed it has built the new system, now available worldwide, using technology from its acquisition of Determined AI last June. It has combined the startup's machine learning platform with HPE’s AI and high performance computing (HPC) offerings. HPE said that with the new system, users can speed up the time it takes to produce results from building and training machine models, from weeks and months, to days.

The company hopes its new system will help businesses bypass the high complexity associated with adopting AI infrastructure. It also helps improve accuracy in models faster with distributed training, automated hyperparameter optimisation, and neural architecture search.

The system will start at a small configuration of 32 Nvidia GPUs, all the way to a larger configuration of 256 Nvidia GPUs. On the smaller configuration, the system delivers approximately 90% scaling efficiency for workloads like natural language processing and Computer Vision.

The new system will be offered in one integrated package that provides preconfigured, fully installed AI infrastructure. As part of this, HPE Pointnext Services will provide onsite installation and software setup.

HPE’s system is offered in a small building block with options to scale up. The configuration comes with AI infrastructure using the HPE Apollo 6500 Gen10 Plus system with eight Nvidia A100 80GB GPUs. It also has HPE Proliant DL32 servers and 1GB Ethernet Aruba CX 6300 switch to control and manage system components. Lastly, it uses the Nvidia Quantum Infiniband networking platform to ensure performance of compute and storage communications.


HCI 2.0 From HPE: How it can help your business thrive

Why SMBs need to accelerate digital transformation with HCI


The company has also launched HPE Swarm Learning, which it claims is the industry’s first privacy-preserving, decentralised machine learning framework for the edge or distributed sites. Through this, a range of organisations, like those in healthcare, banking and financial services, and manufacturing, can share learnings from their AI models with other organisations to improve insights, without sharing the actual data.

It provides customers with containers that are easily integrated with AI models using the HPE Swarm API. Users can then immediately share AI model learnings within their organisations and outside with industry peers to improve training, but without the data.

Currently, the majority of AI model training occurs at a central location, which relies on centralised merged datasets. HPE said this approach can be inefficient and costly due to having to move large volumes of data back to the same source. It can also be constrained by data privacy and data ownership rules and regulations that limit data sharing and movement, potentially leading to inaccurate and biased models.

The company pointed out that by sharing learnings from one organisation to another at the data source, various industries across the world can unite and further improve intelligence that can lead to business and societal outcomes.

Sharing data externally, however, can be a challenge for organisations that are required to meet data governance, regulatory, or compliance requirements that mandate that data stay at its location. HPE Swarm helps companies to use distributed data at its source, which increases the dataset size for training while preserving data governance and privacy.

HPE said it uses blockchain technology to securely onboard members, dynamically elect a leader, and merge model parameters to provide resilience and security to the network and ensure only the AI model learnings are shared, and not the data.

Zach Marzouk

Zach Marzouk is a former ITPro, CloudPro, and ChannelPro staff writer, covering topics like security, privacy, worker rights, and startups, primarily in the Asia Pacific and the US regions. Zach joined ITPro in 2017 where he was introduced to the world of B2B technology as a junior staff writer, before he returned to Argentina in 2018, working in communications and as a copywriter. In 2021, he made his way back to ITPro as a staff writer during the pandemic, before joining the world of freelance in 2022.