AI Data Lake 2.0: how one-stop data-ready infrastructure accelerates high-quality data supply

Huawei presenting at IDI Forum 2026 — (Image credit: Huawei)

By ITPro

The age of agentic AI is upon us, transforming industries and the way we work. But for businesses to fully appreciate its potential, they must get their data ready. This means building a data foundation that’s capable of powering agents at scale.

It’s a revolution with the storage industry at its heart. Data storage has evolved over the last few years, moving from a method of merely housing data to a fully holistic platform for data processing and AI inference.

"AI is unlocking new opportunities for the IT industry", said Yuan Yuan, VP of Huawei and president of Huawei's data storage product line.

"The next chapter of AI is data. Committed to technological innovation in data storage, Huawei will accumulate the experience of industrial AI adoption, and work closely with the entire industry to help customers accelerate their journey into the intelligent era."

As Yuan described it at the Huawei Innovative Data Infrastructure (IDI) Forum 2026 (Paris), this is about "Data Awakening, Infra Evolving".

Yuan also pointed out that enterprises must evolve their existing IT architecture into an “AI DC data infrastructure”. This is so they can accelerate AI adoption. However, this data infrastructure must be systematically planned and built around specific pillars – data lakes, AI data platforms, compute power, models, agent frameworks, and data resilience. Of these, it’s the data lakes that are the first step and the key to bringing AI into production.

What is an AI data lake, and how can it transform your business?

Storage is the key when it comes to powering your operations with AI, but the role is fundamentally different.

“A data lake can help you consolidate all the data across your organisation and help you to generate a corpus and all the training to serve your inference and AI procedures,” Yuan said at the IDI event.

AI data lake provides more flexibility, keeping structured and unstructured data stored but not necessarily assigned to a specific task. This removes a barrier for retrieval and use, which is better for machine learning and agents.

Huawei’s data lake solution, which combines data lake storage and data management, is founded on three core capabilities: store data well, govern it well, and use it well. Converging scattered, heterogeneous, and massive data, it provides a high-quality AI corpus that speeds up model training and inference, which ultimately empowers enterprises to embrace AI.

A report presented at the World Artificial Intelligence Conference concluded that the quality of an AI model will depend 20% on algorithms and 80% on data quality. Data infrastructure needs to offer high-performance, strongly consistent, resilient, and reliable data access services to ensure that high-quality data can be effectively used in AI computing. So, the data infrastructure featuring future-ready storage power and a perfect fit to AI computing has become the key to bolstering the era of large AI models

Huawei’s AI Data Lake solution uses the OceanStor Pacific storage – a high-density, but low-power consumption system – that can deliver 11PB capacity in a 2U space, meaning it can provide optimal total cost of ownership (TCO). Being both economical and eco-friendly is very helpful for AI applications and sustainable development.

A high-quality dataset should not just be about effective storage and management. It should also be easy to find, quick, and usable accurately. In data management, DME Omni-Dataverse, Huawei’s unified data space solution, provides fast retrieval capabilities that enable retrieval among hundreds of billions of files within mere seconds. It does this while supporting real-time scalar and vector search at that scale. What’s more, it meets high-performance data recall requirements for scenarios like large-model training, RAG, and AI agents, unlocking the value of data as a key asset.

“The data decides model quality. Most of the time, they need to find the right data, especially in extreme situations,” Yuan said. “Ultra-fast semantic search is kind of high-dimensional data research, meaning you can match from image to image, video to image, image to text, these kinds of searches.”

Data-ready infrastructure is of vital importance

As the AI boom continues, enterprises are now headed for an era focused on inference. This, in turn, creates an urgent need to overcome critical challenges that hold back real-world adoption and deployment across industries.

Take, for example, a car company with a bold pledge to have no steering wheels in its models by 2030. However, during the actual application of AI, the company said it needed a huge amount of data: over 1000 PB of data from radars, sensors, the environment and the environment. This, the company felt, would lead to successful ‘level 5 autonomous driving’ – the highest level of car automation. As Huawei explained, the business would also need to manage that amount of data with affordable TCOs. That means training and managing data across different data centers, with global visibility. Here, ultra-fast semantic search is crucial.

“The data decided model qualities,” Yuan explained. “Most of the time, they need to find the right data, especially in extreme situations. For example, red lights, a running dog, or a rainy day, many pictures will be presented in this kind of scenario.

“You need to find out in the training platforms. That means 100 billions of files in a matter of second. That’s the function of the data lake: massive data capacity, global visibility, and fast semantic retrieval data.”

AI, however, is a slow process. From start to finish, the development and deployment are very lengthy and time-consuming. Huawei’s one-stop AI toolchain, ModelEngine, can streamline this development and speed up the deployment of large AI models, helping users turn data into AI programs faster than before. The AI toolchain falls under the development enablement and Huawei provides end-to-end AI toolchains that support multimodal data processing and automated cataloging.

Then they use the DME Omni-Dataverse, which is Huawei’s unified data space solution. It supports multimodal, cross-site, and real-time data imports, with visibility over global data and its management. That includes the retrieval from hundreds of billions of 1,000-dimensional vectors in just seconds. These capabilities are needed to achieve high-quality data aggregation and supply.

Data infrastructure is being rapidly changed by AI. And it's about more than just models and computational power. A deeper collaboration between storage and compute is the way forward. As Huawei Data Storage shows, intelligence starts with data.

TOPICS

ITPro is a global business technology website providing the latest news, analysis, and business insight for IT decision-makers. Whether it's cyber security, cloud computing, IT infrastructure, or business strategy, we aim to equip leaders with the data they need to make informed IT investments.

For regular updates delivered to your inbox and social feeds, be sure to sign up to our daily newsletter and follow on us LinkedIn and Twitter.