Dell Technologies and data analytics platform Starburst have announced a new partnership, with the intention of building an advanced data lakehouse solution for better oversight and control of enterprise data.
The initiative, announced at Big Data London, will use Dell’s storage expertise and Starburst’s engine to allow on-demand access to decentralized data.
Customers will then be able to federate and activate data around this lakehouse from a single point of access, which the firms hope will enable more detailed data analysis and for customers to have more oversight of training for artificial intelligence (AI) and machine learning (ML) systems.
Data lakehouses are a model that has arisen in the past few years that combine the structured and unstructured information stored in data warehouses and data lakes. They are particularly useful for performing responsive searches on raw data.
“Dell Technologies is on a journey to a data lakehouse architecture,” said Joe Steiner, CTO of unstructured data solutions at Dell Technologies.
“We have big plans, and step one on our journey is a common query engine, and that's what we're doing with Starburst.
“For far too long our customers, like you, have been bound by the limitations of proprietary databases, data lakes, and data warehouses. My personal feeling is that's going to come to an end. An open ecosystem is emerging, and my customers want these open ecosystem capabilities.
“We're co-engineering solutions, and we're going to deliver some incredible capabilities very soon."
Rick DeMare, global business development leader at Starburst said that his firm’s engine will “sit on top” of Dell’s data lakehouse, with the aim of giving customers warehouse-like speed over all the forms of data contained within. This will also allow customers to federate and activate their data across the lakehouse from a single point of access.
Create the ideal hybrid workplace that will keep you competitive.
On average, Starburst says this approach can help customer systems go 90% faster and reduce the cost of ownership by 53%. In spite of its new partnership with Dell, Starburst will maintain its vendor-agnostic approach and to this end is committed to upholding open file and table formats across its systems.
DeMare also rejected the use of the phrase ‘single source of truth’ used by competitors, describing it as a “single source of lies”.
"It's a mess, it's never been more of a mess,” he said.
DeMare cited a report by S&P Global Market Intelligence which found that on average, firms now maintain 5.4 copies of data between their cloud environments and on-premise data.
He criticized solutions that bill themselves as new technologies but are in effect just data silos, including data lakes, and argued that existing ‘single source of truth’ architecture produces monolithic, closed systems that are expensive to scale.
Easier access to data lakehouses could work to address CIO concerns over cloud complexity. A recent study by Dynatrace found that 47% of CIOs were in favor of more lakehouse structures, to enable greater use of automation.
Prepping for AI, and reducing CIO strain
Steiner and DeMare made the announcement on the keynote stage at Big Data London 2023, an event this year dominated by strategies and solutions aimed at organizing business data for use in AI and ML applications.
The explosion of interest in generative AI, in particular, has put new demands on data teams. Large language models (LLMs) require vast swathes of curated data to function optimally, which requires firms to have a good grip on both structured and unstructured data, and oversight of which data is being used for AI systems to ensure privacy, security, and safety is upheld.
At Dell Technologies World 2023, Dell Technologies global CTO John Roese told ITPro that curation of data was the most important factor for making any LLM work correctly.
Dell’s own effort to remove non-inclusive language such as ‘whitelist’ and ‘blacklist’ from its content repository, for example, would allow for an AI to be trained on the firm’s internal code without fears of unwanted biases appearing in output.
Roese also pointed to the fact that neural networks can make faster connections between data that is unlabelled, as human labels may be seen as unnecessary or arbitrary. In this regard, Dell and Starburst’s data lakehouse could have an advantage over competitors in that it allows firms to quickly draw together data in a variety of forms.
"If everybody can access the data the same way, then they can have the fuel that they need to start working on their generative AI products,” said DeMare.
Recent Gartner research presented evidence that IT teams at many organizations are concerned about the risks of passing their data through public AI systems run by hyperscalers such as Azure OpenAI, and that many firms are weighing the safety of running their own on-premise AI models against the far lower costs of public AI.
DeMare claimed Starburst and Dell's project can help companies to find and manage sensitive data to ensure they have controls over what is and is not given over to public AI firms.
"Maybe you don't want to share all your data with a hyperscaler, which is a general requirement of those generative AI tools."
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2023.
Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at firstname.lastname@example.org or on LinkedIn.