Databricks injects array of AI tools into Lakehouse

The Databricks logo on a Macbook Pro
(Image credit: Getty Images)

Databricks has launched a raft of new features for its flagship Lakehouse data lake platform, including artificial intelligence (AI) and natural language improvements as well as data governance features.

With Lakehouse AI, customers can develop generative AI applications, including large language models (LLMs), within the platform itself, with tools spanning the breadth of the AI lifecycle. This includes data collection, model development, and monitoring.

“This gives you control over your intellectual property,” said Databricks CEO Ali Ghodsi during his keynote address at Databricks Data + AI Summit 2023. He added every company in the next five years “will be an AI and data company” and would need the tools to create their own LLMs to be competitive, rather than rely on third-party services.

What is LakehouseIQ?

LakehouseIQ is the firm’s ‘knowledge engine’ that’s designed to make natural language interfaces, like chat bots, more accurate and relevant to the needs of a particular organization.

Sitting as a fabric between data and a front-end interface, the tool uses generative AI to understand a business’ unique culture, jargon, and data usage patterns, among other factors.

RELATED RESOURCE

Whitepaper cover with title below an image of a cityscape with the sunrise in the background

(Image credit: Dell)

Modern storage: The answer to multi-cloud complexity

Innovative organizations need innovative storage to manage and leverage their data no matter where it lives

DOWNLOAD FOR FREE

It ingests a vast quantity of internal material, including organizational charts, notebooks, and more, in order to understand how best to answer questions users ask via any chat interface.

It’s designed to be specific to each business, so natural language interactions are far more accurate and context-sensitive than classic chat bots. While not a chat bot or virtual assistant itself, it’s designed to improve the reliability and performance of these tools.

Databricks has made Lakehouse IQ available as an API, and the firm plans to integrate this into open source libraries.

New AI features in Lakehouse

Vector Search gives developers the capacity to improve accuracy of generative AI responses. The features creates vector embeddings based on files found in Unity Catalog — the central hub that gives customers access control, auditing, lineage and data discovery capabilities across their workspaces.

It creates vector embeddings from Unity Catalog files, and automatically updates them through integrations with Databricks Model Serving. Developers can also add query filters to provide better outcomes.

Improvements to AutoML also adds low-code functionality to finetuning LLMs. Any tweaks customers make to their LLMs will be owned by the enterprise, with data kept in-house and not shared with any third parties. This model can also be sahred within an organization with Model Serving, Unity Catalog and MLflow integrations.

Finally, Databricks has launched several open source models through its marketplace including MosaicML’s MPT-7B and Hugging Face’s Falcon-7B, as well as Stable Diffusion.

On the LLMOps front, MLflow AI Gateway lets customers centrally manage credentials for software as a service (SaaS) models or APIs, and provide access-controlled querying. This also allows for prediction caching to track repeat prompts and rate limiting to reduce costs.

MLflow Prompt Tools, meanwhile, is a no-code feature that allows users to compare different outputs from various models based on prompts that are automatically tracked within MLflow.

New governance features in Lakehouse

Also new in Lakehouse is LakeHouse Federation capabilities, which has broken down silos between data systems that previously existed on the platform. As a result, businesses are now able to discover, query and govern data across all their systems without the need to copy or move data. 

Query federation comprises catalog and querying functionality to consolidate and map out data asses from different platforms beyond the confines of Databricks, including MySQL, PostgreSQL, Redshift, Snowflake, Azure AQL Database, Azure Synapse, BigQuery, and others. Users can secure, audit and access data from one interface rather than having to move the data between different platforms.

Additional governance tools in Unity Catalog also lets customers apply access policies on any data asset that’s registered, including tables, rows, columns and tags. This comes ahead of future plans to define data access policies and push them out to other data warehouses.

The Lakehouse Federation's capabilities also include giving users a common approach to discovering and exploring both structured and unstructured data with a single query. Domain-owned data sources for analytics and AI use cases are also rapidly exposed, offering faster access to data. Finally, the changes make it possible for users to apply a single permission model for the data estate.

Keumars Afifi-Sabet
Contributor

Keumars Afifi-Sabet is a writer and editor that specialises in public sector, cyber security, and cloud computing. He first joined ITPro as a staff writer in April 2018 and eventually became its Features Editor. Although a regular contributor to other tech sites in the past, these days you will find Keumars on LiveScience, where he runs its Technology section.