Databricks announces major contributions to flagship open source projects

Open source cloud with endpoints underneath
(Image credit: Shutterstock)

Databricks has announced several new contributions to popular data and AI open source projects, including Delta Lake, ML flow, and Apache Spark.

At its Data + AI Summit, the data and AI specialist said it will contribute all features and enhancements it has made to Delta Lake to the Linux Foundation, as well as make all Delta Lake APIs open source as part of its Delta Lake 2.0 release.

That means the open source community will benefit from the full functionality and enhanced performance of the Delta Lake 2.0 ecosystem, enabling the building of high-performance data lakehouses on open standards. The Delta Lake 2.0 Release Candidate is now available, with a full release expected later this year.

The firm also announced the next iteration of open source machine learning project MLflow 2.0, which introduces MLflow pipelines to the platform. The addition aims to substantially decrease time to production and improve execution at scale through standardisation.

MLflow Pipelines offers data scientists pre-defined, production-ready templates based on the model type they’re building, allowing them to bootstrap and accelerate model development without requiring intervention from production engineers.


What is contextual analytics?

Creating more customer value in HR software applications


Additionally, Databricks revealed its new Spark Connect, which will enable the use of the unified data analytics engine Spark on virtually any device, as well as Project Lightspeed, a next-gen Spark Structured Streaming engine for data streaming on the lakehouse platform.

“From the beginning, Databricks has been committed to open standards and the open source community,” commented Ali Ghodsi, Co-Founder and CEO of Databricks. “We have created, contributed to, fostered the growth of, and donated some of the most impactful innovations in modern open source technology.

“Open data lakehouses are quickly becoming the standard for how the most innovative companies handle their data and AI. Delta Lake, MLflow and Spark are all core to this architectural transformation, and we’re proud to do our part in accelerating their innovation and adoption.”

New Lakehouse innovations

Databricks also unveiled several innovations for its Lakehouse Platform at the Data + AI Summit. New capabilities include data warehousing performance and functionality, expanded data governance, and new data sharing innovations which include Databricks Marketplace and Data Cleanrooms for secure data collaboration.

There’s also automatic cost optimization for ETL operations, as well as machine learning lifecycle improvements to “radically simplify” MLOps at production scale.

“Today’s announcements are a significant step forward in advancing our Lakehouse vision, as we are making it faster and easier than ever to maximise the value of data, both within and across companies,” Ghodsi said.

Daniel Todd

Dan is a freelance writer and regular contributor to ChannelPro, covering the latest news stories across the IT, technology, and channel landscapes. Topics regularly cover cloud technologies, cyber security, software and operating system guides, and the latest mergers and acquisitions.

A journalism graduate from Leeds Beckett University, he combines a passion for the written word with a keen interest in the latest technology and its influence in an increasingly connected world.

He started writing for ChannelPro back in 2016, focusing on a mixture of news and technology guides, before becoming a regular contributor to ITPro. Elsewhere, he has previously written news and features across a range of other topics, including sport, music, and general news.