What Databricks’ AI acquisition spree tells us about the future of the data industry

Databricks CEO Ali Ghodsi
(Image credit: Databricks)

Databricks’ deal to acquire MosiacML, a large language model (LLM) startup, for $1.3 billion completes a series of major moves in recent weeks, following the Okera and Rubicon acquisitions in May and June respectively. 

There’s no doubt the artificial intelligence (AI) sector is white hot, which we can glean from MosaicML’s staggering price tag – six times more than its latest investor-round valuation of $222 million.

This is a significant acquisition in isolation but the three deals seen collectively show Databricks is betting big on its own vision of generative AI. This is a vision for industry-specific and custom-built generative AI tools, instead of mass-market-facing variants that the likes of OpenAI’s GPT-4 and DALL·E 2 embody.

Such a move is not unexpected, given the company’s aggressive expansion into the AI space, including the launch of Dolly – an open source LLM – in March. Its move to acquire MosaicML aligns with a broader strategy to ‘democratize enterprise AI’, with the OpenAI competitor soon offering customers the tools to build their own low-cost LLMs using proprietary data.

With such a move, Databricks is betting big on enterprises becoming jaded with the current class of generative AI systems, in light of issues such as IP usage and hallucinations, and instead turning to tools that let them create their own.

Why did Databricks acquire MosaicML?

MosaicML – co-founded three years ago by two former Intel execs Naveen Rao and Hanlin Tang, alongside academics Michael Carbin and Jonathan Frankle – specializes in training neural networks quickly and in a cost-effective way.

The company offers services that allow customers to train and deploy their own custom LLMs and generative AI systems based on data they’ve collected first-hand. The firm’s purpose-built, full-stack managed platform handles systems and hardware complexities, and customers lacking the machine learning resources and expertise tap into the technology.

RELATED RESOURCE

Whitepaper cover with title on blue shaded background and image of a male colleague sat smiling at a desk on right side

(Image credit: AWS)

Six steps to machine learning success

The path toward leveraging the full power of machine learning

DOWNLOAD FOR FREE

“In the last six or seven months, every large enterprise that we've talked to – every meeting I have had with a customer – always ends up going towards generative AI,” says Databricks CEO Ali Ghodsi. “And the number-one thing everyone's saying is, we want to own our own intellectual property (IP). We want to build our own machine learning models, and we want to be able to compete in the market that in.”

The trouble with many of the leading models, such as ChatGPT or GPT-4, is that enterprises don’t own the IP and they aren’t trained on specific and relevant data. This is where a company like MosaicML can step in.

“We asked them: ‘Who out there do you think is doing this well?’ And the name that we kept hearing was MosiacML,” Ghodsi adds. “It became pretty clear to us that we started talking about doing a close partnership, and things intensified, and we realized that if we wanted to make it seamless; help customers build their own IP; keep their data private; and push down the cost – and make it really cost-effective to do this – the best way was to join forces.”

Why is Databricks betting so big on its AI democratization vision?

“Databricks moved swiftly to offer an open source LLM, Dolly, at a time when the overarching narrative was that only specialists such as OpenAI could offer useful LLMs,” says Forrester VP and principal analyst Mike Gualtieri.

“That narrative has been broken, in part by Databricks and by HuggingFace as more people learn about both. With their announced acquisition of MosiacML, is Databricks a data lake company with AI or an AI company with a data lake?”

Databricks CEO Ali Ghodsi

(Image credit: ITPro/Keumars Afifi-Sabet)

The move is a clear play by Databricks to position itself as a viable proprietor in the AI space for enterprises with niche industry-specific needs that want to avoid hysteria associated with the most popular tools. Systems trained on widely available online data, such as GPT-4 are incredibly impressive, but can’t satisfy every business demand. Organizations, too, may seek to own unique generative AI tools to give themselves a competitive edge.

“This acquisition will propel Databricks as a leading "AI operating system" contender,” Gualtieri continues. “However, to attract more enterprise customers, the firm must build more productivity tooling around their AI technology.

“According to Forrester, every enterprise needs its own AI. That means, that in addition to using foundational models from OpenAI, Cohere, AWS, and others, these enterprises must also build some differentiated models from scratch. Open source LLM technology will dominate bespoke model creation but technology vendors will compete on the tooling to make using open source faster.”

How do the Okera and Rubicon acquisitions fit into everything?

MosaicML follows agreements to acquire Okera, the AI-centric regulatory and compliance firm, and Rubicon, which develops AI storage and serving systems, completing a trio of AI-related acquisitions in recent weeks.

Ghodsi tells ITPro Okera is important for Unity Catalog – “the most important strategic bet for us internally” – because integrating the firm’s technology can let businesses set privacy and security policies across the breadth of apps and services. “Okera can help you set security policies here, that you can then push to other systems,” he adds. “So you can set up the policy here but push it so it’s implemented in, for instance, Snowflake.”

Rubicon, meanwhile, will augment Delta Lake, the firm’s data lake platform, by providing the technology to house the data that enterprises can then train machine learning models with. This includes data like images, audio, and video files. “It’s a team that have built AI storage for many, many years,” Ghodsi continues. “Akhil [Gupta] built it for Google a long time ago, then the scalable storage at Dropbox, then he started Rubicon.”

What next for Databricks?

“We’re looking at everything,” Ghodsi tells ITPro, when asked about future acquisition plans. “We keep an open mind. So we’re interested in anything in the data space.”

His answer suggests Databricks isn’t done with its acquisition plans – as it looks to make further inroads in AI and reinforce its products. But that doesn’t mean it’ll acquire every promising company it encounters, as Ghodsi adds: “We’re a picky company”.

“They [at MosaicML] have a similar DNA as Databricks, I would say. We have a very high bar for hiring. We’re very, very, very selective in our hiring processes. We always have been; in fact, it’s a cultural principle at Databricks – the culture is called ‘raise the bar in hiring, don’t settle’.

“So it makes us incompatible with a lot of companies that we otherwise would acquire. So, most companies we’ll look to acquire, we ended up not doing it because we’re not aligned on the cultural principle of our technical bar and the kind of people that we want to hire.”

Keumars Afifi-Sabet
Contributor

Keumars Afifi-Sabet is a writer and editor that specialises in public sector, cyber security, and cloud computing. He first joined ITPro as a staff writer in April 2018 and eventually became its Features Editor. Although a regular contributor to other tech sites in the past, these days you will find Keumars on LiveScience, where he runs its Technology section.