How Google will use Gemini and multimodal AI to “flex its AI native muscles” in 2024 – and why it could seize Microsoft’s market lead

A telephoto shot of Sundar Pichai, clapping his hands at an event. In the background the Google logo is repeated on a wall, out of focus.
(Image credit: Getty Images)

Google, like industry counterparts, has been playing catch-up to Microsoft in the generative AI race over the last year after being blindsided by the launch of ChatGPT in late 2022. But that’s all about to change, analysts believe. 

Last week Google initiated a Gemini rebrand, bringing its portfolio of generative AI tools such as its chatbot Bard and productivity assistant Duet AI under the Gemini banner. 

The company unveiled Gemini, its large language model (LLM) range in late 2023, which it has put forward as a leading challenger to the dominance of OpenAI’s GPT-4.  But Chirag Dekate, VP analyst at Gartner tells ITPro this was more than just a marketing ploy. The move represents a step change in how Google plans to compete with Microsoft and others in the generative AI space by specifically highlighting the advantages of Gemini’s multimodality. 

Multimodal AI allows users to input data across as text, images, or audio, to inform outputs in any or all of these modalities.  Providing a multimodal system with both text and images, for example, can then create outputs in either a text or image-based format. 

At present, Dekate says Google Gemini is the only “native multimodal model” currently available on the market; and it’s this that will be a key differentiator for the firm in the year ahead. Google is leaning into the fact that Gemini is a wholly unique offering for enterprises and consumers alike. Microsoft and Amazon have arrived late to the party on this front and are combining a range of unimodal models, which has significant downsides. 

“So Gemini, being the only native multimodal model that learns from text, video, audio, images, and code, and converts it into multimodal outputs, is quite unique,” says Dekate. “So with alternative approaches, where you’re basically slathering on a bunch of uni-modal models and trying to emulate multimodal capability, you have problems with this. 

“You’re engaging in additive cost structures and the features set will likely be substandard, or at least the performance levels will be relatively poorer than having a native multimodal experience.”

Google's long journey to AI supremacy

Differentiating itself with its AI approach has been a long-running focus for Google, Dekate says, and one that stretches back nearly a decade. 

In 2017, CEO Sundar Pichai told delegates at the I/O developer conference that Google is an “AI-first company”. What this meant, Dekate explains, is that Google has been building an “AI-native infrastructure” that’s ripe for exploitation - and generative AI has been the perfect situation in which to capitalize on this.

Beyond this, the company has been drawing upon the expertise of Google DeepMind and Google Research to drive innovation at both a product and infrastructural level. This represents a vital “innovation engine” for Google, Dekate says, and one that separates it from competitors in the space. 

“With the Gemini branding, what it's trying to articulate on the market is Google has something that many of their peers lack,” he says. “That’s the innovation engine, if you will, of Google DeepMind and Google Research, that delivers leading-edge capabilities and products and enables enterprises to harness this innovation faster. 

RELATED WHITEPAPER

“If you compare its cloud peers, you’ll notice that they don't have the kind of innovation engine that Google has. Even some of the startups might be grabbing headlines but from a capability standpoint, now you see Google flexing its AI native muscles.”

Similarly, Google can not only innovate like its peers but also has a number of delivery mechanisms in Google Cloud, Workspace, Android, Search, and Chrome that will enable it to roll out generative AI products to consumers and enterprises alike. 

Microsoft, he notes, stands on an almost equal footing on this front as it’s woven generative AI features throughout all of its core product offerings. But the company’s great advantage so far in the generative AI race – OpenAI – could become a long-term hindrance in a battle with Google. 

“If you look at competitors in OpenAI, Microsoft, Amazon, and others, they have similar capabilities, but episodically, not necessarily so holistically.”

OpenAI might not be the golden goose Microsoft thinks

The reason behind Microsoft’s resounding success so far in the generative AI space has been OpenAI; of that, there is no doubt, Dekate says. 

Through its partnership with OpenAI, led by the Sam Altman firm, Microsoft has been able to allocate resources highly efficiently and draw upon the expertise and innovation of the startup. But while that’s been an advantage thus far, it could create issues further down the line. 

“The reason Microsoft was as efficient as it is, and it continues to be hyper-efficient, is that it is largely relying on OpenAI to deliver exclusive innovation through a Microsoft delivery vehicle,” he says. 

“So it has to rely on OpenAI for innovation in some sense. Its model innovation vector comes from OpenAI and its infrastructure innovation vector largely comes from Nvidia

“What this does is frees up Microsoft to focus much more on the go-to-market experiences because it didn't have to innovate at the model level. It does not have to innovate too deeply at the infrastructure level.”

What this means though, is that Microsoft has “basically integrated innovation from elsewhere”. OpenAI drives the model innovation, and Microsoft drives the execution strategy. It’s been a highly lucrative strategy so far, but with Google’s Gemini focus and the ability to offer a truly unique offering from competitors in the space, Dekate expects Google to begin gaining ground and even taking a market lead in the year ahead. 

With Gemini, Dekate says Google is demonstrating its industry-leading position on multimodal AI. 

“It’s not just that, they're showcasing what a truly AI-first enterprise looks like. Rather than playing catch up and trying to compete with Microsoft, they are now infusing every one of their products with the best model they have on hand. So in some sense, from a model innovation standpoint, others are now playing catch up with Google.”

Microsoft might struggle with how to compete with native multimodality, according to Dekate, because it’s reliant on OpenAI to innovate on this front. The tech giant is beholden to what OpenAI produces, while Google is working in-house.

Can Google truly capitalize on Gemini?

While pointing out Google’s advantageous position, Dekate admits that the “jury is out” on whether Google will successfully translate generative AI product innovation into impactful revenue gains. 

He says this is because  in the immediate future, Google’s AI success will rest on its ability to be “hyper-efficient in its enterprise execution, to get these technologies in the hands of the enterprise”. 

“It has to outcompete Microsoft in an enterprise-oriented motion, which is really hard to do,” he says. “Microsoft is executing to perfection, and Google needs to out-innovate and out-compete them.” If Google executes this strategy as it has previously, where it believes “the best products sell themselves”, it could stumble or fail altogether, Dekate adds. “You can’t have these amazing innovations and ask enterprises to just go figure it out, that’s unlikely to work,” he says. 

“Google will need to engage in extensive marketing efforts, and more importantly, go-to-market efforts by partnering with enterprises to enable them to create solutions out of this.”

Nonetheless, Dekate believes that, for the first time so far since the advent of generative AI, Google is firmly “in the driver’s seat”.

“Google is setting the pace, if you will, in certain portions of the generative AI market.”

Ross Kelly
News and Analysis Editor

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.

He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.

For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.