Google Gemini shows tech giant is still in the generative AI race as model outperforms GPT-4
Google Gemini Ultra outperforms humans on benchmarks like MMLU
The launch of Google Gemini, the tech giant’s latest AI model, represents the first true competitor to OpenAI’s GPT-4 model and could herald a future battle between the two firms, analysts have said.
Google unveiled Gemini on 6 December, hailing the powerful new AI model’s “sophisticated multimodal reasoning capabilities” as a potential game changing moment in the generative AI race.
Daryl Plummer, distinguished VP analyst & Gartner Fellow, told ITPro the introduction of Google Gemini could shift attention away from OpenAI and has set a “high bar” for competitors in the space.
“2023 has seen Google go from being ‘counted out’ after the introduction of ChatGPT to leapfrogging innovations on models with the introduction of Gemini,” he said.
“Large language models and foundation models are at the center of GenAI excitement, and customers keep asking which models will be the most beneficial to them.
“While Google has many models, much of the industry attention has been on GPT variants. Google needed to set the bar high for how these models will evolve.”
Google Gemini: Everything you need to know
Google confirmed the Gemini model will be integrated with Bard, the firm’s flagship chatbot.
Get the ITPro. daily newsletter
Receive our latest news, industry updates, featured resources and more. Sign up today to receive our FREE report on AI cyber crime & security - newly updated for 2024.
This will provide users with advanced capabilities, including heightened reasoning and natural language abilities. Gemini will be available in three sizes – Ultra, Pro, and Nano.
Users will also be able to run the model across a range of areas; Bard will be powered by Gemini Pro, while mobile devices users will gain new features through Gemini’s Nano range.
Gemini Ultra, the most powerful of the three available classes, will be rolled out next year, Google confirmed.
Critically, Gemini Ultra outperformed OpenAI’s GPT-4 model across the majority of benchmarks.
In an announcement, Demis Hassabis, CEO and co-founder of Google DeepMind, described Gemini as the result of “rigorous testing”, adding that the model will supercharge performance on a “wide variety of tasks” for users.
“From natural image, audio, and video understanding, to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of 32 widely-used academic benchmarks used in large language model research and development,” he wrote.
Hassabis added that, with a score of 90%, Gemini Ultra is the “first language model to outperform human experts” on massive multitask language understanding (MMLU).
All told, this means Gemini is capable of considering difficult questions more carefully before providing an answer and delivers significant improvements for users compared to other industry models.
Plummer said Gemini could herald a step change in the use of large language models due to its performance, enabling more intuitive, nuanced capabilities to assist developers in coding activities, for example.
“Sophisticated reasoning in Gemini allows the model to help pull relevant and correlated information from multiple complex documents and data,” he said.
“And Gemini is trained to support more varied and nuanced coding to continue the assistive support to developers. It can solve nearly twice as many types of coding problems as previous versions.”
Chirag Dekate, VP analyst at Gartner, echoed Plummer’s comments, adding that the model “sets the new benchmark in a fast-evolving, game-changing generative AI landscape”.
Google Gemini may force OpenAI’s hand
Plummer said the launch of Gemini places OpenAI and Microsoft in a precarious position and may force the duo to respond in a rapid manner. Furthermore, Gemini showcases the ability of Google to respond to ongoing developments in the generative AI space.
While much focus has been placed on OpenAI over the last year, Google has been quietly innovative out of the limelight; questions now remain on whether users will see tangible business value in the model, however.
“Gemini now must be responded to by OpenAI and Microsoft,” he said. “Google is seriously in this game even though they don’t have as many people claiming they lead it yet. Gemini is leapfrogging where others have been and the question will be – do customers see the value?”
“The question of whether there are diminishing returns on large model sizes has not yet been answered. Google has not released data on the quantity of parameters on which Gemini is trained. However, the use of general language models will continue to rise as their facility and usefulness grows.
“Gemini represents a new bar in that effort. Google must make a stronger connection to enterprise problems but, for now, no one should be taking Google’s AI efforts for granted anymore.”
How does Gemini compare to other models?
Google has claimed that Gemini Ultra is among the most sophisticated AI models ever built. To back this up, it has released benchmark results that show it edging out GPT-4 across a range of different academic benchmarks for AI.
Gemini Ultra scored 90% in the Massive Multitask Language Understanding (MMLU) benchmark, a text reasoning test in which AI models must complete 14,000 multiple choice questions that cover information outside the scope of training data. This puts it ahead of GPT-4’s score of 86.4% and the 78.3% achieved by Google’s other high-profile LLM PaLM 2.
Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
The creators of MMLU estimated that human experts would score 89.8%, using a combination of guesswork and statistical results from high percentile exam results. Google has therefore claimed that it has achieved a world-first: an AI model that can outperform human experts in certain conditions.
Google has claimed that Gemini Ultra outperforms GPT-4 in 30 of 32 widely-used benchmarks for AI models. The results show that in most cases, the competition is close – for example with MATH, a benchmark that tests models on difficult math problems such as geometry, Gemini Ultra’s achieved 53.2% to GPT-4’s 52.9%.
But Google has also stressed how far its model can exceed competitors when it comes to multimodal benchmarking such as Infographic VQA, which pits models against infographics and data visualizations to test their abilities to unpick information from images and derive reason from graphical layouts.
Gemini Ultra’s unique strength lies in its ability to process images at the same time as text and other inputs such as audio, without the need to run them through object character recognition (OCR) models or run natural language processing (NLP) on transcripts produced through a separate speech-to-text model. This allows it to work more accurately and efficiently.
Get an informed overview of what to consider when executing GenA
DOWNLOAD NOW
When it releases in 2024, Gemini Ultra’s results will be put to the test by independent testers and enterprises will be able to make a more informed decision on the application of the model within their environment.
Until then, Gemini Pro is already powering Google Bard. This lighter iteration of Gemini is less powerful but in testing still largely outperformed GPT-3.5, which powers the free tier of ChatGPT, as well as Meta’s Llama 2. Gemini Pro scored 79.1% on MMLU, ahead of GPT-3.5’s 70% as well as the 78.5% achieved by Anthropic’s Claude 2.
As it stands, Google has fired a powerful shot at OpenAI with this new flagship model, setting a new bar for AI performance. With meaningful competition to GPT-4 having been realized, competition in the space will only intensify.
Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.
He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.
For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.
- Rory BathgateFeatures and Multimedia Editor