Google blows away competition with powerful new Gemini 3 model

(Image credit: Getty Images)

Google has officially unveiled Gemini 3 Pro, its new state of the art LLM with record-breaking scores across almost every AI benchmark.

The new model is intended to improve every Google service that uses Gemini, including its dedicated app, coding tools, and AI in search.

Google stated that Gemini 3 Pro is much better at handling requests in their intended context and providing useful answers that don’t resort to flattery.

The model debuted in the number one spot across text, WebDev, and vision in LMArena, with Google having claimed Gemini 3 Pro is the “best model in the world for complex multimodal understanding”.

Across multimodal reasoning benchmarks, Gemini 3 Pro was found to consistently outperform competition such as GPT-5.1 and Claude Sonnet 4.5.

In MMMU-Pro, for example, the model scored 81% versus GPT-5.1’s 76% and Claude Sonnet 4.5’s 68%.

ARC-AGI-2 is a rigorous benchmark for testing the capability of AI model reasoning across a series of abstract visual puzzles. Easy for humans to complete but difficult for today’s LLMs, it’s considered a true challenge for frontier models.

Gemini 3 Pro scored 31.1% in tests, far in excess of GPT 5.1’s 17.6% and Claude Sonnet 4.5’s 13.6%.

Gemini 3 is a game changer for Google shops

In other areas, the performance gap between Google’s model and the competition is even more stark.

Gemini 3 Pro scored a new record score of 23.4% at MathArena Apex, a benchmark that tests LLMs on their ability to solve mathematical problems upon which they weren’t trained, compared to just 1% by GPT-5.1 and 1.6% by Claude Sonnet 4.5.

Although the model doesn’t beat the latest version of Claude at the most common coding benchmarks, it greatly improves upon Gemini 2.5 Pro.

Google has also emphasized how its reasoning and visual understanding helps Gemini 3 Pro to do more with the coding capabilities it has, for quicker overall resolution of common developer tasks.

In a demo, Google showed how Gemini 3 Pro could turn an image of a chessboard into an interactive game, as well as a back-of-the-napkin sketch of a website into a functioning page.

Gemini 3 Deep Think, the multi-step reasoning variant of the new model, improves upon its benchmarks at the cost of far slower responses.

In Humanity’s Last Exam, an intense benchmark designed to press LLMs with 2,500 difficult questions across a wide range of subject areas, Gemini 3 Deep Think scores 41%, compared to Gemini 3 Pro’s 37.5% and GPT-5 Pro’s 30.7%.

Gemini 3 Deep Think is still undergoing safety tests but will become available to Google AI Ultra subscribers within the next few weeks.

Gemini 3 Pro was trained using Google’s tensor processing units (TPUs), dedicated hardware for AI training and inference that Google cites as key to its efforts to develop AI sustainably.

Google Antigravity

Alongside the much-anticipated launch of Gemini 3 today, Google also revealed Google Antigravity.

The product is described as a new ‘agent-first’ IDE, with a focus on elevating developers into a manager of AI agents.

Using Gemini 3’s agentic coding features, as well as its reasoning and tool use, Antigravity is intended to allow developers to set agents multi-step, complex coding tasks and receive evidence proving the job has been completed to a high standard.

In a demo, Google showed how a developer could use the tool to build a flight lookup web app, returning flight data based on a flight number provided by the user.

The tool is then capable of creating an implementation plan, writing the code, and then testing the app by opening the Chrome browser. Finally, it provides the user with screenshots of its tests, which the user can critique in order to immediately update the UI of the finalized app.

These features are powered by a combination of Gemini 3, Gemini 2.5 Image (better known as Nano Banana), and Google 2.5 Computer Use.

In another demo, Google showed how enterprise developers can delegate to multiple ‘background agents’ within Antigravity, without needing to stay in the same chat window while tasks are being completed.

Google added that Antigravity is capable of supporting Claude 4.5 Sonnet in addition to its own models.

Antigravity is now in public preview for free, with support for other models including Claude Sonnet 4.5 and OpenAI’s GPT-OSS.

Enterprises with the Vertex AI and Gemini Enterprise subscriptions have access to Gemini 3 from today and the model is also available in the Gemini app, the Gemini API within AI Studio, Antigravity, and Gemini CLI.

The model will be priced at $2 per million input tokens and $12 per million output tokens, for prompts under 200,000 tokens.

Prompts over this limit will cost $4 per million input tokens and $18 per million output tokens.

Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.

MORE FROM ITPRO

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.

In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.