Google blows away competition with powerful new Gemini 3 model
Gemini 3 is the hyperscaler’s most powerful model yet and state of the art on almost every AI benchmark going
Google has officially unveiled Gemini 3 Pro, its new state of the art LLM with record-breaking scores across almost every AI benchmark.
The new model is intended to improve every Google service that uses Gemini, including its dedicated app, coding tools, and AI in search.
Google stated that Gemini 3 Pro is much better at handling requests in their intended context and providing useful answers that don’t resort to flattery.
The model debuted in the number one spot across text, WebDev, and vision in LMArena, with Google having claimed Gemini 3 Pro is the “best model in the world for complex multimodal understanding”.
Across multimodal reasoning benchmarks, Gemini 3 Pro was found to consistently outperform competition such as GPT-5.1 and Claude Sonnet 4.5.
In MMMU-Pro, for example, the model scored 81% versus GPT-5.1’s 76% and Claude Sonnet 4.5’s 68%.
ARC-AGI-2 is a rigorous benchmark for testing the capability of AI model reasoning across a series of abstract visual puzzles. Easy for humans to complete but difficult for today’s LLMs, it’s considered a true challenge for frontier models.
Sign up today and you will receive a free copy of our Future Focus 2026 report - the leading resource for IT decision-maker insight on priorities and investment areas in AI, security and more.
Gemini 3 Pro scored 31.1% in tests, far in excess of GPT 5.1’s 17.6% and Claude Sonnet 4.5’s 13.6%.
Gemini 3 is a game changer for Google shops
In other areas, the performance gap between Google’s model and the competition is even more stark.
Gemini 3 Pro scored a new record score of 23.4% at MathArena Apex, a benchmark that tests LLMs on their ability to solve mathematical problems upon which they weren’t trained, compared to just 1% by GPT-5.1 and 1.6% by Claude Sonnet 4.5.
Although the model doesn’t beat the latest version of Claude at the most common coding benchmarks, it greatly improves upon Gemini 2.5 Pro.
Google has also emphasized how its reasoning and visual understanding helps Gemini 3 Pro to do more with the coding capabilities it has, for quicker overall resolution of common developer tasks.
In a demo, Google showed how Gemini 3 Pro could turn an image of a chessboard into an interactive game, as well as a back-of-the-napkin sketch of a website into a functioning page.
Gemini 3 Deep Think, the multi-step reasoning variant of the new model, improves upon its benchmarks at the cost of far slower responses.
In Humanity’s Last Exam, an intense benchmark designed to press LLMs with 2,500 difficult questions across a wide range of subject areas, Gemini 3 Deep Think scores 41%, compared to Gemini 3 Pro’s 37.5% and GPT-5 Pro’s 30.7%.
Gemini 3 Deep Think is still undergoing safety tests but will become available to Google AI Ultra subscribers within the next few weeks.
Gemini 3 Pro was trained using Google’s tensor processing units (TPUs), dedicated hardware for AI training and inference that Google cites as key to its efforts to develop AI sustainably.
Google Antigravity
Alongside the much-anticipated launch of Gemini 3 today, Google also revealed Google Antigravity.
The product is described as a new ‘agent-first’ IDE, with a focus on elevating developers into a manager of AI agents.
Using Gemini 3’s agentic coding features, as well as its reasoning and tool use, Antigravity is intended to allow developers to set agents multi-step, complex coding tasks and receive evidence proving the job has been completed to a high standard.
In a demo, Google showed how a developer could use the tool to build a flight lookup web app, returning flight data based on a flight number provided by the user.
The tool is then capable of creating an implementation plan, writing the code, and then testing the app by opening the Chrome browser. Finally, it provides the user with screenshots of its tests, which the user can critique in order to immediately update the UI of the finalized app.
These features are powered by a combination of Gemini 3, Gemini 2.5 Image (better known as Nano Banana), and Google 2.5 Computer Use.
In another demo, Google showed how enterprise developers can delegate to multiple ‘background agents’ within Antigravity, without needing to stay in the same chat window while tasks are being completed.
Google added that Antigravity is capable of supporting Claude 4.5 Sonnet in addition to its own models.
Antigravity is now in public preview for free, with support for other models including Claude Sonnet 4.5 and OpenAI’s GPT-OSS.
Enterprises with the Vertex AI and Gemini Enterprise subscriptions have access to Gemini 3 from today and the model is also available in the Gemini app, the Gemini API within AI Studio, Antigravity, and Gemini CLI.
The model will be priced at $2 per million input tokens and $12 per million output tokens, for prompts under 200,000 tokens.
Prompts over this limit will cost $4 per million input tokens and $18 per million output tokens.
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.
-
Cyber experts issue alert after two ransomware groups team up on ‘unprecedented’ threat campaignNews The tie-up includes a new model of industrialized ransomware deployment that significantly lowers the barrier to entry for cyber crime
-
Agentic AI 'breaks the traditional SaaS seat licensing model'News Incumbent software vendors will need to work harder than ever to compete with agile, AI-focused disruptors
-
‘What we’re seeing right now is just rapid escalation in AI token spend’: Accenture tells staff to stop using AI for unnecessary tasks amid surging costsNews Accenture has told some staff to roll back the use of AI for basic tasks while the consultancy grapples with surging AI token costs.
-
‘The claims in the suit are false’: Workday hits back amid lawsuit claiming AI recruitment discriminationNews Is AI hiring discriminatory? A California judge has given the go-ahead for a class action suit against Workday
-
IT leaders are being stung by "unexpected" AI costsNews The growing costs associated with AI are hitting organizations large and small
-
'Botsitting' is destroying productivity as workers spend nearly a full day each week making AI 'usable'News While workers are reporting productivity improvements, ‘botsitting’ means these are often negated
-
'Most enterprises are still unprepared to operationalize it': IT leaders are bullish on agents, but keeping falling at the final hurdle – here's whyNews Forrester points to challenges scaling agentic AI, saying companies start rolling out the tech before they're ready to scale
-
‘Chat is dead’: OpenAI plots ChatGPT ‘super app’ overhaul ahead of public listing – with agents and coding tools the new focusNews The company looks set to spruce up ChatGPT with a particular focus on agents to drive subscriptions
-
Uber’s eye-watering AI bill shows enterprises are ‘still measuring AI success through consumption rather than outcomes’ – and it's warping our perception of ROI and productivityNews ‘Tokenmaxxing’ might pad the stats, but it’s a trend that could come back to haunt enterprises
-
Destination AI: Una partnership affidabile per superare gli ostacoli e gettare le basi per la crescita futuraSponsored Con l'accelerazione dell'adozione dell''AI aziendale, i partner IT devono spostare la loro attenzione dall'hype tecnologico ai risultati aziendali tangibili, sfruttando ecosistemi strutturati per promuovere la monetizzazione a lungo termine