Microsoft is building a new AI model to rival some of the biggest

(Image credit: Getty Images)

published 9 May 2024

Microsoft has confirmed it is working on MAI-1, a new large language model (LLM) that could be big enough to rival the largest built models currently available, including Google Gemini and GPT-4.

According to a report by The Information, the new 500 billion parameter model is called MAI-1 and is being overseen by Microsoft AI CEO Mustafa Suleyman.

Suleyman, who was only recently hired by Microsoft to lead its consumer AI development division, was co-founder of AI company Inflection, and one of the founders of UK AI pioneer DeepMind.

The move marks a step change for Microsoft, which until now has relied largely on models developed by OpenAI to fuel its charge in the generative AI race against major competitors such as Google and AWS.

How does MAI-1 compare to its rivals?

If MAI-1 is being built with 500 billion parameters, that would make it one of the largest models currently known.

For example, OpenAI’s GPT-4 is thought to have around 1 trillion parameters; Grok, from Elon Musk’s xAI has 314 billion parameters.

Other big AI players such as Google and Anthropic have kept the number of parameters in their LLMs under wraps.

It’s not entirely clear why Microsoft would need to build another LLM because it already has made a major – $10 billion – investment in OpenAI, whose ChatGPT models have dominated the generative AI landscape up to now.

Why is Microsoft building MAI-1?

Microsoft CTO Kevin Scott sought to downplay the story in a posting on LinkedIn, while also apparently confirming that the company does have a model called MAI.

“Just to summarize the obvious: we build big supercomputers to train AI models; our partner Open AI uses these supercomputers to train frontier-defining models; and then we both make these models available in products and services so that lots of people can benefit from them. We rather like this arrangement,” he said.

Scott said that each supercomputer built for OpenAI is a lot bigger than the one that preceded it, and each frontier model they train is a lot more powerful than its predecessors.

“We will continue to be on this path--building increasingly powerful supercomputer for Open AI to train the models that will set pace for the whole field - well into the future. There's no end in sight to the increasing impact that our work together will have,” he said.

RELATED WHITEPAPER

Supply chain clients: Embrace innovation and realize value

Scott said that Microsoft has also built its own AI models “for years and years and years,” and said AI models are used in almost every one of the products, services, and operating processes at Microsoft.

“The teams making and operating things on occasion need to do their own custom work, whether that's training a model from scratch, or fine tuning a model that someone else has built. There will be more of this in the future too. Some of these models have names like Turing, and MAI. Some, like Phi for instance, we even open source,” he said.

Are LLMs the only game in town?

Not all AI models have to have a gigantic parameter count. Microsoft recently introduced Phi-3, a small language model which it said is capable of outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks, and could be a more practical choice for customers looking to build generative AI applications.

Building an LLM with a vast number of parameters is only part of the story; AI companies also racing to secure the best sources of data to train their generative AI tools.

Just this week OpenAI announced a deal with Stack Overflow to use the millions of questions and answers posted by developers on the knowledge site to enhance the responses from ChatGPT.

Steve Ranger is an award-winning reporter and editor who writes about technology and business. Previously he was the editorial director at ZDNET and the editor of silicon.com.