Microsoft is doubling down on multilingual large language models – and Europe stands to benefit the most
The tech giant wants to ramp up development of LLMs for a range of European languages


Microsoft has announced plans to expand the development and adoption of multilingual LLMs as part of a new partnership drive across Europe.
Europe’s 24 official languages and 250 indigenous languages are currently underrepresented in web content, on which the large language models (LLMs) the industry uses were trained.
The result is that LLMs are currently unable to process Swedish or Romanian at the same standard as English. To bridge this language gap, Microsoft will make multilingual data from GitHub accessible to the European community in collaboration with Hugging Face.
On 1st September, Microsoft will also issue a call for applications for grants to build content out in 10 underrepresented European languages.
“We have learnt that, basically, one needs to record several hundred hours of people, speaking a particular language in order to support the multi-modal capability of AI,” explained Brad Smith, vice chair and president at Microsoft.
“So for example, to be able to handle text to speech and speech to text, and we can do that by employing people to go record more audio in more languages.”
Microsoft will also create new jobs at its innovation centers in Strasbourg, the Microsoft Open Innovation Center (MOIC) and AI for Good Lab, partner with the ICube Laboratory at the University of Strasbourg which is already working on this problem, and fund two post-doctoral researchers.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
This will involve digitizing existing content such as books, as well as creating audio content in the languages to improve multimodal training data. To back these efforts, the company said it will provide groups with Azure cloud credits, grants, and engineering support.
Microsoft has stressed that this data will be in the public domain and will be made freely available to European citizens.
“It's important to underscore that all of this work is designed to donate more data so that others can use it,” Smith told assembled media.
“Our goal is to make it available to the European public and to open source developers. And at the same time, if there are particular partners that submit a proposal that work within a certain approach in terms of terms and the like, we want to be open to honoring their terms," he added.
“But across the board, I want to be clear Microsoft is not going to have a proprietary interest in any of this new content that is made available.”
English dominates AI training
Microsoft research has found that 46% of web content used to train large language models (LLMs) is English.
“When you crawl the whole open web, what you see is predominantly the number one language on the web is English,” explained Juan Lavista, adding that German, Spanish, and French come second but still make up less than 6% of the total.
This is massively disproportionate, with the 379.7 million native English speakers worldwide outnumbered by the 485.1 million native Spanish speakers, for example.
Lavista added that this has been a problem since the beginning of the internet, as it was established in English and didn’t universally support special characters such as those required in French until 2003.
In a presentation, he showed how Meta’s Llama 3.1 drops 10 points in performance benchmarks when used in Swedish compared to English. In Latvian or Estonian, the gap is even more stark, with the model 25 points down .
Microsoft and European governments have identified these limitations as a clear barrier to unlocking productivity through AI in the coming years.
The MOIC and AI for Good Lab will also publish an open blueprint for training LLMs and creating regional language datasets, targeting organizations such as the Basque Center for Language Technology, Barcelona Supercomputing Center, and University of Santiago de Compostela, which are working on Azure-based AI models in Basque, Catalan, and Galician.
In addition to supporting the new languages via improved datasets and hands-on support, Microsoft announced new collaborations with IE University School of Science & Technology in Madrid and the University of Strasbourg, to support other ongoing research projects.
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
MORE FROM ITPRO

Rory Bathgate is Features and Multimedia Editor at ITPro, overseeing all in-depth content and case studies. He can also be found co-hosting the ITPro Podcast with Jane McCallion, swapping a keyboard for a microphone to discuss the latest learnings with thought leaders from across the tech sector.
In his free time, Rory enjoys photography, video editing, and good science fiction. After graduating from the University of Kent with a BA in English and American Literature, Rory undertook an MA in Eighteenth-Century Studies at King’s College London. He joined ITPro in 2022 as a graduate, following four years in student journalism. You can contact Rory at rory.bathgate@futurenet.com or on LinkedIn.
-
Meta isn’t playing ball with the EU on the AI Act
News Europe is 'heading down the wrong path on AI', according to Meta, with the company accusing the EU of overreach
-
IT leaders are facing major work device blind spots – and it's putting security at risk
News The use of unauthorized devices is putting enterprises at huge risk
-
Everything you need to know about OpenAI’s new agent for ChatGPT – including how to access it and what it can do
News ChatGPT agent will bridge "research and action" – but OpenAI is keen to stress it's still a work in progress
-
‘Humans must remain at the center of the story’: Marc Benioff isn’t convinced about the threat of AI job losses – and Salesforce’s adoption journey might just prove his point
News Marc Benioff thinks fears over widespread AI job losses may be overblown and that Salesforce's own approach to the technology shows adoption can be achieved without huge cuts.
-
AI adoption is finally driving ROI for B2B teams in the UK and EU
News Early AI adopters across the UK and EU are transforming their response processes, with many finding first-year ROI success
-
An executive producer at Xbox Games Studios told laid off staff to use AI for counseling, and it’s the most ludicrous thing I’ve ever seen in my life
Opinion In the aftermath of Microsoft layoffs, promoting AI career advice feels supremely cold
-
‘The latest example of FOMO investing’: Why the Builder.ai collapse should be a turning point in the age of AI hype
News Builder.ai was among one of the most promising startups capitalizing on the generative AI boom – until it all came crashing down
-
Is ChatGPT making us dumber? A new MIT study claims using AI tools causes cognitive issues, and it’s not the first – Microsoft has already warned about ‘diminished independent problem-solving’
News A recent study from MIT suggests that using AI tools impacts brain activity, with frequent users underperforming compared to their counterparts.
-
‘Agent washing’ is here: Most agentic AI tools are just ‘repackaged’ RPA solutions and chatbots – and Gartner says 40% of projects will be ditched within two years
News Agentic AI might be the latest industry trend, but new research suggests the majority of tools are simply repackaged AI assistants and chatbots.
-
‘Digital first, but not digital only’: Customer service workers were first on the AI chopping block – but half of enterprises are now backtracking amid a torrent of consumer complaints and poor returns on AI
News While businesses have been keen on replacing customer service workers with AI, adoption difficulties mean many are now backtracking on plans.