‘There is no law of computer science that says that AI must remain expensive and must remain large’: IBM CEO Arvind Krishna bangs the drum for smaller AI models

IBM CEO Arvind Krishna pictured speaking at the first day of the three-day B20 Summit in New Delhi, August 2023.

(Image credit: Getty Images)

published 7 May 2025

IBM CEO Arvind Krishna believes smaller, domain-specific AI models represent a prime opportunity for enterprises to truly capitalize on the technology.

Speaking at IBM’s 2025 Think conference in Boston, Krishna reflected on the course of the generative AI boom so far, noting that the industry has matured significantly compared to last year - as have enterprises and their expectations of the technology.

“As I think about last year to this year, there’s one really big difference that I’m feeling, which is that AI has moved from experimentation to a focus on unlocking business value,” he told attendees.

“People are worrying about what is the use-case? How do I get my business to scale leveraging AI? And I think that's a real big difference, because that means, if I sort of think about hype cycles, that means the hype cycle’s kind of fading, and we are now thinking about adoption, we're thinking about ROI, we're thinking about business value.”

The initial focus on large, cumbersome AI models has waned, Krishna claimed, and is instead shifting towards more finely curated options aimed at tackling specific business challenges.

A key factor in this is businesses adopting AI solutions are dead set on supercharging productivity. While larger models have helped with this, they are often a one-size-fits-all approach.

In contrast, smaller models are now proving vital by enabling enterprises to harness internal data in a more efficient and strategic manner, Krishna noted.

“I think AI is the source of productivity for this era,” he said. “But not all AI is built the same and not all AI is built for the enterprise.

“Why do I make that claim? 99% of all enterprise data has been untouched by AI. So if you need to unlock the value from that 99% you need you need to take an approach to AI that is tailored for the enterprise.

“If you think about the massive general purpose models. Those are very useful, but they are not going to help you unlock the value from all of the data inside the enterprise.”

Smaller models have a key advantage in that they inherently allow enterprises to derive greater value from enterprise data on a case-by-case basis. Ultimately, this not only helps businesses drive productivity, but also allows them to balance costs and target specific areas where the technology has maximum impact.

“To win, you are going to need to build special purpose models, much smaller, that are tailored for a particular use-case that can ingest the enterprise data and then work,” he explained.

“So you'll think about it, and then you’ll say, ‘well what about accuracy?’ - go look at leaderboards. Smaller models are now more accurate than larger models. What about intellectual property? That is where the open nature of some of the smaller models comes in.”

When assessing specific domains in which to deploy AI solutions, Krishna added that it’s “a lot easier to build a smaller model that can go after that”.

“So as we begin to go forward in that, you can think about these models that are three, eight, 13, 20 billion parameters, as opposed to 300 or 500 and then you begin to think about, well what advantage do I get? They're incredibly accurate. They are much, much faster,” he said.

“They're much more cost effective to run, and you can choose to run them where you want.”

IBM can find its niche with small AI models

It’s in this approach that Krishna believes IBM has a major advantage. The company’s watsonx platform, for example, allows enterprises to build applications designed to tackle specific challenges and areas of the business.

Similarly, Krishna pointed to IBM’s Granite model range, which he said is “a lot cheaper than some of the other alternatives”. These products have become core focus areas for IBM, and it intends to continue investing heavily in both moving forward.

As the industry matures and progresses, Krishna believes this focus on smaller model ranges will have a profound impact, opening up opportunities for enterprises in a trickle-down-style effect.

“As technology comes down the cost curve, it opens up many, many more opportunities,” he told attendees.

“It opens up a huge amount of aperture in terms of the problems you can afford to solve. That’s kind of been the curve it is on. We sometimes capture it in the phrase of, well, that's how we democratize technology, or rather, make it accessible to all, because the cost has come down.

“That is what we are very, very focused on. There is no law of computer science that says that AI must remain expensive and must remain large.”

Small language models are all the rage

Rory Bathgate

Krishna isn’t the first to suggest that small language models will make all the difference for enterprises in the coming year. As organizations weigh up the benefits of the largest models over the lowest cost models that meet their needs and speed up the returns on their AI investments, developers are pivoting to providing lighter models that are performant at the edge.

The likes of Google’s Gemma 3 and Meta’s Llama 4 Scout can be run on a single Nvidia H100 GPU, while in the public cloud OpenAI’s 4o nano is its most cost competitive model for text-only tasks.

By leaning into this burgeoning field with its lightweight Granite model family, IBM could finally carve out a meaningful place for itself in the AI market.

Ensuring that each of its SLMs has a clear, competitive, domain-specific use case will be the key here. While there’s little chance of taking on the AI leaders at their own game – namely general purpose models – there’s clear demand for models that simply do one kind of task very well.

Mixture of experts (MoE) models, an architecture in which an AI model is made up of several smaller sub-models which are ‘experts’ at specific tasks and are only activated when needed, complicates the picture when it comes to how ‘small’ SLMs actually are.

The MoE approach allows models which, on paper, have many billions more parameters than the smallest models available on the market to run at a similar latency and cost per token. Take Llama 4 Scout, for example. Meta’s smallest frontier model has a total of 109 billion parameters, but only 17 billion active parameters at any given time, giving it a more lightweight computational footprint.

As these grow in popularity, developers will have to demonstrate concrete benefits to sticking with their approach or adapt to deliver a hybrid model offering. The jury’s still out on what the best approach or lowest latency is for AI output so there’s all to play for here.

For example, OpenAI boasts that GPT-4o nano takes just five seconds to generate its first output tokens when given an input of up to 128,000 tokens. If IBM could demonstrate a faster response time, or even anything like it available through its open source licensing, enterprises would be able to rely more heavily on AI at the edge.

The more domain-specific IBM gets, the more potential for wins in very niche markets.

MORE FROM ITPRO

TOPICS

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.

He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.

For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.