Could rising token costs boost interest in on-premises hardware?

A graphic of a bar chart with points labelled 'AI' on a blue background, to show AI adoption. — (Image credit: Getty Images)

“Tokenomics” has been one of the key talking points at Dell Technologies World 2026. Indeed. executives have been claiming that it could be more economical to invest in on-premises hardware rather than using public services when it comes to AI agent deployment.

To coin a phrase, there’s a strong element of “well they would say that, wouldn’t they?” Dell Technologies is a vendor of data center hardware and, since the launch of AI Factory with Nvidia two years ago, the company has been positioning itself as the answer to enterprises’ AI needs.

At the company’s annual conference, the message was clear: agents are brilliant, but they are token-hungry and high token usage means high costs. Jon Seigal, SVP of client solutions group and online marketing, said that Dell Technologies has direct experience of this, with one ‘super user’ developer running up a bill of $3,400 in just 24 hours thanks to high token usage by the AI agents they were running.

High-performance computers that can handle demanding AI workloads are, in the words of Dell Technologies COO Jeff Clarke “a free, unmetered token generator”.

What is tokenomics?

If you search for “tokenomics” on the web, you’ll be presented with answers relating to cryptocurrency. While there is crossover between the worlds of crypto and generative AI, this isn’t one of them.

In the world of generative AI and AI agents, tokenomics refers to the cost of using AI tokens versus the return they give.

While the cost of a token is predictable, how many tokens any given query or prompt will use is inherently unpredictable.

“Tokens are not created or used equally,” William Fellows, research director, cloud native at 451 Research from S&P Global, explains to ITPro. “Do the same thing a few times – even on the same model – and each prompt will consume a different amount of tokens.”

Varun Chhabra, VP of infrastructure (ISG) and telecom marketing at Dell Technologies, agrees. “It's very hard to predict, even now, when you ask an LLM something, how many tokens it’s going to use,” he tells ITPro.

Additionally, despite increases in efficiency and the cost of creating tokens in and of itself going down, overall costs associated with token use are going up.

“It’s the Jevons Paradox,” says Chhabra. “The cost of creating tokens and consuming them is declining substantially, but the adoption within an organization is ramping up so fast that costs are going up for everybody.”

He argues that this is a sign of things working well. “It’s a sign that people are finding value with it,” he says. “They're using it, becoming more productive.”

With AI agents, this issue of token usage is compounded.

“As more employees use AI tools continuously, and as AI systems begin interacting with other AI systems, those costs can grow quickly and become difficult to predict in public cloud environments,” explains Don Gentile, data platforms and resiliency analyst at Hyperframe research.

This was already an expensive prospect, with reports that superusers (and perhaps tokenmaxxers) are using up token allowances that were supposed to last a year within just one quarter. As reported by The Information, Uber’s CTO Praveen Neppalli Naga revealed the company had blown through its AI budget for 2026 by mid-April – an admission that came just one month after he posted on LinkedIn about how much code was being written by the company’s background coding agent.

While this is already an expensive problem, changes to how major LLM service providers are billing could push it even further towards the edge.

Per-token billing and the agentic apocalypse?

As reported by The State of Brand, on 14 May Anthropic announced that from 15 June it would be splitting its Claude subscription into two. If a human is interacting with Claude, they will stay on the existing flat-fee subscription. Agentic subscriptions, however, will be moved to a fixed monthly API credit.

The tiers are as follows:

Pro: $20 monthly credit
Max: $100 monthly credit
Max 20x: $200 monthly credit

According to a blog from 4sAPI, that $20 of credit equates to about six to seven million input tokens, or one million output tokens. When you consider that the fabled $3,400 superuser’s bill was the result of a team of 10 agents consuming one billion tokens in just one day, even at the Max 20x tier, the token allowance will disappear fairly quickly. Once the credit runs out, the standard API rate will be charged up to any limit set by the subscriber business. Unused credit will not be rolled over from one month to the next.

The reason for this shift, in the words of Anthropic’s head of Claude Code Boris Cherny, is fairly straightforward business economics.

“Our subscriptions weren't built for the usage patterns of these third-party tools,” he said on X. Which is to say, tools like OpenClaw.

The reckoning for both public cloud AI providers and there customers is, it seems, approaching fast or perhaps already here.

Determining the value of agentic AI

“I think the discipline of FinOps for cloud spend management has to be re-invented for token economics, especially inferencing which is where the use is,” says 451 Research’s Fellows. “I think the centre of gravity here will ultimately be around determining what value is returned from token consumption as a more useful measure than the dollar amount.”

“Companies such as Pay-I are tryin to get at this. I don't think anyone has nailed it yet,” he says.

Nobody seems to be suggesting public cloud services have no value, though.

“Cloud services still offer speed and flexibility with lower upfront costs, which makes them attractive for experimentation and fast deployment,” Gentile says. “But for companies running large, always-on AI environments, owning the infrastructure can sometimes provide more stable and predictable long-term costs.”

For Dell’s part, Chhabra says: “There's going to be this balance of a hybrid world where, we believe, a lot of organizations will decide ‘I've reached that inflection point where it is going to be more advantageous for me to actually have my own token generator’.”

“Of course, some portion of things [will be] in the cloud or directly hitting the APIs,” he adds. “But trying to get a grapple on those economics and getting maximum utilization [and] driving down your token generation costs – per token costs – is actually going to really involve having your own infrastructure and being able to utilize that in an effective manner.”

Jane McCallion is Managing Editor of ITPro and ChannelPro, specializing in data centers, enterprise IT infrastructure, and cybersecurity. Before becoming Managing Editor, she held the role of Deputy Editor and, prior to that, Features Editor, managing a pool of freelance and internal writers, while continuing to specialize in enterprise IT infrastructure, and business strategy.

Prior to joining ITPro, Jane was a freelance business journalist writing as both Jane McCallion and Jane Bordenave for titles such as European CEO, World Finance, and Business Excellence Magazine.