Is latency always important?

A macro photo of a stopwatch, framed and vignetted by bright light bokeh, in pinks, blues, and oranges, to represent latency.

(Image credit: Getty Images)

published 14 August 2025

Autonomous vehicle navigation, debit transactions, and security footage-focused action are all examples of where latency – the delay between an instruction to transfer data and the data being transferred – can wreak havoc.

Companies like WEKA have called latency “the performance killer in AI workloads” and Amazon Web Services (AWS) has identified low levels of latency as being as important as the input data for generative AI models. So, when is low latency vital and when is it a nice to have?

To begin with, it’s not as simple as an either/or. Elliot Marx, co-founder of data platform Chalk, says that there are two types of latency on which IT leaders should be focused.

“One is just how fast the servers can give you back an answer so that you can make a decision,” he explains. “That’s really important in a credit card authorization loop, like when you swipe your card at Costco and you want to say yes or no to a purchase, that needs to be a really low amount of time.

However Marx sees there as being a second form of data latency that can also have a significant impact.

“How old is your data? So you could serve data that you found out about a long time ago really fast, and you might call that low latency, but it still could be high latency in terms of, well, this is really stale data.”

While conversations about AI often fixate on the latest and greatest, Marx says it was FinTech that had to figure out the latency conundrum first, ten to fifteen years ago, because of the ever-present threat of fraud. Now, he’s seeing that same urge for latency reduction spilling out onto social media and e-commerce platforms, places where the quicker they can learn your behaviour, the quicker the company can shape it. He gives TikTok as an example of a social media giant that has optimized latency in order to further shape their industry-leading algorithm.

However, low latency isn’t just important in the digital world. Another example where its reduction is key is when physical security is involved, such as when a video feed is being fed into an automated process so that safety threats – whether they’re security issues like thieves or health and safety problems like tripping hazards – can be identified and addressed. Val Cook, chief software architect at Blaize, says that as the industry demands less off the shelf solutions and more bespoke choices, the importance of low latency is going to continue to grow.

“I would say 60 to 70 percent of the requests we get even for POCs [proof of concepts] with customers are just not simple cookbook solutions,” Cook tells ITPro. “They want to know if someone ducked down and they appear to have slid under a car, and they're cutting off the catalytic converter…So, we see that over and over and over again. Could you do this?”

On the customer or user side, Cook also thinks that low latency solutions are in a higher demand because it’s human nature to want a product—whether it’s a large language model (LLM) or a security system interface—to react quickly and to match our pace of work.

“Just the way humans work, the response time is going to[need to] be much more sensitive to being able to provide visual feedback and visual input.”

This is a key consideration when it comes to siting data centers. For example, a company looking to process its data in regions with plentiful renewable energy and therefore green data centers, such as Iceland, will also have to contend with increased latency.

The exact amount will depend on where one’s data was originally stored and processed. In the case of Shearwater Geoservices, which embraced Icelandic data centers in 2024 for reduced operating costs, the firm saw its latency increase from 3-4 milliseconds in the UK to 20-30 milliseconds in Iceland. This was ultimately deemed acceptable for real-time operations.

Workers on the move, such as those in field service management, also have to factor the latency of 5G networks or satellite broadband into their operations. This has to be weighed against what one considers ‘acceptable’ latency depending on the task at hand, and generally compares unfavorably to full-fiber broadband.

For example, in the UK the average latency over Wi-Fi is 19 milliseconds, per Uswitch data, compared to 33 milliseconds over 5G and 42 milliseconds over 4G. A recent Ookla report on Starlink satellite broadband found UK latency was 41 milliseconds on average.

How do organizations reduce latency?

Tech giants are aware of the need to reduce latency. OpenAI, for example, provides an entire guide to reducing latency in AI implementations, one which includes a paradoxical piece of advice: don’t automatically assume that an LLM is your best option.

The firm, along with all major AI model developers, offers a range of models designed to meet the latency needs of users. For example, its lightweight GPT-5 nano model is less sophisticated but far faster to produce outputs than GPT-5, while GPT-4o mini Realtime has such low latency that it can be used for real-time audio inputs.

Other models, such as Gemini 2.5 Flash, Mistral Small, and Nvidia’s Llama 3.1 Nemotron 70B also boast very low latency.

For Marx, his team has found that simplifying the programming language they’re using has helped to drastically reduce latency.

“What we found is that you have to write really low level systems code. And so we end up writing a lot of Rust and a lot of C++,” he tells ITPro.

“Those aren't really the languages of data science and of machine learning engineering, though…What we do is we treat Python and SQL kind of like a domain-specific language, and we reinterpret it and run it as much as we can in lower level systems languages that are much, much, much faster, like orders of magnitude faster.”

Between simplifying the language being used and avoiding reverse extract, transform, load (ETL) in favour of computing on the fly, Marx says his team is able to bring complex functions down from a ten to fifteen second wait time down to 15 milliseconds.

For Cook, the key – instead of using a traditional batch method – is to build a system where multiple tasks can happen dynamically, meaning an entire sequence doesn’t have to happen all at once. This asynchronous approach reduces the amount that latency compounds.

“When I have to declare my graph, run a detector, run a face recognizer, run an action recognizer, and so forth, when I have to declare those all in a pipeline and then run them and then see what you get at the end, you get two challenges,” he says.

“One is you increase your latency dramatically. And number two, to your point, you waste compute.”

Still, there are examples where lower latency isn’t as high of a priority. Marx points to scenarios where business leaders need to be able to see a dashboard that assesses the health of their company – data that’s unlikely to be needed up to the millisecond.

“They could still be complex pipelines but you might not care so much about them being really up to date. I think one of the trends we're seeing in the industry though, is that people want to have one source of truth of their data and not be implementing that same sort of complex logic in a bunch of different places, because inevitably you end up writing it a little bit differently in each spot.”

This is why he sees computing on the fly as the approach, saying that reverse ETL “is going to die” as people prioritize creating systems that don’t require moving around and duplicating data as much.

In the name of efficiency, whether it’s building pipelines and processes or writing in simpler languages, the industries using tools like generative AI will continue to prioritize latency. It lessens compute cost, increases productivity and marketability, and creates opportunities for systems to run faster. In a world where the money is in the milliseconds, latency will be front of mind for industry experts as this portion of the industry evolves.

John Loeppky is a British-Canadian disabled freelance writer based in Regina, Saskatchewan. He has more than a decade of experience as a professional writer with a focus on societal and cultural impact, particularly when it comes to inclusion in its various forms.

In addition to his work for ITPro, he regularly works with outlets such as CBC, Healthline, VeryWell, Defector, and a host of others. He also serves as a member of the National Center on Disability and Journalism's advisory board. John's goal in life is to have an entertaining obituary to read.