IT Pro is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission. Learn more

Ending the age of data centre ‘five nines’

High 99.999% availability pledges play well with cloud and data centre customers, but some downtime is always necessary

Service level agreements (SLAs) and marketing material that emphasises extremely high 'five nines' availability – 99.999% of uptime – can sometimes have the opposite effect. 

In fact, almost always-on availability can nudge operators away from taking systems or infrastructure down for maintenance, according to Simon Brady, EMEA services channel business development head at Vertiv. 

"Everyone talks about 'five nines'; 99.999% availability," Brady says. "Ask any data centre when was the last time they had a three-minute failure; a one-second failure equals a whole day. ‘Five nines’ is a misnomer, a marketing gimmick."

Always-on data centres are unrealistic

The impetus on data centre five-nines might have inadvertently contributed to the data centre cooling issues reported during 2022's July heatwave, when outdoor temperatures peaked at 40.3ºC in Coningsby, Lincs. Rather than carving out downtime necessary to upgrade equipment or maintain systems to cope with extreme temperatures, operators paid the price much later down the line when cooling systems failed.

While hot weather can correlate with failures, well-looked-after equipment doesn’t typically falter. Instead of demanding constant uptime, stated metrics and service levels should focus more on those aspects: contingency plans in the case of failure, resilience strategies, or redundancies built in at the software level, for example. he says.

Some data centres, meanwhile, otherwise typically operate beyond their SLAs and contracts, putting their own organisation at risk – legally if not technically. The promise of 99.999% availability, therefore, could be deemphasised in some cases, Brady suggests.

"No matter what you buy, whether it's a Bugatti Veyron or a Ford Focus, if it can break, it will break at some point," he says. "So you have to plan for that."

Sascha Giese, head geek at SolarWinds, agrees that 99.999% uptime often falls "somewhere between a marketing statement and a tick-box exercise for business requirements", with organisations sometimes seeing 'five nines' as standard even if they don't need it.

"'Five nines’ allows just over five minutes of downtime in a year. Even if nothing happens, that's not easy to achieve, as OS and security updates will take longer than that. This means data needs to be moved around between redundant systems," Giese points out. 

Smaller or regional companies might be "perfectly fine" with two nines, considering that maintenance windows can be scheduled outside business hours. Yet "we live in a world where humans have lost their patience". Either way, "unexpected details" should be avoided in the fine print, he cautions.

When disaster strikes, ‘five nines’ mean nothing 

Giese agrees more tolerance for some downtime can be needed, suggesting that a staging site with "proper messaging" could be prepared, "ready for the inevitable" disruption, which is bound to happen at some point. Other than that, customers should perhaps simply be more prepared to pay higher prices if they want that high availability.

Neil Clark, director of cloud services at provider QuoStar, notes that typically the need for downtime should be – and is – outlined in the fine print. The onus is at least partly on customers, who need to make sure they know exactly what they're being sold in the first instance.

Related Resource

Multi-cloud data integration for data leaders

A holistic data-fabric approach to multi-cloud integration

Whitepaper cover with title and IBM logo and image of colleagues walking down stairs out of a green, moss-covered buildingFree Download

"Downtime from a maintenance perspective shouldn't ever be included in your 'five nines'," Clark says. "If they can't have an application that ever goes down, even for maintenance, you then build the solution right for that application. But, from our perspective, we're going to have maintenance, our platform will have maintenance, the provider will have maintenance."

That might suggest providers could make the service levels and maintenance requirements clearer to customers, perhaps by explaining it in the main body of an agreement, rather than in the fine print  – and emphasise the importance of engaging with all the terms and conditions.

"Obviously, honesty is the best policy," says Clark. "And looked at another way, 'five nines' means absolutely nothing if there is a natural disaster – like a flood. I mean, how much diesel have you got to run your data centre if something like that happens?"

Reforming the ‘five nines’ mindset

Nick Archer, senior consultant at datacentre authority Uptime Institute says 'five nines' can still be useful as a metric, partly depending on whether customers are working with collocation or cloud, and what services are being hosted or supported.

He warns that if critical business functions are in a third-party facility or the cloud, five minutes of downtime may not even be considered enough anymore. If purchasing a single supply to a rack, however, with single corded devices, it's probably "the best you're going to get".

If there's "a concurrently maintainable or fault-tolerant infrastructure", downtime can, of course, happen for maintenance. 

"I think 'five nines' remains a useful metric but it is probably a little bit outdated," Archer says. "And Sod's Law states that that five minutes [downtime] is going to be happening in whatever is critical to your business, whether that be key trading hours or key business hours in the middle of a key application."

Customers aren't necessarily realistic when estimating their own tolerance levels. If you were to ask different stakeholders within an organisation, they'll all have different views. Providers should ensure stakeholders agree and understand in the first instance the potential impacts of downtime, then work on reducing risk, says Archer.

What's key is understanding the importance of the application or the service that's been supported that's going to be placed elsewhere, related to the tolerance of the organisation for potential risks, outages or downtime at particular times, he says.

Featured Resources

2022 State of the multi-cloud report

What are the biggest multi-cloud motivations for decision-makers, and what are the leading challenges

Free Download

The Total Economic Impact™ of IBM robotic process automation

Cost savings and business benefits enabled by robotic process automation

Free Download

Multi-cloud data integration for data leaders

A holistic data-fabric approach to multi-cloud integration

Free Download

MLOps and trustworthy AI for data leaders

A data fabric approach to MLOps and trustworthy AI

Free Download

Recommended

UK updates NIS regulations bringing stricter rules for MSPs
IT regulation

UK updates NIS regulations bringing stricter rules for MSPs

2 Dec 2022
MoD taps AWS for skills training amid digital transformation struggle
digital transformation

MoD taps AWS for skills training amid digital transformation struggle

2 Dec 2022
Ivanti makes double appointment in channel leadership shakeup
Business strategy

Ivanti makes double appointment in channel leadership shakeup

1 Dec 2022
Why Japan finds it so hard to digitally transform
digital transformation

Why Japan finds it so hard to digitally transform

1 Dec 2022

Most Popular

Empowering employees to truly work anywhere
Sponsored

Empowering employees to truly work anywhere

22 Nov 2022
Q&A: Fred Voccola, Kaseya
channel

Q&A: Fred Voccola, Kaseya

30 Nov 2022
How to boot Windows 11 in Safe Mode
Microsoft Windows

How to boot Windows 11 in Safe Mode

15 Nov 2022