How SMBs can prepare for the cloud going down

Five cloud symbols on a blue background signifying multi-cloud approach
(Image credit: Getty Images)

There is nothing more frustrating than being let down by things outside your control. But in business, as in life, being able to manage things you can’t control is essential. 

The goal of 99.9% uptime is far from a given, as cloud providers are not infallible and outages occur. There are several likely reasons for a cloud outage, and factors such as unforeseen power availability issues, software flaws, and ransomware attacks play a role.

Any business must acknowledge the reality of cloud outages. These might be just a few hours in length, or they could last for a few days. No business wants to simply “get through” an outage no matter its duration. Instead, leaders aim to minimize as much damage and downtime as possible and prevent their clients and customers from experiencing any inconvenience as a result. 

For that to happen, especially among small and medium-sized businesses (SMBs) with limited resources, planning and preparation are key.

Plan for cloud outages

The first step towards putting mitigations in place is understanding the problem. In terms of cloud, this means knowing the answer to key questions concerning your on-premises and cloud infrastructure.

Thomas King, CTO at internet exchange operator DE-CIX, tells ITPro that businesses need to account for all eventualities.

“What outages could occur? How critical is the data or the workload involved? How negative will the impact of each of these types of outages be? What countermeasures should be taken?” asks King.

It is important to categorize potential outages by the systems they will affect, from those internal and customer-facing to regional and international systems that may span an organization’s entire environment. With the number of remote-working employees today, the potential for communication outages between teams or entire offices should also be considered. Mapping all of this might be complex, but without the map, you can’t come up with meaningful mitigations.

RELATED RESOURCE

A whitepaper from Schneider to help you choose a UPS provider for your needs

(Image credit: Schneider Electric)

A guide to help you choose the UPS battery backup for your needs

Discover what business applications are best suited for backup power protection

DOWNLOAD NOW

It is also important to understand which parts of your work are business-critical, as these will need to be kept functional as a priority. Can you put in place on-premise workarounds, or do you need to have expensive but potentially business-saving redundancy arrangements that will allow a secondary system to kick in if your cloud fails?

“The most critical resources, such as enterprise resource planning (ERP) and cloud databases, should be deployed in an active-active configuration across different cloud service providers to ensure the greatest possible availability,” says King.

“Further resources such as the webshop front-end can be set up in an active-passive deployment so that they can be spawned on a separate cloud service provider when needed.”

Taking services offline

For some activities, it will be sufficient to put in place manual processes rather than maintain a separate online system. Business continuity planning can go into granular detail over which processes could go offline and provide clarity over which of these are mission-critical. This information is imperative if you get into an ‘all hands to the pump’ situation. 

Jaco Vermeulen, CTO at BML Group reminded ITPro that when offline processes are put in place it is vitally important to “include recovery operations to ensure those offline/manual functions are efficiently and accurately updated in cloud systems.” This cloud updating will need to be scheduled at speed. It is not going to help the business once the cloud is back running at a hundred percent efficiency if people can’t find recent records because they are still in the manual system.

Bring your workforce with you

It is one thing to know which key processes will need to change during a cloud outage and to prioritize these in order of importance as part of a contingency plan. It is quite another thing to know who will be doing the hands-on work to deliver the processes and for those people to feel competent and confident. 

If you will need to redeploy people into entirely new contingency roles, make sure that these people are aware of their responsibilities and that they have the required skills. If they don't, upskilling will be necessary. The last thing you want is to be running around in the first few hours of a cloud outage getting people in the right place and checking that they can do the tasks asked of them.

Manual workflows and record-keeping can make a solid backup for the cloud in the event of an outage, with hard copies still useful despite moves by many to a paperless office. If you are going to use paper systems to replace some cloud ones, it’s important to have the right forms and other materials to hand so there’s no mad rush for the printer when an outage occurs.

Work out what might be needed for a couple of hours of outage, and be sure teams can self-serve to replenish stocks. When using people to replace cloud processes the key is to plan, prepare, and run drills. Getting all this in place as part of a business continuity planning exercise might feel cumbersome and time-consuming. Some might see it as diverting energy away from day-to-day tasks. But without a worst-case scenario plan, leaders are simply biding their time until their ordinary processes are upended.

A final step is to remember that both technical and paper-based contingencies are easily ruined by changes that happen as part of everyday business evolution. Digital transformation, or replacing legacy technology with new systems, are beneficial changes that can nonetheless introduce new variables to a long-standing cloud outage contingency plan.

“All executive management and leaders are responsible for business continuity and plans need to be reviewed, whenever there is an operational change or change in systems/IT landscape,” says Vermeulen.

“Any such change can become a punch in the face to plans if not properly checked.”

Sandra Vogel
Freelance journalist

Sandra Vogel is a freelance journalist with decades of experience in long-form and explainer content, research papers, case studies, white papers, blogs, books, and hardware reviews. She has contributed to ZDNet, national newspapers and many of the best known technology web sites.

At ITPro, Sandra has contributed articles on artificial intelligence (AI), measures that can be taken to cope with inflation, the telecoms industry, risk management, and C-suite strategies. In the past, Sandra also contributed handset reviews for ITPro and has written for the brand for more than 13 years in total.