Cloud failures: Horror stories and how to avoid becoming one

error symbol on a computer monitor

Cloud Pro often talks about the many benefits of cloud computing, but let's be honest, some problems do crop up from time to time.

Outages send infrastructure offline and the wrong cloud applications can sometimes all but sink the companies that have been investing time and money in using them.

Here are a few cloud horror stories to scare you, but like every good horror movie, there is always a survivor who lives to tell the tale...

The recalcitrant cloud provider

One of the big fears around the cloud is that you run the risk of losing your data if the relationship with your cloud services provider turns sour.

Ian Masters, regional director at Vision Solutions, recalls previous experience when a service provider that his firm worked with was asked for help by a prospective customer.

“The customer's approach was due to the fact that they wanted to stop using their current cloud provider and move over to the new one,” Masters said.

“However, their current supplier did not want to provide any assistance in this. This recalcitrant approach did not endear them to the customer, so they had to think more carefully about the migration.”

The incumbent service provider was refusing to help with any migration assistance or solve any problems. This included not allowing access to the datacentre where the customer’s machine images were based, or any other support.

The new service provider could not just rebuild things from scratch, as that would have resulted in a large period of downtime, in addition to ramping up costs.

This locked room mystery meant the customer had to replicate their systems over to the new cloud platform, without having to physically touch the servers or storage.

With time running out, the service provider came up with the idea of using a software tool called Double-Take Move.

“This was installed in the customer’s virtual machine images to copy the VMs over to the DataStore 365 cloud platform in the background, but without having to get physical access,” Masters added.

The images were then switched over without any downtime or lost data. DataStore 365 was also able to prove it could help in adverse circumstances. This helped rebuild some trust back in the cloud in general, turning a potential horror story into a happy ending.

“What could have been a black mark against cloud turned into a validation of this way of delivering services,” Masters said.

When the cloud provider thinks you haven’t paid

Roger Goodwin, managing director of Digital Trading, an Oxfordshire-based web development house, warns that if you have an account with a cloud provider, you should manage it carefully.

“We recently had an issue where an account with Microsoft Azure, where our account got cancelled due to failed payments," he said.

"The payments failed due to security checks by the bank. No notification was received from either the bank or Microsoft of the issue. After 28 days of the failed payment the account was suspended."

When this happened, it took all the servers and site offline. Microsoft was quick to respond and the issue was dealt with quickly. However, once the account had been suspended all the IP addresses and DNS assigned to these servers were removed.

“Once the servers and sites came back online we then had to reset all the DNS records to point to the now new servers. In total this took about 24 hours during which time all sites were offline,” Goodwin added.

Disaster recovery? We’ve heard of it!

Never let it be said that a disaster recovery plan is a waster of time and money. That goes for both the customer and provider. Earlier this year, source code hosting provider Code Spaces experienced the ultimate horror when hackers gained access to its Amazon EC2 control panel.

The firm had already received an extortion threat from criminals demanding a “large fee” to stop a DDoS attack from sending the service offline.

That attack was just the beginning. In addition to amassing a considerable botnet to take the website offline, the hackers also managed to gain control of Code Spaces’ EC2 control panel.

Code Spaces finally managed to regain control of the access panel, but to no avail. The attacker had removed all EBS snapshots, S3 buckets, all AMIs, some EBS instances and several machine instances. This meant most of its data, machine configurations and offsite backups had been partially or completely deleted. There was no longer a business for it to run or a service to provide to customers.

If there are any lessons to be learned, it is to make sure that access to administration dashboards are properly secured and backups are held at a different cloud provider, with different authentication credentials.

"Get your data out of our cloud quick, we’re going out of business"

Probably the biggest nightmare of them all is what happened to customers of cloud storage provider Nivanix.

Last year, the firm shut down and gave customers just weeks to transfer data to another provider. For firms that had terabytes, if not petabytes, of data, stored in the cloud getting that data out of its infrastructure and somewhere - anywhere - else, was going to prove a monumental task given the short window for action.

Add to this the fact that very few providers have pipes big enough to transport so much data, and you are looking at a near impossible situation for firms.

In the end, help came largely from Aorta Cloud and what was left of engineering staff at Nirvanix to pull out data for every customer that contacted it. It did so using high-speed links in datacentres shared with Nirvanix, providing a lifeline to customers with loads of data that otherwise would have been lost.

The lesson here is to make sure to keep an emergency data backup that is fully under your control.

So there you have it. Nothing is perfect, including the cloud, but you can certainly enjoy the successes if you bear in mind what can go wrong and do your best to avoid it.

Rene Millman

Rene Millman is a freelance writer and broadcaster who covers cybersecurity, AI, IoT, and the cloud. He also works as a contributing analyst at GigaOm and has previously worked as an analyst for Gartner covering the infrastructure market. He has made numerous television appearances to give his views and expertise on technology trends and companies that affect and shape our lives. You can follow Rene Millman on Twitter.