Microsoft Azure crash blamed on operational error

Storm clouds

Microsoft has blamed operational error for Azure’s widespread outage earlier this week.

The issue with Redmond’s cloud service pulled customers’ sites offline and left Office 365 users without access to apps and data on Wednesday.

Microsoft apologised for the 11-hour crash yesterday, saying a configuration change made to the Azure Storage Front End triggered a bug in Blob (Binary Large Object – a large file) front-ends.

This bug caused the blob front-ends to run in an infinite loop, preventing them from taking traffic.

The configuration change was part of a wider update to Azure Storage, and Microsoft decided to roll it out to the entire production service.

The Azure team’s CVP, Jason Zander, said in a blog post yesterday: “Unfortunately the issue was widespread, since the update was made across most regions in a short period of time due to operational error, instead of following the standard protocol of applying production changes in incremental batches.

“Once the issue was detected, the configuration change was reverted promptly. However, the Blob Front-Ends had entered into an infinite loop triggered by the update, and couldn’t refresh the configuration without a restart. This caused the recovery to take longer.”

The outage affected a total of 20 applications including 365 online apps, virtual machines, backup and machine learning tools, as well as its cloud-based Hadoop distribution.

It follows on from the multiple Azure outages that occured in August, when global users were denied access to services like Visual Studio and its management portal.

Last October a fault with Microsoft’s cloud service prevented users from uploading when using FTP services.