Fastly blames software bug for major outage

The CDN service says the bug was triggered by a valid customer configuration change

The global outage of cloud computing services provider Fastly was caused by an undiscovered software bug, the company has admitted.

The bug, which was introduced to the company's system last month, surfaced on Tuesday "when it was triggered by a valid customer configuration change", according to Fastly's Engineering and Infrastructure SVP Nick Rockwell.

The hour-long "global CDN disruption", which took place on 8 June, affected the online services of numerous governmental portals, news outlets, and IT and code-hosting websites such as GitHub and Stack Overflow. Amazon and he UK government's online portal were also among countless other websites taken offline.

In a statement on the company's website, Rockwell said that "a customer pushed a valid configuration change that included the specific circumstances that triggered the bug", causing 85% of Fastly's network to "return errors".

"We detected the disruption within one minute, then identified and isolated the cause, and disabled the configuration. Within 49 minutes, 95% of our network was operating as normal. This outage was broad and severe, and we're truly sorry for the impact to our customers and everyone who relies on them," he added.

Rockwell added that the company has taken appropriate measures to prevent future issues, including deploying the bug fix across Fastly's network as well as conducting a complete post mortem of the processes and practices that were followed during the incident.

Related Resource

The secure cloud configuration imperative

The central role of cloud security posture management

The secure cloud configuration imperativeFree download

The company will also determine why the bug hadn't been detected during quality assurance and testing processes and will evaluate ways to improve its remediation time.

However, Fastly didn't provide further details on the nature of the bug. IT Pro has reached out to the company and will update this article when more information becomes available.

Commenting on the events, Angelique Medina, director of Product Marketing at network intelligence company Cisco ThousandEyes, said that the outage is an example of how interconnected the web is.

"Most of the sites we visit, the apps we use and the media we consume are delivered by Content Delivery Network (CDN) providers. By caching web content close to users for maximum performance and availability, they're critical to how we access digital services," she told IT Pro

"Together with major public cloud providers who host or provide services to the most-used sites and applications online, the delivery mechanism that is the Internet is largely powered by a few providers and whenever something goes wrong with one of them, it can have a massive impact on web users globally."

Featured Resources

Modern governance: The how-to guide

Equipping organisations with the right tools for business resilience

Free Download

Cloud operational excellence

Everything you need to know about optimising your cloud operations

Watch now

A buyer’s guide to board management software

Improve your board’s performance

The real world business value of Oracle autonomous data warehouse

Lead with a 417% five-year ROI

Download now

Most Popular

How to boot Windows 11 in Safe Mode
Microsoft Windows

How to boot Windows 11 in Safe Mode

6 Jan 2022
Sony pulls out of MWC 2022
Business operations

Sony pulls out of MWC 2022

14 Jan 2022
Dell XPS 15 (2021) review: The best just got better
Laptops

Dell XPS 15 (2021) review: The best just got better

14 Jan 2022