Cloudflare outage explained: What happened, who was impacted, and how was it resolved?
The seven-hour outage affected customers using Cloudflare's Bring Your Own IP (BYOIP) services
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
You are now subscribed
Your newsletter sign-up was successful
Cloudflare experienced a major outage on Friday that led to issues for a number of websites using its services.
According to Cloudflare, the issue affected customers using its Bring Your Own IP (BYOIP) service, who saw their routes to the internet withdrawn via Border Gateway Protocol (BGP).
"For some BYOIP customers, this resulted in their services and applications being unreachable from the Internet, causing timeouts and failures to connect across their Cloudflare deployments that used BYOIP," the company said in a blog post.
The impacted BYOIP customers first experienced a behavior called “BGP Path Hunting”, whereby end user connections traverse networks trying to find a route to the destination IP - and carry on until the connection that was opened times out and fails.
Meanwhile, visitors to the website for Cloudflare’s recursive DNS resolver (one.one.one.one) were met with HTTP 403 errors and an “Edge IP Restricted” error message.
DNS resolution over the 1.1.1.1 Public Resolver, including DNS over HTTPS, was not affected.
In total, the incident lasted six hours and seven minutes.
Sign up today and you will receive a free copy of our Future Focus 2025 report - the leading guidance on AI, cybersecurity and other IT challenges as per 700+ senior executives
Cloudflare outage caused by configuration changes
Cloudflare confirmed the issue was caused by a change the company had made to the way its network manages IP addresses onboarded through the BYOIP pipeline. This change caused it to unintentionally withdraw customer prefixes.
"The specific piece of configuration that broke was a modification attempting to automate the customer action of removing prefixes from Cloudflare’s BYOIP service, a regular customer request that is done manually today. Removing this manual process was part of our Code Orange: Fail Small work to push all changes toward safe, automated, health-mediated deployment," the firm explained.
"Since the list of related objects of BYOIP prefixes can be large, this was implemented as part of a regularly running sub-task that checks for BYOIP prefixes that should be removed, and then removes them. Unfortunately, this regular cleanup sub-task queried the API with a bug."
While Cloudflare moved fast to reverse the change, around 1,100 prefixes were withdrawn first - around a quarter of the total number of BYOIP prefixes. Some customers were able to restore their own service by using the Cloudflare dashboard to re-advertise their IP addresses.
The outage affected a number of websites, including Uber, Workday, Minecraft, Wikipedia, and Microsoft Outlook.
Betting site Bet365 posted on X, "Hi, we’re aware of an issue with our Website/App, and our Technical Team are working to resolve it as soon as possible. We apologise for the inconvenience."
Cloudflare looks to prevent future issues
Cloudflare said it is already working on improving the Addressing API's configuration change support through staged test mediation and better correctness checks.
"Following our Code Orange: Fail Small promise to require controlled rollouts of any change into Production, our engineering teams have been reaching deep into all layers of our stack to identify and fix all problematic findings," it said.
"While this outage wasn't itself global, the blast radius and impact were unacceptably large, further reinforcing Code Orange: Fail Small as a priority until we have re-established confidence in all changes to our network being as gradual as possible."
FOLLOW US ON SOCIAL MEDIA
Make sure to follow ITPro on Google News to keep tabs on all our latest news, analysis, and reviews.
You can also follow ITPro on LinkedIn, X, Facebook, and BlueSky.
Emma Woollacott is a freelance journalist writing for publications including the BBC, Private Eye, Forbes, Raconteur and specialist technology titles.
-
Everything we know so far about the PayPal data breachNews While few PayPal customers saw their data exposed, some did experience unauthorized activity on their accounts
-
TerraMaster F2-425 Plus reviewReviews This affordable and good-looking five-drive desktop NAS offers plenty of features, including dual 5GbE network ports
-
AWS' new DNS 'business continuity' feature targets 60 minute recovery time after October cloud outageNews The US-EAST-1 Region is getting extra tools and features to help customers during an outage
-
The Cloudflare outage explained: What happened, who was impacted, and what was the root cause?News Web users globally were met with error messages and site crashes yesterday after an outage at Cloudflare brought much of the web to a standstill – here's what happened.
-
A massive Cloudflare outage has taken down X and OpenAI – and even bricked outage tracker site DowndetectorNews Web users trying to access X, OpenAI, and creative design platforms have been affected by the Cloudflare outage
-
Stansted IT glitch causes thousands to miss their flightsNews Eight hour outage causes chaos at UK’s fourth busiest airport
-
British Airways check-in tech failure causes chaos at airportsNews Tech problems cause check-in systems to go down for the airline...
-
A power surge caused British Airways' IT outageNews An engineer rebooted BA's data centre power in an "uncontrolled" fashion
-
British Airways website outage delays check-in for passengersNews BA blames crash on IT database upgrade, but failover capacity questioned
-
HSBC hails "steady return of service" to online bankingNews But bank's technical problems are still ongoing
