Home » Failure caused by faulty routing configuration changes

Failure caused by faulty routing configuration changes

Facebook: outage caused by faulty routing configuration changes

Facebook says yesterday’s global outage was caused by faulty configuration changes to its main routers that shut down all of its services.

“Our engineering teams have learned that configuration changes on the main routers that coordinate network traffic between our data centers have caused issues that interrupted this communication,” said Santosh Janardhan, vice president of engineering. and infrastructure at Facebook.

“This disruption in network traffic has had a cascading effect on the way our data centers communicate, causing our services to stop.”

Configuration issues also impacted the company’s internal systems and tools, making it harder to bring systems online and further hampering the recovery process.

“The underlying cause of this failure also impacted many internal tools and systems that we use in our day-to-day operations, complicating our attempts to diagnose and resolve the problem quickly,” added Janardhan.

He also said there was no evidence that Facebook user data was compromised as a result of the downtime, with the company blaming the root cause of the incident on a faulty configuration change.

What happened?

Yesterday, Facebook, Instagram, and WhatsApp started coming back online after a BGP routing issue was addressed that resulted in more than six hours of downtime.

At around 11:50 a.m. EST, all three websites suddenly became inaccessible, with browsers and apps showing DNS errors when trying to connect.

While Facebook did not provide any details and the massive outage appeared to be DNS related at first, it was later learned that the problem was much worse and much more difficult to resolve.

Several Facebook routing prefixes suddenly disappeared from the Internet’s BGP routing tables, which immediately made it impossible to connect to services hosted on those IP addresses, as explained by Giorgio Bonfiglio, senior technical account manager at Amazon AWS.

BGP (Border Gateway Protocol) is a routing protocol that makes the Internet work and allows devices on one side of the world to travel to devices on the other using routes (or prefixes.)

Because Facebook’s domain registrar and DNS servers are hosted on the company’s own routing prefix, when the BGP prefixes were removed from the routing tables, no one could connect to their IP addresses or services. executed on it.

“BGP routes pointing traffic to Facebook’s IP address space have been removed. The Internet no longer knows where to find Facebook’s IP addresses. One symptom is that DNS queries are failing, ”said Johannes B. Ullrich, Ph.D., Dean of Research. at the SANS Institute of Technology.

“But this is only the result of Facebook hosting its DNS servers in its own network. Even with a working DNS (eg if you still have cached results), IP addresses are currently not accessible.”

“To everyone who has been affected by the outages on our platforms today: we are sorry,” a Facebook spokesperson told TechToSee.

“We know that billions of people and businesses around the world depend on our products and services to stay connected. We appreciate your patience as we come back online.”

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on Top - Get the daily news in your inbox

Trending this Week