BREAKING
Tory MP David Amess stabbed multiple times at constituency surgery 15 October 2021
Man City star Benjamin Mendy refused bail again over rape accusations 11 October 2021
Met decision over Andrew sex claims ‘no surprise’ says ally to prince 11 October 2021
We’re sick of abuse – Demi Stokes says players won’t give up racism fight 11 October 2021
Police restraint training still ‘insufficient’ years after father’s death 11 October 2021
‘Spying more sophisticated than ever’ after claim Russia stole Oxford vaccine 11 October 2021
06 October 2021

Facebook admits ‘cock-up’ during routine maintenance work to blame for 5-hour outage of social network

06 October 2021

The Facebook outage which took the social network, as well as Instagram and WhatsApp, offline for more than five hours was caused by an error during a routine maintenance job, the company has said.

Billions of the platforms’ users had been left unable to get online on Monday by the fault, which the company said was “an outage caused not by malicious activity, but an error of our own making”.

Santosh Janardhan, Facebook’s vice president of infrastructure, said that during what was “routine maintenance work” on the firm’s backbone network “a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centres globally”.

Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command

Writing in a blog post he said: “Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command.

“This change caused a complete disconnection of our server connections between our data centres and the internet. And that total loss of connection caused a second issue that made things worse.”

Mr Janardhan said it also took time to fix because of the way Facebook’s servers are designed, in order to offer better physical security.

“They’re hard to get into, and once you’re inside, the hardware and routers are designed to be difficult to modify even when you have physical access to them,” he said.

He confirmed that Facebook then had to bring the servers back online slowly, to avoid any further issues.

“We knew that flipping our services back on all at once could potentially cause a new round of crashes due to a surge in traffic,” he said.

“Every failure like this is an opportunity to learn and get better, and there’s plenty for us to learn from this one.

“After every issue, small and large, we do an extensive review process to understand how we can make our systems more resilient. That process is already under way.”

As well as sparking debate about the public use of social media, the outage also saw EU competition commissioner Margrethe Vestager repeat calls for greater competition in the tech sector – saying the incident highlighted the negative impact of big tech firms controlling large swathes of the online world.

“We need alternatives and choices in the tech market, and must not rely on a few big players, whoever they are,” she wrote on Twitter.

The best videos delivered daily

Watch the stories that matter, right from your inbox