Global internet outages explained

Is the internet down? What do global internet outages reveal about how the internet works and its core vulnerabilities? Craig Barber CITP MCIIS MBCS explains.

“When it comes to automation, human error can still creep in, as, after all, the tools are developed by humans!”

“When it comes to the internet, a road junction is equivalent to a router, connecting lots of different networks together.”

It’s comforting to imagine that the internet is robust and resilient. And, to a degree, it most certainly is. But, during recent years, we’ve seen several high profile global internet outages. They have all been huge inconveniences for users but, searching for the positives, they are all valuable learning opportunities too.

Ten years ago, a Facebook outage might have been little more than a minor inconvenience - the media might not have even picked it up. Yet in October, news of the site going down was everywhere and the fallout was far wider than initially expected.

Not only was Facebook itself interrupted but additional services run by the parent company, including WhatsApp, Instagram and Workplace were also down. Furthermore, the service known as Login with Facebook became unavailable, resulting in many unrelated websites suffering issues when it came to their users attempting to log in. The incident lasted almost six hours, making it the longest stretch of downtime for the company since 2008.

The incident highlighted how reliant we have become on Facebook, not just personally but, in many cases, from a business perspective too. You can easily imagine people around the world asking: ‘Is the internet down?’

The blackout also caused problems in the developing world. For many African communities, WhatsApp and Facebook services are one of only a few platforms available for businesses and charities to keep in touch with the public. For example, you may have seen that the company provides lite versions of their apps for use on older devices, or in areas with spotty connectivity.

In Zimbabwe, WhatsApp once accounted for almost half of the entire country’s internet traffic and local pharmacies are believed to take prescription orders over the platform. In South Africa, a minister stated that the government needs to do more to support the development of a local social media platform instead of relying on the West.

Understanding a global internet outage

So, what was the cause of the outage which leaves people asking: ‘is the internet down?’ and how can we learn from it? Let’s talk about two common protocols that are often the cause of internet woes and how experts tried to ascertain what was happening at Facebook.

Global internet outages and domain name system (DNS)

Initially, the fault was believed to relate to a protocol known as the domain name system or DNS for short. This system is like the phonebook of the internet. Say you have a friend’s name and phone number stored in your contacts... rather than remember their number, you can simply hit their name and your phone will call them, mapping the name to the relevant number. This is exactly how DNS works. It provides a translation between a website name (such as Google.com) and an IP address (such as 216.58.212.238).

Without DNS, you’d be forced to remember IP addresses for your favourite websites. It would certainly make using the internet more challenging and nowadays, IP addresses also change regularly. Examples of this include your home broadband IP address, or if your organisation deploys services in the cloud.

When browsing the internet at home, your ISP (internet service provider) normally determines which DNS servers you use and they often run their own. However, you can sometimes re-configure your router to use other services. Google and Cloudflare both offer free DNS services that anyone can use. For resilience, you might opt to use a mixture of several providers. If connected to a work network, your company might have deployed their own DNS servers - something that can help speed up internet browsing and improve security.

There is a running joke in the IT world that when there is a network outage ‘it’s always DNS’. Sometimes it’s true! The root cause of several global internet outages over the past few years has been linked to DNS. A good example of this is when cloud delivery company Akamai encountered a problem. This resulted in Lloyds, HSBC, Barclays and other banking services going offline, as well as the Steam and PlayStation gaming platforms.

When it came to the Facebook outage, experts initially jumped to the conclusion that the cause was DNS, due to the fact the company’s servers appeared to be offline. However, upon further investigation, it appeared to relate to internet routing.

Border gateway protocol (BGP)

Think of the internet like the complex road network of a major inner city. A vehicle is transporting your data from point A to point B. Which route should the driver take? Some roads might be more congested than others; others might be smaller roads but used as a so-called ‘rat run’ to get quickly to the destination. Sometimes, the quicker route isn’t the shortest route, like driving around a bypass instead of going straight through a city centre. Whilst on this journey, the vehicle might be directed at junctions based on what the road signs say, or perhaps a sat nav.

When it comes to the internet, a road junction is equivalent to a router, connecting lots of different networks together. Within that router is something known as a ‘routing table’, which is a bit like a sat nav, making decisions on which direction the traffic should head. This is where the border gateway protocol steps in. BGP not only ensures that your internet traffic takes the most efficient route, but that if a fault occurs, your traffic can be sent a different way and still reach the destination network.

‘Best path’ decisions like this are often configured by the owners of each network, determining how traffic should generally be routed. This is like your sat nav recalculating and deciding to take a different route to reach the destination. What’s more, when data is sent back to you from the destination, an entirely different route may be taken depending on network conditions at the time.

So, why was BGP believed to be responsible for Facebook going offline? BGP advertises IP address ranges to routers around the world, providing details on exactly where these networks are located. This then feeds into how BGP determines the best path for your traffic, as those details are stored in the routing table. Without this information, each router is unable to determine how to reach a particular destination. It’s the equivalent of someone coming up to you in the street and asking for directions to somewhere you’ve never heard of.

Following on from the outage, Facebook made a statement that ‘configuration changes on [their] backbone routers’ was the underlying cause. It was believed this was referring to BGP as many experts noticed that for the entire six hour downtime, the routing (or directions) to Facebook’s network disappeared from the internet entirely.

Can cyber attacks cause global internet outages?

Many experts suggested that perhaps a cyber attack targeting BGP might have been responsible for the Facebook outage. Whilst a realistic option, as time went on this appeared less likely and Facebook eventually released the public statement above. However, BGP itself has been around since the early days of the internet and it is simply not secure. The protocol dates back to 1989 and was allegedly conceived on paper napkins! It is also based on the principle of trust - making it unsuitable not only from an attack perspective but also when it comes to misconfiguration.

The Internet Engineering Task Force (IETF) has been working to improve the security of BGP, but the complexity and growth of the internet has made this very difficult, particularly as changes often require every single operator to reconfigure their routers or upgrade their hardware.

As a result of these insecurities, several specific attack types and misconfigurations can occur, causing disruption. One of the most common attacks is known as BGP hijacking. This sounds like something that is almost always malicious, when in fact, it can also occur by accident.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

Referring back to the car analogy, BGP hijacking is the equivalent of road signs being changed or a stranger giving you incorrect directions, resulting in vehicles taking the wrong route to a destination. When this occurs, the traffic can be intercepted, blackholed (effectively sent to a dead end), or otherwise tampered with. The traffic could be redirected to the legitimate destination or potentially to a fake version of the website you are trying to reach.

Accidental BGP hijackings have occurred when network administrators have misconfigured routers, resulting in routing information being updated across the internet. Malicious BGP hijackings include nation-states redirecting traffic to attempt to intercept data and attackers stealing cryptocurrency. If you want to learn more about the insecurities of BGP, Cloudflare has created a website to help drive innovation and awareness. More secure solutions such as BGPSec are also being investigated.

Of course, BGP specific attacks are not the only concern. Many large organisations like Facebook use automation or custom tools to manage their networks. When it comes to automation, human error can still creep in - after all, the tools are developed by humans! Automation can massively speed up configuration changes and updates in a complex environment, but oversights do happen, resulting in innocent mistakes.

It appears this played a part in the Facebook outage and with many of their employees working remotely due to COVID, it appears they lost access to the Facebook network themselves, making remote resolution of the fault impossible.

Boosting resilience

So, how can we reduce the impact of global internet outages from your perspective? Whilst organisations such as Google, Cloudflare and the IETF continue to actively work on making the internet a more secure and resilient platform, your hands are likely tied when it comes to fixing these issues. However, there are ways to reduce the likelihood and impact of disruption for you and your organisation.

Much of this boils down to the proverb of not having all your eggs in one basket. Business continuity (BC) and disaster recovery (DR) planning are at the heart of ensuring your organisation can continue to operate during an incident - including when a global internet outage occurs. As part of this, considering the use of multiple ISPs and cloud providers, resilient hardware and regular testing of BC/DR runbooks will all help your organisation overcome many unforeseen events.

A good example of this is if your business uses the public cloud, consider spanning services across multiple providers. The main providers do a great job at ensuring their service is highly available and resilient, but as we’ve seen with Facebook, nothing is fool proof.

Using multiple DNS providers as referred to earlier is an option too - and don’t forget user education and awareness when it comes to security. When malicious actors initiated the BGP hijack against the cryptocurrency site, many users reported seeing warning messages from their browser. This was due to the fact they were being redirected to a fake version of the website. However, the users ignored the browser warnings and logged in, resulting in the attacker capturing their credentials and being able to log in on their behalf to the real site.

Single sign-on services like Login with Facebook are not only good for security, they are also proven to boost productivity. However, as seen with the Facebook outage, these services can also go offline, resulting in unexpected problems.

Ensure that you have contingency plans in place and consider different eventualities. If for example, you are running an e-commerce site that allows your customers to login with Facebook, consider adding a second option.

As for the internet backbone and protocols such as DNS and BGP, as time goes by, these protocols will likely be revised. Both have inherent security flaws due to their age and progress is often slowed down, since, in many parts of the world, technology is still moving very slowly. Complex collaboration amongst internet providers is also required to address these issues and as the internet continues to grow, this adds additional complexity.

DNS spoofing

Whilst not linked to the Facebook outage, another type of attack worth being aware of is DNS spoofing or poisoning. This can allow an attacker to divert traffic from legitimate websites to malicious ones. There are several ways this can occur; however, the two most common methods are as follows:

An attacker performing a man-in-the-middle attack. This is where they sit between you and the DNS server on the network, changing the responses that come back to you. For example, if your device asks for the IP address for google.com and the DNS server responds with 142.250.180.14, the attacker might change this response to 192.168.200.201 and send you to their own fake version of Google. This type of attack is possible because DNS is a very old protocol and the data is transmitted in cleartext. To help combat this, several encrypted DNS protocols are being quietly rolled out including DNSSEC, DNS over TLS (DoT) and DNS over HTTPS (DoH). If you want to learn more, Cloudflare has written a great blog on this.
An attacker might have gained access to your device and poisoned the DNS cache. Most devices store a local copy of DNS lookups to make internet browsing more efficient. Malware can inject false records into this cache which in turn means your device attempts to connect to a malicious site rather than the legitimate one. Preventative measures include ensuring you regularly install updates and run anti-malware software. You may also notice certificate warnings in your web browser if you are being maliciously redirected. As most of the internet now uses HTTPS, certificates help your browser to ensure the website is legitimate.

The main risks associated with DNS spoofing include data theft, malware infection and censorship.