Categories
Uncategorized

the June 19 Internet outage (not CrowdStrike)

There was a huge Internet outage on June 19, 2024. It affected millions of people, interfering with their ability to travel, interact with friends and family, and with businesses to communicate with their customers and place orders. It cost the global economy millions of dollars.

And it had nothing to do with CrowdStrike.

I’m talking about the the 5-day near-total shutdown of the Internet in Bangladesh, from 2024-07-18t15:00Z (9pm July 18 local time in Bangladesh) until about 2024-07-23t13:00Z (7pm July 23 local time). For most of that period, pretty much all Bangladeshi networks were down. People could not communicate with each other. Here are the start, middle, recovery pictures from our blog entry:

These figures show Bangladesh, with circles whose size indicates the number of networks that are out in each part of the country. Circle color indicates the percentage of networks that are out–red is near 100% networks unreachable. My research group measures Internet outages, and you can look at what happened in our website. Red basically never happens for big countries, at least since the 2011 Egyptian revolution.

Bangladesh had civil unrest, protests, and riots due to an unpopular employment law (as reported by many organizations, including the New York Times). The government chose to shut down their Internet (as reported by AP, and others). They restored services on July 24, but I am told they are still blocking several social media services.

What does this have to do with CloudStrike?

Well, nothing. But you may have heard that CloudStrike had a software-update that went wrong, also on July 19. It also interfered with millions of people’s ability to travel, interact with friends and family, and with businesses to communicate with their customers and place orders, as it crashed millions of computers running Microsoft Windows and left them difficult to recover.

But the CloudStrike software glitch was not an Internet outage.

Yes, millions of computers failed. But the Internet was never affected by the failure of CloudStrike computers. Anyone could use the Internet just fine last week, provided they were using services that did not depend on Microsoft Windows. And lots of the computers that failed (like flight status kiosks in airports) were not on the public Internet.

CloudStrike was a massive software failure, but not an Internet outage.

I mention this because I heard multiple media sources discuss the CloudStrike-caused Internet outage. Most prominent was this article by Barath Raghavan and Bruce Schneier on Lawfare (and then reposted on Schneier’s blog), that starts “Friday’s massive internet outage, caused by a mid-sized tech company called CrowdStrike, disrupted major airlines, hospitals, and banks.” They point to “brittleness of infrastructure” as a risk. The article is true, except for the word “Internet”. The New York times called it a “tech outage“, and us in the field should be as careful about our terms.

By analogy, when two Boeing 373 MAX airliners crashed in 2019 and 2020, we did not call out the “massive air traffic control crash”, we correctly pointed at aircraft failures, and eventually at software and design problems in that specific aircraft.

We should not call all computer failures an Internet outage, when the problem is not about network communication. To improve our computing world, we must identify problems correctly.

Because when a nation of 170 million people goes offline, that’s a big deal, too. And that’s not fixable by rebooting.

Categories
Uncategorized

new technical report “Reasoning about Internet Connectivity”

We have released a new technical report: “Reasoning about Internet Connectivity”, available at https://arxiv.org/abs/2407.14427.

From the abstract:

Figure 1 from [Baltra24b], showing the connected core (A, B and C) with B and C peninsulas, D and E islands, and X an outage.

Innovation in the Internet requires a global Internet core to enable
communication between users in ISPs and services in the cloud. Today, this Internet core is challenged by partial reachability: political pressure
threatens fragmentation by nationality, architectural changes such as
carrier-grade NAT make connectivity conditional, and operational problems and commercial disputes make reachability incomplete for months. We assert that partial reachability is a fundamental part of the Internet core. While some systems paper over partial reachability, this paper is the first to provide a conceptual definition of the Internet core
so we can reason about reachability from first principles. Following
the Internet design, our definition is guided by reachability, not
authority. Its corollaries are peninsulas: persistent regions of
partial connectivity; and islands: when networks are partitioned
from the Internet core. We show that the concept of peninsulas and islands can improve existing measurement systems. In one example,
they show that RIPE’s DNSmon suffers misconfiguration and persistent
network problems that are important, but risk obscuring operationally
important connectivity changes because they are 5x to 9.7x larger. Our evaluation also informs policy questions, showing no single
country or organization can unilaterally control the Internet core.

This technical report is joint work of Guillermo Baltra, Tarang Saluja, Yuri Pradkin, John Heidemann done at USC/ISI. This work was supported by the NSF via the EIEIO and InternetMap projects.

Categories
Uncategorized

major Internet outage in Bangladesh

Since around 2024-07-18t15:00Z (July 18,21:00 local time), Bangladesh has had a major,country-wide Internet outage. As of t17:30Z some regions see 97% unreachability. This country-wide outage seems to be in response to civil unrest and protests.

Here’s the view from Trinocular outage detection as of 17:30Z:

We wish the best for the people of Bangladesh during this unrest.

Update July 19 morning: A day after Bangladesh’s Internet connectivity first went down, it remains nearly completely stopped. Here is our view of Bangladeshi connectivity at 2024-07-19t14:40Z (20:40 local time there):

Update July 19 afternoon: USC/ISI posted an article about the Bangladeshi Internet outage and our work as ISI news, and an new NYT article about the protests.

The AP reports “A statement from the country’s Telecommunication Regulatory Commission said they were unable to ensure service after their data center was attacked Thursday by demonstrators, who set fire to some equipment. The Associated Press was not able to independently verify this.” However, the near-complete outage observed by Trinocular (as seen in the figures above) seems inconsistent with problems at a single datacenter.

Update July 19, 22:28Z: ISOC Pulse has a post about this outage, and reports that “In a press event on 18 July, Bangladesh minister for posts, telecommunications, and information technology, Zunaid Ahmed Palak confirmed that the government had ordered the shutdown. “

Update July 20: The country-wide outage continues.

Update July 21, 17:00Z: Although recent news reports suggest some government response to protests, the near-complete country-wide Internet outage continues.

Update July 22, 23:00Z: Another day with no externally visible change–all of Bangladesh remains inaccessible from outside.

Update July 23, 18:00Z: Beginning around 13:00Z (which 19:00 in Bangladesh), we see the first signs of Bangaldeshi networks coming back on-line! The figure below is as of 16:26Z and shows about half of the national networks reachable from outside the country.

To add about the root cause, the Deccan Herald published an article from Reuters quoting Zunaid Ahmed Palak, junior information technology minister, as saying to reporters: “Mobile internet has been temporarily suspended due to various rumors and the unstable situation created…. on social media” on July 18. Today, Reuters quoted Palak as saying that “broadband internet would be restored by Tuesday night but [he] did not comment on mobile internet”. This statement is consistent with the country-wide outage we observed, and the prior statement suggests the outage was a request of the government.

Update July 24, 13:00Z (19:00 in Bangladesh): It looks like nearly all Bangladeshi networks are now back online.

Update July 25: The July 25 episode of The Briefing, an Australian news podcast, discussed the Bangladeshi outage and its impact, interviewing us about what we saw.

Categories
Uncategorized

Hurricane Beryl, as seen through Internet Outages

Hurricane Beryl made landfall in Texas around 2024-07-08 at 3:17am local time (CDT) (8:17 UTC). We see a fair number of Internet outages in the Huston area, presumably as people lost power due to flooding.

Compared to our view of Hurricane Harvey in 2017 in our blog and web, Beryl looks much less severe–we see fewer areas where most Internet acccess is out (as shown by red circles).

Our most recent data, about 10 hours after landfall (1:33pm local time, or 2024-07-08t18:33Z):

Just before landfall, at 3:17am local time (2024-07-08t08:17Z):

We wish the best for Texas, and for the residents of the Caribbean who experienced Beryl last week.

For current status, please see our near-real-time outage site. Data about this outage will be released at the end of the quarter.