Starting shortly before 2025-08-19t17:05Z (10:05pm August 19 local time in Pakistan), a very large Internet outage occurred across all of Pakistan. Although not affecting all networks, in many areas 50% or more fo the networks are down, as shown in the following map:
We saw the first outages at 16:30Z, and they quickly ramped up to about half of the networks in most of the country. Since the network outages closely follow the country’s borders, it seems unlikely that this is a weather-related event. As of the time of this post (t20:00Z), the outage appears to be ongoing. We’ll post an update here when we learn more.
Some reports are suggesting it’s a backbone outage caused due to flooding.
Recent PhD graduate ASM Rizvi was featured in an ISI News story, sharing his thoughts about joining USC and ISI and what he plans to do after graduation.
Today around 3:30pm local time (around 2025-02-25 T18:30Z), Chile suffered a major power outage. News reports suggest 8 million or more are without power.
We can see the effects of this power outage on Internet access as measured by Trinocular, our internet outage detection system. Outages start around 18:30Z and increase steadily to 20:30Z, the most recent data we have.
On a recent episode of the ISI/nsiders podcast, host Adam Russell interviewed ANT lab member Wes Hardaker about the creation of the DNS at ISI, his history in making software of the Internet interoperate better, research usability and his general desire to promote life long learning.
We have released a new technical report: “Auditing for Bias in Ad Delivery Using Inferred Demographic Attributes”, available at https://arxiv.org/abs/2410.23394.
From the abstract:
[Imana23c, figure 3]: Detecting racial skew with BISG-based inference is less sensitive (shown by the lower test statistic Z) than either knowing true-race, or using our improved version that reflects potential inference error. More samples and larger underlying skew make the range of confusion smaller, but do not eliminate it.
Auditing social-media algorithms has become a focus of public-interest research and policymaking to ensure their fairness across demographic groups such as race, age, and gender in consequential domains such as the presentation of employment opportunities. However, such demographic attributes are often unavailable to auditors and platforms. When demographics data is unavailable, auditors commonly infer them from other available information. In this work, we study the effects of inference error on auditing for bias in one prominent application: black-box audit of ad delivery using paired ads. We show that inference error, if not accounted for, causes auditing to falsely miss skew that exists. We then propose a way to mitigate the inference error when evaluating skew in ad delivery algorithms. Our method works by adjusting for expected error due to demographic inference, and it makes skew detection more sensitive when attributes must be inferred. Because inference is increasingly used for auditing, our results provide an important addition to the auditing toolbox to promote correct audits of ad delivery algorithms for bias. While the impact of attribute inference on accuracy has been studied in other domains, our work is the first to consider it for black-box evaluation of ad delivery bias, when only aggregate data is available to the auditor.
This technical report is joint work of Basilial Imana and Aleksandra Korolova (both of Princeton) and John Heidemann (USC/ISI). This work was supported by the NSF via CNS-1956435, CNS-2344925, and CNS-2319409 (the InternetMap project).
Hurricane Helene made landfall in the U.S. at 11:10pm EDT Sept. 26 (2024-09-27t03:10Z) near Tallahassee, Florida, and we’ve been watching it in the Trinocular Internet Outage system.
Flordia Internet infrasructure appears to have done quite well, with relatively few Internet outages. Here is the view 4.5 hours after landfall, at 3:40am EDT Sept. 27 (2024-09-27t07:40Z), when the eye was already over southern Georgia:
However, storm damange resulted in many outages across Georgia at daybreak. Here is 11 hours after landfall, at 6am EDT Sept 27 (2024-09-27t10:00):
Fortunately the Internet infrastructure in Georgia was quick to recover, suggesting most Internet outages were power loss. We wish the best for those in Kentucky, and for those with physical storm damage and coping with flooding.
There was a huge Internet outage on June 19, 2024. It affected millions of people, interfering with their ability to travel, interact with friends and family, and with businesses to communicate with their customers and place orders. It cost the global economy millions of dollars.
And it had nothing to do with CrowdStrike.
I’m talking about the the 5-day near-total shutdown of the Internet in Bangladesh, from 2024-07-18t15:00Z (9pm July 18 local time in Bangladesh) until about 2024-07-23t13:00Z (7pm July 23 local time). For most of that period, pretty much all Bangladeshi networks were down. People could not communicate with each other. Here are the start, middle, recovery pictures from our blog entry:
These figures show Bangladesh, with circles whose size indicates the number of networks that are out in each part of the country. Circle color indicates the percentage of networks that are out–red is near 100% networks unreachable. My research group measures Internet outages, and you can look at what happened in our website. Red basically never happens for big countries, at least since the 2011 Egyptian revolution.
Bangladesh had civil unrest, protests, and riots due to an unpopular employment law (as reported by many organizations, including the New York Times). The government chose to shut down their Internet (as reported by AP, and others). They restored services on July 24, but I am told they are still blocking several social media services.
What does this have to do with CloudStrike?
Well, nothing. But you may have heard that CloudStrike had a software-update that went wrong, also on July 19. It also interfered with millions of people’s ability to travel, interact with friends and family, and with businesses to communicate with their customers and place orders, as it crashed millions of computers running Microsoft Windows and left them difficult to recover.
But the CloudStrike software glitch was not an Internet outage.
Yes, millions of computers failed. But the Internet was never affected by the failure of CloudStrike computers. Anyone could use the Internet just fine last week, provided they were using services that did not depend on Microsoft Windows. And lots of the computers that failed (like flight status kiosks in airports) were not on the public Internet.
CloudStrike was a massive software failure, but not an Internet outage.
I mention this because I heard multiple media sources discuss the CloudStrike-caused Internet outage. Most prominent was this article by Barath Raghavan and Bruce Schneier on Lawfare (and then reposted on Schneier’s blog), that starts “Friday’s massive internet outage, caused by a mid-sized tech company called CrowdStrike, disrupted major airlines, hospitals, and banks.” They point to “brittleness of infrastructure” as a risk. The article is true, except for the word “Internet”. The New York times called it a “tech outage“, and us in the field should be as careful about our terms.
By analogy, when two Boeing 373 MAX airliners crashed in 2019 and 2020, we did not call out the “massive air traffic control crash”, we correctly pointed at aircraft failures, and eventually at software and design problems in that specific aircraft.
We should not call all computer failures an Internet outage, when the problem is not about network communication. To improve our computing world, we must identify problems correctly.
Because when a nation of 170 million people goes offline, that’s a big deal, too. And that’s not fixable by rebooting.
We have released a new technical report: “Reasoning about Internet Connectivity”, available at https://arxiv.org/abs/2407.14427.
From the abstract:
Innovation in the Internet requires a global Internet core to enable communication between users in ISPs and services in the cloud. Today, this Internet core is challenged by partial reachability: political pressure threatens fragmentation by nationality, architectural changes such as carrier-grade NAT make connectivity conditional, and operational problems and commercial disputes make reachability incomplete for months. We assert that partial reachability is a fundamental part of the Internet core. While some systems paper over partial reachability, this paper is the first to provide a conceptual definition of the Internet core so we can reason about reachability from first principles. Following the Internet design, our definition is guided by reachability, not authority. Its corollaries are peninsulas: persistent regions of partial connectivity; and islands: when networks are partitioned from the Internet core. We show that the concept of peninsulas and islands can improve existing measurement systems. In one example, they show that RIPE’s DNSmon suffers misconfiguration and persistent network problems that are important, but risk obscuring operationally important connectivity changes because they are 5x to 9.7x larger. Our evaluation also informs policy questions, showing no single country or organization can unilaterally control the Internet core.
This technical report is joint work of Guillermo Baltra, Tarang Saluja, Yuri Pradkin, John Heidemann done at USC/ISI. This work was supported by the NSF via the EIEIO and InternetMap projects.