Tag: outage detection

new conference paper “Quantifying Differences Between Batch and Streaming Detection of Internet Outages” in TMA 2025

Post author By elstutz
Post date 2025-05-26

The paper “Quantifying Differences Between Batch and Streaming Detection of Internet Outages” will appear in the 2025 Conference on Network Traffic Measurement and Analysis (TMA) June 10-13, 2025 in Copenhagen, Denmark. The batch and streaming datasets are available for download.

Visual representation of outages from 2021-03-01T22:00Z to 2021-03-03T20:00Z from batch and streaming datasets (Figure 3 from [Stutz23a])

From the paper’s abstract:

A number of different systems today detect outages
in the IPv4 Internet, often using active probing and algorithms
based on Trinocular’s Bayesian inference. Outage detection
methods have evolved, both to provide results in near-real-time,
and adding algorithms to account for important but less common
cases that might otherwise be misinterpreted. We compare two
implementations of active outage detection to see how choices
to optimize for near-real-time results with streaming compare
to designs that use long-term information to maximize accuracy
using batch processing. Examining 8 days of data, starting on
2021-02-26, we show that the two similar systems agree most of
the time, more than 84%. We show that only 0.2% of the time the
algorithms disagree, and 15% of the time only one reports. We
show these differences occur due to streaming’s requirement for
rapid decisions, precluding algorithms that consider long-term
data (days or weeks). These results are important to understand
the trade-offs that occur when balancing timely results with
accuracy. Beyond the two systems we compare, our results
suggest the role that algorithmic differences can have in similar
but different systems, such as the several implementations of
Trinocular-like active probing today.

Live data from Trinocular streams in to our outage website 24×7. The specific data used in this paper is available from our website.

This work is partially supported by the project “CNS Core: Small: Event Identification and Evaluation of Internet Outages (EIEIO)” (CNS-2007106) through the U.S. National Science Foundation, and by an REU supplement to that project. Erica Stutz began this work at Swarthmore College, working remotely for the University of Southern California; her current affiliation is Yale University.

Tags ant, eieio, internetmap, isi, measurement systems, outage detection, papers, pimawat, reu, TMA, Trinocular, usc

Uncategorized

Adam Russell Interviews John Heidemann about Network Research

Post author By johnh
Post date 2025-01-06

As part of the ISI/nsiders podcast, Adam Russell, anthropologist and director of ISI’s AI division is interviewing a number of researchers at ISI.

He recently interviewed John Heidemann about John’s work in networking research about measuring the Internet.

See https://www.isi.edu/isi-insiders-podcast/ for the series, and https://rss.com/podcasts/isi-nsiders/1804707/ for Season 1, Episode 3 (about 20 minutes) for his interview of John Heidemann.

Tags census, internet measurement, interview, isi, outage detection, podcast, usc

Uncategorized

brief Internet outage in Bangladesh

Post author By johnh
Post date 2024-08-05

This morning, from about 2024-08-05t04:50Z (10:50am local time) to t07:40Z, Bangladesh had another very large Internet outage. Fortunately, unlike the outage that began on 2024-07-18, this one cleared up after about three hours. I presume this outage corresponds to the resignation of the prime minster.

We hope for calm for the people of Bangladesh.

Tags Bangladesh, events, Internet outage, isi, outage, outage detection, Trinocular, usc

Uncategorized

new technical report “Reasoning about Internet Connectivity”

Post author By johnh
Post date 2024-07-26

We have released a new technical report: “Reasoning about Internet Connectivity”, available at https://arxiv.org/abs/2407.14427.

From the abstract:

Innovation in the Internet requires a global Internet core to enable
communication between users in ISPs and services in the cloud. Today, this Internet core is challenged by partial reachability: political pressure
threatens fragmentation by nationality, architectural changes such as
carrier-grade NAT make connectivity conditional, and operational problems and commercial disputes make reachability incomplete for months. We assert that partial reachability is a fundamental part of the Internet core. While some systems paper over partial reachability, this paper is the first to provide a conceptual definition of the Internet core
so we can reason about reachability from first principles. Following
the Internet design, our definition is guided by reachability, not
authority. Its corollaries are peninsulas: persistent regions of
partial connectivity; and islands: when networks are partitioned
from the Internet core. We show that the concept of peninsulas and islands can improve existing measurement systems. In one example,
they show that RIPE’s DNSmon suffers misconfiguration and persistent
network problems that are important, but risk obscuring operationally
important connectivity changes because they are 5x to 9.7x larger. Our evaluation also informs policy questions, showing no single
country or organization can unilaterally control the Internet core.

This technical report is joint work of Guillermo Baltra, Tarang Saluja, Yuri Pradkin, John Heidemann done at USC/ISI. This work was supported by the NSF via the EIEIO and InternetMap projects.

Tags Internet topology, isi, islands, measurement systems, outage detection, papers, partial connectivity, peninsulas, technical report, usc

Uncategorized

major Internet outage in Bangladesh

Post author By johnh
Post date 2024-07-18

Since around 2024-07-18t15:00Z (July 18,21:00 local time), Bangladesh has had a major,country-wide Internet outage. As of t17:30Z some regions see 97% unreachability. This country-wide outage seems to be in response to civil unrest and protests.

Here’s the view from Trinocular outage detection as of 17:30Z:

We wish the best for the people of Bangladesh during this unrest.

Update July 19 morning: A day after Bangladesh’s Internet connectivity first went down, it remains nearly completely stopped. Here is our view of Bangladeshi connectivity at 2024-07-19t14:40Z (20:40 local time there):

Update July 19 afternoon: USC/ISI posted an article about the Bangladeshi Internet outage and our work as ISI news, and an new NYT article about the protests.

The AP reports “A statement from the country’s Telecommunication Regulatory Commission said they were unable to ensure service after their data center was attacked Thursday by demonstrators, who set fire to some equipment. The Associated Press was not able to independently verify this.” However, the near-complete outage observed by Trinocular (as seen in the figures above) seems inconsistent with problems at a single datacenter.

Update July 19, 22:28Z: ISOC Pulse has a post about this outage, and reports that “In a press event on 18 July, Bangladesh minister for posts, telecommunications, and information technology, Zunaid Ahmed Palak confirmed that the government had ordered the shutdown. “

Update July 20: The country-wide outage continues.

Update July 21, 17:00Z: Although recent news reports suggest some government response to protests, the near-complete country-wide Internet outage continues.

Update July 22, 23:00Z: Another day with no externally visible change–all of Bangladesh remains inaccessible from outside.

Update July 23, 18:00Z: Beginning around 13:00Z (which 19:00 in Bangladesh), we see the first signs of Bangaldeshi networks coming back on-line! The figure below is as of 16:26Z and shows about half of the national networks reachable from outside the country.

To add about the root cause, the Deccan Herald published an article from Reuters quoting Zunaid Ahmed Palak, junior information technology minister, as saying to reporters: “Mobile internet has been temporarily suspended due to various rumors and the unstable situation created…. on social media” on July 18. Today, Reuters quoted Palak as saying that “broadband internet would be restored by Tuesday night but [he] did not comment on mobile internet”. This statement is consistent with the country-wide outage we observed, and the prior statement suggests the outage was a request of the government.

Update July 24, 13:00Z (19:00 in Bangladesh): It looks like nearly all Bangladeshi networks are now back online.

Update July 25: The July 25 episode of The Briefing, an Australian news podcast, discussed the Bangladeshi outage and its impact, interviewing us about what we saw.

Tags Bangladesh, events, Internet outage, isi, outage, outage detection, Trinocular, usc

Uncategorized

Hurricane Beryl, as seen through Internet Outages

Post author By johnh
Post date 2024-07-08

Hurricane Beryl made landfall in Texas around 2024-07-08 at 3:17am local time (CDT) (8:17 UTC). We see a fair number of Internet outages in the Huston area, presumably as people lost power due to flooding.

Compared to our view of Hurricane Harvey in 2017 in our blog and web, Beryl looks much less severe–we see fewer areas where most Internet acccess is out (as shown by red circles).

Our most recent data, about 10 hours after landfall (1:33pm local time, or 2024-07-08t18:33Z):

Just before landfall, at 3:17am local time (2024-07-08t08:17Z):

We wish the best for Texas, and for the residents of the Caribbean who experienced Beryl last week.

For current status, please see our near-real-time outage site. Data about this outage will be released at the end of the quarter.

Tags events, isi, outage detection, outages, Trinocular, usc

Uncategorized

large Internet outage in the country Georgia

Post author By johnh
Post date 2024-04-25

Starting on April 21, 2024, we observed a large Internet outage in the country Georgia. More than half the IP blocks in large parts of the country have become unreachable from the U.S., with the problem persisting for several days so far.

The timing of this outage is consistent with a recent resurgence of protests over the Georgian “Law on Transparency of Foreign Influence”.

Tags events, Georgia, Internet outage, isi, outage, outage detection, Trinocular, usc

Uncategorized

large Internet outage in West Africa

Post author By johnh
Post date 2024-03-14

On March 14, 2024, we observed a large outage in several West African countries. In Ivory Coast and Liberia, the outage was quite severe, affecting 93% of the active network blocks:

Serious Internet outages in Ivory Coast, beginning 2024-03-1409:00Z.

Fortunately some locations were able to partially recover from the problems, presumably by routing through different paths:

Lagos, Nigeria showed outages starting at 2024-03-14t08:00Z, with a partial recovery around t15:00Z.

The root cause for these outages is likely a problems in multiple undersea telecommunication cables, as has been reported in the Washington Post and the Guardian, among other places.

Tags events, isi, outage detection, Trinocular, usc

Uncategorized

new conference paper: Ebb and Flow: Implications of ISP Address Dynamics

Post author By johnh
Post date 2024-01-29

Our new paper “Ebb and Flow: Implications of ISP Address Dynamics” will appear at the 2024 Conference on Passive and Active Measurements (PAM 2024).

From the abstract:

Address dynamics are changes in IP address occupation as users come and go, ISPs renumber them for privacy or for routing maintenance. Address dynamics affect address reputation services, IP geolocation, network measurement, and outage detection, with implications of Internet governance, e-commerce, and science. While prior work has identified diurnal trends in address use, we show the effectiveness of Multi-Seasonal-Trend using Loess decomposition to identify both daily and weekly trends. We use ISP-wide dynamics to develop IAS, a new algorithm that is the first to automatically detect ISP maintenance events that move users in the address space. We show that 20% of such events result in /24 IPv4 address blocks that become unused for days or more, and correcting nearly 41k false outages per quarter. Our analysis provides a new understanding about ISP address use: while only about 2.8% of ASes (1,730) are diurnal, some diurnal ASes show more than 20% changes each day. It also shows greater fragmentation in IPv4 address use compared to IPv6.

This paper is a joint work of Guillermo Baltra, Xiao Song, and John Heidemann. Datasets from this paper can be found at https://ant.isi.edu/datasets/outage. This work was supported by NSF (MINCEQ, NSF 2028279; EIEIO CNS-2007106.

Tags address, ant, eieio, Internet address usage, isi, measurement systems, minceq, outage detection, papers, usc

Uncategorized

congratulations to Guillermo Baltra for his PhD

Post author By johnh
Post date 2023-11-03

I would like to congratulate Dr. Guillermo Baltra for defending his PhD at the University of Southern California in August 2023 and completing his doctoral dissertation “Improving network reliability using a formal definition of the Internet core”.

Guillermo Baltra (right) and his thesis advisor.

From the abstract:

After 50 years, the Internet is still defined as “a collection of interconnected networks”. Yet seamless, universal connectivity is challenged in several ways. Political pressure threatens fragmentation due to de-peering; architectural changes such as carrier-grade NAT, the cloud makes connectivity indirect; firewalls impede connectivity; and operational problems and commercial disputes all challenge the idea of a single set of “interconnected networks”. We propose that a new, conceptual definition of the Internet core helps disambiguate questions in analysis of network reliability and address space usage.

We prove this statement through three studies. First, we improve coverage of outage detection by dealing with sparse sections of the Internet, increasing from a nominal 67% responsive /24 blocks coverage to 96% of the responsive Internet. Second, we provide a new definition of the Internet core, and use it to resolve partial reachability ambiguities. We show that the Internet today has peninsulas of persistent, partial connectivity, and that some outages cause islands where the Internet at the site is up, but partitioned from the main Internet. Finally, we use our definition to identify ISP trends, with applications to policy and improving outage detection accuracy. We show how these studies together thoroughly prove our thesis statement. We provide a new conceptual definition of “the Internet core” in our second study about partial reachability. We use our definition in our first and second studies to disambiguate questions about network reliability and in our third study, to ISP address space usage dynamics.

Guillermo’s PhD work was supported by NSF grants CNS-1806785, CNS-2007106 and NSF-2028279 and DH S&T Cyber Security Division contract 70RSAT18CB0000014 and a DHS contract administred by AFRL as contract FA8750-18-2-0280, to USC Viterbi, the Armada de Chile, and the Agencia Nacional de Investigación y Desarrollo de Chile (ANID).

Please see his individual publications for what data is available from his research; his results are also in use in ongoing Trinocular outage detection datasets.

Tags datasets, dhs, dissertation, divoice, duoi, eieio, iiovadr, isi, lacanic, measurement systems, nsf, outage detection, phd, usc