Tag: measurement systems

new conference paper “Quantifying Differences Between Batch and Streaming Detection of Internet Outages” in TMA 2025

Post author By elstutz
Post date 2025-05-26

The paper “Quantifying Differences Between Batch and Streaming Detection of Internet Outages” will appear in the 2025 Conference on Network Traffic Measurement and Analysis (TMA) June 10-13, 2025 in Copenhagen, Denmark. The batch and streaming datasets are available for download.

Visual representation of outages from 2021-03-01T22:00Z to 2021-03-03T20:00Z from batch and streaming datasets (Figure 3 from [Stutz23a])

From the paper’s abstract:

A number of different systems today detect outages
in the IPv4 Internet, often using active probing and algorithms
based on Trinocular’s Bayesian inference. Outage detection
methods have evolved, both to provide results in near-real-time,
and adding algorithms to account for important but less common
cases that might otherwise be misinterpreted. We compare two
implementations of active outage detection to see how choices
to optimize for near-real-time results with streaming compare
to designs that use long-term information to maximize accuracy
using batch processing. Examining 8 days of data, starting on
2021-02-26, we show that the two similar systems agree most of
the time, more than 84%. We show that only 0.2% of the time the
algorithms disagree, and 15% of the time only one reports. We
show these differences occur due to streaming’s requirement for
rapid decisions, precluding algorithms that consider long-term
data (days or weeks). These results are important to understand
the trade-offs that occur when balancing timely results with
accuracy. Beyond the two systems we compare, our results
suggest the role that algorithmic differences can have in similar
but different systems, such as the several implementations of
Trinocular-like active probing today.

Live data from Trinocular streams in to our outage website 24×7. The specific data used in this paper is available from our website.

This work is partially supported by the project “CNS Core: Small: Event Identification and Evaluation of Internet Outages (EIEIO)” (CNS-2007106) through the U.S. National Science Foundation, and by an REU supplement to that project. Erica Stutz began this work at Swarthmore College, working remotely for the University of Southern California; her current affiliation is Yale University.

Tags ant, eieio, internetmap, isi, measurement systems, outage detection, papers, pimawat, reu, TMA, Trinocular, usc

Uncategorized

new technical report “Reasoning about Internet Connectivity”

Post author By johnh
Post date 2024-07-26

We have released a new technical report: “Reasoning about Internet Connectivity”, available at https://arxiv.org/abs/2407.14427.

From the abstract:

Innovation in the Internet requires a global Internet core to enable
communication between users in ISPs and services in the cloud. Today, this Internet core is challenged by partial reachability: political pressure
threatens fragmentation by nationality, architectural changes such as
carrier-grade NAT make connectivity conditional, and operational problems and commercial disputes make reachability incomplete for months. We assert that partial reachability is a fundamental part of the Internet core. While some systems paper over partial reachability, this paper is the first to provide a conceptual definition of the Internet core
so we can reason about reachability from first principles. Following
the Internet design, our definition is guided by reachability, not
authority. Its corollaries are peninsulas: persistent regions of
partial connectivity; and islands: when networks are partitioned
from the Internet core. We show that the concept of peninsulas and islands can improve existing measurement systems. In one example,
they show that RIPE’s DNSmon suffers misconfiguration and persistent
network problems that are important, but risk obscuring operationally
important connectivity changes because they are 5x to 9.7x larger. Our evaluation also informs policy questions, showing no single
country or organization can unilaterally control the Internet core.

This technical report is joint work of Guillermo Baltra, Tarang Saluja, Yuri Pradkin, John Heidemann done at USC/ISI. This work was supported by the NSF via the EIEIO and InternetMap projects.

Tags Internet topology, isi, islands, measurement systems, outage detection, papers, partial connectivity, peninsulas, technical report, usc

Uncategorized

congratulations to ASM Rizvi for his PhD

Post author By johnh
Post date 2024-06-20

I would like to congratulate Dr. ASM Rizvi for defending his PhD at the University of Southern California in June 2024 and completing his doctoral dissertation “Mitigating Attacks that Disrupt Online Services Without Changing Existing Protocols”.

From the dissertation abstract:

ASM Rizvi and John Heidemann, after Rizvi's PhD defense.

Service disruption is undesirable in today’s Internet connectivity due to its impacts on enterprise profits, reputation, and user satisfaction. We describe service disruption as any targeted interruptions caused by malicious parties in the regular user-to-service interactions and functionalities that affect service performance and user experience. In this thesis, we propose new methods that tackle service disruptive attacks using measurement without changing existing Internet protocols. Although our methods do not guarantee defense against all the attack types, our example defense systems prove that our methods generally work to handle diverse attacks. To validate our thesis, we demonstrate defense systems against three disruptive attack types. First, we mitigate Distributed Denial-of-Service (DDoS) attacks that target an online service. Second, we handle brute-force password attacks that target the users of a service. Third, we detect malicious routing detours to secure the path from the users to the server. We provide the first public description of DDoS defenses based on anycast and filtering for the network operators. Then, we show the first moving target defense utilizing IPv6 to defeat password attacks. We also demonstrate how regular observation of latency helps cellular users, carriers, and national agencies to find malicious routing detours. As a supplemental outcome, we show the effectiveness of measurements in finding performance issues and ways to improve using existing protocols. These examples show that our idea applies to different network parts, even if we may not mitigate all the attack types.

Rizvi’s PhD work was supported by the U.S. Department of Homeland Security’s HSARPA Cyber Security Division (HSHQDC-17-R-B0004-TTA.02-0006-I, PAADDOS) in a joint project with the Netherlands Organisation for scientific research (4019020199), the U.S. National Science Foundation (grant NSF OAC-1739034, DDIDD; CNS-2319409, PIMAWAT; CRI-8115780, CLASSNET; CNS-1925737, DIINER ) and U.S. DARPA (HR001120C0157, SABRES), and Akamai.

Most data from his papers is available at no cost from ANT; please see specific publications for details.

Tags 5g, akamai, classnet, darpa, datasets, ddidd, ddos, detour detection, dhs, diiner, dissertation, isi, measurement systems, moving target, nsf, paaddos, phd, sabres, usc

Uncategorized

new conference paper: Ebb and Flow: Implications of ISP Address Dynamics

Post author By johnh
Post date 2024-01-29

Our new paper “Ebb and Flow: Implications of ISP Address Dynamics” will appear at the 2024 Conference on Passive and Active Measurements (PAM 2024).

From the abstract:

Address dynamics are changes in IP address occupation as users come and go, ISPs renumber them for privacy or for routing maintenance. Address dynamics affect address reputation services, IP geolocation, network measurement, and outage detection, with implications of Internet governance, e-commerce, and science. While prior work has identified diurnal trends in address use, we show the effectiveness of Multi-Seasonal-Trend using Loess decomposition to identify both daily and weekly trends. We use ISP-wide dynamics to develop IAS, a new algorithm that is the first to automatically detect ISP maintenance events that move users in the address space. We show that 20% of such events result in /24 IPv4 address blocks that become unused for days or more, and correcting nearly 41k false outages per quarter. Our analysis provides a new understanding about ISP address use: while only about 2.8% of ASes (1,730) are diurnal, some diurnal ASes show more than 20% changes each day. It also shows greater fragmentation in IPv4 address use compared to IPv6.

This paper is a joint work of Guillermo Baltra, Xiao Song, and John Heidemann. Datasets from this paper can be found at https://ant.isi.edu/datasets/outage. This work was supported by NSF (MINCEQ, NSF 2028279; EIEIO CNS-2007106.

Tags address, ant, eieio, Internet address usage, isi, measurement systems, minceq, outage detection, papers, usc

Uncategorized

new journal paper: “Deep Dive into NTP Pool’s Popularity and Mapping”

Post author By johnh
Post date 2024-01-16

Our new paper “Deep Dive into NTP Pool’s Popularity and Mapping” will appear in the SIGMETRICS 2024 conference and concurrently in the ACM Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 8, no. 1, March 2024.

From the abstract:

Time synchronization is of paramount importance on the Internet, with the Network Time Protocol (NTP) serving as the primary synchronization protocol. The NTP Pool, a volunteer-driven initiative launched two decades ago, facilitates connections between clients and NTP servers. Our analysis of root DNS queries reveals that the NTP Pool has consistently been the most popular time service. We further investigate the DNS component (GeoDNS) of the NTP Pool, which is responsible for mapping clients to servers. Our findings indicate that the current algorithm is heavily skewed, leading to the emergence of time monopolies for entire countries. For instance, clients in the US are served by 551 NTP servers, while clients in Cameroon and Nigeria are served by only one and two servers, respectively, out of the 4k+ servers available in the NTP Pool. We examine the underlying assumption behind GeoDNS for these mappings and discover that time servers located far away can still provide accurate clock time information to clients. We have shared our findings with the NTP Pool operators, who acknowledge them and plan to revise their algorithm to enhance security.

This paper is a joint work of

Giovane C. M. Moura^1,2, Marco Davids¹, Caspar Schutijser¹, Christian Hesselman^1,3, John Heidemann^4,5, and Georgios Smaragdakis² with 1: SIDN Labs, 2 Technical University, Delft, 3: the University of Twente, 4: the University of Southern California/Information Sciences Institute, 5: USC/Computer Science Dept. This work was supported by the RIPE NCC (via Atlas), the Root Operators and DNS-OARC (for DITL), SIDN Labs time.nl project, the Twente University Centre for Cyber Security Resarch, NSF projects CNS-2212480, CNS-2319409, the European Research Council ResolutioNet (679158), Duth 6G Future Network Services project, the EU programme Horizon Europe grants SEPTON (101094901), MLSysOps (101092912), and TANGO (101070052).

Tags acm, acm pomacs, ant, inttrafmap, isi, measurement systems, ntp, papers, pimawat, RIPE Atlas, SIDN Labs, sigmetrics, tudelft, usc, utwente

Uncategorized

congratulations to Guillermo Baltra for his PhD

Post author By johnh
Post date 2023-11-03

I would like to congratulate Dr. Guillermo Baltra for defending his PhD at the University of Southern California in August 2023 and completing his doctoral dissertation “Improving network reliability using a formal definition of the Internet core”.

Guillermo Baltra (right) and his thesis advisor.

From the abstract:

After 50 years, the Internet is still defined as “a collection of interconnected networks”. Yet seamless, universal connectivity is challenged in several ways. Political pressure threatens fragmentation due to de-peering; architectural changes such as carrier-grade NAT, the cloud makes connectivity indirect; firewalls impede connectivity; and operational problems and commercial disputes all challenge the idea of a single set of “interconnected networks”. We propose that a new, conceptual definition of the Internet core helps disambiguate questions in analysis of network reliability and address space usage.

We prove this statement through three studies. First, we improve coverage of outage detection by dealing with sparse sections of the Internet, increasing from a nominal 67% responsive /24 blocks coverage to 96% of the responsive Internet. Second, we provide a new definition of the Internet core, and use it to resolve partial reachability ambiguities. We show that the Internet today has peninsulas of persistent, partial connectivity, and that some outages cause islands where the Internet at the site is up, but partitioned from the main Internet. Finally, we use our definition to identify ISP trends, with applications to policy and improving outage detection accuracy. We show how these studies together thoroughly prove our thesis statement. We provide a new conceptual definition of “the Internet core” in our second study about partial reachability. We use our definition in our first and second studies to disambiguate questions about network reliability and in our third study, to ISP address space usage dynamics.

Guillermo’s PhD work was supported by NSF grants CNS-1806785, CNS-2007106 and NSF-2028279 and DH S&T Cyber Security Division contract 70RSAT18CB0000014 and a DHS contract administred by AFRL as contract FA8750-18-2-0280, to USC Viterbi, the Armada de Chile, and the Agencia Nacional de Investigación y Desarrollo de Chile (ANID).

Please see his individual publications for what data is available from his research; his results are also in use in ongoing Trinocular outage detection datasets.

Tags datasets, dhs, dissertation, divoice, duoi, eieio, iiovadr, isi, lacanic, measurement systems, nsf, outage detection, phd, usc

Uncategorized

Large Italian Internet Outage

Post author By johnh
Post date 2023-02-06

Recent news reports (for example, Reuters) state that Telecom Italia had a major outage on Sunday, February 5, 2023.

We see evidence for this outage in our Internet outage detection system.

It looks like there were two relatively brief outages, one at 2023-02-05t10:49Z (11:49 local time in Italy) and a smaller one at 11:33Z (12:33 local time). Our monitoring rounds time to about 11 minutes, so the actual events may have been at slightly different times.

These outages were nation-wide, apparently affecting most of Italy. However, it looks like they “only” affected 20-30% of networks, and not all Italian ISPs. We’re happy they were able to recover so quickly.

An outage at Telecom Italia on 2022-02-05 at 10:49 UTC in our outage detection system.

This event shows the importance of global network monitoring.

Tags isi, measurement systems, outage, outage detection, usc

Uncategorized

new paper “Differences in Monitoring the DNS Root Over IPv4 and IPv6” to appear at the IEEE National Symposium for NSF REU Research in Data Science, Systems, and Security

Post author By johnh
Post date 2022-11-16

On December 15, 2022, Tarang Saluja will present the paper “Differences in Monitoring the DNS Root Over IPv4 and IPv6” (by Tarang Saluja, John Heidemann, and Yuri Pradkin) at the IEEE National Symposium for NSF REU Research in Data Science, Systems, and Security.

From the abstract:

The Domain Name System (DNS) is an essential service for the Internet which maps host names to IP addresses. The DNS Root Sever System operates the top of this namespace. RIPE Atlas observes DNS from more than 11k vantage points (VPs) around the world, reporting the reliability of the DNS Root Server System in DNSmon. DNSmon shows that loss rates for queries to the DNS Root are nearly 10% for IPv6, much higher than the approximately 2% loss seen for IPv4. Although IPv6 is “new,” as an operational protocol available to a third of Internet users, it ought to be just as reliable as IPv4. We examine this difference at a finer granularity by investigating loss at individual VPs. We confirm that specific VPs are the source of this difference and identify two root causes: VP islands with routing problems at the edge which leave them unable to access IPv6 outside their LAN, and VP peninsulas which indicate routing problems in the core of the network. These problems account for most of the loss and nearly all of the difference between IPv4 and IPv6 query loss rates. Islands account for most of the loss (half of IPv4 failures and 5/6ths of IPv6 failures), and we suggest these measurement devices should be filtered out to get a more accurate picture of loss rates. Peninsulas account for the main differences between root identifiers, suggesting routing disagreements root operators need to address. We believe that filtering out both of these known problems provides a better measure of underlying network anomalies and loss and will result in more actionable alerts.

Original data from this paper is available from RIPE Atlas (measurement ids are in the paper). We are publishing new results daily on our website (from the RIPE data).

This work was done while Tarang was on his Summer 2022 undergraduate research internship at USC/ISI, with support from NSF grant 2051101 (PI: Jelena Mirkovich). John Heidemann and Yuri Pradkin’s work is supported by NSF through the EIEIO project (CNS-2007106). We thank Guillermo Baltra for his work on islands and peninsulas, as seen in his arXiv report.

Tags b-root, datasets, DNS, isi, islands, measurement systems, papers, partial connectivity, pensinulas, RIPE Atlas, root, swarthmore, undergraduate, usc

Outages Presentations Publications Uncategorized

new poster “Internet Outage Detection Using Passive Analysis” at ACM IMC 2022

Post author By asmaenayet
Post date 2022-10-12

Asma Enayet will present her poster “Internet Outage Detection Using Passive Analysis” by Asma Enayet and John Heidemann at ACM Internet Measurement Conference, Nice, France from October 25-27th, 2022.

We expect the ACM poster abstract (without the poster) to appear at https://doi.org/10.1145/3517745.3563032 in October 2022.

We are making a report available now with the poster abstract and poster at https://doi.org/10.48550/arXiv.2209.13767 as a pre-print.

From the abstract:

Outages from natural disasters, political events, software or hardware issues, and human error place a huge cost on e-commerce ($66k per minute at Amazon). While several existing systems detect Internet outages, these systems are often too inflexible, with fixed parameters across the whole internet with CUSUM-like change detection. We instead propose a system using passive data, to cover both IPv4 and IPv6, customizing parameters for each block to optimize the performance of our Bayesian inference model. Our poster describes our three contributions: First, we show how customizing parameters allows us often to detect outages that are at both fine timescales (5 minutes) and fine spatial resolutions (/24 IPv4 and /48 IPv6 blocks). Our second contribution is to show that, by tuning parameters differently for different blocks, we can scale back temporal precision to cover more challenging blocks. Finally, we show our approach extends to IPv6 and provides the first reports of IPv6 outages.

**IPv6 Coverage**: our source of passive data (B-Root) is incomplete, but it provides similar coverage in both IPv4 and IPv6.

**IPv6 Outages**: Outage rate for IPv6 (12%) is greater than for IPv4 (5.5%) —IPv6 reliability can improve.

This work was supported by NSF grant CNS-2007106 (EIEIO).

Tags b-root, DNS, eieio, imc, isi, measurement systems, outage detection, passive data, usc

Papers Publications

new symposium paper “Visualizing Internet Measurements of Covid-19 Work-from-Home” at IEEE Symposium on REU Research in Data Science, Systems, and Security

Post author By elstutz
Post date 2021-12-06

We published a new paper “Visualizing Internet Measurements of Covid-19 Work-from-Home” by Erica Stutz (Swarthmore College), Yuri Pradkin, Xiao Song, and John Heidemann (USC/ISI) at the Symposium for REU Research in Data Science, Systems, and Security, co-located with IEEE BigData 2021.

From the abstract:

The Covid-19 pandemic disrupted the world as businesses and schools shifted to work-from-home (WFH), and comprehensive maps have helped visualize how those policies changed over time and in different places. We recently developed algorithms that infer the onset of WFH based on changes in observed Internet usage. Measurements of WFH are important to evaluate how effectively policies are implemented and followed, or to confirm policies in countries with less transparent journalism.This paper describes a web-based visualization system for measurements of Covid-19-induced WFH. We build on a web-based world map, showing a geographic grid of observations about WFH. We extend typical map interaction (zoom and pan, plus animation over time) with two new forms of pop-up information that allow users to drill-down to investigate our underlying data.We use sparklines to show changes over the first 6 months of 2020 for a given location, supporting identification and navigation to hot spots. Alternatively, users can report particular networks (Internet Service Providers) that show WFH on a given day.We show that these tools help us relate our observations to news reports of Covid-19-induced changes and, in some cases, lockdowns due to other causes. Our visualization is publicly available at https://covid.ant.isi.edu, as is our underlying data.

Datasets from this work will be available from our website and can be seen now at https://covid.ant.isi.edu. We thank NSF grants 2028279 and CNS-2007106 for supporting this work.

Tags COVID-19, datasets, drill-down, isi, measurement systems, papers, SQL, swarthmore, Trinocular, usc, visualization, work-from-home