new paper “Improving Coverage of Internet Outage Detection in Sparse Blocks”

We will publish a new paper “Improving Coverage of Internet Outage Detection in Sparse Blocks” by Guillermo Baltra and John Heidemann in the Passive and Active Measurement Conference (PAM 2020) in Eugene, Oregon, USA, on March 30, 2020.

From the abstract:

There is a growing interest in carefully observing the reliability of the Internet’s edge. Outage information can inform our understanding of Internet reliability and planning, and it can help guide operations. Active outage detection methods provide results for more than 3M blocks, and passive methods more than 2M, but both are challenged by sparse blocks where few addresses respond or send traffic. We propose a new Full Block Scanning (FBS) algorithm to improve coverage for active scanning by providing reliable results for sparse blocks by gathering more information before making a decision. FBS identifies sparse blocks and takes additional time before making decisions about their outages, thereby addressing previous concerns about false outages while preserving strict limits on probe rates. We show that FBS can improve coverage by correcting 1.2M blocks that would otherwise be too sparse to correctly report, and potentially adding 1.7M additional blocks. FBS can be applied retroactively to existing datasets to improve prior coverage and accuracy.

This paper defines two algorithms: Full Block Scanning (FBS), to address false outages seen in active measurements of sparse blocks, and Lone Address Block Recovery (LABR), to handle blocks with one or two responsive addresses. We show that these algorithms increase coverage, from a nominal 67% (and as low as 53% after filtering) of responsive blocks before to 5.7M blocks, 96% of responsive blocks.
Papers Publications

new conference paper “Detecting ICMP Rate Limiting in the Internet” in PAM 2018

We have published a new conference “Detecting ICMP Rate Limiting in the Internet” in PAM 2018 (the Passive and Active Measurement Conference) in Berlin, Germany.

Figure 4 from [Guo18a] Confirming a block is rate limited with additional probing
Figure 4 from [Guo18a] confirming a bock is rate limited, comparing experimental results with models of rate-limited and non-rate-limited behavior.
From the abstract of our conference paper:

Comparing model and experimental effects of rate limiting (Figure 4 from [Guo18a] )
ICMP active probing is the center of many network measurements. Rate limiting to ICMP traffic, if undetected, could distort measurements and create false conclusions. To settle this concern, we look systematically for ICMP rate limiting in the Internet. We create FADER, a new algorithm that can identify rate limiting from user-side traces with minimal new measurement traffic. We validate the accuracy of FADER with many different network configurations in testbed experiments and show that it almost always detects rate limiting. With this confidence, we apply our algorithm to a random sample of the whole Internet, showing that rate limiting exists but that for slow probing rates, rate-limiting is very rare. For our random sample of 40,493 /24 blocks (about 2% of the responsive space), we confirm 6 blocks (0.02%!) see rate limiting at 0.39 packets/s per block. We look at higher rates in public datasets and suggest that fall-off in responses as rates approach 1 packet/s per /24 block is consistent with rate limiting. We also show that even very slow probing (0.0001 packet/s) can encounter rate limiting of NACKs that are concentrated at a single router near the prober.

Datasets we used in this paper are all public. ISI Internet Census and Survey data (including it71w, it70w, it56j, it57j and it58j census and survey) are available at ZMap 50-second experiments data are from their WOOT 14 paper and can be obtained from ZMap authors upon request.

This conference report is joint work of Hang Guo and  John Heidemann from USC/ISI.

Papers Publications

new conference paper “The Need for End-to-End Evaluation of Cloud Availability” in PAM 2014

The paper “The Need for End-to-End Evaluation of Cloud Availability” was published by PAM 2014 in Marina del Rey, CA (available at

From the abstract:cloud_availability_blog

People’s computing lives are moving into the cloud, making understanding cloud availability increasingly critical. Prior studies of Internet outages have used ICMP-based pings and traceroutes. While these studies can detect network availability, we show that they can be inaccurate at estimating cloud availability. Without care, ICMP probes can underestimate availability because ICMP is not as robust as application-level measurements such as HTTP. They can overestimate availability if they measure reachability of the cloud’s edge, missing failures in the cloud’s back-end. We develop methodologies sensitive to five “nines” of reliability, and then we compare ICMP and end-to-end measurements for both cloud VM and storage services. We show case studies where one fails and the other succeeds, and our results highlight the importance of application-level retries to reach high precision. When possible, we recommend end-to-end measurement with application-level protocols to evaluate the availability of cloud services.

Citation: Zi Hu, Liang Zhu, Calvin Ardi, Ethan Katz-Bassett, Harsha Madhyastha, John Heidemann, Minlan Yu. The Need for End-to-End Evaluation of Cloud Availability. Passive and Active Measurements Conference (PAM). Los Angeles, CA, USA, March 2014.