Categories
Papers Publications

new conference paper “Quantifying Differences Between Batch and Streaming Detection of Internet Outages” in TMA 2025

The paper “Quantifying Differences Between Batch and Streaming Detection of Internet Outages” will appear in the 2025 Conference on Network Traffic Measurement and Analysis (TMA) June 10-13, 2025 in Copenhagen, Denmark. The batch and streaming datasets are available for download.

Visual representation of outages from 2021-03-01T22:00Z to 2021-03-03T20:00Z from batch and streaming datasets (Figure 3 from [Stutz23a])

From the paper’s abstract:

A number of different systems today detect outages
in the IPv4 Internet, often using active probing and algorithms
based on Trinocular’s Bayesian inference. Outage detection
methods have evolved, both to provide results in near-real-time,
and adding algorithms to account for important but less common
cases that might otherwise be misinterpreted. We compare two
implementations of active outage detection to see how choices
to optimize for near-real-time results with streaming compare
to designs that use long-term information to maximize accuracy
using batch processing. Examining 8 days of data, starting on
2021-02-26, we show that the two similar systems agree most of
the time, more than 84%. We show that only 0.2% of the time the
algorithms disagree, and 15% of the time only one reports. We
show these differences occur due to streaming’s requirement for
rapid decisions, precluding algorithms that consider long-term
data (days or weeks). These results are important to understand
the trade-offs that occur when balancing timely results with
accuracy. Beyond the two systems we compare, our results
suggest the role that algorithmic differences can have in similar
but different systems, such as the several implementations of
Trinocular-like active probing today.

Live data from Trinocular streams in to our outage website 24×7. The specific data used in this paper is available from our website.

This work is partially supported by the project “CNS Core: Small: Event Identification and Evaluation of Internet Outages (EIEIO)” (CNS-2007106) through the U.S. National Science Foundation, and by an REU supplement to that project. Erica Stutz began this work at Swarthmore College, working remotely for the University of Southern California; her current affiliation is Yale University.

Categories
Papers Publications

New paper “Bidirectional Anycast/Unicast Probing (BAUP): Optimizing CDN Anycast” at IFIP TMA 2020

We published a new paper “Bidirectional Anycast/Unicast Probing (BAUP): Optimizing CDN Anycast” by Lan Wei (University of Southern California/ ISI), Marcel Flores (Verizon Digital Media Services), Harkeerat Bedi (Verizon Digital Media Services), John Heidemann (University of Southern California/ ISI) at Network Traffic Measurement and Analysis Conference 2020.

From the abstract:

IP anycast is widely used today in Content Delivery Networks (CDNs) and for Domain Name System (DNS) to provide efficient service to clients from multiple physical points-of-presence (PoPs). Anycast depends on BGP routing to map users to PoPs, so anycast efficiency depends on both the CDN operator and the routing policies of other ISPs. Detecting and diagnosing
inefficiency is challenging in this distributed environment. We propose Bidirectional Anycast/Unicast Probing (BAUP), a new approach that detects anycast routing problems by comparing anycast and unicast latencies. BAUP measures latency to help us identify problems experienced by clients, triggering traceroutes to localize the cause and suggest opportunities for improvement. Evaluating BAUP on a large, commercial CDN, we show that problems happens to 1.59% of observers, and we find multiple opportunities to improve service. Prompted by our work, the CDN changed peering policy and was able to significantly reduce latency, cutting median latency in half (40 ms to 16 ms) for regions with more than 100k users.

The data from this paper is publicly available from RIPE Atlas, please see paper reference for measurement IDs.

Categories
Papers Publications

new conference paper “Does Anycast Hang up on You?” in TMA 2017

The paper “Does Anycast hang up on you?” will appear in the 2017 Conference on Network Traffic Measurement and Analysis (TMA) July 21-23, 2017 in Dublin, Ireland.

In each anycast-based DNS root service, there are about 1% VPs see a route flip happens every one or two observation during a week with an observation interval as 4 minutes. (Figure 2 from [Wei17b]).
From the abstract:

Anycast-based services today are widely used commercially, with several major providers serving thousands of important websites. However, to our knowledge, there has been only limited study of how often anycast fails because routing changes interrupt connections between users and their current anycast site. While the commercial success of anycast CDNs means anycast usually work well, do some users end up shut out of anycast? In this paper we examine data from more than 9000 geographically distributed vantage points (VPs) to 11 anycast services to evaluate this question. Our contribution is the analysis of this data to provide the first quantification of this problem, and to explore where and why it occurs. We see that about 1\% of VPs are anycast unstable, reaching a different anycast site frequently (sometimes every query). Flips back and forth between two sites in 10 seconds are observed in selected experiments for given service and VPs. Moreover, we show that anycast instability is persistent for some VPs—a few VPs never see a stable connections to certain anycast services during a week or even longer. The vast majority of VPs only saw unstable routing towards one or two services instead of instability with all services, suggesting the cause of the instability lies somewhere in the path to the anycast sites. Finally, we point out that for highly-unstable VPs, their probability to hit a given site is constant, which means the flipping are happening at a fine granularity—per packet level, suggesting load balancing might be the cause to anycast routing flipping. Our findings confirm the common wisdom that anycast almost always works well, but provide evidence that a small number of locations in the Internet where specific anycast services are never stable.

This paper is joint work of Lan Wei and John Heidemann.  A pre-print of paper is at http://ant.isi.edu/~johnh/PAPERS/Wei17b.pdf, and the datasets from the paper are at https://ant.isi.edu/datasets/anycast/index.html#stability.

Categories
Papers Publications

new conference paper “Do You See Me Now? Sparsity in Passive Observations of Address Liveness” in TMA 2017

The paper “Do You See Me Now? Sparsity in Passive Observations of Address Liveness” will appear in the 2017 Conference on Network Traffic Measurement and Analyais (TMA) July 21-23, 2017 in Dublin, Ireland.   The datasets from the paper that we can make public will be at https://ant.isi.edu/datasets/sparsity/.

Visibility of addresses and blocks from possible /24 virtual monitors (Figure 2 from [Mirkovic17a])
From the abstract of the paper:

Accurate information about address and block usage in the Internet has many applications in planning address allocation, topology studies, and simulations. Prior studies used active probing, sometimes augmented with passive observation, to study macroscopic phenomena, such as the overall usage of the IPv4 address space. This paper instead studies the completeness of passive sources: how well they can observe microscopic phenomena such as address usage within a given network. We define sparsity as the limitation of a given monitor to see a target, and we quantify the effects of interest, temporal, and coverage sparsity. To study sparsity, we introduce inverted analysis, a novel approach that uses complete passive observations of a few end networks (three campus networks in our case) to infer what of these networks would be seen by millions of virtual monitors near their traffic’s destinations. Unsurprisingly, we find that monitors near popular content see many more targets and that visibility is strongly influenced by bipartite traffic between clients and servers. We are the first to quantify these effects and show their implications for the study of Internet liveness from passive observations. We find that visibility is heavy-tailed, with only 0.5% monitors seeing more than 10\% of our targets’ addresses, and is most affected by interest sparsity over temporal and coverage sparsity. Visibility is also strongly bipartite. Monitors of a different class than a target (e.g., a server monitor observing a client target) outperform monitors of the same class as a target in 82-99% of cases in our datasets. Finally, we find that adding active probing to passive observations greatly improves visibility of both server and client target addresses, but is not critical for visibility of target blocks. Our findings are valuable to understand limitations of existing measurement studies, and to develop methods to maximize microscopic completeness in future studies.

Categories
Papers Publications

new workshop paper “Assessing Affinity Between Users and CDN Sites” in TMA 2015

The paper “Assessing Affinity Between Users and CDN Sites” (available at http://www.isi.edu/~xunfan/research/Fan15a.pdf) will appear at the Traffic Monitoring and Analysis Workshop in April 2015 in Barcelona, Spain.

From the abstract:

count_cid_per_clientLarge web services employ CDNs to improve user performance. CDNs improve performance by serving users from nearby FrontEnd (FE) Clusters. They also spread users across FE Clusters when one is overloaded or unavailable and others have unused capacity. Our paper is the first to study the dynamics of the user-to-FE Cluster mapping for Google and Akamai from a large range of client prefixes. We measure how 32,000 prefixes associate with FE Clusters in their CDNs every 15 minutes for more than a month. We study geographic and latency effects of mapping changes, showing that 50–70% of prefixes switch between FE Clusters that are very distant from each other (more than 1,000 km), and that these shifts sometimes (28–40% of the time) result in large latency shifts (100 ms or more). Most prefixes see large latencies only briefly, but a few (2–5%) see high latency much of the time. We also find that many prefixes are directed to several countries over the course of a month, complicating questions of jurisdiction.

Citation: Xun Fan, Ethan Katz-Bassett and John Heidemann.Assessing Affinity Between Users and CDN Sites. To appear in Traffic Monitoring and Analysis Workshop. Barcelona, Spain. April, 2015.

All data in this paper is available to researchers at no cost on request. Please see our CDN affinity dataset webpage.

This research is partially sponsored by the Department of Homeland Security (DHS) Science and Technology Directorate, HSARPA, Cyber Security Division, BAA 11-01-RIKA and Air Force Re-search Laboratory, Information Directorate under agreement number FA8750-12-2-0344, NSF CNS-1351100, and via SPAWAR Systems Center Pacific under Contract No. N66001-13-C-3001. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwith-standing any copyright notation thereon. The views contained herein are those of the authors and
do not necessarily represent those of DHS or the U.S. Government.