Categories

new conference paper “When the Dike Breaks: Dissecting DNS Defenses During DDoS” at ACM IMC 2018

We have published a new paper “When the Dike Breaks: Dissecting DNS Defenses During DDoS” in the ACM Internet Measurements Conference (IMC 2018) in Boston, Mass., USA.

From the abstract:

The Internet’s Domain Name System (DNS) is a frequent target of Distributed Denial-of-Service (DDoS) attacks, but such attacks have had very different outcomes—some attacks have disabled major public websites, while the external effects of other attacks have been minimal. While on one hand the DNS protocol is relatively simple, the \emph{system} has many moving parts, with multiple levels of caching and retries and replicated servers. This paper uses controlled experiments to examine how these mechanisms affect DNS resilience and latency, exploring both the client side’s DNS \emph{user experience}, and server-side traffic. We find that, for about 30\% of clients, caching is not effective. However, when caches are full they allow about half of clients to ride out server outages that last less than cache lifetimes, Caching and retries together allow up to half of the clients to tolerate DDoS attacks longer than cache lifetimes, with 90\% query loss, and almost all clients to tolerate attacks resulting in 50\% packet loss. While clients may get service during an attack, tail-latency increases for clients. For servers, retries during DDoS attacks increase normal traffic up to $8\times$. Our findings about caching and retries help explain why users see service outages from some real-world DDoS events, but minimal visible effects from others.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/dns/#Moura18b_data.

Categories

new technical report “LDplayer: DNS Experimentation at Scale”

We released a new technical report “LDplayer: DNS Experimentation at Scale”, ISI-TR-722, available at https://www.isi.edu/publications/trpublic/pdfs/ISI-TR-722.pdf.

From the abstract:

DNS has evolved over the last 20 years, improving in security and privacy and broadening the kinds of applications it supports. However, this evolution has been slowed by the large installed base with a wide range of implementations that are slow to change. Changes need to be carefully planned, and their impact is difficult to model due to DNS optimizations, caching, and distributed operation. We suggest that experimentation at scale is needed to evaluate changes and speed DNS evolution. This paper presents LDplayer, a configurable, general-purpose DNS testbed that enables DNS experiments to scale in several dimensions: many zones, multiple levels of DNS hierarchy, high query rates, and diverse query sources. LDplayer provides high fidelity experiments while meeting these requirements through its distributed DNS query replay system, methods to rebuild the relevant DNS hierarchy from traces, and efficient emulation of this hierarchy of limited hardware. We show that a single DNS server can correctly emulate multiple independent levels of the DNS hierarchy while providing correct responses as if they were independent. We validate that our system can replay a DNS root traffic with tiny error (+/- 8ms quartiles in query timing and +/- 0.1% difference in query rate). We show that our system can replay queries at 87k queries/s, more than twice of a normal DNS Root traffic rate, maxing out one CPU core used by our customized DNS traffic generator. LDplayer’s trace replay has the unique ability to evaluate important design questions with confidence that we capture the interplay of caching, timeouts, and resource constraints. As an example, we can demonstrate the memory requirements of a DNS root server with all traffic running over TCP, and we identified performance discontinuities in latency as a function of client RTT.

Software developed in this paper is available at https://ant.isi.edu/software/ldplayer/.

Categories

new conference paper “Broad and Load-aware Anycast Mapping with Verfploeter” in IMC 2017

The paper “Broad and Load-aware Anycast Mapping with Verfploeter” will appear in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

From the abstract:

IP anycast provides DNS operators and CDNs with automatic failover and reduced latency by breaking the Internet into catchments, each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as anycast sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs), but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M passive VPs, 430x the 9k physical VPs in RIPE Atlas, providing coverage of the vast majority of networks around the globe. We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast deployment for B-Root, and we also report its use of a nine-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.

The work in this paper was joint work by Wouter B. de Vries, Ricardo de O. Schmidt (Univ. of Twente), Wes Hardaker, John Heidemann (USC/ISI), Pieter-Tjerk de Boer and Aiko Pras (Univ. of Twente). The datasets used in the paper are available at https://ant.isi.edu/datasets/anycast/index.html#verfploeter.

Categories

Evaluation of Hurricane Harvey’s Effects on the Internet’s Edge

On August 25, 2017 Hurricane Harvey made landfall in south Texas, causing widespread property damage, displacing more than 30,000 people, and costing more than 45 lives (as of 2017-09-01).

We sympathize with those were hurt by this disaster, and hope for swift recovery for the region.

We recently examined the effects of Hurricane Harvey on the area using Trinocular, our internet outage detection system.  Two key results:

We see that landfall was followed by widespread Internet outages in the Corpus Christi area, with 40% or more home networks dropping off the Internet.

We see that over the following days, network outages grew in the Houston area, with many networks dropping off the Internet. However, the fraction of networks lost in Houston was much smaller than in the Corpus Christi area.

More details are on our Hurricane Harvey web page.  We will update that page as we get more data in.

The dataset including Hurricane Harvey will be internet_outage_adaptive_a29all-20170702 and will be released in October 2017. Until the full data is released, we have a preliminary dataset through August 2017 available on request.

Categories

new technical report “LDplayer: DNS Experimentation at Scale (abstract with poster)”

We released a new technical report “LDplayer: DNS Experimentation at Scale (abstract with poster)”, ISI-TR-721, available at https://www.isi.edu/publications/trpublic/pdfs/ISI-TR-721.pdf.

The poster abstract and poster (included as part of the technical report) appeared at the poster session at the SIGCOMM 2017 in August 2017 in Los Angeles, CA, USA.

From the abstract:

In the last 20 years the core of the Domain Name System (DNS) has improved in security and privacy, and DNS use broadened from name-to-address mapping to a critical roles in service discovery and anti-spam. However, protocol evolution and expansion of use has been slow because advances must consider a huge and diverse installed base. We suggest that experimentation at scale can fill this gap. To meet the need for experimentation at scale, this paper presents LDplayer, a configurable, general-purpose DNS testbed. LDplayer enables DNS experiments to scale in several dimensions: many zones, multiple levels of DNS hierarchy, high query rates, and diverse query sources. To meet these requirements while providing high fidelity experiments, LDplayer includes a distributed DNS query replay system and methods to rebuild the relevant DNS hierarchy from traces. We show that a single DNS server can correctly emulate multiple independent levels of the DNS hierarchy while providing correct responses as if they were independent. We show the importance of our system to evaluate pressing DNS design questions, using it to evaluate changes in DNSSEC key size.

Categories

We have released a new technical report “Verfploeter: Broad and Load-Aware Anycast Mapping”,by Wouter B. de Vries, Ricardo de O. Schmidt, Wes Haraker, John Heidemann, Pieter-Tjerk de Boer, and Aiko Pras as an ISI technical report ISI-TR-717.

From the abstract:

IP anycast provides DNS operators and CDNs with automatic fail-over and reduced latency by breaking the Internet into catchments, each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs), but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M virtual VPs, 430x the 9k physical VPs in RIPE Atlas, providing coverage of the vast majority of networks around the globe.  We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast for B-Root, and we also report its use of a 9-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.

All datasets used in this paper (but one) are available at https://ant.isi.edu/datasets/anycast/index.html#verfploeter .

Categories

new technical report “Detecting ICMP Rate Limiting in the Internet”

We have released a new technical report “Detecting ICMP Rate Limiting in the Internet” as an ISI technical report ISI-TR-717.

From the abstract of our technical report:

Active probing with ICMP is the center of many network measurements, with tools like ping, traceroute, and their derivatives used to map topologies and as a precursor for security scanning. However, rate limiting of ICMP traffic has long been a concern, since undetected rate limiting to ICMP could distort measurements, silently creating false conclusions. To settle this concern, we look systematically for ICMP rate limiting in the Internet. We develop a model for how rate limiting affects probing, validate it through controlled testbed experiments, and create FADER, a new algorithm that can identify rate limiting from user-side traces with minimal requirements for new measurement traffic. We validate the accuracy of FADER with many different network configurations in testbed experiments and show that it almost always detects rate limiting. Accuracy is perfect when measurement probing ranges from 0 to 60 times the rate limit, and almost perfect (95%) with up to 20% packet loss. The worst case for detection is when probing is very fast and blocks are very sparse, but even there accuracy remains good (measurements 60 times the rate limit of a 10% responsive block is correct 65% of the time). With this confidence, we apply our algorithm to a random sample of whole Internet, showing that rate limiting exists
but that for slow probing rates, rate-limiting is very, very rare. For our random sample of 40,493 /24 blocks (about 2\% of the responsive space), we confirm 6 blocks (0.02%!) see rate limiting
at 0.39 packets/s per block. We look at higher rates in public datasets
and suggest that fall-off in responses as rates approach 1 packet/s per /24 block (14M packets/s from the prober to the whole Internet),
is consistent with rate limiting. We also show that even very slow probing (0.0001 packet/s) can encounter rate limiting of NACKs that are concentrated at a single router near the prober.

Datasets we used in this paper are all public. ISI Internet Census and Survey data (including it71w, it70w, it56j, it57j and it58j census and survey) are available at https://ant.isi.edu/datasets/index.html. ZMap 50-second experiments data are from their WOOT 14 paper and can be obtained from ZMap authors upon request.

This technical report is joint work of Hang Guo and  John Heidemann from USC/ISI.

Categories

new conference paper “Does Anycast Hang up on You?” in TMA 2017

The paper “Does Anycast hang up on you?” will appear in the 2017 Conference on Network Traffic Measurement and Analysis (TMA) July 21-23, 2017 in Dublin, Ireland.

From the abstract:

Anycast-based services today are widely used commercially, with several major providers serving thousands of important websites. However, to our knowledge, there has been only limited study of how often anycast fails because routing changes interrupt connections between users and their current anycast site. While the commercial success of anycast CDNs means anycast usually work well, do some users end up shut out of anycast? In this paper we examine data from more than 9000 geographically distributed vantage points (VPs) to 11 anycast services to evaluate this question. Our contribution is the analysis of this data to provide the first quantification of this problem, and to explore where and why it occurs. We see that about 1\% of VPs are anycast unstable, reaching a different anycast site frequently (sometimes every query). Flips back and forth between two sites in 10 seconds are observed in selected experiments for given service and VPs. Moreover, we show that anycast instability is persistent for some VPs—a few VPs never see a stable connections to certain anycast services during a week or even longer. The vast majority of VPs only saw unstable routing towards one or two services instead of instability with all services, suggesting the cause of the instability lies somewhere in the path to the anycast sites. Finally, we point out that for highly-unstable VPs, their probability to hit a given site is constant, which means the flipping are happening at a fine granularity—per packet level, suggesting load balancing might be the cause to anycast routing flipping. Our findings confirm the common wisdom that anycast almost always works well, but provide evidence that a small number of locations in the Internet where specific anycast services are never stable.

This paper is joint work of Lan Wei and John Heidemann.  A pre-print of paper is at http://ant.isi.edu/~johnh/PAPERS/Wei17b.pdf, and the datasets from the paper are at https://ant.isi.edu/datasets/anycast/index.html#stability.

Categories

best paper award at PAM 2017

Congratulations to Ricardo de Oliveira Schmidt (U. Twente), John Heidemann (USC/ISI), and Jan Harm Kuipers (U. Twente) for the award of  best paper at the Conference on Passive and Active Measurement (PAM) 2017 to their paper “Anycast Latency: How Many Sites Are Enough?”.

See our prior blog post for more information about the paper and its data, and the U. Twente blog post about the paper and the SIDN Labs blog post about the paper.

Categories

new talk “Collecting and Visualizing Outages Over the Long Haul” at the AIMS Workshop 2017

John Heidemann gave the talk “Collecting and Visualizing Outages Over the Long Haul” at CAIDA’s Active Internet Measurement (AIMS) Workshop in San Diego, California, USA on March 2, 2017.  Slides are available at http://www.isi.edu/~johnh/PAPERS/Heidemann17b.pdf.
From the abstract:

We have been collecting data about outages in the Internet since Oct. 2014. Our outage detection system, Trinocular, uses active probing from four sites to study about 4 million /24 IPv4 address blocks. Long-duration measurements bring challenges that don’t occur in short observations. Most importantly, our target (“the Internet”) changes as we measure it, as new blocks come on-line, old blocks are reused in different ways, and ISPs observe and sometimes block our traffic. Our measurement platform also sees occasional hardware failures. Visualization can assist detection of these problems, allowing human perception to detect changes in data collection that have not previously been anticipated. This talk will discuss the challenges of long-term outage measurement and describe our new algorithm that scales to support clustering of 4M blocks and 3 months of observations for visualization.
Our visualization is joint work with Yuri Pradkin, and analysis of our long-term outages includes work with Abdulla Alwabel.

This talk draws on work from [Alwabel15a].  Data from this talk is available at https://ant.isi.edu/datasets/outage/, and visualizations can be found at https://ant.isi.edu/outage/browse/.