Categories
Papers Publications

new conference paper “A Look at Router Geolocation in Public and Commercial Databases” in IMC 2017

The paper “A Look at Router Geolocation in Public and Commercial Databases” has appeared in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

From the abstract:

Regional breakdown of the geolocation error for the geolocation databases vs. ground truth data.

Internet measurement research frequently needs to map infrastructure components, such as routers, to their physical locations. Although public and commercial geolocation services are often used for this purpose, their accuracy when applied to network infrastructure has not been sufficiently assessed. Prior work focused on evaluating the overall accuracy of geolocation databases, which is dominated by their performance on end-user IP addresses. In this work, we evaluate the reliability of router geolocation in databases. We use a dataset of about 1.64M router interface IP addresses extracted from the CAIDA Ark dataset to examine the country- and city-level coverage and consistency of popular public and commercial geolocation databases. We also create and provide a ground-truth dataset of 16,586 router interface IP addresses and their city-level locations, and use it to evaluate the databases’ accuracy with a regional breakdown analysis. Our results show that the databases are not reliable for geolocating routers and that there is room to improve their country- and city-level accuracy. Based on our results, we present a set of recommendations to researchers concerning the use of geolocation databases to geolocate routers.

The work in this paper was joint work by Manaf Gharaibeh, Anant Shah, Han Zhang, Christos Papadopoulos (Colorado State University), Brad Huffaker (CAIDA / UC San Diego), and Roya Ensafi (University of Michigan). The findings of this work are highlighted in an APNIC blog post “Should we trust the geolocation databases to geolocate routers?”. The ground truth datasets used in the paper are available via IMPACT.

Categories
DNS Papers Publications

new journal paper “Detecting Malicious Activity With DNS Backscatter Over Time” in IEEE/ACM ToN Oct, 2017

The paper “Detecting Malicious Activity With DNS Backscatter Over Time ” appears in EEE/ACM  Transactions on Networking ( Volume: 25, Issue: 5, Oct. 2017 ).

From the abstract:

Network-wide activity is when one computer (the originator) touches many others (the targets). Motives for activity may be benign (mailing lists, CDNs, and research scanning), malicious (spammers and scanners for security vulnerabilities), or perhaps indeterminate (ad trackers). Knowledge of malicious activity may help anticipate attacks, and understanding benign activity may set a baseline or characterize growth. This paper identifies DNS backscatter as a new source of information about network-wide activity. Backscatter is the reverse DNS queries caused when targets or middleboxes automatically look up the domain name of the originator. Queries are visible to the authoritative DNS servers that handle reverse DNS. While the fraction of backscatter they see depends on the server’s location in the DNS hierarchy, we show that activity that touches many targets appear even in sampled observations. We use information about the queriers to classify originator activity using machine learning. Our algorithm has reasonable accuracy and precision (70–80%) as shown by data from three different organizations operating DNS servers at the root or country-level. Using this technique we examine nine months of activity from one authority to identify trends in scanning, identifying bursts corresponding to Heartbleed and broad and continuous scanning of ssh.

This paper furthers our understanding of evolution of malicious network activities from an earlier work that:
(1) Why our machine-learning based classifier (that relies on manually collected labeled data) does not port across physical sites and over time.
(2) Secondly paper recommends how to sustain good learning score over time and provides expected life-time of labeled data.

An excerpt from section III-E (Training Over Time):

Classification (§ III-D) is based on training, yet training accuracy is affected by the evolution of activity—specific examples come and go, and the behavior in each class evolves. Change happens for all classes, but the problem is particularly acute for malicious classes (such as spam) where the adversarial nature of the action forces rapid evolution (see § V).

 

Some datasets used in this paper can be found here:

Categories
Papers Publications

new conference paper “Recursives in the Wild: Engineering Authoritative DNS Servers” in IMC 2017

The paper “Recursives in the Wild: Engineering Authoritative DNS Servers” will appear in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

Recursive DNS server selection of authoritatives, per continent. (Figure 4 from [Mueller17b].)
From the abstract:

In In Internet Domain Name System (DNS), services operate authoritative name servers that individuals query through recursive resolvers. Operators strive to provide reliability by operating multiple name servers (NS), each on a separate IP address, and by using IP anycast to allow NSes to provide service from many physical locations. To meet their goals of minimizing latency and balancing load across NSes and anycast, operators need to know how recursive resolvers select an NS, and how that interacts with their NS deployments. Prior work has shown some recursives search for low latency, while others pick an NS at random or round robin, but did not examine how prevalent each choice was. This paper provides the first analysis of how recursives select between name servers in the wild, and from that we provide guidance to operators how to engineer their name servers to reach their goals. We conclude that all NSes need to be equally strong and therefore we recommend to deploy IP anycast at every single authoritative.

All datasets used in this paper (but one) are available at https://ant.isi.edu/datasets/dns/index.html#recursives .

Categories
Papers Publications

new conference paper “Broad and Load-aware Anycast Mapping with Verfploeter” in IMC 2017

The paper “Broad and Load-aware Anycast Mapping with Verfploeter” will appear in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

From the abstract:

IP anycast provides DNS operators and CDNs with automatic failover and reduced latency by breaking the Internet into catchments, each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as anycast sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs), but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M passive VPs, 430x the 9k physical VPs in RIPE Atlas, providing coverage of the vast majority of networks around the globe. We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast deployment for B-Root, and we also report its use of a nine-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.

Distribution of load across two anycast sites of B-root using Verfploeter.

The work in this paper was joint work by Wouter B. de Vries, Ricardo de O. Schmidt (Univ. of Twente), Wes Hardaker, John Heidemann (USC/ISI), Pieter-Tjerk de Boer and Aiko Pras (Univ. of Twente). The datasets used in the paper are available at https://ant.isi.edu/datasets/anycast/index.html#verfploeter.

Categories
Publications Technical Report

new technical report “LDplayer: DNS Experimentation at Scale (abstract with poster)”

We released a new technical report “LDplayer: DNS Experimentation at Scale (abstract with poster)”, ISI-TR-721, available at https://www.isi.edu/publications/trpublic/pdfs/ISI-TR-721.pdf.

The poster abstract and poster (included as part of the technical report) appeared at the poster session at the SIGCOMM 2017 in August 2017 in Los Angeles, CA, USA.

From the abstract:

In the last 20 years the core of the Domain Name System (DNS) has improved in security and privacy, and DNS use broadened from name-to-address mapping to a critical roles in service discovery and anti-spam. However, protocol evolution and expansion of use has been slow because advances must consider a huge and diverse installed base. We suggest that experimentation at scale can fill this gap. To meet the need for experimentation at scale, this paper presents LDplayer, a configurable, general-purpose DNS testbed. LDplayer enables DNS experiments to scale in several dimensions: many zones, multiple levels of DNS hierarchy, high query rates, and diverse query sources. To meet these requirements while providing high fidelity experiments, LDplayer includes a distributed DNS query replay system and methods to rebuild the relevant DNS hierarchy from traces. We show that a single DNS server can correctly emulate multiple independent levels of the DNS hierarchy while providing correct responses as if they were independent. We show the importance of our system to evaluate pressing DNS design questions, using it to evaluate changes in DNSSEC key size.

Categories
Publications Technical Report

new technical report “Recursives in the Wild: Engineering Authoritative DNS Servers”

We have released a new technical report “Recursives in the Wild: Engineering Authoritative DNS Servers”, by Moritz Müller and Giovane C. M. Moura and
Ricardo de O. Schmidt and John Heidemann as an ISI technical report ISI-TR-720.

Recursive DNS server selection of authoritatives, per continent. (Figure 8 from [Mueller17a].)
From the abstract:

In Internet Domain Name System (DNS), services operate authoritative name servers that individuals query through recursive resolvers. Operators strive to provide reliability by operating multiple name servers (NS), each on a separate IP address, and by using IP anycast to allow NSes to provide service from many physical locations. To meet their goals of minimizing latency and balancing load across NSes and anycast, operators need to know how recursive resolvers select an NS, and how that interacts with their NS deployments. Prior work has shown some recursives search for low latency, while others pick an NS at random or round robin, but did not examine how prevalent each choice was. This paper provides the first analysis of how recursives select between name servers in the wild, and from that we provide guidance to name server operators to reach their goals. We conclude that all NSes need to be equally strong and therefore we recommend to deploy IP anycast at every single authoritative.

All datasets used in this paper (but one) are available at https://ant.isi.edu/datasets/dns/index.html#recursives .

Categories
Publications Technical Report

new technical report “Verfploeter: Broad and Load-Aware Anycast Mapping”

We have released a new technical report “Verfploeter: Broad and Load-Aware Anycast Mapping”,by Wouter B. de Vries, Ricardo de O. Schmidt, Wes Haraker, John Heidemann, Pieter-Tjerk de Boer, and Aiko Pras as an ISI technical report ISI-TR-717.

Verfploeter coverage of B-Root. Circle radiuses are how many /24 blocks in each 2×2 degree region go to B-Root, and colored slices indicate which go to LAX and which to MIA. (Figure 2b from [Vries17a], dataset: SBV-5-15).
From the abstract:

IP anycast provides DNS operators and CDNs with automatic fail-over and reduced latency by breaking the Internet into catchments, each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs), but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M virtual VPs, 430x the 9k physical VPs in RIPE Atlas, providing coverage of the vast majority of networks around the globe.  We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast for B-Root, and we also report its use of a 9-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.

All datasets used in this paper (but one) are available at https://ant.isi.edu/datasets/anycast/index.html#verfploeter .

 

Categories
Publications Technical Report

new technical report “Detecting ICMP Rate Limiting in the Internet”

We have released a new technical report “Detecting ICMP Rate Limiting in the Internet” as an ISI technical report ISI-TR-717.

From the abstract of our technical report:

Comparing model and experimental effects of rate limiting (Figure 2.a from [Guo17a] )

Active probing with ICMP is the center of many network measurements, with tools like ping, traceroute, and their derivatives used to map topologies and as a precursor for security scanning. However, rate limiting of ICMP traffic has long been a concern, since undetected rate limiting to ICMP could distort measurements, silently creating false conclusions. To settle this concern, we look systematically for ICMP rate limiting in the Internet. We develop a model for how rate limiting affects probing, validate it through controlled testbed experiments, and create FADER, a new algorithm that can identify rate limiting from user-side traces with minimal requirements for new measurement traffic. We validate the accuracy of FADER with many different network configurations in testbed experiments and show that it almost always detects rate limiting. Accuracy is perfect when measurement probing ranges from 0 to 60 times the rate limit, and almost perfect (95%) with up to 20% packet loss. The worst case for detection is when probing is very fast and blocks are very sparse, but even there accuracy remains good (measurements 60 times the rate limit of a 10% responsive block is correct 65% of the time). With this confidence, we apply our algorithm to a random sample of whole Internet, showing that rate limiting exists
but that for slow probing rates, rate-limiting is very, very rare. For our random sample of 40,493 /24 blocks (about 2\% of the responsive space), we confirm 6 blocks (0.02%!) see rate limiting
at 0.39 packets/s per block. We look at higher rates in public datasets
and suggest that fall-off in responses as rates approach 1 packet/s per /24 block (14M packets/s from the prober to the whole Internet),
is consistent with rate limiting. We also show that even very slow probing (0.0001 packet/s) can encounter rate limiting of NACKs that are concentrated at a single router near the prober.

Datasets we used in this paper are all public. ISI Internet Census and Survey data (including it71w, it70w, it56j, it57j and it58j census and survey) are available at https://ant.isi.edu/datasets/index.html. ZMap 50-second experiments data are from their WOOT 14 paper and can be obtained from ZMap authors upon request.

This technical report is joint work of Hang Guo and  John Heidemann from USC/ISI.

Categories
Papers Publications

new conference paper “Does Anycast Hang up on You?” in TMA 2017

The paper “Does Anycast hang up on you?” will appear in the 2017 Conference on Network Traffic Measurement and Analysis (TMA) July 21-23, 2017 in Dublin, Ireland.

In each anycast-based DNS root service, there are about 1% VPs see a route flip happens every one or two observation during a week with an observation interval as 4 minutes. (Figure 2 from [Wei17b]).
From the abstract:

Anycast-based services today are widely used commercially, with several major providers serving thousands of important websites. However, to our knowledge, there has been only limited study of how often anycast fails because routing changes interrupt connections between users and their current anycast site. While the commercial success of anycast CDNs means anycast usually work well, do some users end up shut out of anycast? In this paper we examine data from more than 9000 geographically distributed vantage points (VPs) to 11 anycast services to evaluate this question. Our contribution is the analysis of this data to provide the first quantification of this problem, and to explore where and why it occurs. We see that about 1\% of VPs are anycast unstable, reaching a different anycast site frequently (sometimes every query). Flips back and forth between two sites in 10 seconds are observed in selected experiments for given service and VPs. Moreover, we show that anycast instability is persistent for some VPs—a few VPs never see a stable connections to certain anycast services during a week or even longer. The vast majority of VPs only saw unstable routing towards one or two services instead of instability with all services, suggesting the cause of the instability lies somewhere in the path to the anycast sites. Finally, we point out that for highly-unstable VPs, their probability to hit a given site is constant, which means the flipping are happening at a fine granularity—per packet level, suggesting load balancing might be the cause to anycast routing flipping. Our findings confirm the common wisdom that anycast almost always works well, but provide evidence that a small number of locations in the Internet where specific anycast services are never stable.

This paper is joint work of Lan Wei and John Heidemann.  A pre-print of paper is at http://ant.isi.edu/~johnh/PAPERS/Wei17b.pdf, and the datasets from the paper are at https://ant.isi.edu/datasets/anycast/index.html#stability.

Categories
Papers Publications

new conference paper “Do You See Me Now? Sparsity in Passive Observations of Address Liveness” in TMA 2017

The paper “Do You See Me Now? Sparsity in Passive Observations of Address Liveness” will appear in the 2017 Conference on Network Traffic Measurement and Analyais (TMA) July 21-23, 2017 in Dublin, Ireland.   The datasets from the paper that we can make public will be at https://ant.isi.edu/datasets/sparsity/.

Visibility of addresses and blocks from possible /24 virtual monitors (Figure 2 from [Mirkovic17a])
From the abstract of the paper:

Accurate information about address and block usage in the Internet has many applications in planning address allocation, topology studies, and simulations. Prior studies used active probing, sometimes augmented with passive observation, to study macroscopic phenomena, such as the overall usage of the IPv4 address space. This paper instead studies the completeness of passive sources: how well they can observe microscopic phenomena such as address usage within a given network. We define sparsity as the limitation of a given monitor to see a target, and we quantify the effects of interest, temporal, and coverage sparsity. To study sparsity, we introduce inverted analysis, a novel approach that uses complete passive observations of a few end networks (three campus networks in our case) to infer what of these networks would be seen by millions of virtual monitors near their traffic’s destinations. Unsurprisingly, we find that monitors near popular content see many more targets and that visibility is strongly influenced by bipartite traffic between clients and servers. We are the first to quantify these effects and show their implications for the study of Internet liveness from passive observations. We find that visibility is heavy-tailed, with only 0.5% monitors seeing more than 10\% of our targets’ addresses, and is most affected by interest sparsity over temporal and coverage sparsity. Visibility is also strongly bipartite. Monitors of a different class than a target (e.g., a server monitor observing a client target) outperform monitors of the same class as a target in 82-99% of cases in our datasets. Finally, we find that adding active probing to passive observations greatly improves visibility of both server and client target addresses, but is not critical for visibility of target blocks. Our findings are valuable to understand limitations of existing measurement studies, and to develop methods to maximize microscopic completeness in future studies.