new technical report “LDplayer: DNS Experimentation at Scale”

We released a new technical report “LDplayer: DNS Experimentation at Scale”, ISI-TR-722, available at https://www.isi.edu/publications/trpublic/pdfs/ISI-TR-722.pdf.

ldplayer_overviewFrom the abstract:

DNS has evolved over the last 20 years, improving in security and privacy and broadening the kinds of applications it supports. However, this evolution has been slowed by the large installed base with a wide range of implementations that are slow to change. Changes need to be carefully planned, and their impact is difficult to model due to DNS optimizations, caching, and distributed operation. We suggest that experimentation at scale is needed to evaluate changes and speed DNS evolution. This paper presents LDplayer, a configurable, general-purpose DNS testbed that enables DNS experiments to scale in several dimensions: many zones, multiple levels of DNS hierarchy, high query rates, and diverse query sources. LDplayer provides high fidelity experiments while meeting these requirements through its distributed DNS query replay system, methods to rebuild the relevant DNS hierarchy from traces, and efficient emulation of this hierarchy of limited hardware. We show that a single DNS server can correctly emulate multiple independent levels of the DNS hierarchy while providing correct responses as if they were independent. We validate that our system can replay a DNS root traffic with tiny error (+/- 8ms quartiles in query timing and +/- 0.1% difference in query rate). We show that our system can replay queries at 87k queries/s, more than twice of a normal DNS Root traffic rate, maxing out one CPU core used by our customized DNS traffic generator. LDplayer’s trace replay has the unique ability to evaluate important design questions with confidence that we capture the interplay of caching, timeouts, and resource constraints. As an example, we can demonstrate the memory requirements of a DNS root server with all traffic running over TCP, and we identified performance discontinuities in latency as a function of client RTT.

Software developed in this paper is available at https://ant.isi.edu/software/ldplayer/.

 

 

Posted in Uncategorized | Tagged , , , , , , , , , , , , , | Leave a comment

new conference paper “A Look at Router Geolocation in Public and Commercial Databases” in IMC 2017

The paper “A Look at Router Geolocation in Public and Commercial Databases” has appeared in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

From the abstract:

Regional breakdown of the geolocation error for the geolocation databases vs. ground truth data.

Internet measurement research frequently needs to map infrastructure components, such as routers, to their physical locations. Although public and commercial geolocation services are often used for this purpose, their accuracy when applied to network infrastructure has not been sufficiently assessed. Prior work focused on evaluating the overall accuracy of geolocation databases, which is dominated by their performance on end-user IP addresses. In this work, we evaluate the reliability of router geolocation in databases. We use a dataset of about 1.64M router interface IP addresses extracted from the CAIDA Ark dataset to examine the country- and city-level coverage and consistency of popular public and commercial geolocation databases. We also create and provide a ground-truth dataset of 16,586 router interface IP addresses and their city-level locations, and use it to evaluate the databases’ accuracy with a regional breakdown analysis. Our results show that the databases are not reliable for geolocating routers and that there is room to improve their country- and city-level accuracy. Based on our results, we present a set of recommendations to researchers concerning the use of geolocation databases to geolocate routers.

The work in this paper was joint work by Manaf Gharaibeh, Anant Shah, Han Zhang, Christos Papadopoulos (Colorado State University), Brad Huffaker (CAIDA / UC San Diego), and Roya Ensafi (University of Michigan). The findings of this work are highlighted in an APNIC blog post “Should we trust the geolocation databases to geolocate routers?”. The ground truth datasets used in the paper are available via IMPACT.

Posted in Uncategorized | Tagged , , , , , , , , , , , , | Leave a comment

new talk “LocalRoot: Serve Yourself”

Wes Hardaker gave a talk on his LocalRoot project, allowing recursive resolver operators to keep an up to date cached copy of the root zone data available at all times. The talk was held in Abu Dhabi on November 1, 2017 at the ICANN annual general meeting during the DNSSEC Workshop. Slides and recorded video are available at on the ICANN event page.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

new talk “Verfploeter: Broad and Load-Aware Anycast Mapping”

Wes Hardaker gave the talk “Verfploeter: Broad and Load-Aware Anycast Mapping” at DNS-OARC in San Jose, California, USA on September 29, 2017.  Slides are available at on the event page.

From the abstract:

IP anycast provides DNS operators and CDNs with automatic fail-over and reduced latency by breaking the Internet into catchments,each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs),but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M virtual VPs, 430 times the 9k physical VPs in RIPE Atlas,providing coverage of the vast majority of networks around the globe. We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast for B-Root, and we also report its use of a nine-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.

 

A video of the talk is available On YouTube.

Posted in Uncategorized | Tagged , , , , , | Leave a comment

new journal paper “Detecting Malicious Activity With DNS Backscatter Over Time” in IEEE/ACM ToN Oct, 2017

The paper “Detecting Malicious Activity With DNS Backscatter Over Time ” appears in EEE/ACM  Transactions on Networking ( Volume: 25, Issue: 5, Oct. 2017 ).

From the abstract:

Network-wide activity is when one computer (the originator) touches many others (the targets). Motives for activity may be benign (mailing lists, CDNs, and research scanning), malicious (spammers and scanners for security vulnerabilities), or perhaps indeterminate (ad trackers). Knowledge of malicious activity may help anticipate attacks, and understanding benign activity may set a baseline or characterize growth. This paper identifies DNS backscatter as a new source of information about network-wide activity. Backscatter is the reverse DNS queries caused when targets or middleboxes automatically look up the domain name of the originator. Queries are visible to the authoritative DNS servers that handle reverse DNS. While the fraction of backscatter they see depends on the server’s location in the DNS hierarchy, we show that activity that touches many targets appear even in sampled observations. We use information about the queriers to classify originator activity using machine learning. Our algorithm has reasonable accuracy and precision (70–80%) as shown by data from three different organizations operating DNS servers at the root or country-level. Using this technique we examine nine months of activity from one authority to identify trends in scanning, identifying bursts corresponding to Heartbleed and broad and continuous scanning of ssh.

This paper furthers our understanding of evolution of malicious network activities from an earlier work that:
(1) Why our machine-learning based classifier (that relies on manually collected labeled data) does not port across physical sites and over time.
(2) Secondly paper recommends how to sustain good learning score over time and provides expected life-time of labeled data.

An excerpt from section III-E (Training Over Time):

Classification (§ III-D) is based on training, yet training accuracy is affected by the evolution of activity—specific examples come and go, and the behavior in each class evolves. Change happens for all classes, but the problem is particularly acute for malicious classes (such as spam) where the adversarial nature of the action forces rapid evolution (see § V).

 

Some datasets used in this paper can be found here:

Posted in Uncategorized | Tagged , | Leave a comment

new conference paper “Recursives in the Wild: Engineering Authoritative DNS Servers” in IMC 2017

The paper “Recursives in the Wild: Engineering Authoritative DNS Servers” will appear in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

Recursive DNS server selection of authoritatives, per continent. (Figure 4 from [Mueller17b].)

From the abstract:

In In Internet Domain Name System (DNS), services operate authoritative name servers that individuals query through recursive resolvers. Operators strive to provide reliability by operating multiple name servers (NS), each on a separate IP address, and by using IP anycast to allow NSes to provide service from many physical locations. To meet their goals of minimizing latency and balancing load across NSes and anycast, operators need to know how recursive resolvers select an NS, and how that interacts with their NS deployments. Prior work has shown some recursives search for low latency, while others pick an NS at random or round robin, but did not examine how prevalent each choice was. This paper provides the first analysis of how recursives select between name servers in the wild, and from that we provide guidance to operators how to engineer their name servers to reach their goals. We conclude that all NSes need to be equally strong and therefore we recommend to deploy IP anycast at every single authoritative.

All datasets used in this paper (but one) are available at https://ant.isi.edu/datasets/dns/index.html#recursives .

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , | Leave a comment

new conference paper “Broad and Load-aware Anycast Mapping with Verfploeter” in IMC 2017

The paper “Broad and Load-aware Anycast Mapping with Verfploeter” will appear in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

From the abstract:

IP anycast provides DNS operators and CDNs with automatic failover and reduced latency by breaking the Internet into catchments, each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as anycast sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs), but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M passive VPs, 430x the 9k physical VPs in RIPE Atlas, providing coverage of the vast majority of networks around the globe. We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast deployment for B-Root, and we also report its use of a nine-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.

Distribution of load across two anycast sites of B-root using Verfploeter.

The work in this paper was joint work by Wouter B. de Vries, Ricardo de O. Schmidt (Univ. of Twente), Wes Hardaker, John Heidemann (USC/ISI), Pieter-Tjerk de Boer and Aiko Pras (Univ. of Twente). The datasets used in the paper are available at https://ant.isi.edu/datasets/anycast/index.html#verfploeter.

Posted in Uncategorized | Tagged , , , , , , , , , | Leave a comment

Evaluation of Hurricane Harvey’s Effects on the Internet’s Edge

On August 25, 2017 Hurricane Harvey made landfall in south Texas, causing widespread property damage, displacing more than 30,000 people, and costing more than 45 lives (as of 2017-09-01).

We sympathize with those were hurt by this disaster, and hope for swift recovery for the region.

We recently examined the effects of Hurricane Harvey on the area using Trinocular, our internet outage detection system.  Two key results:

Trinocular report on outages in Texas after Hurricane Harvey (on 2017-08-28t03:32Z)

We see that landfall was followed by widespread Internet outages in the Corpus Christi area, with 40% or more home networks dropping off the Internet.

We see that over the following days, network outages grew in the Houston area, with many networks dropping off the Internet. However, the fraction of networks lost in Houston was much smaller than in the Corpus Christi area.

More details are on our Hurricane Harvey web page.  We will update that page as we get more data in.

The dataset including Hurricane Harvey will be internet_outage_adaptive_a29all-20170702 and will be released in October 2017. Until the full data is released, we have a preliminary dataset through August 2017 available on request.

Posted in Uncategorized | Tagged , , , , , , , , , , , | Leave a comment

new technical report “LDplayer: DNS Experimentation at Scale (abstract with poster)”

We released a new technical report “LDplayer: DNS Experimentation at Scale (abstract with poster)”, ISI-TR-721, available at https://www.isi.edu/publications/trpublic/pdfs/ISI-TR-721.pdf.

The poster abstract and poster (included as part of the technical report) appeared at the poster session at the SIGCOMM 2017 in August 2017 in Los Angeles, CA, USA.

From the abstract:

In the last 20 years the core of the Domain Name System (DNS) has improved in security and privacy, and DNS use broadened from name-to-address mapping to a critical roles in service discovery and anti-spam. However, protocol evolution and expansion of use has been slow because advances must consider a huge and diverse installed base. We suggest that experimentation at scale can fill this gap. To meet the need for experimentation at scale, this paper presents LDplayer, a configurable, general-purpose DNS testbed. LDplayer enables DNS experiments to scale in several dimensions: many zones, multiple levels of DNS hierarchy, high query rates, and diverse query sources. To meet these requirements while providing high fidelity experiments, LDplayer includes a distributed DNS query replay system and methods to rebuild the relevant DNS hierarchy from traces. We show that a single DNS server can correctly emulate multiple independent levels of the DNS hierarchy while providing correct responses as if they were independent. We show the importance of our system to evaluate pressing DNS design questions, using it to evaluate changes in DNSSEC key size.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , | Leave a comment

new talk “Digging in to Ground Truth in Network Measurements” at the TMA PhD School 2017

John Heidemann gave the talk “Digging in to Ground Truth in Network Measurements” at the TMA PhD School 2017 in Dublin, Ireland on June 19, 2017.  Slides are available at https://www.isi.edu/~johnh/PAPERS/Heidemann17c.pdf.
From the abstract:

New network measurements are great–you can learn about the whole world! But new network measurements are horrible–are you sure you learn about the world, and not about bugs in your code or approach? New scientific approaches must be tested and ultimately calibrated against ground truth. Yet ground truth about the Internet can be quite difficult—often network operators themselves do not know all the details of their network. This talk will explore the role of ground truth in network measurement: getting it when you can, alternatives when it’s imperfect, and what we learn when none is available.

 

This talk builds on research over the last decade with many people, and the slides include some discussion from the TMA PhD school audience.

Travel to the TMA PhD school was supported by ACM, ISI, and the DHS Retro-Future Bridge and Outages project.

Update 2017-07-05: The TMA folks have posted video of this “Ground Truth” talk to YouTube if you want to relive the glory of a warm afternoon in Dublin.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , | Leave a comment