Categories
Uncategorized

new journal paper: “Deep Dive into NTP Pool’s Popularity and Mapping”

Our new paper “Deep Dive into NTP Pool’s Popularity and Mapping” will appear in the SIGMETRICS 2024 conference and concurrently in the ACM Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 8, no. 1, March 2024.

From the abstract:

Number of ASes that are time providers per country (Figure 8 from [Moura24a]).

Time synchronization is of paramount importance on the Internet, with the Network Time Protocol (NTP) serving as the primary synchronization protocol. The NTP Pool, a volunteer-driven initiative launched two decades ago, facilitates connections between clients and NTP servers. Our analysis of root DNS queries reveals that the NTP Pool has consistently been the most popular time service. We further investigate the DNS component (GeoDNS) of the NTP Pool, which is responsible for mapping clients to servers. Our findings indicate that the current algorithm is heavily skewed, leading to the emergence of time monopolies for entire countries. For instance, clients in the US are served by 551 NTP servers, while clients in Cameroon and Nigeria are served by only one and two servers, respectively, out of the 4k+ servers available in the NTP Pool. We examine the underlying assumption behind GeoDNS for these mappings and discover that time servers located far away can still provide accurate clock time information to clients. We have shared our findings with the NTP Pool operators, who acknowledge them and plan to revise their algorithm to enhance security.

This paper is a joint work of

Giovane C. M. Moura1,2, Marco Davids1, Caspar Schutijser1, Christian Hesselman1,3, John Heidemann4,5, and Georgios Smaragdakis2 with 1: SIDN Labs, 2 Technical University, Delft, 3: the University of Twente, 4: the University of Southern California/Information Sciences Institute, 5: USC/Computer Science Dept. This work was supported by the RIPE NCC (via Atlas), the Root Operators and DNS-OARC (for DITL), SIDN Labs time.nl project, the Twente University Centre for Cyber Security Resarch, NSF projects CNS-2212480, CNS-2319409, the European Research Council ResolutioNet (679158), Duth 6G Future Network Services project, the EU programme Horizon Europe grants SEPTON (101094901), MLSysOps (101092912), and TANGO (101070052).

Categories
Uncategorized

new paper “Old but Gold: Prospecting TCP to Engineer and Live Monitor DNS Anycast” Awarded Best Paper at the Passive and Active Measurement Conference

On March 29, 2022 the paper “Old but Gold: Prospecting TCP to Engineer and Live Monitor DNS Anycast” by Giovane C. M. Moura, John Heidemann, Wes Hardaker, Pithayuth Charnsethikul, Jeroen Bulten, João M. Ceron, and Cristian Hesselman appeared that the 2022 Passive and Active Measurement Conference. We’re happy that it was awarded Best Paper for this year’s conference!

From the abstract:

Google latency for .nl before (left red area) and after (middle green area) DNS polarization was corrected. Polarization was detected with ENTRADA using the work from this paper.

DNS latency is a concern for many service operators: CDNs exist to reduce service latency to end-users but must rely on global DNS for reachability and load-balancing. Today, DNS latency is monitored by active probing from distributed platforms like RIPE Atlas, with Verfploeter, or with commercial services. While Atlas coverage is wide, its 10k sites see only a fraction of the Internet. In this paper we show that passive observation of TCP handshakes can measure live DNS latency, continuously, providing good coverage of current clients of the service. Estimating RTT from TCP is an old idea, but its application to DNS has not previously been studied carefully. We show that there is sufficient TCP DNS traffic today to provide good operational coverage (particularly of IPv6), and very good temporal coverage (better than existing approaches), enabling near-real time evaluation of DNS latency from real clients. We also show that DNS servers can optionally solicit TCP to broaden coverage. We quantify coverage and show that estimates of DNS latency from TCP is consistent with UDP latency. Our approach finds previously unknown, real problems: DNS polarization is a new problem where a hypergiant sends global traffic to one anycast site rather than taking advantage of the global anycast deployment. Correcting polarization in Google DNS cut its latency from 100ms to 10ms; and from Microsoft Azure cut latency from 90ms to 20ms. We also show other instances of routing problems that add 100-200ms latency. Finally, real-time use of our approach for a European country-level domain has helped detect and correct a BGP routing misconfiguration that detoured European traffic to Australia. We have integrated our approach into several open source tools: Entrada, our open source data warehouse for DNS, a monitoring tool (ANTS), which has been operational for the last 2 years on a country-level top-level domain, and a DNS anonymization tool in use at a root server since March 2021.

The tools we developed in this paper are freely available, including patches to Knot, improvements to dnsanon, improvements to ENTRADA, and the new tool Anteater. Unfortunately data from the paper was from operational DNS systems and so cannot be shared due to privacy concerns.

This paper was made in part through DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I (PAADDOS) and by NWO, NSF CNS-1925737 (DIINER), and the Conconrdia Project, an European Union’s Horizon 2020 Research and Innovation program under Grant Agreement No 830927.

Categories
Uncategorized

the tsuNAME vulnerability in DNS

On 2020-05-06, researchers at SIDN Labs, (the .nl registry), InternetNZ (the .nz registry) , and at the Information Science Institute at the University of Southern California publicly disclosed tsuNAME, a vulnerability in some DNS resolver software that can be weaponized to carry out DDoS attacks against authoritative DNS servers.

TsuNAME is a problem that results from cyclic dependencies in DNS records, where two NS records point at each other. We found that some recursive resolvers would follow this cycle, greatly amplifying an initial queries and stresses the authoritative servers providing those records.

Our technical report describes a tsuNAME related event observed in 2020 at the .nz authoritative servers, when two domains were misconfigured with cyclic dependencies. It caused the total traffic to growth by 50%. In the report, we show how an EU-based ccTLD experienced a 10x traffic growth due to cyclic dependent misconfigurations.

We refer DNS operators and developers to our security advisory that provides recommendations for how to mitigate or detect tsuNAME.

We have also created a tool, CycleHunter, for detecting cyclic dependencies in DNS zones. Following responsible disclosure practices, we provided operators and software vendors time to address the problem first. We are happy that Google public DNS and Cisco OpenDNS both took steps to protect their public resolvers, and that PowerDNS and NLnet have confirmed their current software is not affected.

Categories
Papers Publications

new conference paper “Cache Me If You Can: Effects of DNS Time-to-Live” at ACM IMC 2019

We will publish a new paper “Cache Me If You Can: Effects of DNS Time-to-Live” by Giovane C. M. Moura, John Heidemann, Ricardo de O. Schmidt, and Wes Hardaker, in the ACM Internet Measurements Conference (IMC 2019) in Amsterdam, the Netherlands.

From the abstract:

Figure 10a from [Moura19b], showing the distribution of latency with small TTLs before (right in blue) and with larger TTLs after (left in red) the .uy domain reviewed our work and lengthened their domain’s cache lifetimes to reduce latency to their customers.

DNS depends on extensive caching for good performance, and every DNS zone owner must set Time-to-Live (TTL) values to control their DNS caching. Today there is relatively little guidance backed by research about how to set TTLs, and operators must balance conflicting demands of caching against agility of configuration. Exactly how TTL value choices affect operational networks is quite challenging to understand due to interactions across the distributed DNS service, where resolvers receive TTLs in different ways (answers and hints), TTLs are specified in multiple places (zones and their parent’s glue), and while DNS resolution must be security-aware. This paper provides the first careful evaluation of how these multiple, interacting factors affect the effective cache lifetimes of DNS records, and provides recommendations for how to configure DNS TTLs based on our findings. We provide recommendations in TTL choice for different situations, and for where they must be configured. We show that longer TTLs have significant promise in reducing latency, reducing it from 183ms to 28.7ms for one country-code TLD.

We have also reported on this work at the RIPE and APNIC blogs.

Categories
DNS Papers Publications

new conference paper “When the Dike Breaks: Dissecting DNS Defenses During DDoS” at ACM IMC 2018

We have published a new paper “When the Dike Breaks: Dissecting DNS Defenses During DDoS” in the ACM Internet Measurements Conference (IMC 2018) in Boston, Mass., USA.

From the abstract:

Caching and retries protect half of clients even with 90% loss and an attack twice the cache duration. (Figure 7c from [Moura18b].)

The Internet’s Domain Name System (DNS) is a frequent target of Distributed Denial-of-Service (DDoS) attacks, but such attacks have had very different outcomes—some attacks have disabled major public websites, while the external effects of other attacks have been minimal. While on one hand the DNS protocol is relatively simple, the \emph{system} has many moving parts, with multiple levels of caching and retries and replicated servers. This paper uses controlled experiments to examine how these mechanisms affect DNS resilience and latency, exploring both the client side’s DNS \emph{user experience}, and server-side traffic. We find that, for about 30\% of clients, caching is not effective. However, when caches are full they allow about half of clients to ride out server outages that last less than cache lifetimes, Caching and retries together allow up to half of the clients to tolerate DDoS attacks longer than cache lifetimes, with 90\% query loss, and almost all clients to tolerate attacks resulting in 50\% packet loss. While clients may get service during an attack, tail-latency increases for clients. For servers, retries during DDoS attacks increase normal traffic up to $8\times$. Our findings about caching and retries help explain why users see service outages from some real-world DDoS events, but minimal visible effects from others.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/dns/#Moura18b_data.

 

Categories
Publications Technical Report

new technical report “When the Dike Breaks: Dissecting DNS Defenses During DDoS (extended)”

We released a new technical report “When the Dike Breaks: Dissecting DNS Defenses During DDoS (extended)”, ISI-TR-725, available at https://www.isi.edu/~johnh/PAPERS/Moura18a.pdf.

Moura18a Figure 6a, Answers received during a DDoS attack causing 100% packet loss with pre-loaded caches.

From the abstract:

The Internet’s Domain Name System (DNS) is a frequent target of Distributed Denial-of-Service (DDoS) attacks, but such attacks have had very different outcomes—some attacks have disabled major public websites, while the external effects of other attacks have been minimal. While on one hand the DNS protocol is a relatively simple, the system has many moving parts, with multiple levels of caching and retries and replicated servers. This paper uses controlled experiments to examine how these mechanisms affect DNS resilience and latency, exploring both the client side’s DNS user experience, and server-side traffic. We find that, for about about 30% of clients, caching is not effective. However, when caches are full they allow about half of clients to ride out server outages, and caching and retries allow up to half of the clients to tolerate DDoS attacks that result in 90% query loss, and almost all clients to tolerate attacks resulting in 50% packet loss. The cost of such attacks to clients are greater median latency. For servers, retries during DDoS attacks increase normal traffic up to 8x. Our findings about caching and retries can explain why some real-world DDoS cause service outages for users while other large attacks have minimal visible effects.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/dns/#Moura18a_data.

 

Categories
Papers Publications

new conference paper “Anycast vs. DDoS: Evaluating the November 2015 Root DNS Event” in IMC 2016

The paper “Anycast vs. DDoS: Evaluating the November 2015 Root DNS Event” will appear at ACM Internet Measurement Conference in November 2016 in Santa Monica, California, USA. (available at http://www.isi.edu/~weilan/PAPER/IMC2016camera.pdf)

From the abstract:

RIPE Atlas VPs going to different anycast sites when under stress. Colors indicate different sites, with black showing unsuccessful queries. [Moura16b, figure 11b]

Distributed Denial-of-Service (DDoS) attacks continue to be a major threat in the Internet today. DDoS attacks overwhelm target services with requests or other traffic, causing requests from legitimate users to be shut out. A common defense against DDoS is to replicate the service in multiple physical locations or sites. If all sites announce a common IP address, BGP will associate users around the Internet with a nearby site,defining the catchment of that site. Anycast addresses DDoS both by increasing capacity to the aggregate of many sites, and allowing each catchment to contain attack traffic leaving other sites unaffected. IP anycast is widely used for commercial CDNs and essential infrastructure such as DNS, but there is little evaluation of anycast under stress. This paper provides the first evaluation of several anycast services under stress with public data. Our subject is the Internet’s Root Domain Name Service, made up of 13 independently designed services (“letters”, 11 with IP anycast) running at more than 500 sites. Many of these services were stressed by sustained traffic at 100 times normal load on Nov.30 and Dec.1, 2015. We use public data for most of our analysis to examine how different services respond to the these events. We see how different anycast deployments respond to stress, and identify two policies: sites may absorb attack traffic, containing the damage but reducing service to some users, or they may withdraw routes to shift both good and bad traffic to other sites. We study how these deployments policies result in different levels of service to different users. We also show evidence of collateral damage on other services located near the attacks.

This IMC paper is joint work of  Giovane C. M. Moura, Moritz Müller, Cristian Hesselman (SIDN Labs), Ricardo de O. Schmidt, Wouter B. de Vries (U. Twente), John Heidemann, Lan Wei (USC/ISI). Datasets in this paper are derived from RIPE Atlas and are available at http://traces.simpleweb.org/ and at https://ant.isi.edu/datasets/anycast/.

Categories
Publications Technical Report

new technical report “Anycast vs. DDoS: Evaluating the November 2015 Root DNS Event”

We have released a new technical report “Anycast vs. DDoS: Evaluating the November 2015 Root DNS Event”, ISI-TR-2016-709, available at http://www.isi.edu/~johnh/PAPERS/Moura16a.pdf

From the abstract:

[Moura16a] Figure 3
[Moura16a] Figure 3: reachability at several root letters (anycast instances) during two events with very heavy traffic.

Distributed Denial-of-Service (DDoS) attacks continue to be a major threat in the Internet today. DDoS attacks overwhelm target services with requests or other traffic, causing requests from legitimate users to be shut out. A common defense against DDoS is to replicate the service in multiple physical locations or sites. If all sites announce a common IP address, BGP will associate users around the Internet with a nearby site,defining the catchment of that site. Anycast addresses DDoS both by increasing capacity to the aggregate of many sites, and allowing each catchment to contain attack traffic leaving other sites unaffected. IP anycast is widely used for commercial CDNs and essential infrastructure such as DNS, but there is little evaluation of anycast under stress. This paper provides the first evaluation of several anycast services under stress with public data. Our subject is the Internet’s Root Domain Name Service, made up of 13 independently designed services (“letters”, 11 with IP anycast) running at more than 500 sites. Many of these services were stressed by sustained traffic at 100 times normal load on Nov.30 and Dec.1, 2015. We use public data for most of our analysis to examine how different services respond to the these events. We see how different anycast deployments respond to stress, and identify two policies: sites may absorb attack traffic, containing the damage but reducing service to some users, or they may withdraw routes to shift both good and bad traffic to other sites. We study how these deployments policies result in different levels of service to different users. We also show evidence of collateral damage on other services located near the attacks.

This technical report is joint work of  Giovane C. M. Moura, Moritz Müller, Cristian Hesselman(SIDN Labs), Ricardo de O. Schmidt, Wouter B. de Vries (U. Twente), John Heidemann, Lan Wei (USC/ISI). Datasets in this paper are derived from RIPE Atlas and are available at http://traces.simpleweb.org/ and at https://ant.isi.edu/datasets/.