We will publish a new paper titled “Anycast Agility: Network Playbooks to Fight DDoS” by A S M Rizvi (USC/ISI), Leandro Bertholdo (University of Twente), João Ceron (SIDN Labs), and John Heidemann (USC/ISI) at the 31st USENIX Security Symposium in Aug. 2022.
A sample anycast playbook for a 3-site anycast deployment. Different routing configurations provide different traffic mixes. From [Rizvi22a, Table 5].
From the abstract:
IP anycast is used for services such as DNS and Content Delivery Networks (CDN) to provide the capacity to handle Distributed Denial-of-Service (DDoS) attacks. During a DDoS attack service operators redistribute traffic between anycast sites to take advantage of sites with unused or greater capacity. Depending on site traffic and attack size, operators may instead concentrate attackers in a few sites to preserve operation in others. Operators use these actions during attacks, but how to do so has not been described systematically or publicly. This paper describes several methods to use BGP to shift traffic when under DDoS, and shows that a response playbook can provide a menu of responses that are options during an attack. To choose an appropriate response from this playbook, we also describe a new method to estimate true attack size, even though the operator’s view during the attack is incomplete. Finally, operator choices are constrained by distributed routing policies, and not all are helpful. We explore how specific anycast deployment can constrain options in this playbook, and are the first to measure how generally applicable they are across multiple anycast networks.
Acknowledgments: A S M Rizvi and John Heidemann’s work on this paper is supported, in part, by the DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I. Joao Ceron and Leandro Bertholdo’s work on this paper is supported by Netherlands Organisation for scientific research (4019020199), and European Union’s Horizon 2020 research and innovation program (830927). We would like to thank our anonymous reviewers for their valuable feedback. We are also grateful to the Peering and Tangled admins who allowed us to run measurements. We thank Dutch National Scrubbing Center for sharing DDoS data with us. We also thank Yuri Pradkin for his help to release our datasets.
On November 14 we had a group lunch near ISI to celebrate the completion of Joao Ceron’s visit from the University of Twente as a visiting scholar, to welcome Asma Enayet to the group as a new PhD student, and to welcome Hang Guo’s son into the world. (Hang was understandably not able to make the lunch.) Happy Thanksgiving to all!
A group lunch in honor of Asma (left) and Joao (6th left).
Figure 10a from [Moura19b], showing the distribution of latency with small TTLs before (right in blue) and with larger TTLs after (left in red) the .uy domain reviewed our work and lengthened their domain’s cache lifetimes to reduce latency to their customers.
DNS depends on extensive caching for good performance, and every DNS zone owner must set Time-to-Live (TTL) values to control their DNS caching. Today there is relatively little guidance backed by research about how to set TTLs, and operators must balance conflicting demands of caching against agility of configuration. Exactly how TTL value choices affect operational networks is quite challenging to understand due to interactions across the distributed DNS service, where resolvers receive TTLs in different ways (answers and hints), TTLs are specified in multiple places (zones and their parent’s glue), and while DNS resolution must be security-aware. This paper provides the first careful evaluation of how these multiple, interacting factors affect the effective cache lifetimes of DNS records, and provides recommendations for how to configure DNS TTLs based on our findings. We provide recommendations in TTL choice for different situations, and for where they must be configured. We show that longer TTLs have significant promise in reducing latency, reducing it from 183ms to 28.7ms for one country-code TLD.
We have also reported on this work at the RIPE and APNIC blogs.
The PAADDoS project’s goal is to defend against large-scale DDoS attacks by making anycast-based capacity more effective than it is today.
We will work toward this goal by (1) developing tools to map anycast catchments and baseline load, (2) develop methods to plan changes and their effects on catchments, (3) develop tools to estimate attack load and assist anycast reconfiguration during an attack. and (4) evaluate and integration of these tools with traditional DoS defenses.
We expect these innovations to improve service resilience in the face of DDoS attacks. Our tools will improve anycast agility during an attack, allowing capacity to be used effectively.
Caching and retries protect half of clients even with 90% loss and an attack twice the cache duration. (Figure 7c from [Moura18b].)
The Internet’s Domain Name System (DNS) is a frequent target of Distributed Denial-of-Service (DDoS) attacks, but such attacks have had very different outcomes—some attacks have disabled major public websites, while the external effects of other attacks have been minimal. While on one hand the DNS protocol is relatively simple, the \emph{system} has many moving parts, with multiple levels of caching and retries and replicated servers. This paper uses controlled experiments to examine how these mechanisms affect DNS resilience and latency, exploring both the client side’s DNS \emph{user experience}, and server-side traffic. We find that, for about 30\% of clients, caching is not effective. However, when caches are full they allow about half of clients to ride out server outages that last less than cache lifetimes, Caching and retries together allow up to half of the clients to tolerate DDoS attacks longer than cache lifetimes, with 90\% query loss, and almost all clients to tolerate attacks resulting in 50\% packet loss. While clients may get service during an attack, tail-latency increases for clients. For servers, retries during DDoS attacks increase normal traffic up to $8\times$. Our findings about caching and retries help explain why users see service outages from some real-world DDoS events, but minimal visible effects from others.
Recursive DNS server selection of authoritatives, per continent. (Figure 4 from [Mueller17b].)From the abstract:
In In Internet Domain Name System (DNS), services operate authoritative name servers that individuals query through recursive resolvers. Operators strive to provide reliability by operating multiple name servers (NS), each on a separate IP address, and by using IP anycast to allow NSes to provide service from many physical locations. To meet their goals of minimizing latency and balancing load across NSes and anycast, operators need to know how recursive resolvers select an NS, and how that interacts with their NS deployments. Prior work has shown some recursives search for low latency, while others pick an NS at random or round robin, but did not examine how prevalent each choice was. This paper provides the first analysis of how recursives select between name servers in the wild, and from that we provide guidance to operators how to engineer their name servers to reach their goals. We conclude that all NSes need to be equally strong and therefore we recommend to deploy IP anycast at every single authoritative.
IP anycast provides DNS operators and CDNs with automatic failover and reduced latency by breaking the Internet into catchments, each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as anycast sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs), but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M passive VPs, 430x the 9k physical VPs in RIPE Atlas, providing coverage of the vast majority of networks around the globe. We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast deployment for B-Root, and we also report its use of a nine-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.
Distribution of load across two anycast sites of B-root using Verfploeter.
The work in this paper was joint work by Wouter B. de Vries, Ricardo de O. Schmidt (Univ. of Twente), Wes Hardaker, John Heidemann (USC/ISI), Pieter-Tjerk de Boer and Aiko Pras (Univ. of Twente). The datasets used in the paper are available at https://ant.isi.edu/datasets/anycast/index.html#verfploeter.
Recursive DNS server selection of authoritatives, per continent. (Figure 8 from [Mueller17a].)From the abstract:
In Internet Domain Name System (DNS), services operate authoritative name servers that individuals query through recursive resolvers. Operators strive to provide reliability by operating multiple name servers (NS), each on a separate IP address, and by using IP anycast to allow NSes to provide service from many physical locations. To meet their goals of minimizing latency and balancing load across NSes and anycast, operators need to know how recursive resolvers select an NS, and how that interacts with their NS deployments. Prior work has shown some recursives search for low latency, while others pick an NS at random or round robin, but did not examine how prevalent each choice was. This paper provides the first analysis of how recursives select between name servers in the wild, and from that we provide guidance to name server operators to reach their goals. We conclude that all NSes need to be equally strong and therefore we recommend to deploy IP anycast at every single authoritative.
We have released a new technical report “Verfploeter: Broad and Load-Aware Anycast Mapping”,by Wouter B. de Vries, Ricardo de O. Schmidt, Wes Haraker, John Heidemann, Pieter-Tjerk de Boer, and Aiko Pras as an ISI technical report ISI-TR-717.
Verfploeter coverage of B-Root. Circle radiuses are how many /24 blocks in each 2×2 degree region go to B-Root, and colored slices indicate which go to LAX and which to MIA. (Figure 2b from [Vries17a], dataset: SBV-5-15).From the abstract:
IP anycast provides DNS operators and CDNs with automatic fail-over and reduced latency by breaking the Internet into catchments, each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs), but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M virtual VPs, 430x the 9k physical VPs in RIPE Atlas, providing coverage of the vast majority of networks around the globe. We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast for B-Root, and we also report its use of a 9-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.