Categories
Uncategorized

new conference paper “Anycast in Context: A Tale of Two Systems” at SIGCOMM 2021

We published a new paper “Anycast in Context: A Tale of Two Systems” by Thomas Koch, Ke Li, Calvin Ardi*, Ethan Katz-Bassett, Matt Calder**, and John Heidemann* (of Columbia, where not otherwise indicated, *USC/ISI, and **Microsoft and Columbia) at ACM SIGCOMM 2021.

From the abstract:

Anycast is used to serve content including web pages and DNS, and anycast deployments are growing. However, prior work examining root DNS suggests anycast deployments incur significant inflation, with users often routed to suboptimal sites. We reassess anycast performance, first extending prior analysis on inflation in the root DNS. We show that inflation is very common in root DNS, affecting more than 95% of users. However, we then show root DNS latency hardly matters to users because caching is so effective. These findings lead us to question: is inflation inherent to anycast, or can inflation be limited when it matters? To answer this question, we consider Microsoft’s anycast CDN serving latency-sensitive content. Here, latency matters orders of magnitude more than for root DNS. Perhaps because of this need, only 35% of CDN users experience any inflation, and the amount they experience is smaller than for root DNS. We show that CDN anycast latency has little inflation due to extensive peering and engineering. These results suggest prior claims of anycast inefficiency reflect experiments on a single application rather than anycast’s technical potential, and they demonstrate the importance of context when measuring system performance.

Categories
Publications Students

congratulations to Lan Wei for her new PhD

I would like to congratulate Dr. Lan Wei for defending her PhD in September 2020 and completing her doctoral dissertation “Anycast Stability, Security and Latency in The Domain Name System (DNS) and Content Deliver Networks (CDNs)” in December 2020.

From the abstract:

Clients’ performance is important for both Content-Delivery Networks (CDNs) and the Domain Name System (DNS). Operators would like the service to meet expectations of their users. CDNs providing stable connections will prevent users from experiencing downloading pause from connection breaks. Users expect DNS traffic to be secure without being intercepted or injected. Both CDN and DNS operators care about a short network latency, since users can become frustrated by slow replies.


Many CDNs and DNS services (such as the DNS root) use IP anycast to bring content closer to users. Anycast-based services announce the same IP address(es) from globally distributed sites. In an anycast infrastructure, Internet routing protocols will direct users to a nearby site naturally. The path between a user and an anycast site is formed on a hop-to-hop basis—at each hop} (a network device such as a router), routing protocols like Border Gateway Protocol (BGP) makes the decision about which next hop to go to. ISPs at each hop will impose their routing policies to influence BGP’s decisions. Without globally knowing (also unable to modify) the distributed information of BGP routing table of every ISP on the path, anycast infrastructure operators are unable to predict and control in real-time which specific site a user will visit and what the routing path will look like. Also, any change in routing policy along the path may change both the path and the site visited by a user. We refer to such minimal control over routing towards an anycast service, the uncertainty of anycast routing. Using anycast spares extra traffic management to map users to sites, but can operators provide a good anycast-based service without precise control over the routing?


This routing uncertainty raises three concerns: routing can change, breaking connections; uncertainty about global routing means spoofing can go undetected, and lack of knowledge of global routing can lead to suboptimal latency. In this thesis, we show how we confirm the stability, how we confirm the security, and how we improve the latency of anycast to answer these three concerns. First, routing changes can cause users to switch sites, and therefore break a stateful connection such as a TCP connection immediately. We study routing stability and demonstrate that connections in anycast infrastructure are rarely broken by routing instability. Of all vantage points (VPs), fewer than 0.15% VP’s TCP connections frequently break due to timeout in 5s during all 17 hours we observed. We only observe such frequent TCP connection break in 1 service out of all 12 anycast services studied. A second problem is DNS spoofing, where a third-party can intercept the DNS query and return a false answer. We examine DNS spoofing to study two aspects of security–integrity and privacy, and we design an algorithm to detect spoofing and distinguish different mechanisms to spoof anycast-based DNS. We show that DNS spoofing is uncommon, happening to only 1.7% of all VPs, although increasing over the years. Among all three ways to spoof DNS–injections, proxies, and third-party anycast site (prefix hijack), we show that third-party anycast site is the least popular one. Last, diagnosing poor latency and improving the latency can be difficult for CDNs. We develop a new approach, BAUP (bidirectional anycast unicast probing), which detects inefficient routing with better routing replacement provided. We use BAUP to study anycast latency. By applying BAUP and changing peering policies, a commercial CDN is able to significantly reduce latency, cutting median latency in half from 40ms to 16ms for regional users.

Lan defended her PhD when USC was on work-from-home due to COVID-19; she is the third ANT student with a fully on-line PhD defense.

Categories
Papers Publications

New paper “Bidirectional Anycast/Unicast Probing (BAUP): Optimizing CDN Anycast” at IFIP TMA 2020

We published a new paper “Bidirectional Anycast/Unicast Probing (BAUP): Optimizing CDN Anycast” by Lan Wei (University of Southern California/ ISI), Marcel Flores (Verizon Digital Media Services), Harkeerat Bedi (Verizon Digital Media Services), John Heidemann (University of Southern California/ ISI) at Network Traffic Measurement and Analysis Conference 2020.

From the abstract:

IP anycast is widely used today in Content Delivery Networks (CDNs) and for Domain Name System (DNS) to provide efficient service to clients from multiple physical points-of-presence (PoPs). Anycast depends on BGP routing to map users to PoPs, so anycast efficiency depends on both the CDN operator and the routing policies of other ISPs. Detecting and diagnosing
inefficiency is challenging in this distributed environment. We propose Bidirectional Anycast/Unicast Probing (BAUP), a new approach that detects anycast routing problems by comparing anycast and unicast latencies. BAUP measures latency to help us identify problems experienced by clients, triggering traceroutes to localize the cause and suggest opportunities for improvement. Evaluating BAUP on a large, commercial CDN, we show that problems happens to 1.59% of observers, and we find multiple opportunities to improve service. Prompted by our work, the CDN changed peering policy and was able to significantly reduce latency, cutting median latency in half (40 ms to 16 ms) for regions with more than 100k users.

The data from this paper is publicly available from RIPE Atlas, please see paper reference for measurement IDs.

Categories
DNS Internet

B-root’s new sites reduce latency

B-Root, one of the 13 root DNS servers, deployed three new sites in January 2020, doubling its footprint and adding its first sites in Asia and Europe. How did this growth lower latency to users? We looked at B-Root deployment with Verfploter to answer this question. The end result was that new sites in Asia and Europe allowed users there to resolve DNS names with B-Root with lower latency (see the catchment map below). For more details please review our anycast catchment page.

B-root added 3 new sites in Singapore, Washington, DC, and Amsterdam to their three existing 3 sites in Los Angeles, Chile, and Miami. The graph below shows anycast catchments after these sites were deployed (each color in the pie charts shows traffic to a different site).

Categories
Announcements Projects

new project “Plannning for Anycast as Anti-DDoS” (PAADDoS)

We are happy to announce a new project Plannning for Anycast as Anti-DDoS (PAADDoS).

The PAADDoS project’s goal is to defend against large-scale DDoS attacks by making anycast-based capacity more effective than it is today.

We will work toward this goal by (1) developing tools to map anycast catchments and baseline load, (2) develop methods to plan changes and their effects on catchments, (3) develop tools to estimate attack load and assist anycast reconfiguration during an attack. and (4) evaluate and integration of these tools with traditional DoS defenses.

We expect these innovations to improve service resilience in the face of DDoS attacks. Our tools will improve anycast agility during an attack, allowing capacity to be used effectively.

PAADDoS is a joint effort of the ANT Lab involving USC/ISI (PI: John Heidemann) and the Design and Analysis of Communication Systems group at the University of Twente (PI: Aiko Pras).

PAADDoS is supported by the DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I, and by NWO.

Categories
DNS Papers Publications

new conference paper “When the Dike Breaks: Dissecting DNS Defenses During DDoS” at ACM IMC 2018

We have published a new paper “When the Dike Breaks: Dissecting DNS Defenses During DDoS” in the ACM Internet Measurements Conference (IMC 2018) in Boston, Mass., USA.

From the abstract:

Caching and retries protect half of clients even with 90% loss and an attack twice the cache duration. (Figure 7c from [Moura18b].)

The Internet’s Domain Name System (DNS) is a frequent target of Distributed Denial-of-Service (DDoS) attacks, but such attacks have had very different outcomes—some attacks have disabled major public websites, while the external effects of other attacks have been minimal. While on one hand the DNS protocol is relatively simple, the \emph{system} has many moving parts, with multiple levels of caching and retries and replicated servers. This paper uses controlled experiments to examine how these mechanisms affect DNS resilience and latency, exploring both the client side’s DNS \emph{user experience}, and server-side traffic. We find that, for about 30\% of clients, caching is not effective. However, when caches are full they allow about half of clients to ride out server outages that last less than cache lifetimes, Caching and retries together allow up to half of the clients to tolerate DDoS attacks longer than cache lifetimes, with 90\% query loss, and almost all clients to tolerate attacks resulting in 50\% packet loss. While clients may get service during an attack, tail-latency increases for clients. For servers, retries during DDoS attacks increase normal traffic up to $8\times$. Our findings about caching and retries help explain why users see service outages from some real-world DDoS events, but minimal visible effects from others.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/dns/#Moura18b_data.

 

Categories
Publications Technical Report

new technical report “Back Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)”

We released a new technical report “Back Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)”, ISI-TR-724, available at https://www.isi.edu/~johnh/PAPERS/Heidemann18b.pdf.

From the abstract:

Clustering (from our event clustering algorithm) of 2014q3 outages from 172/8, showing 7 weeks including the 2014-08-27 Time Warner outage.

Internet reliability has many potential weaknesses: fiber rights-of-way at the physical layer, exchange-point congestion from DDOS at the network layer, settlement disputes between organizations at the financial layer, and government intervention the political layer. This paper shows that we can discover common points-of-failure at any of these layers by observing correlated failures. We use end-to-end observations from data-plane-level connectivity of edge hosts in the Internet. We identify correlations in connectivity: networks that usually fail and recover at the same time suggest common point-of-failure. We define two new algorithms to meet these goals. First, we define a computationally-efficient algorithm to create a linear ordering of blocks to make correlated failures apparent to a human analyst. Second, we develop an event-based clustering algorithm that directly networks with correlated failures, suggesting common points-of-failure. Our algorithms scale to real-world datasets of millions of networks and observations: linear ordering is O(n log n) time and event-based clustering parallelizes with Map/Reduce. We demonstrate them on three months of outages for 4 million /24 network prefixes, showing high recall (0.83 to 0.98) and precision (0.72 to 1.0) for blocks that respond. We also show that our algorithms generalize to identify correlations in anycast catchments and routing.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/outage/, and we expect to release the software for this paper in the coming months (contact us if you are interested).

Categories
Presentations

new talk “Verfploeter: Broad and Load-Aware Anycast Mapping”

Wes Hardaker gave the talk “Verfploeter: Broad and Load-Aware Anycast Mapping” at DNS-OARC in San Jose, California, USA on September 29, 2017.  Slides are available at on the event page.

From the abstract:

IP anycast provides DNS operators and CDNs with automatic fail-over and reduced latency by breaking the Internet into catchments,each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs),but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M virtual VPs, 430 times the 9k physical VPs in RIPE Atlas,providing coverage of the vast majority of networks around the globe. We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast for B-Root, and we also report its use of a nine-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.

 

A video of the talk is available On YouTube.

Categories
Papers Publications

new conference paper “Recursives in the Wild: Engineering Authoritative DNS Servers” in IMC 2017

The paper “Recursives in the Wild: Engineering Authoritative DNS Servers” will appear in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

Recursive DNS server selection of authoritatives, per continent. (Figure 4 from [Mueller17b].)
From the abstract:

In In Internet Domain Name System (DNS), services operate authoritative name servers that individuals query through recursive resolvers. Operators strive to provide reliability by operating multiple name servers (NS), each on a separate IP address, and by using IP anycast to allow NSes to provide service from many physical locations. To meet their goals of minimizing latency and balancing load across NSes and anycast, operators need to know how recursive resolvers select an NS, and how that interacts with their NS deployments. Prior work has shown some recursives search for low latency, while others pick an NS at random or round robin, but did not examine how prevalent each choice was. This paper provides the first analysis of how recursives select between name servers in the wild, and from that we provide guidance to operators how to engineer their name servers to reach their goals. We conclude that all NSes need to be equally strong and therefore we recommend to deploy IP anycast at every single authoritative.

All datasets used in this paper (but one) are available at https://ant.isi.edu/datasets/dns/index.html#recursives .

Categories
Publications Technical Report

new technical report “Recursives in the Wild: Engineering Authoritative DNS Servers”

We have released a new technical report “Recursives in the Wild: Engineering Authoritative DNS Servers”, by Moritz Müller and Giovane C. M. Moura and
Ricardo de O. Schmidt and John Heidemann as an ISI technical report ISI-TR-720.

Recursive DNS server selection of authoritatives, per continent. (Figure 8 from [Mueller17a].)
From the abstract:

In Internet Domain Name System (DNS), services operate authoritative name servers that individuals query through recursive resolvers. Operators strive to provide reliability by operating multiple name servers (NS), each on a separate IP address, and by using IP anycast to allow NSes to provide service from many physical locations. To meet their goals of minimizing latency and balancing load across NSes and anycast, operators need to know how recursive resolvers select an NS, and how that interacts with their NS deployments. Prior work has shown some recursives search for low latency, while others pick an NS at random or round robin, but did not examine how prevalent each choice was. This paper provides the first analysis of how recursives select between name servers in the wild, and from that we provide guidance to name server operators to reach their goals. We conclude that all NSes need to be equally strong and therefore we recommend to deploy IP anycast at every single authoritative.

All datasets used in this paper (but one) are available at https://ant.isi.edu/datasets/dns/index.html#recursives .