Categories

## new conference paper “Anycast in Context: A Tale of Two Systems” at SIGCOMM 2021

We published a new paper “Anycast in Context: A Tale of Two Systems” by Thomas Koch, Ke Li, Calvin Ardi*, Ethan Katz-Bassett, Matt Calder**, and John Heidemann* (of Columbia, where not otherwise indicated, *USC/ISI, and **Microsoft and Columbia) at ACM SIGCOMM 2021.

From the abstract:

Anycast is used to serve content including web pages and DNS, and anycast deployments are growing. However, prior work examining root DNS suggests anycast deployments incur significant inflation, with users often routed to suboptimal sites. We reassess anycast performance, first extending prior analysis on inflation in the root DNS. We show that inflation is very common in root DNS, affecting more than 95% of users. However, we then show root DNS latency hardly matters to users because caching is so effective. These findings lead us to question: is inflation inherent to anycast, or can inflation be limited when it matters? To answer this question, we consider Microsoft’s anycast CDN serving latency-sensitive content. Here, latency matters orders of magnitude more than for root DNS. Perhaps because of this need, only 35% of CDN users experience any inflation, and the amount they experience is smaller than for root DNS. We show that CDN anycast latency has little inflation due to extensive peering and engineering. These results suggest prior claims of anycast inefficiency reflect experiments on a single application rather than anycast’s technical potential, and they demonstrate the importance of context when measuring system performance.

Categories

## new workshop report “Overcoming Measurement Barriers to Internet Research” (WOMBIR 2021) in ACM CCR

WOMBIR 2021 was the NSF-sponsored Workshop on Overcoming Measurement Barriers to Internet Research. This workshop was hold in two sessions over several days in January and April 2021, chaired by k.c. claffy, David Clark, Fabian Bustamente, John Heidemann, and Mattijs Monjker. The final report includes contributions from Aaron Schulman and Ellen Zegura as well as all the workshop participants.

From the abstract:

In January and April 2021 we held the Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR) with the goal of understanding challenges in network and security data set collection and sharing. Most workshop attendees provided white papers describing their perspectives, and many participated in short-talks and discussion in two virtual workshops over five days. That discussion produced consensus around several points. First, many aspects of the Internet are characterized by decreasing visibility of important network properties, which is in tension with the Internet’s role as critical infrastructure. We discussed three specific research areas that illustrate this tension: security, Internet access; and mobile networking. We discussed visibility challenges at all layers of the networking stack, and the challenge of gathering data and validating inferences. Important data sets require longitudinal (long-term, ongoing) data collection and sharing, support for which is more challenging for Internet research than other fields. We discussed why a combination of technical and policy methods are necessary to safeguard privacy when using or sharing measurement data. Workshop participant proposed several opportunities to accelerate progress, some of which require coordination across government, industry, and academia.

Categories

## new talk “Observing the Global IPv4 Internet: What IP Addresses Show” as an SKC Science and Technology Webinar

John Heidemann gave the talk “Observing the Global IPv4 Internet: What IP Addresses Show” at the SKC Science and Technology Webinar, hosted by Deepankar Medhi (U. Missouri-Kansas City and NSF) on June 18, 2021.  A video of the talk is on YouTube at https://www.youtube.com/watch?v=4A_gFXi2WeY. Slides are available at https://www.isi.edu/~johnh/PAPERS/Heidemann21a.pdf.

From the abstract:

Since 2014 the ANT lab at USC has been observing the visible IPv4 Internet (currently 5 million networks measured every 11 minutes) to detect network outages. This talk explores how we use this large-scale, active measurement to estimate Internet reliability and understand the effects of real-world events such as hurricanes. We have recently developed new algorithms to identify Covid-19-related Work-from-Home and other Internet shutdowns in this data. Our Internet outage work is joint work of John Heidemann, Lin Quan, Yuri Pradkin, Guillermo Baltra, Xiao Song, and Asma Enayet with contributions from Ryan Bogutz, Dominik Staros, Abdulla Alwabel, and Aqib Nisar.

This project is joint work of a number of people listed in the abstract above, and is supported by NSF 2028279 (MINCEQ) and CNS-2007106 (EIEIO). All data from this paper is available at no cost to researchers.

Categories

## the tsuNAME vulnerability in DNS

On 2020-05-06, researchers at SIDN Labs, (the .nl registry), InternetNZ (the .nz registry) , and at the Information Science Institute at the University of Southern California publicly disclosed tsuNAME, a vulnerability in some DNS resolver software that can be weaponized to carry out DDoS attacks against authoritative DNS servers.

TsuNAME is a problem that results from cyclic dependencies in DNS records, where two NS records point at each other. We found that some recursive resolvers would follow this cycle, greatly amplifying an initial queries and stresses the authoritative servers providing those records.

Our technical report describes a tsuNAME related event observed in 2020 at the .nz authoritative servers, when two domains were misconfigured with cyclic dependencies. It caused the total traffic to growth by 50%. In the report, we show how an EU-based ccTLD experienced a 10x traffic growth due to cyclic dependent misconfigurations.

We refer DNS operators and developers to our security advisory that provides recommendations for how to mitigate or detect tsuNAME.

We have also created a tool, CycleHunter, for detecting cyclic dependencies in DNS zones. Following responsible disclosure practices, we provided operators and software vendors time to address the problem first. We are happy that Google public DNS and Cisco OpenDNS both took steps to protect their public resolvers, and that PowerDNS and NLnet have confirmed their current software is not affected.

Categories

## congratulations to Xaio Song for receiving a 2021 USC Viterbi award for MS Student Research

Congratulations to Xiao Song for receiving a 2021 USC Viterbi School of Engineering award for Masters Student Research in the Computer Science Department. This award was on the basis of her work observing work-from-home due to Covid-19, as reported in her poster at the NSF PREPARE-VO Workshop and our arXive technical report.

The award was presented at the May 2021 Viterbi Masters Student Awards Ceremony.

Categories

## congratulations to Abdul Qadeer for his PhD

I would like to congratulate Dr. Abdul Qadeer for defending his PhD at the University of Southern California in March 2021 and completing his doctoral dissertation “Efficient Processing of Streaming Data in Multi-User and Multi-Abstraction Workflows”.

From the abstract:

Ever-increasing data and evolving processing needs force enterprises to scale-out expensive computational resources to prioritize processing for timely results. Teams process their organization’s data either independently or using ad hoc sharing mechanisms. Often different users start with the same data and the same initial stages (decrypt, decompress, clean, anonymize). As their workflows evolve, later stages often diverge, and different stages may work best with different abstractions. The result is workflows with some overlap, some variations, and multiple transitions where data handling changes between continuous, windowed, and per-block. The system processing this diverse, multi-user, multi-abstraction workflow should be efficient and safe, but also must cope with fault recovery.

Analytics from multiple users can cause redundant processing and data, or encounter performance anomalies due to skew. Skew arises due to static or dynamic imbalance in the workflow stages. Both redundancy and skew waste compute resources and add latency to results. When users bridge between multiple abstractions, such as from per-block processing to windowed processing, they often employ custom code. These transitions can be error prone due to corner cases, can easily add latency as an inefficiency, and custom code is often a source of errors and maintenance difficulty. We need new solutions to manage the above challenges and to expose opportunities for data sharing explicitly. Our thesis is: new methods enable efficient processing of multi-user and multi-abstraction workflows of streaming data. We present two new methods for efficient stream processing—optimizations for multi-user workflows, and multiple abstractions for application coverage and efficient bridging.

These algorithms use a pipeline-graph to detect duplication of code and data across multiple users and cleanly delineate workflow stages for skew management. The pipeline-graph is our job description language that allows developers to specify their need easily and enables our system to automatically detect duplication and manage skew. The pipeline-graph acts as a shared canvas for collaboration amongst users to extend each other’s work. To efficiently implement our deduplication and skew management algorithms, we present streaming data to processing stages as fixed-sized but large blocks. Large-blocks have low meta-data overhead per user, provide good parallelism, and help with fault recovery.

Our second method enables applications to use a different abstraction on a different workflow stage. We provide three key abstractions and show that they cover many classes of analytics and our framework can bridge them efficiently. We provide Block-Streaming, Windowed-Streaming, and Stateful-Streaming abstractions. Block-Streaming is suitable for single-pass applications that care about temporal or spatial locality. Windowed-Streaming allows applications to process accumulated data (time-aligned blocks to sync with external information) and reductions like summation, averages, or other MapReduce-style analytics. We believe our three abstractions allow many classes of analytics and enable processing of one block, many blocks, or infinite stream. Plumb allows multiple abstractions in different parts of the workflow and provides efficient bridging between them so that users could make complex analytics from individual stages without worrying about data movement.

Our methods aim for good throughput, low latency, and clean and easy-to-use support for more applications to achieve better efficiency than our prior hand-tuned but often brittle system. The Plumb framework is the implementation of our solutions and a testbed to validate them. We use real-world workloads from the B-Root DNS domain to demonstrate effectiveness of our solutions. Our processing deduplication increases throughput up to $6\times$, reduces storage by 75%, as compared to their pre-Plumb counterparts. Plumb reduces CPU wastage due to structural skew up to half and reduces latency due to computational skew by 50%. Plumb has cut per-block latency by 74% and latency of daily statistics by 97%, while reducing code size by 58% and lowering manual intervention to handle problems by 73% as compared to pre-Plumb system.

The operational use of Plumb for the B-Root service provides a multi-year validation of our design choices under many traffic conditions. Over the last three years, Plumb has processed more than 12PB of DNS packet data and daily statistics. We show that our abstractions apply to many applications in the domain of networking big-data and beyond.

Categories

## congratulations to Manaf Gharaibeh for his PhD

I would like to congratulate Dr. Manaf Gharaibeh for defending his PhD at Colorado State University in February 2020 and completing his doctoral dissertation “Characterizing the Visible Address Space to Enable Efficient, Continuous IP Geolocation” in March 2020.

From the abstract:

Internet Protocol (IP) geolocation is vital for location-dependent applications and many network research problems. The benefits to applications include enabling content customization, proximal server selection, and management of digital rights based on the location of users, to name a few. The benefits to networking research include providing geographic context useful for several purposes, such as to study the geographic deployment of Internet resources, bind cloud data to a location, and to study censorship and monitoring, among others.
The measurement-based IP geolocation is widely considered as the state-of-the-art client-independent approach to estimate the location of an IP address. However, full measurement-based geolocation is prohibitive when applied continuously to the entire Internet to maintain up-to-date IP-to-location mappings. Furthermore, many IP address blocks rarely move, making it unnecessary to perform such full geolocation.
The thesis of this dissertation states that \emph{we can enable efficient, continuous IP geolocation by identifying clusters of co-located IP addresses and their location stability from latency observations.} In this statement, a cluster indicates a group of an arbitrary number of adjacent co-located IP addresses (a few up to a /16). Location stability indicates a measure of how often an IP block changes location. We gain efficiency by allowing IP geolocation systems to geolocate IP addresses as units, and by detecting when a geolocation update is required, optimizations not explored in prior work. We present several studies to support this thesis statement.
We first present a study to evaluate the reliability of router geolocation in popular geolocation services, complementing prior work that evaluates end-hosts geolocation in such services. The results show the limitations of these services and the need for better solutions, motivating our work to enable more accurate approaches. Second, we present a method to identify clusters of \emph{co-located} IP addresses by the similarity in their latency. Identifying such clusters allows us to geolocate them efficiently as units without compromising accuracy. Third, we present an efficient delay-based method to identify IP blocks that move over time, allowing us to recognize when geolocation updates are needed and avoid frequent geolocation of the entire Internet to maintain up-to-date geolocation. In our final study, we present a method to identify cellular blocks by their distinctive variation in latency compared to WiFi and wired blocks. Our method to identify cellular blocks allows a better interpretation of their latency estimates and to study their geographic properties without the need for proprietary data from operators or users.

Categories

## new poster “Measuring the Internet during Covid-19 to Evaluate Work-from-Home” at the NSF PREPARE-VO Workshop

Xiao Song presented the poster “Measuring the Internet during Covid-19 to Evaluate Work-from-Home (poster)” at the NSF PREPARE-VO Workshop on 2020-12-15. Xiao describes the poster in our video.

There was no formal abstract, but this poster presents early results from examining Internet address changes to identify work-from-home resulting from Covid-19.

This work is part of the MINCEQ project, supported as an NSF CISE RAPID, NSF-2028279.

Categories

## congratulations to Lan Wei for her new PhD

I would like to congratulate Dr. Lan Wei for defending her PhD in September 2020 and completing her doctoral dissertation “Anycast Stability, Security and Latency in The Domain Name System (DNS) and Content Deliver Networks (CDNs)” in December 2020.

From the abstract:

Clients’ performance is important for both Content-Delivery Networks (CDNs) and the Domain Name System (DNS). Operators would like the service to meet expectations of their users. CDNs providing stable connections will prevent users from experiencing downloading pause from connection breaks. Users expect DNS traffic to be secure without being intercepted or injected. Both CDN and DNS operators care about a short network latency, since users can become frustrated by slow replies.

Many CDNs and DNS services (such as the DNS root) use IP anycast to bring content closer to users. Anycast-based services announce the same IP address(es) from globally distributed sites. In an anycast infrastructure, Internet routing protocols will direct users to a nearby site naturally. The path between a user and an anycast site is formed on a hop-to-hop basis—at each hop} (a network device such as a router), routing protocols like Border Gateway Protocol (BGP) makes the decision about which next hop to go to. ISPs at each hop will impose their routing policies to influence BGP’s decisions. Without globally knowing (also unable to modify) the distributed information of BGP routing table of every ISP on the path, anycast infrastructure operators are unable to predict and control in real-time which specific site a user will visit and what the routing path will look like. Also, any change in routing policy along the path may change both the path and the site visited by a user. We refer to such minimal control over routing towards an anycast service, the uncertainty of anycast routing. Using anycast spares extra traffic management to map users to sites, but can operators provide a good anycast-based service without precise control over the routing?

This routing uncertainty raises three concerns: routing can change, breaking connections; uncertainty about global routing means spoofing can go undetected, and lack of knowledge of global routing can lead to suboptimal latency. In this thesis, we show how we confirm the stability, how we confirm the security, and how we improve the latency of anycast to answer these three concerns. First, routing changes can cause users to switch sites, and therefore break a stateful connection such as a TCP connection immediately. We study routing stability and demonstrate that connections in anycast infrastructure are rarely broken by routing instability. Of all vantage points (VPs), fewer than 0.15% VP’s TCP connections frequently break due to timeout in 5s during all 17 hours we observed. We only observe such frequent TCP connection break in 1 service out of all 12 anycast services studied. A second problem is DNS spoofing, where a third-party can intercept the DNS query and return a false answer. We examine DNS spoofing to study two aspects of security–integrity and privacy, and we design an algorithm to detect spoofing and distinguish different mechanisms to spoof anycast-based DNS. We show that DNS spoofing is uncommon, happening to only 1.7% of all VPs, although increasing over the years. Among all three ways to spoof DNS–injections, proxies, and third-party anycast site (prefix hijack), we show that third-party anycast site is the least popular one. Last, diagnosing poor latency and improving the latency can be difficult for CDNs. We develop a new approach, BAUP (bidirectional anycast unicast probing), which detects inefficient routing with better routing replacement provided. We use BAUP to study anycast latency. By applying BAUP and changing peering policies, a commercial CDN is able to significantly reduce latency, cutting median latency in half from 40ms to 16ms for regional users.

Lan defended her PhD when USC was on work-from-home due to COVID-19; she is the third ANT student with a fully on-line PhD defense.

Categories

## Observing the CenturyLink outage on 2020-08-30

CenturyLink / Level3 was reported to have a major outage on Sunday, 2020-08-30 (as reported on CNN and discussed on slashdot).

This outage was very clear in our Trinocular near-real-time outage detection system. We have summarized the details with images, before, during, and after, and an animation of the nearly 7-hour event or see the event on our near-real-time outage website.

This outage is one of the largest U.S. nation-wide events since the 2014-08-27 Time Warner outage.