congratulations to Tarang Saluja for his summer undergraduate research internship

Tarang Saluja completed his summer undergraduate research internship at ISI this summer, working with John Heidemann and Yuri Pradkin on his project “Differences in Monitoring the DNS Root Over IPv4 and IPv6″.

In his project, Tarang examined RIPE Atlas’s DNSmon, a measurement system that monitors the Root Server System. DNSmon examines both IPv4 and IPv6, and its IPv6 reports show query loss rates that are consistently higher than IPv4, often 4-6% IPv6 loss vs. no or 2% IPv4 loss. Prior results by researchers at RIPE suggested these differences were due to problems at specific Atlas Vantage Points (VPs, also called Atlas Probes).

Tarang Saluja describing his research to an ISI researcher, at the ISI REU Poster Session on 2022-08-01.

Building on the Guillero Baltra’s studies of partial connectivity in the Internet, Tarang classified Atlas VPs with problems as islands and peninsulas. Islands think they are on IPv6, but cannot reach any of the 13 Root DNS “letters” over IPv6, indicating that the VP has a local network configuration problem. Peninsulas can reach some letters, but not others, indicating a routing problem somewhere in the core of the Internet.

Tarang’s work is important because these observations allow lead to potential solutions. Islands suggest VPs that do not support IPv6 and so should not be used for monitoring. Peninsulas point to IPv6 routing problems that need to be addressed by ISPs. Setting VPs with these problems aside provides a more accurate view of what IPv6 should be, and allows us to use DNSmon to detect more subtle problems. Together, his work points the way to improving IPv6 for everyone and improving Root DNS access over IPv6.

Tarang’s work was part of the ISI Research Experiences for Undergraduates program at USC/ISI. We thank Jelena Mirkovic (PI) for coordinating another year of this great program, and NSF for support through award #2051101.


new paper “Old but Gold: Prospecting TCP to Engineer and Live Monitor DNS Anycast” Awarded Best Paper at the Passive and Active Measurement Conference

On March 29, 2022 the paper “Old but Gold: Prospecting TCP to Engineer and Live Monitor DNS Anycast” by Giovane C. M. Moura, John Heidemann, Wes Hardaker, Pithayuth Charnsethikul, Jeroen Bulten, João M. Ceron, and Cristian Hesselman appeared that the 2022 Passive and Active Measurement Conference. We’re happy that it was awarded Best Paper for this year’s conference!

From the abstract:

Google latency for .nl before (left red area) and after (middle green area) DNS polarization was corrected. Polarization was detected with ENTRADA using the work from this paper.

DNS latency is a concern for many service operators: CDNs exist to reduce service latency to end-users but must rely on global DNS for reachability and load-balancing. Today, DNS latency is monitored by active probing from distributed platforms like RIPE Atlas, with Verfploeter, or with commercial services. While Atlas coverage is wide, its 10k sites see only a fraction of the Internet. In this paper we show that passive observation of TCP handshakes can measure live DNS latency, continuously, providing good coverage of current clients of the service. Estimating RTT from TCP is an old idea, but its application to DNS has not previously been studied carefully. We show that there is sufficient TCP DNS traffic today to provide good operational coverage (particularly of IPv6), and very good temporal coverage (better than existing approaches), enabling near-real time evaluation of DNS latency from real clients. We also show that DNS servers can optionally solicit TCP to broaden coverage. We quantify coverage and show that estimates of DNS latency from TCP is consistent with UDP latency. Our approach finds previously unknown, real problems: DNS polarization is a new problem where a hypergiant sends global traffic to one anycast site rather than taking advantage of the global anycast deployment. Correcting polarization in Google DNS cut its latency from 100ms to 10ms; and from Microsoft Azure cut latency from 90ms to 20ms. We also show other instances of routing problems that add 100-200ms latency. Finally, real-time use of our approach for a European country-level domain has helped detect and correct a BGP routing misconfiguration that detoured European traffic to Australia. We have integrated our approach into several open source tools: Entrada, our open source data warehouse for DNS, a monitoring tool (ANTS), which has been operational for the last 2 years on a country-level top-level domain, and a DNS anonymization tool in use at a root server since March 2021.

The tools we developed in this paper are freely available, including patches to Knot, improvements to dnsanon, improvements to ENTRADA, and the new tool Anteater. Unfortunately data from the paper was from operational DNS systems and so cannot be shared due to privacy concerns.

This paper was made in part through DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I (PAADDOS) and by NWO, NSF CNS-1925737 (DIINER), and the Conconrdia Project, an European Union’s Horizon 2020 Research and Innovation program under Grant Agreement No 830927.

Internet Papers Publications Software releases

new paper “Chhoyhopper: A Moving Target Defense with IPv6” at NDSS MADWeb Workshop 2022

On April 24, 2022 we will publish a new paper titled “Chhoyhopper: A Moving Target Defense with IPv6” by A S M Rizvi and John Heidemann at the 4th Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb 2022), co-located with NDSS. We provide Chhoyhopper as an open-source tool for SSH and HTTPS—try it out!

From the abstract:

Services on the public Internet are frequently scanned, then subject to brute-force password attempts and Denial-of-Service (DoS) attacks. We would like to run such services stealthily, where they are available to friends but hidden from adversaries. In this work, we propose a discovery-resistant moving target defense named “Chhoyhopper” that utilizes the vast IPv6 address space to conceal publicly available services. The client meets the server at an IPv6 address that changes in a pattern based on a shared, pre-distributed secret and the time of day. By hopping over a /64 prefix, services cannot be found by active scanners, and passively observed information is useless after two minutes. We demonstrate our system with the two important applications—SSH and HTTPS, and make our system publicly available.

Client and server interaction in Chhoyhopper. A Client with the right secret key can only get access into the system.

Thanks: A S M Rizvi and John Heidemann’s work on this paper is supported, in part, by the DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I (PAADDoS), and by DARPA under Contract No. HR001120C0157 (SABRES). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or DARPA. We thank Rayner Pais who prototyped an early version of Chhoyhopper and version in IPv4 hopping over ports.

Anycast BGP Internet

new paper “Anycast Agility: Network Playbooks to Fight DDoS” at USENIX Security Symposium 2022

We will publish a new paper titled “Anycast Agility: Network Playbooks to Fight DDoS” by A S M Rizvi (USC/ISI), Leandro Bertholdo (University of Twente), João Ceron (SIDN Labs), and John Heidemann (USC/ISI) at the 31st USENIX Security Symposium in Aug. 2022.

A sample anycast playbook for a 3-site anycast deployment. Different routing configurations provide different traffic mixes. From [Rizvi22a, Table 5].

From the abstract:

IP anycast is used for services such as DNS and Content Delivery Networks (CDN) to provide the capacity to handle Distributed Denial-of-Service (DDoS) attacks. During a DDoS attack service operators redistribute traffic between anycast sites to take advantage of sites with unused or greater capacity. Depending on site traffic and attack size, operators may instead concentrate attackers in a few sites to preserve operation in others. Operators use these actions during attacks, but how to do so has not been described systematically or publicly. This paper describes several methods to use BGP to shift traffic when under DDoS, and shows that a response playbook can provide a menu of responses that are options during an attack. To choose an appropriate response from this playbook, we also describe a new method to estimate true attack size, even though the operator’s view during the attack is incomplete. Finally, operator choices are constrained by distributed routing policies, and not all are helpful. We explore how specific anycast deployment can constrain options in this playbook, and are the first to measure how generally applicable they are across multiple anycast networks.

Dataset used in this paper are listed at, and the software used in our work is at They are provided as part of Call for Artifacts.

Acknowledgments: A S M Rizvi and John Heidemann’s work on this paper is supported, in part, by the DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I. Joao Ceron and Leandro Bertholdo’s work on this paper is supported by Netherlands Organisation for scientific research (4019020199), and European Union’s Horizon 2020 research and innovation program (830927). We would like to thank our anonymous reviewers for their valuable feedback. We are also grateful to the Peering and Tangled admins who allowed us to run measurements. We thank Dutch National Scrubbing Center for sharing DDoS data with us. We also thank Yuri Pradkin for his help to release our datasets.

DNS Papers Publications

New paper and talk “Institutional Privacy Risks in Sharing DNS Data” at Applied Networking Research Workshop 2021

Basileal Imana presented the paper “Institutional Privacy Risks in Sharing DNS Data” by Basileal Imana, Aleksandra Korolova and John Heidemann at Applied Networking Research Workshop held virtually from July 26-28th, 2021.

From the abstract:

We document institutional privacy as a new risk
posed by DNS data collected at authoritative servers, even
after caching and aggregation by DNS recursives. We are the
first to demonstrate this risk by looking at leaks of e-mail
exchanges which show communications patterns, and leaks
from accessing sensitive websites, both of which can harm an
institution’s public image. We define a methodology to identify queries from institutions and identify leaks. We show the
current practices of prefix-preserving anonymization of IP
addresses and aggregation above the recursive are not sufficient to protect institutional privacy, suggesting the need for
novel approaches.

Number of MX and DNSBL queries in a week-long root DNS data that can potentially leak email-related activity

The data from this paper is available upon request, please see our project page.


new conference paper “Anycast in Context: A Tale of Two Systems” at SIGCOMM 2021

We published a new paper “Anycast in Context: A Tale of Two Systems” by Thomas Koch, Ke Li, Calvin Ardi*, Ethan Katz-Bassett, Matt Calder**, and John Heidemann* (of Columbia, where not otherwise indicated, *USC/ISI, and **Microsoft and Columbia) at ACM SIGCOMM 2021.

From the abstract:

Anycast is used to serve content including web pages and DNS, and anycast deployments are growing. However, prior work examining root DNS suggests anycast deployments incur significant inflation, with users often routed to suboptimal sites. We reassess anycast performance, first extending prior analysis on inflation in the root DNS. We show that inflation is very common in root DNS, affecting more than 95% of users. However, we then show root DNS latency hardly matters to users because caching is so effective. These findings lead us to question: is inflation inherent to anycast, or can inflation be limited when it matters? To answer this question, we consider Microsoft’s anycast CDN serving latency-sensitive content. Here, latency matters orders of magnitude more than for root DNS. Perhaps because of this need, only 35% of CDN users experience any inflation, and the amount they experience is smaller than for root DNS. We show that CDN anycast latency has little inflation due to extensive peering and engineering. These results suggest prior claims of anycast inefficiency reflect experiments on a single application rather than anycast’s technical potential, and they demonstrate the importance of context when measuring system performance.

Tom also blogged about this work at APNIC.


the tsuNAME vulnerability in DNS

On 2020-05-06, researchers at SIDN Labs, (the .nl registry), InternetNZ (the .nz registry) , and at the Information Science Institute at the University of Southern California publicly disclosed tsuNAME, a vulnerability in some DNS resolver software that can be weaponized to carry out DDoS attacks against authoritative DNS servers.

TsuNAME is a problem that results from cyclic dependencies in DNS records, where two NS records point at each other. We found that some recursive resolvers would follow this cycle, greatly amplifying an initial queries and stresses the authoritative servers providing those records.

Our technical report describes a tsuNAME related event observed in 2020 at the .nz authoritative servers, when two domains were misconfigured with cyclic dependencies. It caused the total traffic to growth by 50%. In the report, we show how an EU-based ccTLD experienced a 10x traffic growth due to cyclic dependent misconfigurations.

We refer DNS operators and developers to our security advisory that provides recommendations for how to mitigate or detect tsuNAME.

We have also created a tool, CycleHunter, for detecting cyclic dependencies in DNS zones. Following responsible disclosure practices, we provided operators and software vendors time to address the problem first. We are happy that Google public DNS and Cisco OpenDNS both took steps to protect their public resolvers, and that PowerDNS and NLnet have confirmed their current software is not affected.


congratulations to Abdul Qadeer for his PhD

I would like to congratulate Dr. Abdul Qadeer for defending his PhD at the University of Southern California in March 2021 and completing his doctoral dissertation “Efficient Processing of Streaming Data in Multi-User and Multi-Abstraction Workflows”.

From the abstract:

Abdul Qadeer after his defense.

Ever-increasing data and evolving processing needs force enterprises to scale-out expensive computational resources to prioritize processing for timely results. Teams process their organization’s data either independently or using ad hoc sharing mechanisms. Often different users start with the same data and the same initial stages (decrypt, decompress, clean, anonymize). As their workflows evolve, later stages often diverge, and different stages may work best with different abstractions. The result is workflows with some overlap, some variations, and multiple transitions where data handling changes between continuous, windowed, and per-block. The system processing this diverse, multi-user, multi-abstraction workflow should be efficient and safe, but also must cope with fault recovery.

Analytics from multiple users can cause redundant processing and data, or encounter performance anomalies due to skew. Skew arises due to static or dynamic imbalance in the workflow stages. Both redundancy and skew waste compute resources and add latency to results. When users bridge between multiple abstractions, such as from per-block processing to windowed processing, they often employ custom code. These transitions can be error prone due to corner cases, can easily add latency as an inefficiency, and custom code is often a source of errors and maintenance difficulty. We need new solutions to manage the above challenges and to expose opportunities for data sharing explicitly. Our thesis is: new methods enable efficient processing of multi-user and multi-abstraction workflows of streaming data. We present two new methods for efficient stream processing—optimizations for multi-user workflows, and multiple abstractions for application coverage and efficient bridging.

These algorithms use a pipeline-graph to detect duplication of code and data across multiple users and cleanly delineate workflow stages for skew management. The pipeline-graph is our job description language that allows developers to specify their need easily and enables our system to automatically detect duplication and manage skew. The pipeline-graph acts as a shared canvas for collaboration amongst users to extend each other’s work. To efficiently implement our deduplication and skew management algorithms, we present streaming data to processing stages as fixed-sized but large blocks. Large-blocks have low meta-data overhead per user, provide good parallelism, and help with fault recovery.

Our second method enables applications to use a different abstraction on a different workflow stage. We provide three key abstractions and show that they cover many classes of analytics and our framework can bridge them efficiently. We provide Block-Streaming, Windowed-Streaming, and Stateful-Streaming abstractions. Block-Streaming is suitable for single-pass applications that care about temporal or spatial locality. Windowed-Streaming allows applications to process accumulated data (time-aligned blocks to sync with external information) and reductions like summation, averages, or other MapReduce-style analytics. We believe our three abstractions allow many classes of analytics and enable processing of one block, many blocks, or infinite stream. Plumb allows multiple abstractions in different parts of the workflow and provides efficient bridging between them so that users could make complex analytics from individual stages without worrying about data movement.

Our methods aim for good throughput, low latency, and clean and easy-to-use support for more applications to achieve better efficiency than our prior hand-tuned but often brittle system. The Plumb framework is the implementation of our solutions and a testbed to validate them. We use real-world workloads from the B-Root DNS domain to demonstrate effectiveness of our solutions. Our processing deduplication increases throughput up to $6\times$, reduces storage by 75%, as compared to their pre-Plumb counterparts. Plumb reduces CPU wastage due to structural skew up to half and reduces latency due to computational skew by 50%. Plumb has cut per-block latency by 74% and latency of daily statistics by 97%, while reducing code size by 58% and lowering manual intervention to handle problems by 73% as compared to pre-Plumb system.

The operational use of Plumb for the B-Root service provides a multi-year validation of our design choices under many traffic conditions. Over the last three years, Plumb has processed more than 12PB of DNS packet data and daily statistics. We show that our abstractions apply to many applications in the domain of networking big-data and beyond.

Publications Students

congratulations to Lan Wei for her new PhD

I would like to congratulate Dr. Lan Wei for defending her PhD in September 2020 and completing her doctoral dissertation “Anycast Stability, Security and Latency in The Domain Name System (DNS) and Content Deliver Networks (CDNs)” in December 2020.

From the abstract:

Clients’ performance is important for both Content-Delivery Networks (CDNs) and the Domain Name System (DNS). Operators would like the service to meet expectations of their users. CDNs providing stable connections will prevent users from experiencing downloading pause from connection breaks. Users expect DNS traffic to be secure without being intercepted or injected. Both CDN and DNS operators care about a short network latency, since users can become frustrated by slow replies.

Many CDNs and DNS services (such as the DNS root) use IP anycast to bring content closer to users. Anycast-based services announce the same IP address(es) from globally distributed sites. In an anycast infrastructure, Internet routing protocols will direct users to a nearby site naturally. The path between a user and an anycast site is formed on a hop-to-hop basis—at each hop} (a network device such as a router), routing protocols like Border Gateway Protocol (BGP) makes the decision about which next hop to go to. ISPs at each hop will impose their routing policies to influence BGP’s decisions. Without globally knowing (also unable to modify) the distributed information of BGP routing table of every ISP on the path, anycast infrastructure operators are unable to predict and control in real-time which specific site a user will visit and what the routing path will look like. Also, any change in routing policy along the path may change both the path and the site visited by a user. We refer to such minimal control over routing towards an anycast service, the uncertainty of anycast routing. Using anycast spares extra traffic management to map users to sites, but can operators provide a good anycast-based service without precise control over the routing?

This routing uncertainty raises three concerns: routing can change, breaking connections; uncertainty about global routing means spoofing can go undetected, and lack of knowledge of global routing can lead to suboptimal latency. In this thesis, we show how we confirm the stability, how we confirm the security, and how we improve the latency of anycast to answer these three concerns. First, routing changes can cause users to switch sites, and therefore break a stateful connection such as a TCP connection immediately. We study routing stability and demonstrate that connections in anycast infrastructure are rarely broken by routing instability. Of all vantage points (VPs), fewer than 0.15% VP’s TCP connections frequently break due to timeout in 5s during all 17 hours we observed. We only observe such frequent TCP connection break in 1 service out of all 12 anycast services studied. A second problem is DNS spoofing, where a third-party can intercept the DNS query and return a false answer. We examine DNS spoofing to study two aspects of security–integrity and privacy, and we design an algorithm to detect spoofing and distinguish different mechanisms to spoof anycast-based DNS. We show that DNS spoofing is uncommon, happening to only 1.7% of all VPs, although increasing over the years. Among all three ways to spoof DNS–injections, proxies, and third-party anycast site (prefix hijack), we show that third-party anycast site is the least popular one. Last, diagnosing poor latency and improving the latency can be difficult for CDNs. We develop a new approach, BAUP (bidirectional anycast unicast probing), which detects inefficient routing with better routing replacement provided. We use BAUP to study anycast latency. By applying BAUP and changing peering policies, a commercial CDN is able to significantly reduce latency, cutting median latency in half from 40ms to 16ms for regional users.

Lan defended her PhD when USC was on work-from-home due to COVID-19; she is the third ANT student with a fully on-line PhD defense.


Deep Dive into DNS at IETF108

The Domain Name System (DNS) is responsible for handling the initial steps of almost all connections on the Internet. USC/ISI’s Wes Hardaker, along with Geoff Houston and Joao Damas from APNIC, gave a “Deep Dive” presentation on how the DNS works at the 108th IETF conference. The recording is available on YouTube for those that missed it.