Categories
Uncategorized

new paper “Differences in Monitoring the DNS Root Over IPv4 and IPv6” to appear at the IEEE National Symposium for NSF REU Research in Data Science, Systems, and Security

On December 15, 2022, Tarang Saluja will present the paper “Differences in Monitoring the DNS Root Over IPv4 and IPv6” (by Tarang Saluja, John Heidemann, and Yuri Pradkin) at the IEEE National Symposium for NSF REU Research in Data Science, Systems, and Security.

From the abstract:

Figure 9 from [Saluja22a], showing fraction of query failures in RIPE Atlas after we remove observers that are islands (unable to reach any of the 13 DNS root identifiers). Blue is IPv4, red is IPv6, with data for each of the 13 DNS root identifiers. We believe this data is a better representation of what people expect to see than Atlas results that include these “broken” observers.

The Domain Name System (DNS) is an essential service for the Internet which maps host names to IP addresses. The DNS Root Sever System operates the top of this namespace. RIPE Atlas observes DNS from more than 11k vantage points (VPs) around the world, reporting the reliability of the DNS Root Server System in DNSmon. DNSmon shows that loss rates for queries to the DNS Root are nearly 10% for IPv6, much higher than the approximately 2% loss seen for IPv4. Although IPv6 is “new,” as an operational protocol available to a third of Internet users, it ought to be just as reliable as IPv4. We examine this difference at a finer granularity by investigating loss at individual VPs. We confirm that specific VPs are the source of this difference and identify two root causes: VP islands with routing problems at the edge which leave them unable to access IPv6 outside their LAN, and VP peninsulas which indicate routing problems in the core of the network. These problems account for most of the loss and nearly all of the difference between IPv4 and IPv6 query loss rates. Islands account for most of the loss (half of IPv4 failures and 5/6ths of IPv6 failures), and we suggest these measurement devices should be filtered out to get a more accurate picture of loss rates. Peninsulas account for the main differences between root identifiers, suggesting routing disagreements root operators need to address. We believe that filtering out both of these known problems provides a better measure of underlying network anomalies and loss and will result in more actionable alerts.

Original data from this paper is available from RIPE Atlas (measurement ids are in the paper). We are publishing new results daily on our website (from the RIPE data).

This work was done while Tarang was on his Summer 2022 undergraduate research internship at USC/ISI, with support from NSF grant 2051101 (PI: Jelena Mirkovich). John Heidemann and Yuri Pradkin’s work is supported by NSF through the EIEIO project (CNS-2007106). We thank Guillermo Baltra for his work on islands and peninsulas, as seen in his arXiv report.

Categories
Uncategorized

new paper “Old but Gold: Prospecting TCP to Engineer and Live Monitor DNS Anycast” Awarded Best Paper at the Passive and Active Measurement Conference

On March 29, 2022 the paper “Old but Gold: Prospecting TCP to Engineer and Live Monitor DNS Anycast” by Giovane C. M. Moura, John Heidemann, Wes Hardaker, Pithayuth Charnsethikul, Jeroen Bulten, João M. Ceron, and Cristian Hesselman appeared that the 2022 Passive and Active Measurement Conference. We’re happy that it was awarded Best Paper for this year’s conference!

From the abstract:

Google latency for .nl before (left red area) and after (middle green area) DNS polarization was corrected. Polarization was detected with ENTRADA using the work from this paper.

DNS latency is a concern for many service operators: CDNs exist to reduce service latency to end-users but must rely on global DNS for reachability and load-balancing. Today, DNS latency is monitored by active probing from distributed platforms like RIPE Atlas, with Verfploeter, or with commercial services. While Atlas coverage is wide, its 10k sites see only a fraction of the Internet. In this paper we show that passive observation of TCP handshakes can measure live DNS latency, continuously, providing good coverage of current clients of the service. Estimating RTT from TCP is an old idea, but its application to DNS has not previously been studied carefully. We show that there is sufficient TCP DNS traffic today to provide good operational coverage (particularly of IPv6), and very good temporal coverage (better than existing approaches), enabling near-real time evaluation of DNS latency from real clients. We also show that DNS servers can optionally solicit TCP to broaden coverage. We quantify coverage and show that estimates of DNS latency from TCP is consistent with UDP latency. Our approach finds previously unknown, real problems: DNS polarization is a new problem where a hypergiant sends global traffic to one anycast site rather than taking advantage of the global anycast deployment. Correcting polarization in Google DNS cut its latency from 100ms to 10ms; and from Microsoft Azure cut latency from 90ms to 20ms. We also show other instances of routing problems that add 100-200ms latency. Finally, real-time use of our approach for a European country-level domain has helped detect and correct a BGP routing misconfiguration that detoured European traffic to Australia. We have integrated our approach into several open source tools: Entrada, our open source data warehouse for DNS, a monitoring tool (ANTS), which has been operational for the last 2 years on a country-level top-level domain, and a DNS anonymization tool in use at a root server since March 2021.

The tools we developed in this paper are freely available, including patches to Knot, improvements to dnsanon, improvements to ENTRADA, and the new tool Anteater. Unfortunately data from the paper was from operational DNS systems and so cannot be shared due to privacy concerns.

This paper was made in part through DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I (PAADDOS) and by NWO, NSF CNS-1925737 (DIINER), and the Conconrdia Project, an European Union’s Horizon 2020 Research and Innovation program under Grant Agreement No 830927.

Categories
Internet Papers Publications Software releases

new paper “Chhoyhopper: A Moving Target Defense with IPv6” at NDSS MADWeb Workshop 2022

On April 24, 2022 we will publish a new paper titled “Chhoyhopper: A Moving Target Defense with IPv6” by A S M Rizvi and John Heidemann at the 4th Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb 2022), co-located with NDSS. We provide Chhoyhopper as an open-source tool for SSH and HTTPS—try it out!

From the abstract:

Services on the public Internet are frequently scanned, then subject to brute-force password attempts and Denial-of-Service (DoS) attacks. We would like to run such services stealthily, where they are available to friends but hidden from adversaries. In this work, we propose a discovery-resistant moving target defense named “Chhoyhopper” that utilizes the vast IPv6 address space to conceal publicly available services. The client meets the server at an IPv6 address that changes in a pattern based on a shared, pre-distributed secret and the time of day. By hopping over a /64 prefix, services cannot be found by active scanners, and passively observed information is useless after two minutes. We demonstrate our system with the two important applications—SSH and HTTPS, and make our system publicly available.

Client and server interaction in Chhoyhopper. A Client with the right secret key can only get access into the system.

Thanks: A S M Rizvi and John Heidemann’s work on this paper is supported, in part, by the DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I (PAADDoS), and by DARPA under Contract No. HR001120C0157 (SABRES). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or DARPA. We thank Rayner Pais who prototyped an early version of Chhoyhopper and version in IPv4 hopping over ports.

Categories
Anycast BGP Internet

new paper “Anycast Agility: Network Playbooks to Fight DDoS” at USENIX Security Symposium 2022

We will publish a new paper titled “Anycast Agility: Network Playbooks to Fight DDoS” by A S M Rizvi (USC/ISI), Leandro Bertholdo (University of Twente), João Ceron (SIDN Labs), and John Heidemann (USC/ISI) at the 31st USENIX Security Symposium in Aug. 2022.

A sample anycast playbook for a 3-site anycast deployment. Different routing configurations provide different traffic mixes. From [Rizvi22a, Table 5].

From the abstract:

IP anycast is used for services such as DNS and Content Delivery Networks (CDN) to provide the capacity to handle Distributed Denial-of-Service (DDoS) attacks. During a DDoS attack service operators redistribute traffic between anycast sites to take advantage of sites with unused or greater capacity. Depending on site traffic and attack size, operators may instead concentrate attackers in a few sites to preserve operation in others. Operators use these actions during attacks, but how to do so has not been described systematically or publicly. This paper describes several methods to use BGP to shift traffic when under DDoS, and shows that a response playbook can provide a menu of responses that are options during an attack. To choose an appropriate response from this playbook, we also describe a new method to estimate true attack size, even though the operator’s view during the attack is incomplete. Finally, operator choices are constrained by distributed routing policies, and not all are helpful. We explore how specific anycast deployment can constrain options in this playbook, and are the first to measure how generally applicable they are across multiple anycast networks.

Dataset used in this paper are listed at https://ant.isi.edu/datasets/anycast/anycast_against_ddos/index.html, and the software used in our work is at https://ant.isi.edu/software/anygility. They are provided as part of Call for Artifacts.

Acknowledgments: A S M Rizvi and John Heidemann’s work on this paper is supported, in part, by the DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I. Joao Ceron and Leandro Bertholdo’s work on this paper is supported by Netherlands Organisation for scientific research (4019020199), and European Union’s Horizon 2020 research and innovation program (830927). We would like to thank our anonymous reviewers for their valuable feedback. We are also grateful to the Peering and Tangled admins who allowed us to run measurements. We thank Dutch National Scrubbing Center for sharing DDoS data with us. We also thank Yuri Pradkin for his help to release our datasets.

Categories
Papers Publications

new symposium paper “Visualizing Internet Measurements of Covid-19 Work-from-Home” at IEEE Symposium on REU Research in Data Science, Systems, and Security

We published a new paper “Visualizing Internet Measurements of Covid-19 Work-from-Home” by Erica Stutz (Swarthmore College), Yuri Pradkin, Xiao Song, and John Heidemann (USC/ISI) at the Symposium for REU Research in Data Science, Systems, and Security, co-located with IEEE BigData 2021.

A screenshot from our Covid-WFH website showing an event in Malaysia on 2020-04-02.
A change in Internet use seen in Malaysia on 2020-04-02, present in our Covid-WFH data but discovered through our website.

From the abstract:

The Covid-19 pandemic disrupted the world as businesses and schools shifted to work-from-home (WFH), and comprehensive maps have helped visualize how those policies changed over time and in different places. We recently developed algorithms that infer the onset of WFH based on changes in observed Internet usage. Measurements of WFH are important to evaluate how effectively policies are implemented and followed, or to confirm policies in countries with less transparent journalism.This paper describes a web-based visualization system for measurements of Covid-19-induced WFH. We build on a web-based world map, showing a geographic grid of observations about WFH. We extend typical map interaction (zoom and pan, plus animation over time) with two new forms of pop-up information that allow users to drill-down to investigate our underlying data.We use sparklines to show changes over the first 6 months of 2020 for a given location, supporting identification and navigation to hot spots. Alternatively, users can report particular networks (Internet Service Providers) that show WFH on a given day.We show that these tools help us relate our observations to news reports of Covid-19-induced changes and, in some cases, lockdowns due to other causes. Our visualization is publicly available at https://covid.ant.isi.edu, as is our underlying data.

Datasets from this work will be available from our website and can be seen now at https://covid.ant.isi.edu. We thank NSF grants 2028279 and CNS-2007106 for supporting this work.

Categories
DNS Papers Publications

New paper and talk “Institutional Privacy Risks in Sharing DNS Data” at Applied Networking Research Workshop 2021

Basileal Imana presented the paper “Institutional Privacy Risks in Sharing DNS Data” by Basileal Imana, Aleksandra Korolova and John Heidemann at Applied Networking Research Workshop held virtually from July 26-28th, 2021.

From the abstract:

We document institutional privacy as a new risk
posed by DNS data collected at authoritative servers, even
after caching and aggregation by DNS recursives. We are the
first to demonstrate this risk by looking at leaks of e-mail
exchanges which show communications patterns, and leaks
from accessing sensitive websites, both of which can harm an
institution’s public image. We define a methodology to identify queries from institutions and identify leaks. We show the
current practices of prefix-preserving anonymization of IP
addresses and aggregation above the recursive are not sufficient to protect institutional privacy, suggesting the need for
novel approaches.

Number of MX and DNSBL queries in a week-long root DNS data that can potentially leak email-related activity

The data from this paper is available upon request, please see our project page.

Categories
Uncategorized

new conference paper “Anycast in Context: A Tale of Two Systems” at SIGCOMM 2021

We published a new paper “Anycast in Context: A Tale of Two Systems” by Thomas Koch, Ke Li, Calvin Ardi*, Ethan Katz-Bassett, Matt Calder**, and John Heidemann* (of Columbia, where not otherwise indicated, *USC/ISI, and **Microsoft and Columbia) at ACM SIGCOMM 2021.

From the abstract:

Anycast is used to serve content including web pages and DNS, and anycast deployments are growing. However, prior work examining root DNS suggests anycast deployments incur significant inflation, with users often routed to suboptimal sites. We reassess anycast performance, first extending prior analysis on inflation in the root DNS. We show that inflation is very common in root DNS, affecting more than 95% of users. However, we then show root DNS latency hardly matters to users because caching is so effective. These findings lead us to question: is inflation inherent to anycast, or can inflation be limited when it matters? To answer this question, we consider Microsoft’s anycast CDN serving latency-sensitive content. Here, latency matters orders of magnitude more than for root DNS. Perhaps because of this need, only 35% of CDN users experience any inflation, and the amount they experience is smaller than for root DNS. We show that CDN anycast latency has little inflation due to extensive peering and engineering. These results suggest prior claims of anycast inefficiency reflect experiments on a single application rather than anycast’s technical potential, and they demonstrate the importance of context when measuring system performance.

Tom also blogged about this work at APNIC.

Categories
Announcements

reblogging: the diurnal Internet and DNS backscatter

We are happy to share that two of our older topics have appeared more recently in other venues.

Our animations of the diurnal Internet, originally seen in our 2014 ACM IMC paper and our blog posts, was noticed by Gerald Smith who used it to start a discussion with seventh-grade classes in Mahe, India and (I think) Indiana, USA as part of his Fullbright work. It’s great to see research work that useful to middle-schoolers!

Kensuke Fukuda recently posted about our work on identifying IPv6 scanning with DNS backscatter at the APNIC blog. This work was originally published at the 2018 ACM IMC and posted in our blog. It’s great to see that work get out to a new audience.

Categories
Publications Technical Report

new technical report “Plumb: Efficient Processing of Multi-User Pipelines (Poster)”

We released a new technical report “Plumb: Efficient Processing of Multi-User Pipelines (Poster)”, by Abdul Qadeer and John Heidemann, as ISI-TR-731.  This work was originally presented at ACM Symposium on Cloud Computing (the poster abstract is available at ACM). The poster abstract with a small version of the poster is available at https://www.isi.edu/publications/trpublic/pdfs/isi-tr-731.pdf

aqadeer at SoCC 2018 Carlsbad CA

From the abstract:

As the field of big data analytics matures, workflows are increasingly complex and often include components that are shared by different users. Individual workflows often include multiple stages, and when groups build on each other’s work it is easy to lose track of computation that may be shared across different groups.

The contribution of this poster is to provide an organization-wide processing substrate Plumb that can be used to solve commonly occurring problems and to achieve a common goal. Plumb makes multi-user sharing a first-class concern by providing pipeline-graph abstraction. This abstraction is simple and based on fundamental model of input-processing-output but is powerful to capture processing and data duplication. Plumb then employs best available solutions to tackle problems of large-block processing under structural and computational skew without user intervention.

We expect to release the Plumb software this fall; please contact us if you have questions or interest in using it.

Categories
Papers Publications

new paper “Precise Detection of Content Reuse in the Web” to appear in ACM SIGCOMM Computer Communication Review

We have published a new paper “Precise Detection of Content Reuse in the Web” by Calvin Ardi and John Heidemann, in the ACM SIGCOMM Computer Communication Review (Volume 49 Issue 2, April 2019) newsletter.

From the abstract:

With vast amount of content online, it is not surprising that unscrupulous entities “borrow” from the web to provide content for advertisements, link farms, and spam. Our insight is that cryptographic hashing and fingerprinting can efficiently identify content reuse for web-size corpora. We develop two related algorithms, one to automatically discover previously unknown duplicate content in the web, and the second to precisely detect copies of discovered or manually identified content. We show that bad neighborhoods, clusters of pages where copied content is frequent, help identify copying in the web. We verify our algorithm and its choices with controlled experiments over three web datasets: Common Crawl (2009/10), GeoCities (1990s–2000s), and a phishing corpus (2014). We show that our use of cryptographic hashing is much more precise than alternatives such as locality-sensitive hashing, avoiding the thousands of false-positives that would otherwise occur. We apply our approach in three systems: discovering and detecting duplicated content in the web, searching explicitly for copies of Wikipedia in the web, and detecting phishing sites in a web browser. We show that general copying in the web is often benign (for example, templates), but 6–11% are commercial or possibly commercial. Most copies of Wikipedia (86%) are commercialized (link farming or advertisements). For phishing, we focus on PayPal, detecting 59% of PayPal-phish even without taking on intentional cloaking.