Categories
Uncategorized

congratulations to Tarang Saluja for his summer undergraduate research internship

Tarang Saluja completed his summer undergraduate research internship at ISI this summer, working with John Heidemann and Yuri Pradkin on his project “Differences in Monitoring the DNS Root Over IPv4 and IPv6″.

In his project, Tarang examined RIPE Atlas’s DNSmon, a measurement system that monitors the Root Server System. DNSmon examines both IPv4 and IPv6, and its IPv6 reports show query loss rates that are consistently higher than IPv4, often 4-6% IPv6 loss vs. no or 2% IPv4 loss. Prior results by researchers at RIPE suggested these differences were due to problems at specific Atlas Vantage Points (VPs, also called Atlas Probes).

Tarang Saluja describing his research to an ISI researcher, at the ISI REU Poster Session on 2022-08-01.

Building on the Guillero Baltra’s studies of partial connectivity in the Internet, Tarang classified Atlas VPs with problems as islands and peninsulas. Islands think they are on IPv6, but cannot reach any of the 13 Root DNS “letters” over IPv6, indicating that the VP has a local network configuration problem. Peninsulas can reach some letters, but not others, indicating a routing problem somewhere in the core of the Internet.

Tarang’s work is important because these observations allow lead to potential solutions. Islands suggest VPs that do not support IPv6 and so should not be used for monitoring. Peninsulas point to IPv6 routing problems that need to be addressed by ISPs. Setting VPs with these problems aside provides a more accurate view of what IPv6 should be, and allows us to use DNSmon to detect more subtle problems. Together, his work points the way to improving IPv6 for everyone and improving Root DNS access over IPv6.

Tarang’s work was part of the ISI Research Experiences for Undergraduates program at USC/ISI. We thank Jelena Mirkovic (PI) for coordinating another year of this great program, and NSF for support through award #2051101.

Categories
Uncategorized

Large Canadian Internet Outage

News reports (for example, at the Verge and Slashdot) mention a large outage in Rogers, a major Canadian telecommunications provider.

We see lots of evidence for this in our Internet outage detection system.

It’s big! Maybe 30% of Toronto and southern Ontario networks, plus a lot of outages in New Brunswick.

Ontario:

Internet outages in Ontario, Canada. The largest circle represents about 6500 /24 network blocks down near Toronto, about 30% of the /24 blocks in that area. See details on our outage website.

New Brunswick:

Internet outages in New Brunswick, Canada. The largest circle here represents 196 /24 network blocks down near Moncton, more than 45% of the /24 blocks there. The red circles are areas where most or all network blocks are currently out. See details on our outage website.

An update: Newfoundland also sees a lot of outages. Quebec looks in pretty good shape, though.

And it’s lasting a long time. It looks like it started at 5am Eastern time (2022-07-08t09:00Z), it it has lasted 9.5 hours so far!

We wish Rogers personnel and our Canadian neighbors the best.

Update at 2022-07-09t06:15Z (2:15am Eastern time): Toronto is doing much better, with “only” 10% of blocks unreachable (22808 of 21.5k in the 43.8N,79.3W 0.5 grid cell). New Brunswick and Newfoundland still look the same, with outages in about 50% of blocks.

Update at 2022-07-09t21:10Z (5:10pm Eastern time): It looks like many Rogers networks recovered at 2022-07-09t05:15Z (1:15am Eastern time). This includes all of New Brunswick and Newfoundland and most of Ontario. Trinocular has about a one-hour delay while it computes results, so I did not see this result when I checked in the prior update–I needed to wait 15 minutes more.

Categories
Uncategorized

Internet Outages Timelines and Events in 2022

We recently added timeline support to our Outage World map–clicking on an outage bubble pops up a window with a sparkline (a small graph) showing maximum outages on each data for the current quarter, and clicking on the “daily timeline” tab shows outages for the current 24 hours. These graphs help provide context for how long an outage lasts, and if there were other outages the same quarter.

As an example, here is a major outage effecting most of central and southern Mexico on 2022-01-05. The timeline of Mexico City shows how unusual this outage was:

Some other big outages in 2022 include this big outage in Italy on April 27 from 18:00 to 23:59:

and in southwest Florida on April 24 at 3:15pm Eastern Time (that’s 2022-04-24t19:15Z) that was confirmed as a fiber cut:

Thanks to Erica Stutz for adding timelines to the outage code (as a follow on to her work on Covid-19 Work-from-Home visualization) and to Yuri Pradkin for spotting these events.

Categories
Uncategorized

new paper “Old but Gold: Prospecting TCP to Engineer and Live Monitor DNS Anycast” Awarded Best Paper at the Passive and Active Measurement Conference

On March 29, 2022 the paper “Old but Gold: Prospecting TCP to Engineer and Live Monitor DNS Anycast” by Giovane C. M. Moura, John Heidemann, Wes Hardaker, Pithayuth Charnsethikul, Jeroen Bulten, João M. Ceron, and Cristian Hesselman appeared that the 2022 Passive and Active Measurement Conference. We’re happy that it was awarded Best Paper for this year’s conference!

From the abstract:

Google latency for .nl before (left red area) and after (middle green area) DNS polarization was corrected. Polarization was detected with ENTRADA using the work from this paper.

DNS latency is a concern for many service operators: CDNs exist to reduce service latency to end-users but must rely on global DNS for reachability and load-balancing. Today, DNS latency is monitored by active probing from distributed platforms like RIPE Atlas, with Verfploeter, or with commercial services. While Atlas coverage is wide, its 10k sites see only a fraction of the Internet. In this paper we show that passive observation of TCP handshakes can measure live DNS latency, continuously, providing good coverage of current clients of the service. Estimating RTT from TCP is an old idea, but its application to DNS has not previously been studied carefully. We show that there is sufficient TCP DNS traffic today to provide good operational coverage (particularly of IPv6), and very good temporal coverage (better than existing approaches), enabling near-real time evaluation of DNS latency from real clients. We also show that DNS servers can optionally solicit TCP to broaden coverage. We quantify coverage and show that estimates of DNS latency from TCP is consistent with UDP latency. Our approach finds previously unknown, real problems: DNS polarization is a new problem where a hypergiant sends global traffic to one anycast site rather than taking advantage of the global anycast deployment. Correcting polarization in Google DNS cut its latency from 100ms to 10ms; and from Microsoft Azure cut latency from 90ms to 20ms. We also show other instances of routing problems that add 100-200ms latency. Finally, real-time use of our approach for a European country-level domain has helped detect and correct a BGP routing misconfiguration that detoured European traffic to Australia. We have integrated our approach into several open source tools: Entrada, our open source data warehouse for DNS, a monitoring tool (ANTS), which has been operational for the last 2 years on a country-level top-level domain, and a DNS anonymization tool in use at a root server since March 2021.

The tools we developed in this paper are freely available, including patches to Knot, improvements to dnsanon, improvements to ENTRADA, and the new tool Anteater. Unfortunately data from the paper was from operational DNS systems and so cannot be shared due to privacy concerns.

This paper was made in part through DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I (PAADDOS) and by NWO, NSF CNS-1925737 (DIINER), and the Conconrdia Project, an European Union’s Horizon 2020 Research and Innovation program under Grant Agreement No 830927.

Categories
Uncategorized

congratulations to Erica Stutz for her summer undergraduate internship

Erica Stutz completed her summer undergraduate research internship at ISI this summer, working with John Heidemann, Yuri Pradkin, and Xiao Song on her project “Visualizing COVID-19 Work-from-Home”.

In this project, Erica developed a new Covid-19 Work-From-Home website combinng Xiao WFH data with our existing outage website, and adding new interactive drill-down methods to display additional information to the user.

Visulizing Covid-19 work-from-home: here we look at China, Korea, and Japan and pop-up information about Laiwu, China. The popup shows WFH behavior for that location for the first 6 months of 2020.

We hope Erica’s new website makes it easier to evaluate COVID-19 WFH changes, and we look forward to continue to work with Erica on this topic.

Erica worked virtually at USC/ISI in summer 2021 as part of the (ISI Research Experiences for Undergraduates. We thank Jelena Mirkovic (PI) for coordinating the second year of this great program, and NSF for support through award #2051101.

Categories
Uncategorized

network outages in Louisiana with Hurricane Ida

We’ve been watching the situation in Louisiana develop with Hurricane Ida with our Trinocular Internet outage detection system.

Internet outages in Louisiana at 8:30pm Sunday evening August 29, corresponding to Hurricane Ida’s landfall.

Data as of 2021-08-30t01:30Z, which is 8:30pm Sunday night August 29 in New Orleans, shows about half of the networks in the New Orleans area being unreachable (mostly IPv4 home networks). Following shortly after landfall, these outages correspond with news reports about widespread power loss. Current data is appearing on our Internet outage map.

We wish the residents of Louisiana the best and hope for a rapid recovery.

Categories
Uncategorized

ANT research group lunch

At the end of June we had an ANT research group lunch to celebrate four (!) recent PhD defenses in 2020 and 2021: Hang Guo, Calvin Ardi, Lan Wei, and Abdul Qadeer. Although not everyone could be there (Hang has already moved for his new job), and the ANT lab includes a number of people outside of L.A. who could not make it, us students, staff, and family in L.A. had a great time at Vista del Mar Park near the beach!

A big thanks to Basileal Imana and ASM Rizvi for coordinating delivery of Ethiopian food for lunch.

We are also very thankful that vaccine availability in the U.S. is widespread and we were able to get together face-to-face after a year of Covid limitations. I’m happy that we’ve been able to do good work throughout the pandemic with remote collaboration tools and occasional on-site access, but it was nice to see old friends face-to-face again and share a meal. We hope the fall’s in-person classes at USC go well.

Categories
Uncategorized

new conference paper “Anycast in Context: A Tale of Two Systems” at SIGCOMM 2021

We published a new paper “Anycast in Context: A Tale of Two Systems” by Thomas Koch, Ke Li, Calvin Ardi*, Ethan Katz-Bassett, Matt Calder**, and John Heidemann* (of Columbia, where not otherwise indicated, *USC/ISI, and **Microsoft and Columbia) at ACM SIGCOMM 2021.

From the abstract:

Anycast is used to serve content including web pages and DNS, and anycast deployments are growing. However, prior work examining root DNS suggests anycast deployments incur significant inflation, with users often routed to suboptimal sites. We reassess anycast performance, first extending prior analysis on inflation in the root DNS. We show that inflation is very common in root DNS, affecting more than 95% of users. However, we then show root DNS latency hardly matters to users because caching is so effective. These findings lead us to question: is inflation inherent to anycast, or can inflation be limited when it matters? To answer this question, we consider Microsoft’s anycast CDN serving latency-sensitive content. Here, latency matters orders of magnitude more than for root DNS. Perhaps because of this need, only 35% of CDN users experience any inflation, and the amount they experience is smaller than for root DNS. We show that CDN anycast latency has little inflation due to extensive peering and engineering. These results suggest prior claims of anycast inefficiency reflect experiments on a single application rather than anycast’s technical potential, and they demonstrate the importance of context when measuring system performance.

Tom also blogged about this work at APNIC.

Categories
Uncategorized

new workshop report “Overcoming Measurement Barriers to Internet Research” (WOMBIR 2021) in ACM CCR

WOMBIR 2021 was the NSF-sponsored Workshop on Overcoming Measurement Barriers to Internet Research. This workshop was hold in two sessions over several days in January and April 2021, chaired by k.c. claffy, David Clark, Fabian Bustamente, John Heidemann, and Mattijs Monjker. The final report includes contributions from Aaron Schulman and Ellen Zegura as well as all the workshop participants.

From the abstract:

In January and April 2021 we held the Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR) with the goal of understanding challenges in network and security data set collection and sharing. Most workshop attendees provided white papers describing their perspectives, and many participated in short-talks and discussion in two virtual workshops over five days. That discussion produced consensus around several points. First, many aspects of the Internet are characterized by decreasing visibility of important network properties, which is in tension with the Internet’s role as critical infrastructure. We discussed three specific research areas that illustrate this tension: security, Internet access; and mobile networking. We discussed visibility challenges at all layers of the networking stack, and the challenge of gathering data and validating inferences. Important data sets require longitudinal (long-term, ongoing) data collection and sharing, support for which is more challenging for Internet research than other fields. We discussed why a combination of technical and policy methods are necessary to safeguard privacy when using or sharing measurement data. Workshop participant proposed several opportunities to accelerate progress, some of which require coordination across government, industry, and academia.

Categories
Uncategorized

new talk “Observing the Global IPv4 Internet: What IP Addresses Show” as an SKC Science and Technology Webinar

John Heidemann gave the talk “Observing the Global IPv4 Internet: What IP Addresses Show” at the SKC Science and Technology Webinar, hosted by Deepankar Medhi (U. Missouri-Kansas City and NSF) on June 18, 2021.  A video of the talk is on YouTube at https://www.youtube.com/watch?v=4A_gFXi2WeY. Slides are available at https://www.isi.edu/~johnh/PAPERS/Heidemann21a.pdf.

From the abstract:Covid and non-Covid network changes in India; part of a talk about measuring the IPv4 Internet.

Since 2014 the ANT lab at USC has been observing the visible IPv4 Internet (currently 5 million networks measured every 11 minutes) to detect network outages. This talk explores how we use this large-scale, active measurement to estimate Internet reliability and understand the effects of real-world events such as hurricanes. We have recently developed new algorithms to identify Covid-19-related Work-from-Home and other Internet shutdowns in this data. Our Internet outage work is joint work of John Heidemann, Lin Quan, Yuri Pradkin, Guillermo Baltra, Xiao Song, and Asma Enayet with contributions from Ryan Bogutz, Dominik Staros, Abdulla Alwabel, and Aqib Nisar.

This project is joint work of a number of people listed in the abstract above, and is supported by NSF 2028279 (MINCEQ) and CNS-2007106 (EIEIO). All data from this paper is available at no cost to researchers.