Early longitudinal results in measuring the usage of Mozilla’s DNS Canary

Mozilla announced the creation of a “use-application-dns.net” “Canary Domain” that could be configured within ISPs to disable Firefox’s default use of DNS over HTTPS. On 2019/09/21 Wes Hardaker created a RIPE Atlas measurement to study resolvers within ISPs that had been configured to return an NXDOMAIN response. This measurement is configured to have 1000 Atlas probes query for the use-application-dns.net name once a day.

The full description of methodology is on Wes’ ISI site, which should receive regular updates to the graph.

canary

Posted in Uncategorized | Leave a comment

group lunch in honor of a departure and two arrivals

On November 14 we had a group lunch near ISI to celebrate the completion of Joao Ceron’s visit from the University of Twente as a visiting scholar, to welcome Asma Enayet to the group as a new PhD student, and to welcome Hang Guo’s son into the world. (Hang was understandably not able to make the lunch.) Happy Thanksgiving to all!

A group lunch in honor of Asma (left) and Joao (6th left).
Posted in Uncategorized | Tagged , , , , , , | Leave a comment

new paper “Identifying Important Internet Outages” at the Sixth National Symposium for NSF REU Research in Data Science, Systems, and Security

We will publish a new paper “Identifying Important Internet Outages” by Ryan Bogutz, Yuri Pradkin, and John Heidemann, in the Sixth National Symposium for NSF REU Research in Data Science, Systems, and Security in Los Angeles, California, USA, on December 12, 2019.

From the abstract:

[Bogutz19a, figure 1]: Our sideboard showing important outages on 2019-03-08, including this outage in Venezuela.

Today, outage detection systems can track outages across the whole IPv4 Internet—millions of networks. However, it becomes difficult to find meaningful, interesting events in this huge dataset, since three months of data can easily include 660M observations and thousands of outage events. We propose an outage reporting system that sifts through this data to find the most interesting events. We explore multiple metrics to evaluate interesting”, reflecting the size and severity of outages. We show that defining interest as the product of size by severity works well, avoiding degenerate cases like complete outages affecting a few people, and apparently large outages that affect only a small fraction of people in an area. We have integrated outage reporting into our existing public website (https://outage.ant.isi.edu) with the goal of making near-real-time outage information accessible to the general public. Such data can help answer questions like “what are the most significant outages today?”, did Florida have major problems in an ongoing hurricane?”, and
“are there power outages in Venezuela?”.

The data from this paper is available publicly and in our website. The technical report ISI-TR-735 includes some additional data.

Posted in Uncategorized | Tagged , , , , , , , , | Leave a comment

new technical report “Improving the Optics of Active Outage Detection (extended)”

We have released a new technical report “Improving the Optics of the Active Outage Detection (extended)”, by Guillermo Baltra and John Heidemann, as ISI-TR-733.

From the abstract:

A sample block showing changes in block usage (c), and outage detection results of Trinocular (b) and improved with the Full Block Scanning Algorithm (a).

There is a growing interest in carefully observing the reliability of the Internet’s edge. Outage information can inform our understanding of Internet reliability and planning, and it can help guide operations. Outage detection algorithms using active probing from third parties have been shown to be accurate for most of the Internet, but inaccurate for blocks that are sparsely occupied. Our contributions include a definition of outages, which we use to determine how many independent observers are required to determine global outages. We propose a new Full Block Scanning (FBS) algorithm that gathers more information for sparse blocks to reduce false outage reports. We also propose ISP Availability Sensing (IAS) to detect maintenance activity using only external information. We study a year of outage data and show that FBS has a True Positive Rate of 86%, and show that IAS detects maintenance events in a large U.S. ISP.

All data from this paper will be publicly available.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Talks at DNS-OARC 61

Wes Hardaker gave two presentations at DNS-OARC on November 1st, 2019. The first was a presentation about the previously announced “Cache me if you can” paper, which is on youtube, and the slides are available as well. The second talk presented Haoyu Jiang’s work during the summer of 2018 on analyzing DNS B-Root traffic during the 2018 DITL data for levels of traffic sent by the Chrome web browser, levels of traffic associated with different languages, and levels of traffic sent by different label lengths. It is available on youtube with the slides here.

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

new conference paper “Cache Me If You Can: Effects of DNS Time-to-Live” at ACM IMC 2019

We will publish a new paper “Cache Me If You Can: Effects of DNS Time-to-Live” by Giovane C. M. Moura, John Heidemann, Ricardo de O. Schmidt, and Wes Hardaker, in the ACM Internet Measurements Conference (IMC 2019) in Amsterdam, the Netherlands.

From the abstract:

Figure 10a from [Moura19b], showing the distribution of latency with small TTLs before (right in blue) and with larger TTLs after (left in red) the .uy domain reviewed our work and lengthened their domain’s cache lifetimes to reduce latency to their customers.

DNS depends on extensive caching for good performance, and every DNS zone owner must set Time-to-Live (TTL) values to control their DNS caching. Today there is relatively little guidance backed by research about how to set TTLs, and operators must balance conflicting demands of caching against agility of configuration. Exactly how TTL value choices affect operational networks is quite challenging to understand due to interactions across the distributed DNS service, where resolvers receive TTLs in different ways (answers and hints), TTLs are specified in multiple places (zones and their parent’s glue), and while DNS resolution must be security-aware. This paper provides the first careful evaluation of how these multiple, interacting factors affect the effective cache lifetimes of DNS records, and provides recommendations for how to configure DNS TTLs based on our findings. We provide recommendations in TTL choice for different situations, and for where they must be configured. We show that longer TTLs have significant promise in reducing latency, reducing it from 183ms to 28.7ms for one country-code TLD.

We have also reported on this work at the RIPE and APNIC blogs.

Posted in Uncategorized | Tagged , , , , , , , , | Leave a comment

congratulations to Ryan Bogutz for his summer undergraduate internship

Ryan Bogutz completed his summer undergraduate research internship at ISI this summer, working with John Heidemann and Yuri Pradkin on his project “Identifying Interesting Outages”.

Ryan Bogutz with his poster at the ISI summer undergraduate research poster session.

In this project, Ryan examined Internet Outage data from Trinocular, developing an outage report that summarized the most “interesting” outages each day. Yuri integrated this report into our outage website where is available as a left side panel.

We hope Ryan’s new report makes it easier to evaluate Internet outages on a given day, and we look forward to continue to work with Ryan on this topic.

Ryan visited USC/ISI in summer 2019 as part of the (ISI Research Experiences for Undergraduates. We thank Jelena Mirkovic (PI) for coordinating the second year of this great program, and NSF for support through award #1659886.

See also ISI’s post about this summer undergradate program.

Posted in Uncategorized | Tagged , , , , , , , , , | Leave a comment

reblogging: the diurnal Internet and DNS backscatter

We are happy to share that two of our older topics have appeared more recently in other venues.

Our animations of the diurnal Internet, originally seen in our 2014 ACM IMC paper and our blog posts, was noticed by Gerald Smith who used it to start a discussion with seventh-grade classes in Mahe, India and (I think) Indiana, USA as part of his Fullbright work. It’s great to see research work that useful to middle-schoolers!

Kensuke Fukuda recently posted about our work on identifying IPv6 scanning with DNS backscatter at the APNIC blog. This work was originally published at the 2018 ACM IMC and posted in our blog. It’s great to see that work get out to a new audience.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , , | Leave a comment

new technical report “Plumb: Efficient Processing of Multi-User Pipelines (Poster)”

We released a new technical report “Plumb: Efficient Processing of Multi-User Pipelines (Poster)”, by Abdul Qadeer and John Heidemann, as ISI-TR-731.  This work was originally presented at ACM Symposium on Cloud Computing (the poster abstract is available at ACM). The poster abstract with a small version of the poster is available at https://www.isi.edu/publications/trpublic/pdfs/isi-tr-731.pdf

aqadeer at SoCC 2018 Carlsbad CA

From the abstract:

As the field of big data analytics matures, workflows are increasingly complex and often include components that are shared by different users. Individual workflows often include multiple stages, and when groups build on each other’s work it is easy to lose track of computation that may be shared across different groups.

The contribution of this poster is to provide an organization-wide processing substrate Plumb that can be used to solve commonly occurring problems and to achieve a common goal. Plumb makes multi-user sharing a first-class concern by providing pipeline-graph abstraction. This abstraction is simple and based on fundamental model of input-processing-output but is powerful to capture processing and data duplication. Plumb then employs best available solutions to tackle problems of large-block processing under structural and computational skew without user intervention.

We expect to release the Plumb software this fall; please contact us if you have questions or interest in using it.

Posted in Uncategorized | Tagged , , , , , , , , , , | Leave a comment

new paper “Precise Detection of Content Reuse in the Web” to appear in ACM SIGCOMM Computer Communication Review

We have published a new paper “Precise Detection of Content Reuse in the Web” by Calvin Ardi and John Heidemann, in the ACM SIGCOMM Computer Communication Review (Volume 49 Issue 2, April 2019) newsletter.

From the abstract:

With vast amount of content online, it is not surprising that unscrupulous entities “borrow” from the web to provide content for advertisements, link farms, and spam. Our insight is that cryptographic hashing and fingerprinting can efficiently identify content reuse for web-size corpora. We develop two related algorithms, one to automatically discover previously unknown duplicate content in the web, and the second to precisely detect copies of discovered or manually identified content. We show that bad neighborhoods, clusters of pages where copied content is frequent, help identify copying in the web. We verify our algorithm and its choices with controlled experiments over three web datasets: Common Crawl (2009/10), GeoCities (1990s–2000s), and a phishing corpus (2014). We show that our use of cryptographic hashing is much more precise than alternatives such as locality-sensitive hashing, avoiding the thousands of false-positives that would otherwise occur. We apply our approach in three systems: discovering and detecting duplicated content in the web, searching explicitly for copies of Wikipedia in the web, and detecting phishing sites in a web browser. We show that general copying in the web is often benign (for example, templates), but 6–11% are commercial or possibly commercial. Most copies of Wikipedia (86%) are commercialized (link farming or advertisements). For phishing, we focus on PayPal, detecting 59% of PayPal-phish even without taking on intentional cloaking.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , | Leave a comment