Categories
DNS Papers Publications

new conference paper “When the Dike Breaks: Dissecting DNS Defenses During DDoS” at ACM IMC 2018

We have published a new paper “When the Dike Breaks: Dissecting DNS Defenses During DDoS” in the ACM Internet Measurements Conference (IMC 2018) in Boston, Mass., USA.

From the abstract:

Caching and retries protect half of clients even with 90% loss and an attack twice the cache duration. (Figure 7c from [Moura18b].)

The Internet’s Domain Name System (DNS) is a frequent target of Distributed Denial-of-Service (DDoS) attacks, but such attacks have had very different outcomes—some attacks have disabled major public websites, while the external effects of other attacks have been minimal. While on one hand the DNS protocol is relatively simple, the \emph{system} has many moving parts, with multiple levels of caching and retries and replicated servers. This paper uses controlled experiments to examine how these mechanisms affect DNS resilience and latency, exploring both the client side’s DNS \emph{user experience}, and server-side traffic. We find that, for about 30\% of clients, caching is not effective. However, when caches are full they allow about half of clients to ride out server outages that last less than cache lifetimes, Caching and retries together allow up to half of the clients to tolerate DDoS attacks longer than cache lifetimes, with 90\% query loss, and almost all clients to tolerate attacks resulting in 50\% packet loss. While clients may get service during an attack, tail-latency increases for clients. For servers, retries during DDoS attacks increase normal traffic up to $8\times$. Our findings about caching and retries help explain why users see service outages from some real-world DDoS events, but minimal visible effects from others.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/dns/#Moura18b_data.

 

Categories
Papers Publications

new conference paper “The Policy Potential of Measuring Internet Outages” at TPRC

We have published a new paper “The Policy Potential of Measuring Internet Outages” in TPRC46, the Research Conference on Communications, Information and Internet Policy, to be presented on September 21, 2018 at the American University, Washington College of Law.

Outages from Hurricane Irma after landfall in Florida on 2017-09-11, observed with Trinocular.

From the abstract of our paper:

Today it is possible to evaluate the reliability of the Internet. Prior approaches to measure network reliability required telecommunications providers reporting the status of their own networks, resulting in limits on the precision, timeliness, and availability of the results. Recent work in Internet measurement has shown that network outages can be observed with active measurements from a few sites, and from passive measurements of network telescopes (large, unused address space) or large network services such as content-delivery networks. We suggest that these kinds of *third-party* observations of network outages can provide data that is precise and timely. We discuss early results of Trinocular, an outage detection system using active probing developed at the University of Southern California. Trinocular has been operating continuously since November 2013, and we provide (at no charge) data covering about 4 million network blocks from around the world. This paper describes some results of Trinocular showing outages in a large U.S. Internet Service Provider, and those resulting from the 2017 Hurricane Irma in Florida. Our data shows the impact of the Broadband America policy for always-on networks, and we discuss how it might be used to address future policy questions and assist in disaster planning and recovery.

Data we describe in this paper is at https://ant.isi.edu/datasets/outage/, with visualizations at https://ant.isi.edu/outage/world/.

This paper is joint work of John Heideman, Yuri Pradkin, and Guillermo Baltra from USC/ISI, with work carried out as part of LACANIC and DIVOICE projects with DHS S&T/CSD support.

Categories
Students

congratulations to Christopher Morales Ramos for his summer undergrad internship

We would to thank Christopher Morales Ramos for his summer internship at ANT, as part of the NSF-sponsored Research Experiences for Undergraduates (REU) Program at ISI in 2018:
Human Communication in a Connected World
. Christopher interned with us as part of his studies at the University of Puerto Rico where he is an undergraduate student in computer science.

Yuri Pradkin, Christopher Morales Ramos, and John Heidemann, with Christopher’s summer undergraduate research project poster.

Christopher’s project was improving the accuracy in estimating Round Trip Time (RTT) measurements from icmptrain our high-speed IPv4 prober, while minimizing the amount of traffic that was sent.  In addition to improving RTT estimates, his work can lead to better geolocation estimates.

His research at ISI was jointly advised by Yuri Pradkin and John Heidemann, as part of the ISI REU program directed by Jelena Mirkovic.

Categories
Publications Technical Report

new technical report “When the Dike Breaks: Dissecting DNS Defenses During DDoS (extended)”

We released a new technical report “When the Dike Breaks: Dissecting DNS Defenses During DDoS (extended)”, ISI-TR-725, available at https://www.isi.edu/~johnh/PAPERS/Moura18a.pdf.

Moura18a Figure 6a, Answers received during a DDoS attack causing 100% packet loss with pre-loaded caches.

From the abstract:

The Internet’s Domain Name System (DNS) is a frequent target of Distributed Denial-of-Service (DDoS) attacks, but such attacks have had very different outcomes—some attacks have disabled major public websites, while the external effects of other attacks have been minimal. While on one hand the DNS protocol is a relatively simple, the system has many moving parts, with multiple levels of caching and retries and replicated servers. This paper uses controlled experiments to examine how these mechanisms affect DNS resilience and latency, exploring both the client side’s DNS user experience, and server-side traffic. We find that, for about about 30% of clients, caching is not effective. However, when caches are full they allow about half of clients to ride out server outages, and caching and retries allow up to half of the clients to tolerate DDoS attacks that result in 90% query loss, and almost all clients to tolerate attacks resulting in 50% packet loss. The cost of such attacks to clients are greater median latency. For servers, retries during DDoS attacks increase normal traffic up to 8x. Our findings about caching and retries can explain why some real-world DDoS cause service outages for users while other large attacks have minimal visible effects.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/dns/#Moura18a_data.

 

Categories
Presentations

new talk “Internet Outages: Reliablity and Security” from U. of Oregon Cybersecurity Day 2018

John Heidemann gave the talk “Internet Outages: Reliablity and Security” at the University of Oregon Cybersecurity Day in Eugene, Oregon on April 23, 2018.  Slides are available at https://www.isi.edu/~johnh/PAPERS/Heidemann18e.pdf.

Network outages as a security problem.

From the abstract:

The Internet is central to our lives, but we know astoundingly little about it. Even though many businesses and individuals depend on it, how reliable is the Internet? Do policies and practices make it better in some places than others?

Since 2006, we have been studying the public face of the Internet to answer these questions. We take regular censuses, probing the entire IPv4 Internet address space. For more than two years we have been observing Internet reliability through active probing with Trinocular outage detection, revealing the effects of the Internet due to natural disasters like Hurricanes from Sandy to Harvey and Maria, configuration errors that sometimes affect millions of customers, and political events where governments have intervened in Internet operation. This talk will describe how it is possible to observe Internet outages today and what they are beginning to say about the Internet and about the physical world.

This talk builds on research over the last decade in IPv4 censuses and outage detection and includes the work of many of my collaborators.

Data from this talk is all available; see links on the last slide.

Categories
Announcements Projects

new project “Interactive Internet Outages Visualization to Assess Disaster Recovery”

We are happy to announce a new project, Interactive Internet Outages Visualization to Assess Disaster Recovery.   This project is supporting the use of Internet outage measurements to help understand and recover from natural disasters. It will expand on the visualization of Internet outages found at https://ant.isi.edu/outage/world/.

This visualization was initially seeded by a Michael Keston research grant here at ISI, and the outage measurement techniques and ongoing data collection has been developed with the support of DHS (the LANDER-2007, LACREND, LACANIC, and Retro-future Bridge and Outages projects).

Categories
Publications Technical Report

new technical report “Back Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)”

We released a new technical report “Back Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)”, ISI-TR-724, available at https://www.isi.edu/~johnh/PAPERS/Heidemann18b.pdf.

From the abstract:

Clustering (from our event clustering algorithm) of 2014q3 outages from 172/8, showing 7 weeks including the 2014-08-27 Time Warner outage.

Internet reliability has many potential weaknesses: fiber rights-of-way at the physical layer, exchange-point congestion from DDOS at the network layer, settlement disputes between organizations at the financial layer, and government intervention the political layer. This paper shows that we can discover common points-of-failure at any of these layers by observing correlated failures. We use end-to-end observations from data-plane-level connectivity of edge hosts in the Internet. We identify correlations in connectivity: networks that usually fail and recover at the same time suggest common point-of-failure. We define two new algorithms to meet these goals. First, we define a computationally-efficient algorithm to create a linear ordering of blocks to make correlated failures apparent to a human analyst. Second, we develop an event-based clustering algorithm that directly networks with correlated failures, suggesting common points-of-failure. Our algorithms scale to real-world datasets of millions of networks and observations: linear ordering is O(n log n) time and event-based clustering parallelizes with Map/Reduce. We demonstrate them on three months of outages for 4 million /24 network prefixes, showing high recall (0.83 to 0.98) and precision (0.72 to 1.0) for blocks that respond. We also show that our algorithms generalize to identify correlations in anycast catchments and routing.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/outage/, and we expect to release the software for this paper in the coming months (contact us if you are interested).

Categories
Announcements In-the-news

news story about measuring Internet outages

PCMag released a news story on January 3, 2018 about our measuring Internet outages, including discussion about the 2017 hurricanes like Irma, and our new worldwide outage browser.

Categories
Announcements Outages

new website for browsing Internet outages

We are happy to announce a new website at https://ant.isi.edu/outage/world/ that supports our Internet outage data collected from Trinocular.

The ANT Outage world browser, showing Hurricane Irma just after landfall in Florida in Sept. 2017.

Our website supports browsing more than two years of outage data, organized by geography and time.  The map is a google-maps-style world map, with circle on it at even intervals (every 0.5 to 2 degrees of latitude and longitude, depending on the zoom level).  Circle sizes show how many /24 network blocks are out in that location, while circle colors show the percentage of outages, from blue (only a few percent) to red (approaching 100%).

We hope that this website makes our outage data more accessible to researchers and the public.

The raw data underlying this website is available on request, see our outage dataset webpage.

The research is funded by the Department of Homeland Security (DHS) Cyber Security Division (through the LACREND and Retro-Future Bridge and Outages projects) and Michael Keston, a real estate entrepreneur and philanthropist (through the Michael Keston Endowment).  Michael Keston helped support this the initial version of this website, and DHS has supported our outage data collection and algorithm development.

The website was developed by Dominik Staros, ISI web developer and owner of Imagine Web Consulting, based on data collected by ISI researcher Yuri Pradkin. It builds on prior work by Pradkin, Heidemann and USC’s Lin Quan in ISI’s Analysis of Network Traffic Lab.

ISI has featured our new website on the ISI news page.

 

Categories
Announcements Projects

new project LACANIC

We are happy to announce a new project, LACANIC, the Los Angeles/Colorado Application and Network Information Community.

The LACANIC project’s goal is to develop datasets to improve Internet security and readability. We distribute these datasets through the DHS IMPACT program.

As part of this work we:

  • provide regular data collection to collect long-term, longitudinal data
  • curate datasets for special events
  • build websites and portals to help make data accessible to casual users
  • develop new measurement approaches

We provide several types of datasets:

  • anonymized packet headers and network flow data, often to document events like distributed denial-of-service (DDoS) attacks and regular traffic
  • Internet censuses and surveys for IPv4 to document address usage
  • Internet hitlists and histories, derived from IPv4 censuses, to support other topology studies
  • application data, like DNS and Internet-of-Things mapping, to document regular traffic and DDoS events
  • and we are developing other datasets

LACANIC allows us to continue some of the data collection we were doing as part of the LACREND project, as well as develop new methods and ways of sharing the data.

LACANIC is a joint effort of the ANT Lab involving USC/ISI (PI: John Heidemann) and Colorado State University (PI: Christos Papadopoulos).

We thank DHS’s Cyber Security Division for their continued support!