Categories
Students

congratulations to Xue Cai for her new PhD

I would like to congratulate Dr. Xue Cai for defending her PhD and filing her doctoral disseration “Global Analysis and Modeling on Decentralized Internet” in Dec. 2013.

Xue Cai (left) and John Heidemann, after her PhD defense.
Xue Cai (left) and John Heidemann, after her PhD defense.

From the abstract:

Better understanding about Internet infrastructure is crucial to improve the reliability, performance, and security of web services. The need for this understanding then drives research in network measurements. Internet measurements explore a variety of data related to a specific topic and then develop approaches to transform data into useful understanding about the topic. This process is not straightforward since available data often only contains indirect information that may appear to have limited connection to the topic.
This body of work asserts that systematic approaches can overcome data limitations to improve understanding about important aspects of the Internet infrastructure. We demonstrate the validity of our thesis statement by providing three specific examples that develop novel approaches and provide novel understanding compared to prior work. In particular, we employ four systematic approaches—statistical, clustering, modeling, and what-if approach—to understand three important aspects of the Internet: the efficiency and management of IPv4 addresses, the ownership of Autonomous Systems (ASes), and the robustness of web services when facing critical facility disruption. These approaches have addressed a variety of challenges posed by indirect, incomplete, over-fit, noisy and unknown data; they in turn enable us to improve understanding about the Internet.
Each of our three studies explores a different area of the problem space and opens a much larger area of opportunity. The data limitations addressed by our approaches also occur in many other problems. We believe our approaches can inspire future work to solve these problems and in turn provide more useful understanding about the Internet.

Categories
Publications Technical Report

new technical report “A Holistic Framework for Bridging Regional Threats to User QoE”

We just released a new technical report “A Holistic Framework for Bridging Regional Threats to User QoE”, ISI-TR-2013-687, available as https://www.isi.edu/~johnh/PAPERS/Cai13c.pdf

Estimated impact on user QoE in four cable cut incidents (Figure 13 from [Cai13c])

From the abstract:

Submarine cable cuts have become increasingly common, with five incidents breaking more than ten cables in the last three years. Today, around~300 cables carry the majority of international Internet traffic, so a single cable cut can affect millions of users, and repairs to any cut are expensive and time consuming. Prior work has either measured the impact following incidents, or predicted the results of network changes to relatively abstract Internet topological models. In this paper, we develop a new approach to model cable cuts. Our approach differs by following problems drawn from real-world occurrences all the way to their impact on end-users. Because our approach spans many layers, no single organization can provide all the data needed to apply the model. We therefore perform what-if analysis to study a range of possibilities. With this approach we evaluate four incidents in 2012 and 2013; our analysis suggests general rules that assess the degree of a country’s vulnerability to a cut.

 

Categories
Papers Publications

new conference paper “Trinocular: Understanding Internet Reliability Through Adaptive Probing” in SIGCOMM 2013

The paper “Trinocular: Understanding Internet Reliability Through Adaptive Probing” was accepted by SIGCOMM’13 in Hong Kong, China (available at http://www.isi.edu/~johnh/PAPERS/Quan13c with cite and pdf, or direct pdf).

100% detection of outages one round or longer
100% detection of outages one round or longer (figure 3 from the paper)

From the abstract:

Natural and human factors cause Internet outages—from big events like Hurricane Sandy in 2012 and the Egyptian Internet shutdown in Jan. 2011 to small outages every day that go unpublicized. We describe Trinocular, an outage detection system that uses active probing to understand reliability of edge networks. Trinocular is principled: deriving a simple model of the Internet that captures the information pertinent to outages, and populating that model through long-term data, and learning current network state through ICMP probes. It is parsimonious, using Bayesian inference to determine how many probes are needed. On average, each Trinocular instance sends fewer than 20 probes per hour to each /24 network block under study, increasing Internet “background radiation” by less than 0.7%. Trinocular is also predictable and precise: we provide known precision in outage timing and duration. Probing in rounds of 11 minutes, we detect 100% of outages one round or longer, and estimate outage duration within one-half round. Since we require little traffic, a single machine can track 3.4M /24 IPv4 blocks, all of the Internet currently suitable for analysis. We show that our approach is significantly more accurate than the best current methods, with about one-third fewer false conclusions, and about 30% greater coverage at constant accuracy. We validate our approach using controlled experiments, use Trinocular to analyze two days of Internet outages observed from three sites, and re-analyze three years of existing data to develop trends for the Internet.

Citation: Lin Quan, John Heidemann and Yuri Pradkin. Trinocular: Understanding Internet Reliability Through Adaptive Probing. In Proceedings of the ACM SIGCOMM Conference. Hong Kong, China, ACM. August, 2013. <http://www.isi.edu/~johnh/PAPERS/Quan13c>.

Datasets (listed here) used in generating this paper are available or will be available before the conference presentation.

Categories
Presentations

New Poster “Poster Abstract: Towards Active Measurements of Edge Network Outages” in PAM 2013

Lin Quan presented our outage work: “Poster Abstract: Towards Active Measurements of Edge Network Outages” at the PAM 2013 conference. Poster abstract is available at http://www.isi.edu/~johnh/PAPERS/Quan13a/index.html

pam_poster

End-to-end reachability is a fundamental service of the Internet. We study network outages caused by natural disasters, and political upheavals. We propose a new approach to outage detection using active probing. Like prior outage detection methods, our method uses ICMP echo requests (“pings”) to detect outages, but we probe with greater density and ner granularity, showing pings can detect outages without supplemental probing. The main contribution of our work is to de ne how to interpret pings as outages: defi ning an outage as a sharp change in block responsiveness relative to recent behavior. We also provide preliminary analysis of outage rate in the Internet edge. Space constrains this poster abstract to only sketches of our approach; details and validation are in our technical report. Our data is available at no charge, see http://www.isi.edu/ant/traces/internet_outages/.

This work is based on our technical report: http://www.isi.edu/~johnh/PAPERS/Quan12a/index.html, joint work by Lin Quan, John Heidemann and Yuri Pradkin.

Categories
Presentations

new talk “Long-term Data Collection and Analysis of Outages at the Edge” given at the AIMS workshop

John Heidemann gave the talk “Long-term Data Collection and Analysis of Outages at the Edge” at UCSD, San Diego, California on Feb. 8, 2013 as part of the CAIDA Active Internet Measurement Systems (AIMS) Workshop.  Slides are available at http://www.isi.edu/~johnh/PAPERS/Heidemann13e.html.

talk_icon

This talk describes our analysis of outages in edge networks at the time of Hurricane Sandy, and how that work was enabled by long-term data collection. The analysis showed U.S. networks had double the outage rate (from 0.2% to 0.4%) on 2012-10-30, the day after Sandy landfall, and recovered after four days. We highlighted long-term data collection of Internet Surveys, a random sample of about 41,000 /24 blocks, and the characteristics that make that data suitable for re-analysis. The talk was part of the CAIDA Workshop on Active Internet Measurement Systems, hosted at UCSD.

This work is based on our recent technical report   “A Preliminary Analysis of Network Outages During Hurricane Sandy“, joint work of John Heidemann, Lin Quan, and Yuri Pradkin.

Categories
Presentations

new abstract “Third-Party Measurement of Network Outages in Hurricane Sandy” and talk with video at FCC Workshop on Network Resiliency

We recently posted our abstract “Third-Party Measurement of Network Outages in Hurricane Sandy” at http://www.isi.edu/~johnh/PAPERS/Heidemann13c.html and the talk “Active Probing of Edge Networks: Hurricane Sandy and Beyond” at http://www.isi.edu/~johnh/PAPERS/Heidemann13d.html

These were part of the FCC Workshop on Network Resiliency at Brooklyn Law College, Brooklyn, NY on Feb. 6, 2013, chaired by Henning Schulzrinne.

Video from our talk and for the whole workshop is on YouTube.

fcc_youtube

A summary of the talk:

This talk summarized our analysis of outages in edge networks at the time of Hurricane Sandy. This analysis showed U.S. networks had double the outage rate (from 0.2% to 0.4%) on 2012-10-30, the day after Sandy landfall, and recovered after four days. It also describes our goal of tracking all outages in the Internet. The talk was part of the FCC workshop on Network Resiliency, hosted at Brooklyn Law College by Henning Schulzrinne.

This work is based on our recent technical report   “A Preliminary Analysis of Network Outages During Hurricane Sandy“, joint work of John Heidemann, Lin Quan, and Yuri Pradkin.

 

 

Categories
Presentations

new talk “Active Probing of Edge Networks: Outages During Hurricane Sandy” at NANOG57

John Heidemann gave the talk “Active Probing of Edge Networks: Outages During Hurricane Sandy” at NANOG57 in Orlando Florida on Feb. 5, 2013 as part of a panel on Hurricane Sandy, hosted by James Cowie at Renesys.  Slides are available at http://www.isi.edu/~johnh/PAPERS/Heidemann13b.html.

m2051752.small

This talk summarizes our analysis of outages in edge networks at the time of Hurricane Sandy. This analysis showed U.S. networks had double the outage rate (from 0.2% to 0.4%) on 2012-10-30, the day after Sandy landfall, and recovered after four days. The talk was part of the panel “Internet Impacts of Hurricane Sandy”, moderated by James Cowie, with presentations by John Heidemann, USC/Information Sciences Institute; Emile Aben, RIPE NCC; Patrick Gilmore, Akamai; Doug Madory, Renesys.

This work is based on our recent technical report   “A Preliminary Analysis of Network Outages During Hurricane Sandy“, joint work of John Heidemann, Lin Quan, and Yuri Pradkin.

 

 

Categories
Publications Technical Report

new tech report “A Preliminary Analysis of Network Outages During Hurricane Sandy”

We just released a new technical report “A Preliminary Analysis of Network Outages During Hurricane Sandy”, available at ftp://ftp.isi.edu/isi-pubs/tr-685.pdf and at http://www.isi.edu/~johnh/PAPERS/Heidemann12d.pdf.

From the abstract:

This document describes our analysis of Internet outages during the October 2012 Hurricane Sandy. We assess network reliability by pinging a sample of networks and observing those that respond and then stop responding. While there are always occasional network outages, we see that the outage rate in U.S. networks doubled when the hurricane made landfall, then took about four days to recover. We confirm that this increase was due to outages in New York and New Jersey.

Categories
Papers Publications

New Workshop paper “Visualizing Sparse Internet Events: Network Outages and Route Changes”


The paper “Visualizing Sparse Internet Events: Network Outages and Route Changes” was accepted by WIV’12 in Boston, MA (available at http://www.isi.edu/~johnh/PAPERS/Quan12b.html).

From the abstract:

To understand network behavior, researchers and enterprise network operators must interpret large amounts of network data. To understand and manage network events such as outages, route instability, and spam campaigns, they must interpret data that covers a range of networks and evolves over time. We propose a simple clustering algorithm that helps identify spatial clusters of network events based on correlations in event timing, producing 2-D visualizations. We show that these visualizations where they reveal the extent, timing, and dynamics of network outages such as January 2011 Egyptian change of government, and the March 2011 Japanese earthquake. We also show they reveal correlations in routing changes that are hidden from AS-path analysis.

Citation: Lin Quan and John Heidemann and Yuri Pradkin. Visualizing Sparse Internet Events: Network Outages and Route Changes. In Proceedings of the First ACM Workshop on Internet Visualization. Boston, MA. November, 2012. <http://www.isi.edu/~johnh/PAPERS/Quan12b.html>.

Categories
Publications Technical Report

New Tech Report “Detecting Internet Outages with Precise Active Probing (extended)”

We just published a new technical report “Detecting Internet Outages with Precise Active Probing (extended)”, available at ftp://ftp.isi.edu/isi-pubs/tr-678b.pdf. This is an update of ISI-TR-678.

From the abstract:

Parts of the Internet are down every day, from the intentionalshutdown of the Egyptian Internet in Jan. 2011 and natural disasterssuch as the Mar. 2011 Japanese earthquake, to the thousands of smalloutages caused by localized accidents, and human error, maintenance,or choices.  Understanding these events requires efficient andaccurate detection methods, motivating our new system to detectnetwork outages by active probing.  We show that a single computer cantrack outages across the entire analyzable IPv4 Internet, probing asample of 20 addresses in all 2.5M responsive /24 address blocks.  Weshow that our approach is significantly more accurate than the bestcurrent methods, with 31% fewer false conclusions, while providing 14%greater coverage and requiring about the same probing traffic.  Wedevelop new algorithms to identify outages and cluster them to events,providing the first visualization of outages.  We carefully validateour approach, showing consistent results over two years and from threedifferent sites.  Using public BGP archives and news sources weconfirm 83% of large events.  For a random sample of 50 observedevents, we find 38% in partial control-plane information, reaffirmingprior work that small outages are often not caused by BGP.  Throughcontrolled emulation we show that our approach detects 100% offull-block outages that last at least twice our probing interval.Finally, we report on Internet stability as a whole, and the size andduration of typical outages, using core-to-edge observations with muchlarger coverage than prior mesh-based studies.  We find that about0.3% of the Internet is likely to be unreachable at any time,suggesting the Internet provides only 2.5 “nines” of availability.