Detecting Internet Outages with Active Probing (extended)

Quan, Lin and Heidemann, John
USC/Information Sciences Institute

Lin Quan and John Heidemann 2011. Detecting Internet Outages with Active Probing (extended). Technical Report ISI-TR-2011-672. USC/Information Sciences Institute.


With businesses, governments, and individuals increasingly dependent on the Internet, understanding its reliability is more important than ever. Network outages vary in scope and cause, from the intentional shutdown of the Egyptian Internet in February 2011, to outages caused by the effects of March 2011 earthquakes on undersea cables entering Japan, to the thousands of small, daily outages caused by localized accidents or human error. In this paper we present a new method to detect network outages by probing entire blocks. Using 24 datasets, each a 2-week study of 22,000 /24 address blocks randomly sampled from the Internet, we develop new algorithms to identify and visualize outages and to cluster those outages into network-level events. We validate our approach by comparing our data-plane results against control-plane observations from BGP routing and news reports, examining both major and randomly selected events. We confirm our results are stable from two different locations and over more than one and half years of observations. We show that our approach of probing all addresses in a /24 block is significantly more accurate than prior approaches that use a single representative for all routed blocks, cutting the number of mistake outage observations from 44% to under 1%. We use our approach to study several large outages such as those mentioned above. We also develop a general estimate for how much of the Internet is regularly down, finding about 0.3% of the Internet is likely to be unreachable at any time. By providing a baseline estimate of Internet outages, our work lays the groundwork to evaluate ISP reliability.


