We just published a new technical report “Detecting Internet Outages with Precise Active Probing (extended)”, available at ftp://ftp.isi.edu/isi-pubs/tr-678b.pdf. This is an update of ISI-TR-678.
From the abstract:
Parts of the Internet are down every day, from the intentionalshutdown of the Egyptian Internet in Jan. 2011 and natural disasterssuch as the Mar. 2011 Japanese earthquake, to the thousands of smalloutages caused by localized accidents, and human error, maintenance,or choices. Understanding these events requires efficient andaccurate detection methods, motivating our new system to detectnetwork outages by active probing. We show that a single computer cantrack outages across the entire analyzable IPv4 Internet, probing asample of 20 addresses in all 2.5M responsive /24 address blocks. Weshow that our approach is significantly more accurate than the bestcurrent methods, with 31% fewer false conclusions, while providing 14%greater coverage and requiring about the same probing traffic. Wedevelop new algorithms to identify outages and cluster them to events,providing the first visualization of outages. We carefully validateour approach, showing consistent results over two years and from threedifferent sites. Using public BGP archives and news sources weconfirm 83% of large events. For a random sample of 50 observedevents, we find 38% in partial control-plane information, reaffirmingprior work that small outages are often not caused by BGP. Throughcontrolled emulation we show that our approach detects 100% offull-block outages that last at least twice our probing interval.Finally, we report on Internet stability as a whole, and the size andduration of typical outages, using core-to-edge observations with muchlarger coverage than prior mesh-based studies. We find that about0.3% of the Internet is likely to be unreachable at any time,suggesting the Internet provides only 2.5 “nines” of availability.