Categories
Publications Technical Report

new technical report “Back Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)”

We released a new technical report “Back Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)”, ISI-TR-724, available at https://www.isi.edu/~johnh/PAPERS/Heidemann18b.pdf.

From the abstract:

Clustering (from our event clustering algorithm) of 2014q3 outages from 172/8, showing 7 weeks including the 2014-08-27 Time Warner outage.

Internet reliability has many potential weaknesses: fiber rights-of-way at the physical layer, exchange-point congestion from DDOS at the network layer, settlement disputes between organizations at the financial layer, and government intervention the political layer. This paper shows that we can discover common points-of-failure at any of these layers by observing correlated failures. We use end-to-end observations from data-plane-level connectivity of edge hosts in the Internet. We identify correlations in connectivity: networks that usually fail and recover at the same time suggest common point-of-failure. We define two new algorithms to meet these goals. First, we define a computationally-efficient algorithm to create a linear ordering of blocks to make correlated failures apparent to a human analyst. Second, we develop an event-based clustering algorithm that directly networks with correlated failures, suggesting common points-of-failure. Our algorithms scale to real-world datasets of millions of networks and observations: linear ordering is O(n log n) time and event-based clustering parallelizes with Map/Reduce. We demonstrate them on three months of outages for 4 million /24 network prefixes, showing high recall (0.83 to 0.98) and precision (0.72 to 1.0) for blocks that respond. We also show that our algorithms generalize to identify correlations in anycast catchments and routing.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/outage/, and we expect to release the software for this paper in the coming months (contact us if you are interested).

Categories
Announcements Projects

new project LACANIC

We are happy to announce a new project, LACANIC, the Los Angeles/Colorado Application and Network Information Community.

The LACANIC project’s goal is to develop datasets to improve Internet security and readability. We distribute these datasets through the DHS IMPACT program.

As part of this work we:

  • provide regular data collection to collect long-term, longitudinal data
  • curate datasets for special events
  • build websites and portals to help make data accessible to casual users
  • develop new measurement approaches

We provide several types of datasets:

  • anonymized packet headers and network flow data, often to document events like distributed denial-of-service (DDoS) attacks and regular traffic
  • Internet censuses and surveys for IPv4 to document address usage
  • Internet hitlists and histories, derived from IPv4 censuses, to support other topology studies
  • application data, like DNS and Internet-of-Things mapping, to document regular traffic and DDoS events
  • and we are developing other datasets

LACANIC allows us to continue some of the data collection we were doing as part of the LACREND project, as well as develop new methods and ways of sharing the data.

LACANIC is a joint effort of the ANT Lab involving USC/ISI (PI: John Heidemann) and Colorado State University (PI: Christos Papadopoulos).

We thank DHS’s Cyber Security Division for their continued support!

 

Categories
Papers Publications

new conference paper “A Look at Router Geolocation in Public and Commercial Databases” in IMC 2017

The paper “A Look at Router Geolocation in Public and Commercial Databases” has appeared in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

From the abstract:

Regional breakdown of the geolocation error for the geolocation databases vs. ground truth data.

Internet measurement research frequently needs to map infrastructure components, such as routers, to their physical locations. Although public and commercial geolocation services are often used for this purpose, their accuracy when applied to network infrastructure has not been sufficiently assessed. Prior work focused on evaluating the overall accuracy of geolocation databases, which is dominated by their performance on end-user IP addresses. In this work, we evaluate the reliability of router geolocation in databases. We use a dataset of about 1.64M router interface IP addresses extracted from the CAIDA Ark dataset to examine the country- and city-level coverage and consistency of popular public and commercial geolocation databases. We also create and provide a ground-truth dataset of 16,586 router interface IP addresses and their city-level locations, and use it to evaluate the databases’ accuracy with a regional breakdown analysis. Our results show that the databases are not reliable for geolocating routers and that there is room to improve their country- and city-level accuracy. Based on our results, we present a set of recommendations to researchers concerning the use of geolocation databases to geolocate routers.

The work in this paper was joint work by Manaf Gharaibeh, Anant Shah, Han Zhang, Christos Papadopoulos (Colorado State University), Brad Huffaker (CAIDA / UC San Diego), and Roya Ensafi (University of Michigan). The findings of this work are highlighted in an APNIC blog post “Should we trust the geolocation databases to geolocate routers?”. The ground truth datasets used in the paper are available via IMPACT.

Categories
Papers Publications

new workshop paper “Assessing Co-Locality of IP Blocks” in GI 2016

The paper “Assessing Co-Locality of IP Blocks” appeared in the 19th IEEE  Global Internet Symposium on April 11, 2016 in San Francisco, CA, USA and is available at (http://www.cs.colostate.edu/~manafgh/publications/Assessing-Co-Locality-of-IP-Block-GI2016.pdf). The datasets are available at (https://ant.isi.edu/datasets/geolocation/).

From the abstract:

isi_all_blocks_clustersCountMany IP Geolocation services and applications assume that all IP addresses within the same /24 IPv4 prefix (a /24 block) reside in close physical proximity. For blocks that contain addresses in very different locations (such as blocks identifying network backbones), this assumption can result in a large geolocation error. In this paper we evaluate the co-location assumption. We first develop and validate a hierarchical clustering method to find clusters of IP addresses with similar observed delay measurements within /24 blocks. We validate our methodology against two ground-truth datasets, confirming that 93% of the identified multi-cluster blocks are true positives with multiple physical locations and an upper bound for false positives of only about 5.4%. We then apply our methodology to a large dataset of 1.41M /24 blocks extracted from a delay-measurement study of the entire responsive IPv4 address space. We find that about 247K (17%) out of 1.41M blocks are not co-located, thus quantifying the error in the /24 block co-location assumption.

The work in this paper is by Manaf Gharaibeh, Han Zhang, Christos Papadopoulos (Colorado State University) and John Heidemann (USC/ISI).

Categories
Publications Technical Report

new technical report “Assessing Co-Locality of IP Blocks”

We have released a new technical report “Assessing Co-Locality of IP Blocks”, CSU TR15-103, available at http://www.cs.colostate.edu/TechReports/Reports/2015/tr15-103.pdf.

From the abstract:

isi_all_blocks_clustersCount_CDF
CDF of number of clusters per block, suggesting the number of potential multi-location blocks. (Figure 2 from [Gharaibeh15a].)

Many IP Geolocation services and applications assume that all IP addresses with the same /24 IPv4 prefix (a /24 block) are in the same location. For blocks that contain addresses in very different locations (such blocks identifying network backbones), this assumption can result in large geolocation error. This paper evaluates this assumption using a large dataset of 1.41M /24 blocks extracted from a delay measurements dataset for the entire
responsive IPv4 address space. We use hierarchal clustering to find clusters of IP addresses with similar observed delay measurements within /24 blocks. Blocks with multiple clusters often span different geographic locations. We evaluate this claim against two ground-truth datasets, confirming that 93% of identified multi-cluster blocks are true positives with multiple locations, while only 13% of blocks identified as single-cluster appear to be multi-location in ground truth. Applying the clustering process to the whole dataset suggests that about 17% (247K) of blocks are likely multi-location.

This work is by Manaf Gharaibeh, Han Zhang, Christos Papadopoulos (Colorado State University), and John Heidemann (USC/ISI). The datasets used in this work are new analysis of an existing geolocation dataset as collected by Hu et al. (http://www.isi.edu/~johnh/PAPERS/Hu12a.pdf).  These source datasets are available upon request from http://www.predict.org and via our website, and we expect trial datasets in our new work to also be available there and through PREDICT by the end of 2015.

Categories
Presentations

new animation “Watching the Internet Sleep”

Does the Internet sleep? Yes, and we have the video!

We have recently put together a video showing 35 days of Internet address usage as observed from Trinocular, our outage detection system.

The Internet sleeps: address use in South America is low (blue) in the early morning, while India is high (red) in afternoon.
The Internet sleeps: address use in South America is low (blue) in the early morning, while India is high (red) in afternoon.

The Internet sleeps: address use in South America is low (blue) in the early morning, while India is high (red) in afternoon.  When we look at address usage over time, we see that some parts of the globe have daily swings of +/-10% to 20% in the number of active addresses. In China, India, eastern Europe and much of South America, the Internet sleeps.

Understanding when the Internet sleeps is important to understand how different country’s network policies affect use, it is part of outage detection, and it is a piece of improving our long-term goal of understanding exactly how big the Internet is.

See http://www.isi.edu/ant/diurnal/ for the video, or read our technical paper “When the Internet Sleeps: Correlating Diurnal Networks With External Factors” by Quan, Heidemann, and Pradkin, to appear at ACM IMC, Nov. 2014.

Datasets (listed here) used in generating this video are available.

This work is partly supported by DHS S&T, Cyber Security division, agreement FA8750-12-2-0344 (under AFRL) and N66001-13-C-3001 (under SPAWAR).  The views contained
herein are those of the authors and do not necessarily represent those of DHS or the U.S. Government.  This work was classified by USC’s IRB as non-human subjects research (IIR00001648).

Categories
Papers Publications

new conference paper “When the Internet Sleeps: Correlating Diurnal Networks With External Factors” in IMC 2014

The paper “When the Internet Sleeps: Correlating Diurnal Networks With External Factors” will appear at ACM Internet Measurements Conference 2014 in Vancouver, Canada (available at http://www.isi.edu/~johnh/PAPERS/Quan14c/ with cite and pdf, or direct pdf).

Predicting longitude from observed diurnal phase ([Quan14c], figure 14c)
Predicting longitude from observed diurnal phase for 287k geolocatable, diurnal blocks ([Quan14c], figure 14c)
From the abstract:

As the Internet matures, policy questions loom larger in its operation. When should an ISP, city, or government invest in infrastructure? How do their policies affect use? In this work, we develop a new approach to evaluate how policies, economic conditions and technology correlates with Internet use around the world. First, we develop an adaptive and accurate approach to estimate block availability, the fraction of active IP addresses in each /24 block over short timescales (every 11 minutes). Our estimator provides a new lens to interpret data taken from existing long-term outage measurements, thus requiring no additional traffic. (If new collection was required, it would be lightweight, since on average, outage detection requires less than 20 probes per hour per /24 block; less than 1% of background radiation.) Second, we show that spectral analysis of this measure can identify diurnal usage: blocks where addresses are regularly used during part of the day and idle in other times. Finally, we analyze data for the entire responsive Internet (3.7M /24 blocks) over 35 days. These global observations show when and where the Internet sleeps—networks are mostly always-on in the US and Western Europe, and diurnal in much of Asia, South America, and Eastern Europe. ANOVA (Analysis of Variance) testing shows that diurnal networks correlate negatively with country GDP and electrical consumption, quantifying that national policies and economics relate to networks.

Citation: Lin Quan, John Heidemann, and Yuri Pradkin. When the Internet Sleeps: Correlating Diurnal Networks With External Factors. In Proceedings of the ACM Internet Measurement Conference, p. to appear. Vancouver, BC, Canada, ACM. November, 2014.

All data in this paper is available to researchers at no cost, and source code to our analysis tools is available on request; see our diurnal datasets webpage.

This work is partly supported by DHS S&T, Cyber Security division, agreement FA8750-12-2-0344 (under AFRL) and N66001-13-C-3001 (under SPAWAR).  The views contained
herein are those of the authors and do not necessarily represent those of DHS or the U.S. Government.  This work was classified by USC’s IRB as non-human subjects research (IIR00001648).

Categories
Publications Technical Report

new technical report “A Holistic Framework for Bridging Regional Threats to User QoE”

We just released a new technical report “A Holistic Framework for Bridging Regional Threats to User QoE”, ISI-TR-2013-687, available as https://www.isi.edu/~johnh/PAPERS/Cai13c.pdf

Estimated impact on user QoE in four cable cut incidents (Figure 13 from [Cai13c])

From the abstract:

Submarine cable cuts have become increasingly common, with five incidents breaking more than ten cables in the last three years. Today, around~300 cables carry the majority of international Internet traffic, so a single cable cut can affect millions of users, and repairs to any cut are expensive and time consuming. Prior work has either measured the impact following incidents, or predicted the results of network changes to relatively abstract Internet topological models. In this paper, we develop a new approach to model cable cuts. Our approach differs by following problems drawn from real-world occurrences all the way to their impact on end-users. Because our approach spans many layers, no single organization can provide all the data needed to apply the model. We therefore perform what-if analysis to study a range of possibilities. With this approach we evaluate four incidents in 2012 and 2013; our analysis suggests general rules that assess the degree of a country’s vulnerability to a cut.

 

Categories
Announcements Data

Complete IPv4 geolocation dataset now available

complete_geoloc_map

We recently finished the work of geolocating all IPv4 addresses and plotted a “complete IP geolocation map“.

This work is based on our previous IMC paper “Towards Geolocation of Millions of IP Addresses“, joint work of Zi Hu, John Heidemann, and Yuri Pradkin.

Processed data from this work is visible on our browsable web map.  The raw data from this effort is available through PREDICT or from the authors.

Categories
Papers Publications

New conference paper “Towards Geolocation of Millions of IP Addresses” at IMC 2012

The paper “Towards Geolocation of Millions of IP Addresses” was accepted by IMC 2012 in Boston, MA (available at http://www.isi.edu/~johnh/PAPERS/Hu12a.html).

From the abstract:

Previous measurement-based IP geolocation algorithms have focused on accuracy, studying a few targets with increasingly sophisticated algorithms taking measurements from tens of vantage points (VPs). In this paper, we study how to scale up existing measurement-based geolocation algorithms like Shortest Ping and CBG to cover the whole Internet. We show that with many vantage points, VP proximity to the target is the most important factor affecting accuracy. This observation suggests our new algorithm that selects the best few VPs for each target from many candidates. This approach addresses the main bottleneck to geolocation scalability: minimizing traffic into each target (and also out of each VP) while maintaining accuracy. Using this approach we have currently geolocated about 35% of the allocated, unicast, IPv4 address-space (about 85% of the addresses in the Internet that can be directly geolocated). We visualize our geolocation results on a web-based address-space browser.

Citation: Zi Hu and John Heidemann and Yuri Pradkin. Towards Geolocation of Millions of IP Addresses. In Proceedings of the ACM Internet Measurement Conference, p. to appear. Boston, MA, USA, ACM. 2012. <http://www.isi.edu/~johnh/PAPERS/Hu12a.html>