Categories
Students Uncategorized

congratulations to Manaf Gharaibeh for his PhD

I would like to congratulate Dr. Manaf Gharaibeh for defending his PhD at Colorado State University in February 2020 and completing his doctoral dissertation “Characterizing the Visible Address Space to Enable Efficient, Continuous IP Geolocation” in March 2020.

From the abstract:

Manaf Gharaibeh’s phd defense, with Christos Papadopoulos.

Internet Protocol (IP) geolocation is vital for location-dependent applications and many network research problems. The benefits to applications include enabling content customization, proximal server selection, and management of digital rights based on the location of users, to name a few. The benefits to networking research include providing geographic context useful for several purposes, such as to study the geographic deployment of Internet resources, bind cloud data to a location, and to study censorship and monitoring, among others.
The measurement-based IP geolocation is widely considered as the state-of-the-art client-independent approach to estimate the location of an IP address. However, full measurement-based geolocation is prohibitive when applied continuously to the entire Internet to maintain up-to-date IP-to-location mappings. Furthermore, many IP address blocks rarely move, making it unnecessary to perform such full geolocation.
The thesis of this dissertation states that \emph{we can enable efficient, continuous IP geolocation by identifying clusters of co-located IP addresses and their location stability from latency observations.} In this statement, a cluster indicates a group of an arbitrary number of adjacent co-located IP addresses (a few up to a /16). Location stability indicates a measure of how often an IP block changes location. We gain efficiency by allowing IP geolocation systems to geolocate IP addresses as units, and by detecting when a geolocation update is required, optimizations not explored in prior work. We present several studies to support this thesis statement.
We first present a study to evaluate the reliability of router geolocation in popular geolocation services, complementing prior work that evaluates end-hosts geolocation in such services. The results show the limitations of these services and the need for better solutions, motivating our work to enable more accurate approaches. Second, we present a method to identify clusters of \emph{co-located} IP addresses by the similarity in their latency. Identifying such clusters allows us to geolocate them efficiently as units without compromising accuracy. Third, we present an efficient delay-based method to identify IP blocks that move over time, allowing us to recognize when geolocation updates are needed and avoid frequent geolocation of the entire Internet to maintain up-to-date geolocation. In our final study, we present a method to identify cellular blocks by their distinctive variation in latency compared to WiFi and wired blocks. Our method to identify cellular blocks allows a better interpretation of their latency estimates and to study their geographic properties without the need for proprietary data from operators or users.

Categories
Papers Publications

new conference paper “A Look at Router Geolocation in Public and Commercial Databases” in IMC 2017

The paper “A Look at Router Geolocation in Public and Commercial Databases” has appeared in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

From the abstract:

Regional breakdown of the geolocation error for the geolocation databases vs. ground truth data.

Internet measurement research frequently needs to map infrastructure components, such as routers, to their physical locations. Although public and commercial geolocation services are often used for this purpose, their accuracy when applied to network infrastructure has not been sufficiently assessed. Prior work focused on evaluating the overall accuracy of geolocation databases, which is dominated by their performance on end-user IP addresses. In this work, we evaluate the reliability of router geolocation in databases. We use a dataset of about 1.64M router interface IP addresses extracted from the CAIDA Ark dataset to examine the country- and city-level coverage and consistency of popular public and commercial geolocation databases. We also create and provide a ground-truth dataset of 16,586 router interface IP addresses and their city-level locations, and use it to evaluate the databases’ accuracy with a regional breakdown analysis. Our results show that the databases are not reliable for geolocating routers and that there is room to improve their country- and city-level accuracy. Based on our results, we present a set of recommendations to researchers concerning the use of geolocation databases to geolocate routers.

The work in this paper was joint work by Manaf Gharaibeh, Anant Shah, Han Zhang, Christos Papadopoulos (Colorado State University), Brad Huffaker (CAIDA / UC San Diego), and Roya Ensafi (University of Michigan). The findings of this work are highlighted in an APNIC blog post “Should we trust the geolocation databases to geolocate routers?”. The ground truth datasets used in the paper are available via IMPACT.

Categories
Papers Publications

new conference paper “Broad and Load-aware Anycast Mapping with Verfploeter” in IMC 2017

The paper “Broad and Load-aware Anycast Mapping with Verfploeter” will appear in the 2017 Internet Measurement Conference (IMC) on November 1-3, 2017 in London, United Kingdom.

From the abstract:

IP anycast provides DNS operators and CDNs with automatic failover and reduced latency by breaking the Internet into catchments, each served by a different anycast site. Unfortunately, understanding and predicting changes to catchments as anycast sites are added or removed has been challenging. Current tools such as RIPE Atlas or commercial equivalents map from thousands of vantage points (VPs), but their coverage can be inconsistent around the globe. This paper proposes Verfploeter, a new method that maps anycast catchments using active probing. Verfploeter provides around 3.8M passive VPs, 430x the 9k physical VPs in RIPE Atlas, providing coverage of the vast majority of networks around the globe. We then add load information from prior service logs to provide calibrated predictions of anycast changes. Verfploeter has been used to evaluate the new anycast deployment for B-Root, and we also report its use of a nine-site anycast testbed. We show that the greater coverage made possible by Verfploeter’s active probing is necessary to see routing differences in regions that have sparse coverage from RIPE Atlas, like South America and China.

Distribution of load across two anycast sites of B-root using Verfploeter.

The work in this paper was joint work by Wouter B. de Vries, Ricardo de O. Schmidt (Univ. of Twente), Wes Hardaker, John Heidemann (USC/ISI), Pieter-Tjerk de Boer and Aiko Pras (Univ. of Twente). The datasets used in the paper are available at https://ant.isi.edu/datasets/anycast/index.html#verfploeter.

Categories
Announcements Collaborations Papers

best paper award at AINTEC 2016

Best paper award to Shah, Fontugne, and Papadopoulos at AINTEC 2016

Congratulations to Anant Shah, Christos Papadopoulos (Colorado State University) and Romain Fontugne (Internet Initiative Japan) for the award of  best paper at AINTEC 2016 to their paper “Towards Characterizing International Routing Detours”.

See our prior blog post for more information about the paper and its data, and the APNIC blog post about this paper.

Categories
Papers Publications

new conference paper “Towards Characterizing International Routing Detours” in AINTEC 2016

The paper “Towards Characterizing International Routing Detours” appeared in the 12th Asian Internet Engineering Conference on Dec 1, 2016 in Bangkok, Thailand and is available at http://dl.acm.org/citation.cfm?id=3012698. The datasets are available at http://geoinfo.bgpmon.io.

From the abstract:

There are currently no requirements (technical or otherwise) that routing paths must be contained within national boundaries. Indeed, some paths experience international detours, i.e., originate in one country, cross international boundaries and return to the same country. In most cases these are sensible traffic engineering or peering decisions at ISPs that serve multiple countries. In some cases such detours may be suspicious. Characterizing international detours is useful to a number of players: (a) network engineers trying to diagnose persistent problems, (b) policy makers aiming at adhering to certain national communication policies, (c) entrepreneurs looking for opportunities to deploy new networks, or (d) privacy-conscious states trying to minimize the amount of internal communication traversing different jurisdictions.

In this paper we characterize international detours in the Internet during the month of January 2016. To detect detours we sample BGP RIBs every 8 hours from 461 RouteViews and RIPE RIS peers spanning 30 countries. We use geolocation of ASes which geolocates each BGP prefix announced by each AS, mapping its presence at IXPs and geolocation infrastructure IPs. Finally, we analyze each global BGP RIB entry looking for detours. Our analysis shows more than 5K unique BGP prefixes experienced a detour. 132 prefixes experienced more than 50% of the detours. We observe about 544K detours. Detours either last for a few days or persist the entire month. Out of all the detours, more than 90% were transient detours that lasted for 72 hours or less. We also show different countries experience different characteristics of detours.

This work won the Best Paper Award at AINTEC 2016. APNIC blog post on this paper can be found here.

The work in this paper is by Anant Shah, Christos Papadopoulos (Colorado State University) and Romain Fontugne (Internet Initiative Japan).

Categories
Papers Publications

new workshop paper “Assessing Co-Locality of IP Blocks” in GI 2016

The paper “Assessing Co-Locality of IP Blocks” appeared in the 19th IEEE  Global Internet Symposium on April 11, 2016 in San Francisco, CA, USA and is available at (http://www.cs.colostate.edu/~manafgh/publications/Assessing-Co-Locality-of-IP-Block-GI2016.pdf). The datasets are available at (https://ant.isi.edu/datasets/geolocation/).

From the abstract:

isi_all_blocks_clustersCountMany IP Geolocation services and applications assume that all IP addresses within the same /24 IPv4 prefix (a /24 block) reside in close physical proximity. For blocks that contain addresses in very different locations (such as blocks identifying network backbones), this assumption can result in a large geolocation error. In this paper we evaluate the co-location assumption. We first develop and validate a hierarchical clustering method to find clusters of IP addresses with similar observed delay measurements within /24 blocks. We validate our methodology against two ground-truth datasets, confirming that 93% of the identified multi-cluster blocks are true positives with multiple physical locations and an upper bound for false positives of only about 5.4%. We then apply our methodology to a large dataset of 1.41M /24 blocks extracted from a delay-measurement study of the entire responsive IPv4 address space. We find that about 247K (17%) out of 1.41M blocks are not co-located, thus quantifying the error in the /24 block co-location assumption.

The work in this paper is by Manaf Gharaibeh, Han Zhang, Christos Papadopoulos (Colorado State University) and John Heidemann (USC/ISI).

Categories
Publications Technical Report

new technical report “Assessing Co-Locality of IP Blocks”

We have released a new technical report “Assessing Co-Locality of IP Blocks”, CSU TR15-103, available at http://www.cs.colostate.edu/TechReports/Reports/2015/tr15-103.pdf.

From the abstract:

isi_all_blocks_clustersCount_CDF
CDF of number of clusters per block, suggesting the number of potential multi-location blocks. (Figure 2 from [Gharaibeh15a].)

Many IP Geolocation services and applications assume that all IP addresses with the same /24 IPv4 prefix (a /24 block) are in the same location. For blocks that contain addresses in very different locations (such blocks identifying network backbones), this assumption can result in large geolocation error. This paper evaluates this assumption using a large dataset of 1.41M /24 blocks extracted from a delay measurements dataset for the entire
responsive IPv4 address space. We use hierarchal clustering to find clusters of IP addresses with similar observed delay measurements within /24 blocks. Blocks with multiple clusters often span different geographic locations. We evaluate this claim against two ground-truth datasets, confirming that 93% of identified multi-cluster blocks are true positives with multiple locations, while only 13% of blocks identified as single-cluster appear to be multi-location in ground truth. Applying the clustering process to the whole dataset suggests that about 17% (247K) of blocks are likely multi-location.

This work is by Manaf Gharaibeh, Han Zhang, Christos Papadopoulos (Colorado State University), and John Heidemann (USC/ISI). The datasets used in this work are new analysis of an existing geolocation dataset as collected by Hu et al. (http://www.isi.edu/~johnh/PAPERS/Hu12a.pdf).  These source datasets are available upon request from http://www.predict.org and via our website, and we expect trial datasets in our new work to also be available there and through PREDICT by the end of 2015.

Categories
Papers Publications

new conference paper “Mapping the Expansion of Google’s Serving Infrastructure” in IMC 2013 and WSJ Blog

The paper “Mapping the Expansion of Google’s Serving Infrastructure” (by Matt Calder, Xun Fan, Zi Hu, Ethan Katz-Bassett, John Heidemann and Ramesh Govindan) will appear in the 2013 ACM Internet Measurements Conference (IMC) in Barcelona, Spain in Oct. 2013.

This work was also featured today in Digits, the technology news and analysis blog from the Wall Street Journal, and at USC’s press room.

A copy of the paper is available at http://www.isi.edu/~johnh/PAPERS/Calder13a, and data from the work is available at http://mappinggoogle.cs.usc.edu, from http://www.isi.edu/ant/traces/mapping_google/index.html, and from http://www.predict.org.

[Calder13a] figure 5a
Growth of Google’s infrastructure, measured in IP addresses [Calder13a] figure 5a
 

From the paper’s abstract:

Modern content-distribution networks both provide bulk content and act as “serving infrastructure” for web services in order to reduce user-perceived latency. Serving infrastructures such as Google’s are now critical to the online economy, making it imperative to understand their size, geographic distribution, and growth strategies. To this end, we develop techniques that enumerate IP addresses of servers in these infrastructures, find their geographic location, and identify the association between clients and clusters of servers. While general techniques for server enumeration and geolocation can exhibit large error, our techniques exploit the design and mechanisms of serving infrastructure to improve accuracy. We use the EDNS-client-subnet DNS extension to measure which clients a service maps to which of its serving sites. We devise a novel technique that uses this mapping to geolocate servers by combining noisy information about client locations with speed-of-light constraints. We demonstrate that this technique substantially improves geolocation accuracy relative to existing approaches. We also cluster server IP addresses into physical sites by measuring RTTs and adapting the cluster thresholds dynamically. Google’s serving infrastructure has grown dramatically in the ten months, and we use our methods to chart its growth and understand its content serving strategy. We find that the number of Google serving sites has increased more than sevenfold, and most of the growth has occurred by placing servers in large and small ISPs across the world, not by expanding Google’s backbone.

Categories
Publications Technical Report

new technical report “Mapping the Expansion of Google’s Serving Infrastructure”

We just released a new technical report “Mapping the Expansion of Google’s Serving Infrastructure”, available as https://www.isi.edu/~johnh/PAPERS/Calder13a.pdf

Growth of Google's serving network.
Growth of Google’s serving network (measured here in IP addresses).

From the abstract:

Modern content-distribution networks both provide bulk content and act as “serving infrastructure” for web services in order to reduce user-perceived latency. These serving infrastructures (such as Google’s) are now critical to the online economy, making it imperative to understand their size, geographic distribution, and growth strategies. To this end, we develop techniques that enumerate servers in these infrastructures, find their geographic location, and identify the association between clients and servers. While general techniques for server enumeration and geolocation can exhibit large error, our techniques exploit the design and mechanisms of serving infrastructure to improve accuracy. We use the EDNS-client-subnet extension to DNS to measure which clients a service maps to which of its servers. We devise a novel technique that uses this mapping to geolocate servers by combining noisy information about client locations with speed-of-light constraints. We demonstrate that this technique substantially improves geolocation accurate relative to existing approaches. We also cluster servers into physical sites by measuring RTTs and adapting the cluster thresholds dynamically. Google’s serving infrastructure has grown dramatically in the last six months, and we use our methods to chart its growth and understand its content serving strategy. We find that Google has almost doubled in size, and that most of the growth has occurred by placing servers in large and small ISPs across the world, not by expanding on Google’s backbone.

Datasets from this work will be available, please contact the authors at this time if you’re interested.

Categories
Announcements Data

Complete IPv4 geolocation dataset now available

complete_geoloc_map

We recently finished the work of geolocating all IPv4 addresses and plotted a “complete IP geolocation map“.

This work is based on our previous IMC paper “Towards Geolocation of Millions of IP Addresses“, joint work of Zi Hu, John Heidemann, and Yuri Pradkin.

Processed data from this work is visible on our browsable web map.  The raw data from this effort is available through PREDICT or from the authors.