Categories
Announcements Collaborations Data Internet Outages

welcoming Greece to the ANT Internet Census

We’re happy to welcome Greece to our browsable Internet map at http://www.isi.edu/ant/address/browse/ !  Of course Greece has always been in our Internet censuses, but George Xylomenos and George Polyzos of the Athens University of Economics and Business (their lab) helped set up a new observation site.  Greece now provides a new vantage point for Internet censuses.

The differences in the census are small, as one would hope, since it’s a global Internet.  However, when we look at latency (the time it takes for an IP address to reply to our requests), Greece gives us a European view.

Compare the lower-left corner of the Internet, since that is European IPv4 address space:

it61g RTTs
Round-trip times from our Greek vantage point (in AUEB.gr) to the world. Observe that European IP addresses in the lower left corner are nearby (light colored).
it61w RTTs
Round-trip times from our Los Angeles-based vantage point (at isi.edu) to the world. Observe that European IP addresses in the lower left corner are distant (darker gray).

In addition to big thanks to George Xylomenos and George Polyzos of AUEB (σας ευχαριστώ!) and AUEB for institutional funding for this work.  We also thank Christos Papadopoulos (Colorado State) for helping with many details, and Colin Perkins (U. Glasgow) for discussions about potential European hosts.

Data from our Greece census is available to researchers at no cost on the same terms as our existing census data.  See our datasets page for details. Greek data starts with it61 as of 2014-08-29.

Categories
Students

congratulations to Lin Quan for his new PhD

I would like to congratulate Dr. Lin Quan for defending his PhD in Dec. 2013 and his doctoral disseration “Learning about the Internet through Efficient Sampling and Aggregation” in Jan. 2014.

Lin Quan (left) and John Heidemann, after Lin's PhD defense.
Lin Quan (left) and John Heidemann, after Lin’s PhD defense.

From the abstract:

The Internet is important for nearly all aspects of our society, affecting ordinary people, businesses, and social activities. Because of its importance and wide-spread applications, we want to have good knowledge about Internet’s operation, reliability and performance, through various kinds of measurements. However, despite the wide usage, we only have limited knowledge of its overall performance and reliability. The first reason of this limited knowledge is that there is no central governance of the Internet, making both active and passive measurements hard. The second reason is the huge scale of the Internet. This makes brute-force analysis hard because of practical computing resource limits such as CPU, memory and probe rate.

This thesis states that sampling and aggregation are necessary to overcome resource constraints in time and space to learn about better knowledge of the Internet. Many other Internet measurement studies also utilize sampling and aggregation techniques to discover properties of the Internet. We distinguish our work by exploring novel mechanisms and new knowledge in several specific areas. First, we aggregate short-time-scale observations and use an efficient multi-time-scale query scheme to discover the properties and reasons of long-lived Internet flows. Second, we sample and probe /24 blocks in the IPv4 address space, and use greedy clustering algorithms to efficiently characterize Internet outages. Third, we show an efficient and effective aggregation technique by visualization and clustering. This technique makes both manual inspection and automated characterization easier. Last, we develop an adaptive probing system to study global scale Internet reliability. It samples and adapts probe rate within each /24 block for accurate beliefs. By aggregation and correlation to other domains, we are also able to study broader policy effects on Internet use, such as political causes, economic conditions, and access technologies.

This thesis provides several examples of Internet knowledge discovery with new mechanisms of sampling and aggregation techniques. We believe our approaches of new sampling and aggregation mechanisms can be used by and will inspire new ways for future Internet measurement systems to overcome resource constraints, such as large amount and dispersed data.

 

Categories
Announcements Data

Complete IPv4 geolocation dataset now available

complete_geoloc_map

We recently finished the work of geolocating all IPv4 addresses and plotted a “complete IP geolocation map“.

This work is based on our previous IMC paper “Towards Geolocation of Millions of IP Addresses“, joint work of Zi Hu, John Heidemann, and Yuri Pradkin.

Processed data from this work is visible on our browsable web map.  The raw data from this effort is available through PREDICT or from the authors.

Categories
Papers Publications

New conference paper “Towards Geolocation of Millions of IP Addresses” at IMC 2012

The paper “Towards Geolocation of Millions of IP Addresses” was accepted by IMC 2012 in Boston, MA (available at http://www.isi.edu/~johnh/PAPERS/Hu12a.html).

From the abstract:

Previous measurement-based IP geolocation algorithms have focused on accuracy, studying a few targets with increasingly sophisticated algorithms taking measurements from tens of vantage points (VPs). In this paper, we study how to scale up existing measurement-based geolocation algorithms like Shortest Ping and CBG to cover the whole Internet. We show that with many vantage points, VP proximity to the target is the most important factor affecting accuracy. This observation suggests our new algorithm that selects the best few VPs for each target from many candidates. This approach addresses the main bottleneck to geolocation scalability: minimizing traffic into each target (and also out of each VP) while maintaining accuracy. Using this approach we have currently geolocated about 35% of the allocated, unicast, IPv4 address-space (about 85% of the addresses in the Internet that can be directly geolocated). We visualize our geolocation results on a web-based address-space browser.

Citation: Zi Hu and John Heidemann and Yuri Pradkin. Towards Geolocation of Millions of IP Addresses. In Proceedings of the ACM Internet Measurement Conference, p. to appear. Boston, MA, USA, ACM. 2012. <http://www.isi.edu/~johnh/PAPERS/Hu12a.html>

 

Categories
Announcements

IP Geolocation in our Browsable IPv4 Map

We’re happy to announce that our browsable Internet map at http://www.isi.edu/ant/address/browse/ now includes IP geolocation.

We plot the latitude and longitude of each IP address around the world as a specific color, placing them on our IPv4 map (the zoomable Hilbert curve).  Thus we can show how blocks of IPv4 addresses map (above) to the globe (below).

AMITE Geolocation of IPv4 as of 2012-06-28
Hue and lightness to longitude and latitude.

On the IP map, we show latitude/longitude by color.  For each address, the longitude is the hue (the colors around the rainbow), so North America is blue; South America, fuschia; Europe and Africa, red; and Asia to Australia yellow to green.  The latitude controls lightness, so things north of the equator are darker, while those south of the equator are lighter. Thus Japan is dark green, while Australia is teal, and Scandanavia is dark read, while south Africa is orange.  (We have released the source code to do this mapping with a BSD license.)

The IP map shows IP all 4 billion addresses on the Hilbert curve.  We have discussed this mapping before (see our poster).

Our IP map is zoomable and draggable, so one can look at particular regions of interest.  For example, here is 128/8, including ISI (in Los Angeles, dark blue), between UC San Diego (also dark blue) and University of Maryland (US east coast, so purple), while the Fininnish University of Helsinki is dark brown, and the Australian University of Melboure is lime green.

Annotated IPv4 geolocation

Our geolocation data comes from three sources:

All of these geolocation sources have varying levels of accuracy, however we hope that the ability to visually relate IP addresses (onthe Hilbert curve) with geolocation (via latitude and longitude as shownby color) provides a fresh look at IP addresses and their locations.

This geolocation work is due to Zi Hu, Yuri Pradkin, and John Heideman.  This work and visualization has been supported by the AMITE project through DHS, and the data (both processed geolocation results and raw data if you can improve our accuracy) will be available through the LANDER project’s datasets and the PREDICT program.

 

Categories
Presentations

New Video About Address Utilization and Allocations on Map Browser

The ANT project released a video describing Internet address allocation and how we study address utilization with IPv4 censuses. Aniruddh Rao prepared this video, working with John Heidemann and Xue Cai.

a scene from the ANT video describing address allocation and census taking

We have also updated our web-based IPv4 address browser to provide information about to what organizations each address block is allocated. The map now visualizes the whois allocation data; we thank the five regional internet registries for sharing this data with us and authorizing this visualization.

organizations in our Internet map

Finally, our web-based IPv4 address browser now has better time travel, with nearly 30 different census from Dec. 2005 to Nov. 2010, and we continue to update the map regularly.

Data collection for this work is through the LANDER project, and the map browser improvements are due to AMITE, both supported by DHS. Video preparation was supported by these projects and NSF through the MADCAT project.

Categories
Papers Publications

New conference paper “Selecting Representative IP Addresses for Internet Topology Studies” to appear at IMC

The paper “Selecting Representative IP Addresses for Internet Topology Studies” (available at http://www.isi.edu/~xunfan/research/Fan10a.pdf) was accepted to appear at the ACM Internet Measurement Conference 2010 in Melbourne, Australia.

From the abstract:

An Internet hitlist is a set of addresses that cover and can represent the the Internet as a whole. Hitlists have long been used in studies of Internet topology, reachability, and performance, serving as the destinations of traceroute or performance probes. Most early topology studies used manually generated lists of prominent addresses, but evolution and growth of the Internet make human maintenance untenable. Random selection scales to today’s address space, but most andom addresses fail to respond. In this paper we present what we believe is the first automatic generation of hitlists informed censuses of Internet addresses. We formalize the desirable characteristics of a hitlist: reachability, each representative responds to pings; completeness, they cover all the allocated IPv4 address space; and stability, list evolution is minimized when possible. We quantify the accuracy of our automatic hitlists, showing that only one-third of the Internet allows informed selection of representatives. Of informed representatives, 50–60% are likely to respond three months later, and we show that causes for non-responses are likely due to dynamic addressing (so no stable representative exists) or firewalls. In spite of these limitations, we show that the use of informed hitlists can add 1.7 million edge links (a 5% growth) to traceroute-based Internet topology studies. Our hitlists are available free-of-charge and are in use by several other research projects.

Citation: Xun Fan and John Heidemann. Selecting Representative IP Addresses for Internet Topology Studies. To appear in Proceedings of the ACM Internet Measurement Conference (IMC). Melbourne, Australia, ACM. November, 2010. http://www.isi.edu/~johnh/PAPERS/Fan10a.html

Categories
Papers Publications

new conference paper “Understanding Block-level Address Usage in the Visible Internet” at SIGCOMM

The paper “Understanding Block-level Address Usage in the Visible Internet” was accepted and presented at SIGCOMM’10 in New Delhi, India (available at http://www.isi.edu/~johnh/PAPERS/Cai10a.html).

From the abstract:

Although the Internet is widely used today, we have little information about the edge of the network. Decentralized management, firewalls, and sensitivity to probing prevent easy answers and make measurement difficult. Building on frequent ICMP probing of 1% of the Internet address space, we develop clustering and analysis methods to estimate how Internet addresses are used. We show that adjacent addresses often have similar characteristics and are used for similar purposes (61% of addresses we probe are consistent blocks of 64 neighbors or more). We then apply this block-level clustering to provide data to explore several open questions in how networks are managed. First, we provide information about how effectively network address blocks appear to be used, finding that a significant number of blocks are only lightly used (most addresses in about one-fifth of /24 blocks are in use less than 10% of the time), an important issue as the IPv4 address space nears full allocation. Second, we provide new measurements about dynamically managed address space, showing nearly 40% of /24 blocks appear to be dynamically allocated, and dynamic addressing is most widely used in countries more recent to the Internet (more than 80% in China, while less than 30% in the U.S.). Third, we distinguish blocks with low-bitrate last-hops and show that such blocks are often underutilized.

Citation: Xue Cai and John Heidemann. Understanding Block-level Address Usage in the Visible Internet. In Proceedings of the ACM SIGCOMM Conference , p. to appear. New Delhi, India, ACM. August, 2010. <http://www.isi.edu/~johnh/PAPERS/Cai10a.html>.

Categories
Publications Technical Report

New tech report “Selecting Representative IP Addresses for Internet Topology Studies”

We just published a new technical report “Selecting Representative IP Addresses for Internet Topology Studies” (available at ftp://ftp.isi.edu/isi-pubs/tr-666.pdf) .

From the abstract:

An Internet hitlist is a set of addresses that cover and can represent the the Internet as a whole. Hitlists have long been used in studies of Internet topology, reachability, and performance, serving as the destinations of traceroute or performance probes. Most early topology studies used manually generated lists of prominent addresses, but evolution and growth of the Internet make human maintenance untenable. Random selection scales to today’s address space, but most andom addresses fail to respond. In this paper we present what we believe is the first automatic generation of hitlists informed censuses of Internet addresses. We formalize the desirable characteristics of a hitlist: reachability, each representative responds to pings; completeness, they cover all the allocated IPv4 address space; and stability, list evolution is minimized when possible. We quantify the accuracy of our automatic hitlists, showing that only one-third of the Internet allows informed selection of representatives. Of informed representatives, 50–60% are likely to respond three months later, and we show that causes for non-responses are likely due to dynamic addressing (so no stable representative exists) or firewalls. In spite of these limitations, we show that the use of informed hitlists can add 1.7 million edge links (a 5% growth) to traceroute-based Internet topology studies. Our hitlists are available free-of-charge and are in use by several other research projects.

Citation: Xun Fan and John Heidemann. Selecting Representative IP Addresses for Internet Topology Studies. Technical Report N. ISI-TR-666, USC/Information Sciences Institute, June, 2010. http://www.isi.edu/~johnh/PAPERS/Fan10a.html

Categories
Announcements

multiple views in browsable Internet address map

We’re happy to announce an update to our browsable Internet map at http://www.isi.edu/ant/address/browse/. Our map now includes FIND ME and MULTIPLE VIEWS.

screenshot of browsing RTTs in the Internet
screenshot of browsing RTTs in the Internet

FIND ME: To locate any host on the map, click in the IP address address box (at the top right) and type in a hostname. A pushpin will appear at that address, with a bubble indicating the hostname and IP address, and the map will scroll to the location. No more manually finding addresses!

MULTIPLE VIEWS allow users to flip between different data types, census dates, source locations:

  1. DATA TYPES: We now plot round-trip times in addition to prior ping responsiveness. See how far away the Internet is! (At least from our probing sites.)
  2. CENSUS DATES: We currently plot five datasets from Nov 2006 to June 2009. Travel through time to see the Internet of yesteryear!
  3. SOURCE LOCATIONS: We collect data from two different locations: Los Angeles and Colorado State University, to help understand if we have observation bias. See the Internet from sea level, or a mile high!

To select different views, click the +-sign on the right of the screen and pick from the menus.

Data collection for this work is through the LANDER project http://www.isi.edu/ant/lander/, and the visualization improvements are due to AMITE http://www.isi.edu/ant/amite/, both supported by DHS.  We thank OpenLayers.org for the customizable front-end.