Categories
Announcements Data

Complete IPv4 geolocation dataset now available

complete_geoloc_map

We recently finished the work of geolocating all IPv4 addresses and plotted a “complete IP geolocation map“.

This work is based on our previous IMC paper “Towards Geolocation of Millions of IP Addresses“, joint work of Zi Hu, John Heidemann, and Yuri Pradkin.

Processed data from this work is visible on our browsable web map.  The raw data from this effort is available through PREDICT or from the authors.

Categories
Announcements

ANT project blog moved

The ANT Project blog has moved from http://www.isi.edu/ant/blog to it’s new location at http://ant.isi.edu/blog/

If you’re watching the blog via RSS, you may want to update your feedreader.

Categories
Announcements

IP Geolocation in our Browsable IPv4 Map

We’re happy to announce that our browsable Internet map at http://www.isi.edu/ant/address/browse/ now includes IP geolocation.

We plot the latitude and longitude of each IP address around the world as a specific color, placing them on our IPv4 map (the zoomable Hilbert curve).  Thus we can show how blocks of IPv4 addresses map (above) to the globe (below).

AMITE Geolocation of IPv4 as of 2012-06-28
Hue and lightness to longitude and latitude.

On the IP map, we show latitude/longitude by color.  For each address, the longitude is the hue (the colors around the rainbow), so North America is blue; South America, fuschia; Europe and Africa, red; and Asia to Australia yellow to green.  The latitude controls lightness, so things north of the equator are darker, while those south of the equator are lighter. Thus Japan is dark green, while Australia is teal, and Scandanavia is dark read, while south Africa is orange.  (We have released the source code to do this mapping with a BSD license.)

The IP map shows IP all 4 billion addresses on the Hilbert curve.  We have discussed this mapping before (see our poster).

Our IP map is zoomable and draggable, so one can look at particular regions of interest.  For example, here is 128/8, including ISI (in Los Angeles, dark blue), between UC San Diego (also dark blue) and University of Maryland (US east coast, so purple), while the Fininnish University of Helsinki is dark brown, and the Australian University of Melboure is lime green.

Annotated IPv4 geolocation

Our geolocation data comes from three sources:

All of these geolocation sources have varying levels of accuracy, however we hope that the ability to visually relate IP addresses (onthe Hilbert curve) with geolocation (via latitude and longitude as shownby color) provides a fresh look at IP addresses and their locations.

This geolocation work is due to Zi Hu, Yuri Pradkin, and John Heideman.  This work and visualization has been supported by the AMITE project through DHS, and the data (both processed geolocation results and raw data if you can improve our accuracy) will be available through the LANDER project’s datasets and the PREDICT program.

 

Categories
Announcements

multiple views in browsable Internet address map

We’re happy to announce an update to our browsable Internet map at http://www.isi.edu/ant/address/browse/. Our map now includes FIND ME and MULTIPLE VIEWS.

screenshot of browsing RTTs in the Internet
screenshot of browsing RTTs in the Internet

FIND ME: To locate any host on the map, click in the IP address address box (at the top right) and type in a hostname. A pushpin will appear at that address, with a bubble indicating the hostname and IP address, and the map will scroll to the location. No more manually finding addresses!

MULTIPLE VIEWS allow users to flip between different data types, census dates, source locations:

  1. DATA TYPES: We now plot round-trip times in addition to prior ping responsiveness. See how far away the Internet is! (At least from our probing sites.)
  2. CENSUS DATES: We currently plot five datasets from Nov 2006 to June 2009. Travel through time to see the Internet of yesteryear!
  3. SOURCE LOCATIONS: We collect data from two different locations: Los Angeles and Colorado State University, to help understand if we have observation bias. See the Internet from sea level, or a mile high!

To select different views, click the +-sign on the right of the screen and pick from the menus.

Data collection for this work is through the LANDER project http://www.isi.edu/ant/lander/, and the visualization improvements are due to AMITE http://www.isi.edu/ant/amite/, both supported by DHS.  We thank OpenLayers.org for the customizable front-end.

Categories
Announcements Collaborations Software releases

ANT extensions for bzip2-splitting to appear in Hadoop

The ANT project is happy to announce that our extensions to Hadoop to support splitting of bzip2-compressed files have been accepted to appear in the next Hadoop release (will be 0.21.0).

Support for compression is important in map/reduce because it reduces the amount of I/O, and because important input files (for us, our Internet address censuses) are provided in compressed format.

Splitting is important in map/reduce, because splitting allows many computers to process parts of a few big files.  Since the whole point of Hadoop and map/reduce is processing big files (for us, 4GB or more) with many computers (for us, dozens to hundreds), splitting is really essential.

Until now, Hadoop did not support splitting of compressed files.  Instead, if input data was compressed, you get at most one computer per file.  Some work-arounds were possible, but basically unpleasant, and often requiring that one rewrite all the input data is some other format.

Our extensions (see HADOOP-4012 and MAPREDUCE-830, plus HADOOP-3646 that went into 0.19.0) support Hadoop execution over bzip2 files with automatic splitting.  Getting this done was trickier than one might expect:  Hadoop really wants to decide where to split files, yet bzip2 can only support splits at specific locations that are different, and users don’t care about either of these but instead only about their record boundaries.  Fortunately, we were able to align all of these constraints, and deal with the corner cases that inevitably arise.  (What if the bzip2 marker appears in normal data?  What happens when markers exactly align, or are off-by-one?)

Abdul Qadeer did this work in 2008, working with Yuri Pradkin and me (John Heidemann), and continued to work with the patch through its getting committed.  We especially thank Chris Douglas at Yahoo for shepherding patch through the Hadoop bug tracking system, including helping clean it up and add test cases.  And we thank Doug Cutting for initially suggesting bzip2 as a splittable compression scheme.

This work was supported by NSF through the MR-Net research project (CNS-0823774).

Categories
Announcements

Hello world!

Welcome to the ANT Project Blog.  Folks are welcome to subscribe to the RSS feed for this blog if they wish to track research related to the analysis of Internet traffic in the ANT group at ISI, USC, and CSU.  We expect this blog to be very low traffic (research takes time!).