This web page documents the format of our IP Address Space Hitlist. Our address space hitlist is available upon request.
An IP Address Space Hitlist is a list of IP addresses in the Internet that we believe are reachable via ping. Topology studies and routing often look at traceroutes to these addresses to understand internet topology.
Example topology studies and tools include CADIA Archipelago (and previously skitter), studies of routing on reachability (for example, “Testing the reachability of (new) address space”).
Our hitlists are primiarily dervied from our Address Censuses. We update our hitlists using new census data is it comes in. In addition, to start our list, we began with the data from Olaf Maennel (and Randy Bush, Matthew Roughan, and Steve Uhlig, used in their paper “Internet Optometry: Assessing the Broken Glasses in Internet Reachability”.), and our understanding is that this list dates back to the CAIDA Skitter list started by kc claffy et al.
The selection goals and methodology for our list:
Our goal is to provide representatives that are responsive, complete and stable.
By complete, we mean we report one representative address for every allocated /24.
By stable, we mean and that the hitlist is reasonably stable over time.
To promote responsivness, we select the one IP address out of each /24 prefix that appeared most responsive historically. We do not guarantee that that address responded in the most recent census, but we bias our selection to favor recent results.
To be complete , For /24s that used to respond: If a prefix used to respond, but hasn't responded recently, we arbitrarily pick a random address in that prefix and report a score of -1.
To be complete , If we have never see any responses for that prefix, we just take x.x.x.1 as the representative, and with score -2 (see detail below). We report allocated /24 prefix according to IANA address allocation (http://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.txt).
To promote stabililty, we only switch addresses when they improve the score significantly (by 34).
Currently we consider a history of 16 prior censuses.
The format is a simple text file, one entry per line, with fields separated by tabs. The first line is a header indicating the column names:
#fsdb -F t hex_ip score ip
The meaning of each column:
hex_ip: the representative IP address from that prefix, in hexadeciml
score: our estimation of the quality of the result, from 0 to 99. A 99 represents a perfect response for all the history we consider. Scores are not linear (a score of 50 is not necessarily half as reponsive as a score of 99). As a special case, scores less than 0 mean particular things: -1 means there were no positive responses for any address in the prefix for the recent history. -2 means there have never been any positive responses.
ip: the representative IP address for that prefix, in dotted-quad notation.
Here’s a partial example of a file (with fake IP addresses):
    #fsdb -F t hex_ip score ip
    0a0025d6        99      10.0.37.214
    0a002615        99      10.0.38.21
    0a002713        -1      10.0.39.19
    0a0029b2        38      10.0.41.178
    0a002a01        -2      10.0.42.1
Pre-release hitlists (alpha and beta) omit some results. Alpha hitlists do not include representatives with scores -1 or -2; beta hitlists omit representatives with scores of -2.
We recommend that researchers using hitlist comply with best practices for active measurement.
Make your measurement computer easy to identify as research infrastructure.
a. Give it a meaningful hostname.
b. Run a webserver on it with a web page that gives information about the experiment.
c. If possible, include identifying information in the active measurement.
Maintain an opt-out list.
Our hitlist is algorithmicly derived from ISI IPv4 census data. We take these steps for ISI IPv4 censuses, and people who opt out of censuses will eventually get a score of -1 in the hitlist.
Users of our hitlist may want to discard targets with negative scores before using it.