Internet Address Survey Binary Format Description

This web page describes the binary format of our address survey data. We provide open source parsers that read this format and output text format (print_datafile), and work within hadoop (ICMPTrainRecordReader). This web page can guide implementation of new parsers.

A sample view of Internet Census internet_address_census_it66w-20150911: it66w, 2015-09-11

Binary Format Version 3

Binary format version 3 is used for surveys and censuses starting it29*-20091102.

Dataset is divided into one or several files, each file named after the probing machine used. Each binary files are compressed using bzip2 and contain records of two types: DATAv3 and TEXTv3. DATA-type records describe the results of a single probe, while TEXT-type records can store arbitrary text (metadata). All fields stored in big endian byte order.

DATAv3 record type

Field Name Byte Length Description
Type 1 =5. This is type of DATAv2 record and it’s always set to 5
Length 1 =24. This is the length of DATAv2 record and it’s always set to 24
ICMP reply type 1 Type of ICMP message received. This would be 0 for echo reply, 3 for destination unreachable, and 8 for no reply. Other types are possible, see rfc792 for details.
ICMP reply code 1 Code of ICMP message received. Typical values are:
    - 0 for echo reply (type 0),
    - 0 for destination network unreachable (type 3),
    - 2 for destination protocol unreachable (type 3),
    - 9 for network administratively prohibited (type 3),
    - 10 for host administratively prohibited (type 3),
    - 13 for communication administratively prohibited (type 3).
    Type 3 and codes 9, 10, and 13 are typically indicative of a firewall or router blocking access to the destination. Other types are possible, see rfc792 for details.
Reserved 2 =0, Reserved for future use.
Flags 1 Flags marking this record:
    - bit 0: packet dumped (if 1, this record is associated with a pcap capture, more on this in Pcap Capture section below).
    - bit 1: (v3 and on) for ICMP_UNREACHABLEs: matched reflected header’s destination to a probing record (see more on this in the Format v3 description.
    - bit 2: (v3 and on) for ICMP_UNREACHABLEs: matched source of the packet to a probing record (see more on this in the Format v3 description.
    - bit 3,4: (v3 and on, from it50 and on, zeros before) reflect results of sending/matching of a payload “cookie” in ECHO_REQUESTs. See more on this in Payload Matching.
TTL 1 Remaining TTL of the response: this field is copied from the IP header of the response packet and should be equal to initialTTL minus Number of hops from probed host to probing systemt. Because we don’t know what the TTL was initalized with, the number of hops is only to be guessed at.\ Special case: for type 3 responses, the original datagram of the probe is included with the reply. In such cases, we set this field to the value from the original probe IP header. This means that for type 3 the hop distance is 64 minus this value.
Timestamp 4 Sent (if not available, received) in seconds since the Epoch See also Pcap Capture for other semantics of this field.
RTT 4 RTT in microseconds from the time probe was sent until we got a response. If we were unable to match a probe with a response record, this is set to zero. See also Pcap Capture for other semantics of this field.
Probe IP 4 IP address of the probe records. If the reply could not be matched to a probe, this field is set to zero.
Response IP 4 IP address of the response. If this is a probe that timed out without reply (ICMP reply type is 8), then this is set to zero.

TEXTv3 record type

Field Name Byte Length Description
Type 1 =6. This is type of TEXTv2 record and it’s always set to 6
Length 1 =24. This is the length of TEXTv2 record and it’s always set to 24
Text 22 Arbitrary text (metadata)

Text is in UTF-8 encoding (in practice, it uses only the 7-bit ASCII subset). Text shorter than one record should be NUL-padded. There is no guarantee that text will always include a trailing null (text exactly 22 characters long will fill exactly one record). Text longer than one record should be stored in multiple records; those records will be concatinated on output, and intermediate records will not include NUL characters.

Pcap capture

Some data files have accompanying pcap capture files. Normally an ICMP Echo Request message is replied to with an ICMP Echo Reply or an ICMP Unreachable. Other responses are sometimes possible, for example type 4 (source quench), type 5 (redirect), type 11 (time exceeded), and others. In later versions of our software we save whole received datagrams with ICMP types other than 0, 3, and 11 (too many of those!) to pcap capture files. In order to facilitate matching file records to/from pcap packet records, timestamps of pcap packets are saved in the Timestamp field of DATAv2 record type (tv_sec) and RTT field of DATAv2 record type (tv_usec). These timestamps are guaranteed to be unique giving a one to one mappings between the binary format described here and pcap traces for ICMP Reply types other than 0, 3, and 11.

Background

ICMP_UNREACHABLEs sent to us in response to our ICMP_ECHO_REQUESTs should contain a copy of the received IP header and the ICMP_ECHO_REQUEST. We shall call this copy a reflected request. In the v2 of the application/data format we implicitly trusted the destination address of the reflected packet (from the ip header) to be the probe address. Thus, when we received an ICMP_UNREACHABLE, we used this address to look up in prober’s cache and if found, match the response to the request (one indication of having matched a response would be a non-zero RTT). So far so good. However, if no match was found, we just assumed that the probe address was the destination address of the reflected packet and recorded it as such in the data. In some cases it turned out to be not very reliable, apparently because of NAT. In the examples that we examined manually, we saw the reflected destination address re-mapped to private address space (10/8), so NATs were modifying IP addresses in “outer” IP headers, but not in the reflected (embedded) IP headers. As a result, we had probe records containing unreliable (often private) addresses in the probe-field.

Reading and Writing Binary Format

Sample code for reading and printing binary records is provided:

  • Converting to textual format (and sample reading code): see print_datafile at our address survey software page. (Or if you are at ISI, see git ANT/CODE/icmptrain_print)

    When printed, the ICMP reply typ and reply code are often output as a typeandcode 4-byte hexadecimal field. Common values (in decreasing frequency):

    typeandcode: meaning

    0800: no reply

    0000: echo reply—the expected result with a host present

    03xx: error reply—the xx show the ICMP reply code more detail

    Other values are possible but quite rare.

  • Reading from within hadoop: We have a package ICMPTrainRecordReader that allows Hadoop to directly read bininary format v3. There are some usage examples documented in ISI wiki (for internal use only). If you’re an outside user, you can download the source and pre-built jar of the reader (icmptrain-hadoop-reader.tar.gz from our address survey software page. Please run tar zxvf icmptrain-hadoop-reader.tar.gz to extract.

How to Interpret IP Addreses

Because probing and the world is complicated, researchers should take care with how they interpret probeaddr and replyaddr.

Guaranteed Results

If you want to guarantee you only get the exact addresses we probe, then:

  1. if (typeandcode == 0x0000 && probeaddr != 0x00000000), then return probeaddr (a correct, matched reply),
  2. if (typeandcode == 0x0800 && replyaddr == 0x00000000), then return probeaddr (timeout with non-reply),
  3. if (type == 0x03 && (flags & 6) != 0), then return probeaddr (an ICMP error where we could confirm the reply address).
  4. else return replyaddr (icmptrain couldn’t figure it out, so just use the srcip in the header)

Pretty Good Results (and simpler)

If you want pretty good results:

  1. if probeaddr != 0, then return probeaddr
  2. if probeaddr == 0, then return replyaddr

Details

If you want to take more care and understand all the cases, read below.

replyaddr is always taken directly from the incoming packet.

However, replyaddr cannot always be trusted for two reasons. (1) return addresses can be forged, and (2) the return address can be different from the the original target addresss of the probe if the target is multihomed or is a NAT with port forwarding. We do not know how common these cases are, but we believe they occur in a few percent of reply records. Because of these cases, though, one can get replyaddr values that are different from the network that is probed. (For example, we see some replyaddrs in 10/8 and we never probe it.)

The probeaddr is handled more carefully because we usually know what addresses we probe. We describe the exact algorithm we use to compute the probe address below (v2 packet matching and v3 package matching).

In general, probeaddr can be fully trusted if (1) typeandcode == 0x0000 and probeaddr != 0x00000000 (a correct, matched reply), or (2) if typeandcode == 0x0800 && replyaddr == 0x00000000 (timeout with non-reply), or (3) type == 0x03 && (flags & 6) != 0 (an ICMP error where we could confirm the reply address).

Probeaddr may be less trustworth if type == 0x03, ICMP error reply. If type == 0x03 && (flags & 6 == 0) then we were not able to match the packet, and probeaddr may be incorrect (if present, it was taken from the reply packet, but not all OSes reflect the original echo request back in the reply, and NATs can interfere).

As a result, users of this data may wish to filter out and discard untrustworthy records (as defined above).

Distinguishing timeouts from ECHO_REQUESTs

Both timeouts and ICMP ECHO_REQUESTs will have typeandcode set to 0x0800. To discriminate between them, one needs to consider the value of the replyaddr: replyaddr is always 0 (0x00000000) for timeouts; for non-timeouts the address is set to the source address of the ECHO_REQUEST. In addition, if pcap dump is enabled, these packets are also dumped and (flag&1) == 1.

v2 Packet Matching

In the v2 of application/data-file we used this algorithm to match ICMP_UNREACHABLE responses:

  1. if match_in_output_cache(reflected_header->dst_ip):
     then output with
         out.probe_ip = reflected_header->dst_ip
         out.response_ip = src_ip
         out.rtt = calc_rtt()
  2. else
         out.probe_ip = reflected_header->dst_ip
         out.response_ip = src_ip
         out.rtt = 0

As discussed in the previous section, we only attempted to match reflected header and if there was no match, we trusted the reflected header’s destination address to be the probe address.

v3 Packet Matching

Version 3 of the application/data-file adds 2 additional flags:

#define IPR_FLAG_MATCH_RH   0x02
#define IPR_FLAG_MATCH_SRC  0x04

They are used to capture three possible outcomes of the matching of ICMP_UNREACHABLEs process:

  1. if match_in_output_cache(reflected_header->dst_ip):
     then output with
        out.probe_ip = reflected_header->dst_ip
        out.response_ip = src_ip
        out.rtt = calc_rtt()
        out.flags |= IPR_FLAG_MATCH_RH
        if reflected_header->dst_ip == src_ip:
        then 
            out.flags |= IPR_FLAG_MATCH_SRC
     (i.e.: first try to match the reflected header and if it's found:
      PROBABLY_CLEAN if reflected_header->dst_ip == src_ip
      MAYBE_MULTI_HOMED if reflected_header->dst_ip != src_ip)

  2. else if match_in_output_cache(src_ip):
        out.probe_ip = src_ip
        out.response_ip = reflected_header->dst_ip
        out.rtt = calc_rtt()
        out.flags |= IPR_FLAG_MATCH_SRC

     (i.e.: next try to match to source of response:
      PROBABLY_NAT if out.flags & IPR_FLAG_MATCH_SRC)

  3. else:
         out.probe_ip = reflected_header->dst_ip
         out.response_ip = src_ip
         out.rtt = 0
     (i.e. none of the two flags set, really: PROBABLY_SPURIOUS_REPLY, we also
      save such entire packets in a pcap trace)

In summary, when analyzing a record of an ICMP_UNREACHABLE, consider:

  1. IPR_FLAG_MATCH_RH is set, IPR_FLAG_MATCH_SRC is set:
    • means that reflected header’s destination IP was found in our probed cache and reflected header’s destination IP is equal to the source IP of the Un reachable: mostly likely interpretation is that the Unreachable was sent by the probed system (or on behalf of it, e.g. by a NAT). In that case fileds are set as follows:

        probe_addr = src_ip
        reply_addr = src_ip
        rtt        = ...
      
  2. IPR_FLAG_MATCH_RH is set and IPR_FLAG_MATCH_SRC is not:
    • means that while reflected header’s destination IP was found in our probed cache, source IP of the Unreachable was different: most likely the Unreachable was sent sent by an intermediate system or a multi-homed target. The fields are set as follows:

        probe_addr = reflected_header->dst_ip
        reply_addr = src_ip
        rtt        = ...
      
  3. IPR_FLAG_MATCH_RH is unset and IPR_FLAG_MATCH_SRC is set:
    • means that source IP of the unreachable was found in the probed cache. This mostly likely is indicative of a NAT. The address fields are:

        probe_addr = src_ip
        reply_addr = reflected_header->dst_ip
        rtt        = ...
      
  4. IPR_FLAG_MATCH_RH is unset and IPR_FLAG_MATCH_SRC is unset:
    • means that neither source IP of the unreachable nor reflected header’s destination IP was found in our probed cache. It could be a spurious reply or a reply that was delayed (queued) for a long time, usually several seconds. The fields are set as:

        probe_addr = reflected_header->dst_ip
        reply_addr = src_ip
        rtt        = 0
      

    Such packets are saved in a pcap trace.

Payload matching

Starting it50, we include a 4 bytes payload “cookie” in each ICMP echo request sent. ICMP Unreachables, when they include a copy of the original echo_request, may or may not return the payload cookie. When available, we attempt to match this received cookie to the cookie sent, which provides additional guarantees against spoofing. Thus, from it50 forward, we’re using 2 more flags:

#define IPR_FLAG_COOKIE1        0x08
#define IPR_FLAG_COOKIE2        0x10

Ther meaning is as follows:

COOKIE2, COOKIE1: 00    -cookie not tried (backward compatible with older v3 censuses/surveys
COOKIE2, COOKIE1: 11    -cookie tried, matched
COOKIE2, COOKIE1: 10    -cookie tried, notmatched
COOKIE2, COOKIE1: 01    -cookie tried, not returned

Binary Format Version 2

Field Name Byte Length Description
Type 1 =3. This is type of DATAv3 record and it’s always set to 3
Type 1 =4. This is type of TEXTv3 record and it’s always set to 4

All other fields are the same as in format v2, except for the interpretation of ICMP_UNREACHABLE’s and a couple of additional flags as we discuss below.

Binary Format Version 1

In older datasets we used binary format version 1. We are not distributing our datasets in this format any longer. The following description is therefore provided only for those who obtained older versions of our datasets. The main differences betweenversion 1 and version 2 are:

  • v1 is variable length, while v2 is fixed length
  • v1 did not have pcap captures
  • v1 did not record ICMP reply types other than 0, 3, and 8.

DATAv1 records can be described by the following “C” structure:

#define IPR_TYPE_DATAv1         0x1  /* XXX deprecated in future use */
#define IPR_TYPE_DATAv1_LEN     sizeof(icmptrain_probe_record_datav1_t)
typedef struct icmptrain_probe_record_datav1_ {
    /* all fields in network byte order */
    uint8_t ipr_type;           /* record type   = IPR_TYPE_DATAv1 */
    uint8_t ipr_len;            /* record length = IPR_TYPE_DATAv1_LEN */
    uint8_t ipr_reply_type;     /* reply type (or IPR_REPLY_NOREPLY) */
    uint8_t ipr_ttl;            /* remaining ttl of the response */
    uint32_t    ipr_time_s;     /* sent (if not available, received) seconds since the Epoch */
    uint32_t    ipr_rtt_us;     /* us */
    uint32_t    ipr_probe_addr; /* probed address */
    uint32_t    ipr_reply_addr; /* if different from probe_addr, or 0 */
} icmptrain_probe_record_datav1_t;

TEXTv1 records are represented by the following structure:

#define IPR_TYPE_TXT_v1         0x2  /* XXX deprecated in future use */
#define IPR_TYPE_TXT_v1_LEN     255
typedef struct icmptrain_probe_record_txt_v1_ {
    uint8_t ipr_type;           /* = IPR_TYPE_TXT_v1 */ 
    uint8_t ipr_len;            /* = IPR_TYPE_TXT_v1_LEN */
#define IPR_MSG_TXT_v1_MAX  (IPR_TYPE_TXT_v1_LEN-2)
    char    ipr_msg[IPR_MSG_TXT_v1_MAX];
} icmptrain_probe_record_txt_v1_t;