Description of IP Accumulation Datasets

This web page documents our datasets about IPv4 accumulation–counts of the number of active addresses per /24.

Datasets are distributed as a number of files, identify as part-NNNNN.xz, where NNNNN are decimial digits.

Each file is tab-separated value with the following header line:

#fsdb -F t block timebin lit available  

This header defines the schema:

  • block: a 8-hex digit version of the IP block. The last two bytes will always be 00.

  • timebin: a unix timestamp indicating when this period begins

  • lit: the nubmer of IP addresses that have been responsive since the last time they were scanned

  • available: The number of IP addresses that are being scanned.

Typically available has the same value, corresponding to the number of ever-active addresses in the target block. For the first few timebins, available may be smaller until all addresses have been scanned at least once. Then each subsequent entry is incremntally updated.

Data is sorted by block and then timebin.

Blocks are distributed randomly across all files.

Timebins are uniformly spaced.

Sample data showing one block:

block timestamp duration n_active_ip probed_ip 0104de00 1577900940 660 149 253 0104de00 1577901600 660 150 255 0104de00 1577902260 3960 152 256 0104de00 1577906220 660 153 256 …