Internet Address Survey Bitstring Format Description

Our earliest Internet address surveys stored data as a bitstring, described below. This approach has long since been retired in favor of a 24B binary format that preserves far more information.

Historical Bitstring Address Survey Format

We encode the whole dataset as a bitstring. A bit can be set to either 1 or 0 depending on whether the address responded to our probes or it did not. Since IPv4 addresses are 32 bits, such bitstring must contain 2^32 bits to cover the whole IPv4 address range.

For example, to check whether or not address 123.45.67.89 is present in the dataset, all we need to do is to find the index of the bit in the bitstring and check if it is set. The index is given by:

bit_index = (123 << 24) | (45 << 16) | (67 << 8) | 89

It follows that the size of the dataset is 2^32 bits or 512MB. Such representation is particularly useful for performing set operations such as union (it becomes a logical OR of two bit strings) and intersection (it becomes a logical AND of the two bit strings).