ANT Datasets

ANT provides a number of datasets in different formats.

Getting Datasets

See our separate datasets requests page for steps to take to get access to our data.

See also our list of all datasets, and pointers to their formats.

Dataset Categories

We have several categories of dataset types:

Address Space Allocation Data

Address Space Allocation Data contains Internet addresses that have some properties that characterize Internet topology (for example, addresses that respond with different codes, or that appear to be dynamic, etc.). The IP addresses in this dataset are not typically anonymized because they are determined from measurement traffic and not actual sender-receiver communications and so are not associated with specific individuals. This data can be used to better understand the Internet topology and address usage.

Specific sub-categories of address space allocation data include:

Other Internet Topology data

Much of our data is IP Packet Headers: These datasets are comprised of headers of traffic data, containing information such as anonymized source and destination IP addresses and other IP and transport (e.g., TCP, UDP, ICMP, SCTP) header fields. No packet contents are included. Depending on the specific dataset, this category of data can be used for characterization of typical Internet traffic, or of traffic anomalies such as DDoS attacks, port scans, or worm outbreaks.

We also have traffic flow data: Network traffic can represented as flows between two endpoints. This dataset contains traffic flow information, which includes a variety of attributes such as source and destination IP address, source and destination port, protocol type, and packet and byte counts. This data can be in different formats generated by a range of different collection tools such as NetFlow, IPFIX, and argus, or variants. IP addresses in these files are anonymized on a per-dataset or per-time interval basis. These datasets are useful for research such as network economics and accounting, network planning, analysis, security, denial of service attacks, network monitoring, as well as traffic visualization.

DNS Data

We have DNS data (Domain Name System), showing DNS-protocol lookups. It comes in three flavors:

Public DNS data: Ths dataset consists of Domain Name Systems data derived from public sources, such as from pubic DNS servers. It is not associated with users and has no privacy constraints.

An example public DNS dataset is our reverse DNS data rdns_ipv4-20160312.

Anonymized DNS data: Ths dataset consists of Domain Name Systems data that contains no identifying information for individuals, either because it was anonymized, or because it is aggregated to the level that individual’s queries are obscured, or contains only experimental data from test programs (not individuals).

An example anonymized DNS datasets is our DITL data DITL_B_Root-20160405.

Limited DNS data: Ths dataset consists of Domain Name Systems data that does not directly identify individuals, but that has cannot be combined with other data sources.

Service Enumeration Data

Our service enumeration data consists of our efforts to enumeration different Internet services, now includeing:

Anycast enumeration datasets: Active probing information to DNS anycast services such as root DNS. Typically probes are made from many vantage points with the goal to enumerate all anycast nodes in the service. Anycast enumeration datasets are useful to understand the operational status and geographic reach of anycast services and nodes. For detailed information about the dataset, please refer to dataset description page.

Google front-ends enumeration and mapping: Active DNS queries with EDNS-client-subnet allow enumeration of Google front-ends IP addresses. With all the front-ends IP addresses, we use new technique to geolocate the front-ends and clustering them into serving sites. For detailed information about the dataset, please refer to dataset description page.

Reverse DNS data: We collect and provide a crawl of the IPv4 Reverse DNS domain names.

An example reverse DNS dataset is rdns_ipv4-20160312.

Other DNS data: we have data related to DNS backscatter.

Internet Outage Data

Datasets in this category record information about Internet outages–address blocks that become unreachable. Typically outages are inferred from active probing. It may include /24 block-level outages over time, or lists of inferred outages that affect larger parts of the Internet. Outage data can be useful to understand Internet reliability.

For detail of the dataset, please refer to the description page.

Other types of data

other paper-specific datasets: We have several other datasets specific to papers we have published, including p2p traffic detection, TCP SYNs, etc.

Data Formats

(Documentation about dataset formats is now here.)