LANDER:as to org mapping inferred truth-20100507 From Predict README version: 4058, last modified: 2014-06-6. This file describes the trace dataset "as_to_org_mapping_inferred_truth-20100507" provided by the LANDER project. Contents • 1 LANDER Metadata • 2 Dataset Contents • 3 Data Format • 3.1 Syntax • 3.2 Schema • 3.3 How Organization vs. AS files relate • 4 Clustering Method • 5 Citation • 6 Results Using This Dataset • 7 User Annotations LANDER Metadata ┌───────────────────────────┬────────────────────────────────────────────────────────────────────────────────────┐ │ dataSetName │ as_to_org_mapping_inferred_truth-20100507 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ status │ usc-web-and-predict │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ shortDesc │ The ASes clustered into 9 organizations │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ longDesc │ This dataset identified the ASes beloging to 9 large Internet organizations. We │ │ │ determined these ASes by manual inspection of RIR whois information, using AS │ │ │ names and external information (company web pages, wikipedia, etc.) to infer a │ │ │ feasible ground truth. │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ datasetClass │ Quasi-Restricted │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ commercialAllowed │ true │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ requestReviewRequired │ true │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ productReviewRequired │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ ongoingMeasurement │ true │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ submissionMethod │ Upload │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionStartDate │ 2010-05-07 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionStartTime │ 00:00:00 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionEndDate │ 2010-05-07 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ collectionEndTime │ 00:00:00 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityStartDate │ 2013-03-04 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityStartTime │ 18:10:01 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityEndDate │ 2030-01-01 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ availabilityEndTime │ 00:00:00 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ anonymization │ none │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ archivingAllowed │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ keywords │ category:internet-topology-data, subcategory:as-organizational-data, internet, │ │ │ topology, one-time │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ format │ text │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ access │ https │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ hostName │ USC-LANDER │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ providerName │ USC │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ groupingId │ │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ groupingSummaryFlag │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ retrievalInstructions │ download │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ byteSize │ 1048576 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ expirationDays │ 14 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ uncompressedSize │ 35602 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ impactDoi │ 10.23721/109/1353656 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ useAgreement │ dua-ni-160816 │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ irbRequired │ false │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤ │ privateAccessInstructions │ See https://ant.isi.edu/datasets/#getting-datasets for information on obtaining │ │ │ this dataset. │ │ │ See │ └───────────────────────────┴────────────────────────────────────────────────────────────────────────────────────┘ Dataset Contents as_to_org_mapping_inferred_truth-20100507.README.txt      copy of this README orgs.fsdb      9 large Internet organizations and their corresponding AS cluster IDs ases.fsdb      ASes belonging to the 9 organizations, each annotated with a cluster ID     .sha1sum SHA-1 checksum The file ".sha1sum" contains SHA1 checksums of individual compressed files. The integrity of the distribution thus can be checked by independently calculating SHA1 sums of files and comparing them with those listed in the file. If you have the sha1sum utility installed on your system, you can do that by executing: sha1sum --check .sha1sum Data Format Syntax Each of the *.fsdb files are in FSDB file format---this is a simple, white-space-separated text database format, where each line is a database row and whitespace separates columns. Schema Each file is a simple database with 5 columns in total, which are: ┌───────────┬────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ the unique identifier that identifies an AS cluster, in the format of "method-id" of which │ │ clusterid │ "method" indicates the clustering method (always "manual" in this dataset because the ASes are │ │ │ clustered by manual inspection) and "id" is an unique identifier in that domain, such as │ │ │ "manual-1". │ ├───────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ orgname │ the name of the organization. │ ├───────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ asn │ the AS Number, unique identifier of an Autonomous System (AS). │ ├───────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ rir │ the Regional Internet Registry (RIR) the AS belongs to, should be one of {arin, ripe, apnic, │ │ │ lacnic, afrinic}. │ ├───────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ asname │ the name of the AS. │ └───────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┘ If the value in a certain column is "-", it means the info is not available for that organization/AS. How Organization vs. AS files relate The dataset has two files that relate to each other. These two files are: the organization file (orgs.fsdb) and the AS file (ases.fsdb). The orgnization file lists the 9 organizations, one organization per row. The AS file lists all ASes belonging to the 9 organizations, and which organization that AS belongs to. To see which ASes belong to the same organization (sharing the same clusterid), join by clusterid, the organization file with AS file. Clustering Method This dataset identified the AS belonging to 9 large Internet organizations. We determined these ASes by manual inspection of RIR whois information, using AS names and external information (company web pages, wikipedia, etc.) to infer a feasible ground truth. We inferred the ground truth around May 7th, 2010 (the suffix of this dataset, 20100507). Thus, as a snapshot, one should expect this dataset to be correct ONLY around this time. The 9 organiation are four telecommunications companies, Verizon (234 ASes), Comcast (48), Time Warner Cable (35), and China Mobile (CN Mobile) (10); four content providers, Yahoo (76), Akamai (32), Google (21), and Limelight| (11); and a root-DNS provider, Internet Systems Consortium (ISC) (55). We identified ASes as part of the same organization by manually inspection of keywords in AS names, such as organization names, subsidiary names and merger and acquisition company names. Although this data is the best we could infer and we have used it as best available ground truth to test automated algorithms, we cannot guarantee its completeness or accuracy. Citation If you use this trace to conduct additional research, please cite it as: Internet Addresses Survey dataset, PREDICT ID USC-LANDER/as_to_org_mapping_inferred_truth-20100507. Traces generated on 2010-05-07. Provided by the USC/LANDER project (http://www.isi.edu/ant/lander). Results Using This Dataset This dataset has been used the following previously published work: • Xue Cai, John Heidemann, Balachander Krishnamurthy, and Walter Willinger. Towards an AS-to-Organization Map. In Proceedings of the ACM Internet Measurement Conference (IMC). Melbourne, Australia, ACM. November, 2010. http://www.isi.edu/~johnh/PAPERS/Cai10c.pdf User Annotations Currently no annotations. Categories: • Datasets • LANDER • LANDER:Datasets