Community Labeling and Sharing of Security and Networking Test datasets (CLASSNET)

Project Description

Community Labeling and Sharing of Security and Networking Test datasets (CLASSNET) will support network and security research with new, labeled, rich and diverse datasets to the research community. The project will develop a framework for collaborative, community-driven enrichment and labeling of data, enabling use of our datasets for machine learning in networking and security. Second, the CLASSNET project will make data available to researchers through multiple methods, ensuring privacy of data while enabling flexible data computation. Finally, the project will also generate diverse continuous (constantly, automatically updated) and curated (selected by human) datasets for research use.

CLASSNET project will innovate in dimensions of data labeling, data distribution and data sources. For data labeling CLASSNET will provide a collaborative framework for low-friction sharing of annotations among researchers. The framework will incentivize labeling with feedback mechanisms and user credits, and support bulk, automatic, algorithmic labeling. For data distribution, CLASSNET will support multiple ways of data access, ranging from downloading anonymized data to processing data in cloud, on provider machines or via the code-to-data approach. Finally, CLASSNET data sources will provide new, diverse, continuous, and curated datasets that are useful for network and security research, including traffic packets and flows, network telescope data, DNS data and Internet topology data.

The immediate impact of this project will include new types of labeled, curated and continuous datasets that enable new security, networking, and ML research and education, impacting a large community.

The broader impact of this data will be to foster research and education will make the Internet safer, more stable, and more secure, and will increase the community’s knowledge about the Internet. With the Internet’s importance for tele-work, tele-medicine, remote learning, e-commerce and e-government, these improvements will have a broad societal impact. In addition, CLASSNET datasets will support data-driven exercises for graduate and undergraduate education, and new PhD research. CLASSNET project’s innovations in multiple pathways to data access, combined with the automated and incentivized enrichment framework, will improve the state-of-the-art for responsible data sharing in related disciplines of information technology.

Data from CLASSNET will be made available to researchers at no cost, and used to support education and research.

Update March 2023: If you want data, please check out our new project web portal.

Support

CLASSNET is supported by NSF/CISE as an NSF CRI-8115780.

CLASSNET is a joint effort of USC/ISI and Merit Network, Inc.

People

  • Wes Hardaker, co-PI on this project, researcher (USC/ISI)
  • John Heidemann, co-PI on this project, project leader and professor (USC/ISI)
  • Jelena Mirkovic, PI on this project, project leader and assistant professor (USC/ISI)
  • Yuri Pradkin, researcher (USC/ISI)

Publications

  • ASM Rizvi, Tingshan Huang, Rasit Esrefoglu and John Heidemann 2024. Anycast Polarization in The Wild. Proceedings of the Passive and Active Measurement Workshop (Virtual Location, Mar. 2024). [PDF] [Dataset] Details
  • G. Moura, W. Hardaker, J. Heidemann and M. Davids 2022. Considerations for Large Authoritative DNS Server Operators. Technical Report 9199. Internet Request For Comments. [DOI] [PDF] Details
  • John Heidemann, Jelena Mirkovic, Wes Hardaker and Michalis Kallitsis 2021. Collecting, Labeling, and Using Networking Data: the Intersection of AI and Networking. NSF Workshop on AI for Networking (Virtual Event, Oct. 2021). [PDF] Details

For related publications, please see the ANT publications web page.

Software

  • dnstapmq Convert dnstap data to message_question format.

See also the see the ANT distribution web page.

Datasets

See the COMUNDA web portal to get datasets.

We also make all datasets available through the ANT dataset page.