Categories
Presentations

new talk “New Opportunities for Research and Experiments in Internet Naming And Identification” at the AIMS Workshop

John Heidemann gave the talk “New Opportunities for Research and Experiments in Internet Naming And Identification” at the AIMS 2016 workshop at CAIDA, La Jolla, California on February 11, 2016.  Slides are available at http://www.isi.edu/~johnh/PAPERS/Heidemann16a.pdf.

Needs for new naming and identity research prompt new research infrastructure, enabling new research directions.
Needs for new naming and identity research prompt new research infrastructure, enabling new research directions.

From the abstract:

DNS is central to Internet use today, yet research on DNS today is challenging: many researchers find it challenging to create realistic experiments at scale and representative of the large installed base, and datasets are often short (two days or less) or otherwise limited. Yes DNS evolution presses on: improvements to privacy are needed, and extensions like DANE provide an opportunity for DNS to improve security and support identity management. We exploring how to grow the research community and enable meaningful work on Internet naming. In this talk we will propose new research infrastructure to support to realistic DNS experiments and longitudinal data studies. We are looking for feedback on our proposed approaches and input about your pressing research problems in Internet naming and identification.

For more information see our project website.

Categories
Publications Technical Report

new technical report “BotDigger: Detecting DGA Bots in a Single Network”

We have released a new technical report “BotDigger: Detecting DGA Bots in a Single Network”, CS-16-101, available at http://www.cs.colostate.edu/~hanzhang/papers/BotDigger-techReport.pdf

The code of BotDigger is available on GitHub at: https://github.com/hanzhang0116/BotDigger

From the abstract:

To improve the resiliency of communication between bots and C&C servers, bot masters began utilizing Domain Generation Algorithms (DGA) in recent years. Many systems have been introduced to detect DGA-based botnets. However, they suffer from several limitations, such as requiring DNS traffic collected across many networks, the presence of multiple bots from the same botnet, and so forth. BotDiggerOverviewThese limitations make it very hard to detect individual bots when using traffic collected from a single network. In this paper, we introduce BotDigger, a system that detects DGA-based bots using DNS traffic without a priori knowledge of the domain generation algorithm. BotDigger utilizes a chain of evidence, including quantity, temporal and linguistic evidence
to detect an individual bot by only monitoring traffic at the DNS servers of a single network. We evaluate BotDigger’s performance using traces from two DGA-based botnets: Kraken and Conflicker. Our results show that BotDigger detects all the Kraken bots and 99.8% of Conficker bots. A one-week DNS trace captured from our university and three traces collected from our research lab are used to evaluate false positives. The results show that the false positive rates are 0.05% and 0.39% for these two groups of background traces, respectively.

This work is by Han Zhang, Manaf Gharaibeh, Spiros Thanasoulas and Christos Papadopoulos (Colorado State University).

Categories
Publications Technical Report

new technical report “Assessing Co-Locality of IP Blocks”

We have released a new technical report “Assessing Co-Locality of IP Blocks”, CSU TR15-103, available at http://www.cs.colostate.edu/TechReports/Reports/2015/tr15-103.pdf.

From the abstract:

isi_all_blocks_clustersCount_CDF
CDF of number of clusters per block, suggesting the number of potential multi-location blocks. (Figure 2 from [Gharaibeh15a].)

Many IP Geolocation services and applications assume that all IP addresses with the same /24 IPv4 prefix (a /24 block) are in the same location. For blocks that contain addresses in very different locations (such blocks identifying network backbones), this assumption can result in large geolocation error. This paper evaluates this assumption using a large dataset of 1.41M /24 blocks extracted from a delay measurements dataset for the entire
responsive IPv4 address space. We use hierarchal clustering to find clusters of IP addresses with similar observed delay measurements within /24 blocks. Blocks with multiple clusters often span different geographic locations. We evaluate this claim against two ground-truth datasets, confirming that 93% of identified multi-cluster blocks are true positives with multiple locations, while only 13% of blocks identified as single-cluster appear to be multi-location in ground truth. Applying the clustering process to the whole dataset suggests that about 17% (247K) of blocks are likely multi-location.

This work is by Manaf Gharaibeh, Han Zhang, Christos Papadopoulos (Colorado State University), and John Heidemann (USC/ISI). The datasets used in this work are new analysis of an existing geolocation dataset as collected by Hu et al. (http://www.isi.edu/~johnh/PAPERS/Hu12a.pdf).  These source datasets are available upon request from http://www.predict.org and via our website, and we expect trial datasets in our new work to also be available there and through PREDICT by the end of 2015.

Categories
Papers Publications

new conference paper “Detecting Malicious Activity with DNS Backscatter”

The paper “Detecting Malicious Activity with DNS Backscatter” will appear at the ACM Internet Measurements Conference in October 2015 in Tokyo, Japan.  A copy is available at http://www.isi.edu/~johnh/PAPERS/Fukuda15a.pdf).

How newtork activity generates DNS backscatter that is visible at authority servers. (Figure 1 from [Fukuda15a]).
How newtork activity generates DNS backscatter that is visible at authority servers. (Figure 1 from [Fukuda15a]).
From the abstract:

Network-wide activity is when one computer (the originator) touches many others (the targets). Motives for activity may be benign (mailing lists, CDNs, and research scanning), malicious (spammers and scanners for security vulnerabilities), or perhaps indeterminate (ad trackers). Knowledge of malicious activity may help anticipate attacks, and understanding benign activity may set a baseline or characterize growth. This paper identifies DNS backscatter as a new source of information about network-wide activity. Backscatter is the reverse DNS queries caused when targets or middleboxes automatically look up the domain name of the originator. Queries are visible to the authoritative DNS servers that handle reverse DNS. While the fraction of backscatter they see depends on the server’s location in the DNS hierarchy, we show that activity that touches many targets appear even in sampled observations. We use information about the queriers to classify originator activity using machine-learning. Our algorithm has reasonable precision (70-80%) as shown by data from three different organizations operating DNS servers at the root or country-level. Using this technique we examine nine months of activity from one authority to identify trends in scanning, identifying bursts corresponding to Heartbleed and broad and continuous scanning of ssh.

The work in this paper is by Kensuke Fukuda (NII/Sokendai) and John Heidemann (USC/ISI) and was begun when Fukuda-san was a visiting scholar at USC/ISI.  Kensuke Fukuda’s work in this paper is partially funded by Young Researcher Overseas Visit Program by Sokendai, JSPS Kakenhi, and the Strategic International Collaborative R&D Promotion Project of the Ministry of Internal Affairs and Communication in Japan, and by the European Union Seventh Framework Programme.  John Heidemann’s work is partially supported by US DHS S&T, Cyber Security division.

Some of the datasets in this paper are available to researchers, either from the authors or through DNS-OARC.  We list DNS backscatter datasets and methods to obtain them at https://ant.isi.edu/datasets/dns_backscatter/index.html.

 

Categories
Students

congratulations to Xun Fan for his new PhD

I would like to congratulate Dr. Xun Fan for defending his PhD in May 2015 and completing his doctoral dissertation “Enabling Efficient Service Enumeration Through Smart Selection of Measurements” in July 2015.

Xun Fan (left) and John Heidemann, after Xun's PhD defense.
Xun Fan (left) and John Heidemann, after Xun’s PhD defense.

From the abstract:

The Internet is becoming more and more important in our daily lives. Both the government and industry invest in the growth of the Internet, bringing more users to the world of networks. As the Internet grows, researchers and operators need to track and understand the behavior of global Internet services to achieve smooth operation. Active measurements are often used to study behavior of large Internet service, and efficient service enumeration is required. For example, studies of Internet topology may need active probing to all visible network prefixes; monitoring large replicated service requires periodical enumeration of all service replicas. To achieve efficient service enumeration, it is important to select probing sources and destinations wisely. However, there are challenges for making smart selection of probing sources and destinations. Prior methods to select probing destinations are either inefficient or hard to maintain. Enumerating replicas of large Internet services often requires many widely distributed probing sources. Current measurement platforms don’t have enough probing sources to approach complete enumeration of large services.

This dissertation makes the thesis statement that smart selection of probing sources and destinations enables efficient enumeration of global Internet services to track and understand their behavior. We present three studies to demonstrate this thesis statement. First, we propose new automated approach to generate a list of destination IP addresses that enables efficient enumeration of Internet edge links. Second, we show that using large number of widely distributed open resolvers enables efficient enumeration of anycast nodes which helps study abnormal behavior of anycast DNS services. In our last study, we efficiently enumerate Front-End (FE) Clusters of Content Delivery Networks (CDNs) and use the efficient enumeration to track and understand the dynamics of user-to-FE Cluster mapping of large CDNs. We achieve the efficient enumeration of CDN FE Clusters by selecting probing sources from a large set of open resolvers. Our selected probing sources have smaller number of open resolvers but provide same coverage on CDN FE Cluster as the larger set.

In addition to our direct results, our work has also been used by several published studies to track and understand the behavior of Internet and large network services. These studies not only support our thesis as additional examples but also suggest this thesis can further benefit many other studies that need efficient service enumeration to track and understand behavior of global Internet services.

Categories
Papers Publications

new workshop paper “Privacy Principles for Sharing Cyber Security Data” in IWPE 15

The paper “Privacy Principles for Sharing Cyber Security Data” (available at https://www.isi.edu/~calvin/papers/Fisk15a.pdf) will appear at the International Workshop on Privacy Engineering (co-located with IEEE Symposium on Security and Privacy) on May 21, 2015 in San Jose, California.

From the abstract:

Sharing cyber security data across organizational boundaries brings both privacy risks in the exposure of personal information and data, and organizational risk in disclosing internal information. These risks occur as information leaks in network traffic or logs, and also in queries made across organizations. They are also complicated by the trade-offs in privacy preservation and utility present in anonymization to manage disclosure. In this paper, we define three principles that guide sharing security information across organizations: Least Disclosure, Qualitative Evaluation, and Forward Progress. We then discuss engineering approaches that apply these principles to a distributed security system. Application of these principles can reduce the risk of data exposure and help manage trust requirements for data sharing, helping to meet our goal of balancing privacy, organizational risk, and the ability to better respond to security with shared information.

The work in the paper is by Gina Fisk (LANL), Calvin Ardi (USC/ISI), Neale Pickett (LANL), John Heidemann (USC/ISI), Mike Fisk (LANL), and Christos Papadopoulos (Colorado State). This work is supported by DHS S&T, Cyber Security division.

Categories
Papers Publications

new workshop paper “Assessing Affinity Between Users and CDN Sites” in TMA 2015

The paper “Assessing Affinity Between Users and CDN Sites” (available at http://www.isi.edu/~xunfan/research/Fan15a.pdf) will appear at the Traffic Monitoring and Analysis Workshop in April 2015 in Barcelona, Spain.

From the abstract:

count_cid_per_clientLarge web services employ CDNs to improve user performance. CDNs improve performance by serving users from nearby FrontEnd (FE) Clusters. They also spread users across FE Clusters when one is overloaded or unavailable and others have unused capacity. Our paper is the first to study the dynamics of the user-to-FE Cluster mapping for Google and Akamai from a large range of client prefixes. We measure how 32,000 prefixes associate with FE Clusters in their CDNs every 15 minutes for more than a month. We study geographic and latency effects of mapping changes, showing that 50–70% of prefixes switch between FE Clusters that are very distant from each other (more than 1,000 km), and that these shifts sometimes (28–40% of the time) result in large latency shifts (100 ms or more). Most prefixes see large latencies only briefly, but a few (2–5%) see high latency much of the time. We also find that many prefixes are directed to several countries over the course of a month, complicating questions of jurisdiction.

Citation: Xun Fan, Ethan Katz-Bassett and John Heidemann.Assessing Affinity Between Users and CDN Sites. To appear in Traffic Monitoring and Analysis Workshop. Barcelona, Spain. April, 2015.

All data in this paper is available to researchers at no cost on request. Please see our CDN affinity dataset webpage.

This research is partially sponsored by the Department of Homeland Security (DHS) Science and Technology Directorate, HSARPA, Cyber Security Division, BAA 11-01-RIKA and Air Force Re-search Laboratory, Information Directorate under agreement number FA8750-12-2-0344, NSF CNS-1351100, and via SPAWAR Systems Center Pacific under Contract No. N66001-13-C-3001. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwith-standing any copyright notation thereon. The views contained herein are those of the authors and
do not necessarily represent those of DHS or the U.S. Government.

 

Categories
Papers Publications

new workshop paper “Measuring DANE TLSA Deployment” in TMA 2015

The paper “Measuring DANE TLSA Deployment” will appear at the Traffic Monitoring and Analysis Workshop in April 2015 in Barcelona, Spain (available at http://www.isi.edu/~liangzhu/papers/dane_tlsa.pdf).

From the abstract:

The DANE (DNS-based Authentication of Named Entities) framework uses DNSSEC to provide a source of trust, and with TLSA it can serve as a root of trust for TLS certificates. This serves to complement traditional certificate authentication methods, which is important given the risks inherent in trusting hundreds of organizations—risks already demonstrated with multiple compromises. The TLSA protocol was published in 2012, and this paper presents the first systematic study of its deployment. We studied TLSA usage, developing a tool that actively probes all signed zones in .com and .net for TLSA records. We find the TLSA use is early: in our latest measurement, of the 485k signed zones, we find only 997 TLSA names. We characterize how it is being used so far, and find that around 7–13% of TLSA records are invalid. We find 33% of TLSA responses are larger than 1500 Bytes and will very likely be fragmented.

The work in the paper is by Liang Zhu (USC/ISI), Duane Wessels and Allison Mankin (both of Verisign Labs), and John Heidemann (USC/ISI).

Categories
Presentations

new talk “Internet Populations (Good and Bad): Measurement, Estimation, and Correlation” at the ICERM Workshop on Cybersecurity

John Heidemann gave the talk “Internet Populations (Good and Bad): Measurement, Estimation, and Correlation” at the ICERM Workshop on Cybersecurity at Brown University, Providence, Rhode Island on October 22, 2014. Slides are available at http://www.isi.edu/~johnh/PAPERS/Heidemann14e/.

Can we improve the mathematical tools we use to measure and understand the Internet?
Can we improve the mathematical tools we use to measure and understand the Internet?

From the abstract:

Our research studies the Internet’s public face. Since 2006 we have been taking censuses of the Internet address space (pinging all IPv4 addresses) every 3 months. Since 2012 we have studied network outages and events like Hurricane Sandy, using probes of much of the Internet every 11 minutes. Most recently we have evaluated the diurnal Internet, finding countries where most people turn off their computers at night. Finally, we have looked at network reputation, identifying how spam generation correlates with network location, and others have studies multiple measurements of “network reputation”.

A common theme across this work is one must estimate characteristics of the edge of the Internet in spite of noisy measurements and a underlying changes. One also need to compare and correlate these imperfect measurements with other factors (from GDP to telecommunications policies).

How do these applications relate to the mathematics of census taking and measurement, estimation, and correlation? Are there tools we should be using that we aren’t? Do the properties of the Internet suggest new approaches (for example where rapid full enumeration is possible)? Does correlation and estimates of network “badness” help us improve cybersecurity by treating different parts of the network differently?

Categories
Presentations

new animation “Watching the Internet Sleep”

Does the Internet sleep? Yes, and we have the video!

We have recently put together a video showing 35 days of Internet address usage as observed from Trinocular, our outage detection system.

The Internet sleeps: address use in South America is low (blue) in the early morning, while India is high (red) in afternoon.
The Internet sleeps: address use in South America is low (blue) in the early morning, while India is high (red) in afternoon.

The Internet sleeps: address use in South America is low (blue) in the early morning, while India is high (red) in afternoon.  When we look at address usage over time, we see that some parts of the globe have daily swings of +/-10% to 20% in the number of active addresses. In China, India, eastern Europe and much of South America, the Internet sleeps.

Understanding when the Internet sleeps is important to understand how different country’s network policies affect use, it is part of outage detection, and it is a piece of improving our long-term goal of understanding exactly how big the Internet is.

See http://www.isi.edu/ant/diurnal/ for the video, or read our technical paper “When the Internet Sleeps: Correlating Diurnal Networks With External Factors” by Quan, Heidemann, and Pradkin, to appear at ACM IMC, Nov. 2014.

Datasets (listed here) used in generating this video are available.

This work is partly supported by DHS S&T, Cyber Security division, agreement FA8750-12-2-0344 (under AFRL) and N66001-13-C-3001 (under SPAWAR).  The views contained
herein are those of the authors and do not necessarily represent those of DHS or the U.S. Government.  This work was classified by USC’s IRB as non-human subjects research (IIR00001648).