Categories
Papers Publications

new conference paper “Who Knocks at the IPv6 Door? Detecting IPv6 Scanning” at ACM IMC 2018

We have published a new paper “Who Knocks at the IPv6 Door? Detecting IPv6 Scanning” by Kensuke Fukuda and John Heidemann, in the ACM Internet Measurements Conference (IMC 2018) in Boston, Mass., USA.

DNS backscatter from IPv4 and IPv6 ([Fukuda18a], figure 1).
From the abstract:

DNS backscatter detects internet-wide activity by looking for common reverse DNS lookups at authoritative DNS servers that are high in the DNS hierarchy. Both DNS backscatter and monitoring unused address space (darknets or network telescopes) can detect scanning in IPv4, but with IPv6’s vastly larger address space, darknets become much less effective. This paper shows how to adapt DNS backscatter to IPv6. IPv6 requires new classification rules, but these reveal large network services, from cloud providers and CDNs to specific services such as NTP and mail. DNS backscatter also identifies router interfaces suggesting traceroute-based topology studies. We identify 16 scanners per week from DNS backscatter using observations from the B-root DNS server, with confirmation from backbone traffic observations or blacklists. After eliminating benign services, we classify another 95 originators in DNS backscatter as potential abuse. Our work also confirms that IPv6 appears to be less carefully monitored than IPv4.

Categories
Announcements Students

congratulations to Liang Zhu for his new PhD

I would like to congratulate Dr. Liang Zhu for defending his PhD in August 2018 and completing his doctoral dissertation “Balancing Security and Performance of Network Request-Response Protocols” in September 2018.

Liang Zhu (left) and John Heidemann, after Liang’s PhD defense.

From the abstract:

The Internet has become a popular tool to acquire information and knowledge. Usually information retrieval on the Internet depends on request-response protocols, where clients and servers exchange data. Despite of their wide use, request-response protocols bring challenges for security and privacy. For example, source-address spoofing enables denial-of-service (DoS) attacks, and eavesdropping of unencrypted data leaks sensitive information in request-response protocols. There is often a trade-off between security and performance in request-response protocols. More advanced protocols, such as Transport Layer Security (TLS), are proposed to solve these problems of source spoofing and eavesdropping. However, developers often avoid adopting those advanced protocols, due to performance costs such as client latency and server memory requirement. We need to understand the trade-off between security and performance for request-response protocols and find a reasonable balance, instead of blindly prioritizing one of them.
This thesis of this dissertation states that it is possible to improve security of network request-response protocols without compromising performance, by protocol and deployment optimizations, that are demonstrated through measurements of protocol developments and deployments. We support the thesis statement through three specific studies, each of which uses measurements and experiments to evaluate the development and optimization of a request-response protocol. We show that security benefits can be achieved with modest performance costs. In the first study, we measure the latency of OCSP in TLS connections. We show that OCSP has low latency due to its wide use of CDN and caching, while identifying certificate revocation to secure TLS. In the second study, we propose to use TCP and TLS for DNS to solve a range of fundamental problems in DNS security and privacy. We show that DNS over TCP and TLS can achieve favorable performance with selective optimization. In the third study, we build a configurable, general-purpose DNS trace replay system that emulates global DNS hierarchy in a testbed and enables DNS experiments at scale efficiently. We use this system to further prove the reasonable performance of DNS over TCP and TLS at scale in the real world.

In addition to supporting our thesis, our studies have their own research contributions. Specifically, In the first work, we conducted new measurements of OCSP by examining network traffic of OCSP and showed a significant improvement of OCSP latency: a median latency of only 20ms, much less than the 291ms observed in prior work. We showed that CDN serves 94% of the OCSP traffic and OCSP use is ubiquitous. In the second work, we selected necessary protocol and implementation optimizations for DNS over TCP/TLS, and suggested how to run a production TCP/TLS DNS server [RFC7858]. We suggested appropriate connection timeouts for DNS operations: 20s at authoritative servers and 60s elsewhere. We showed that the cost of DNS over TCP/TLS can be modest. Our trace analysis showed that connection reuse can be frequent (60%-95% for stub and recursive resolvers). We showed that server memory is manageable (additional 3.6GB for a recursive server), and latency of connection-oriented DNS is acceptable (9%-22% slower than UDP). In the third work, we showed how to build a DNS experimentation framework that can scale to emulate a large DNS hierarchy and replay large traces. We used this experimentation framework to explore how traffic volume changes (increasing by 31%) when all DNS queries employ DNSSEC. Our DNS experimentation framework can benefit other studies on DNS performance evaluations.

Categories
Publications Technical Report

new technical report “Detecting IoT Devices in the Internet (Extended)”

We have released a new technical report “Detecting IoT Devices in the Internet (Extended)” as ISI-TR-726.

ISP-Level Deployment for  26 IoT Device Types. From Figure 2 of [Guo18c].
From the abstract of our technical report:

Distributed Denial-of-Service (DDoS) attacks launched from compromised Internet-of-Things (IoT) devices have shown how vulnerable the Internet is to large-scale DDoS attacks. To understand the risks of these attacks requires learning about these IoT devices: where are they? how many are there? how are they changing? This paper describes three new methods to find IoT devices on the Internet: server IP addresses in traffic, server names in DNS queries, and manufacturer information in TLS certificates. Our primary methods (IP addresses and DNS names) use knowledge of servers run by the manufacturers of these devices. We have developed these approaches with 10 device models from 7 vendors. Our third method uses TLS certificates obtained by active scanning. We have applied our algorithms to a number of observations. Our IP-based algorithms see at least 35 IoT devices on a college campus, and 122 IoT devices in customers of a regional IXP. We apply our DNSbased algorithm to traffic from 5 root DNS servers from 2013 to 2018, finding huge growth (about 7×) in ISPlevel deployment of 26 device types. DNS also shows similar growth in IoT deployment in residential households from 2013 to 2017. Our certificate-based algorithm finds 254k IP cameras and network video recorders from 199 countries around the world.

We make operational traffic we captured from 10 IoT devices we own public at https://ant.isi.edu/datasets/iot/. We also use operational traffic of 21 IoT devices shared by University of New South Wales at http://149.171.189.1/.

This technical report is joint work of Hang Guo and  John Heidemann from USC/ISI.

Categories
Software releases

release of the cryptopANT library for IP address anonymization

cryptopANT v1.0 (stable) has been released (available at https://ant.isi.edu/software/cryptopANT/)

cryptopANT is a C library for IP address anonymization using crypto-PAn algorithm, originally defined by Georgia Tech. The library supports anonymization and de-anonymization (provided you possess a secret key) of IPv4, IPv6, and MAC addresses. The software release includes sample utilities that anonymize IP addresses in text, but we expect most use of the library will be as part of other programs. The Crypto-PAn anonymization scheme was developed by Xu, Fan, Ammar, and Moon at Georgia Tech and described in“Prefix-Preserving IP Address Anonymization”, Computer Networks, Volume 46, Issue 2, 7 October 2004, Pages 253-272, Elsevier. Our library is independent (and not binary compatible) of theirs.

Despite this being the first release as a library, the code has been in use for more than 10 years in other tools.  It had been part of our other software packages, such as dag_scrubber for years.  By popular request, we’re finally releasing it as a separate package.

The library is packaged with an example binary (scramble_ips) that can be used to anonymize text ips.

See also the crypto-PAn page at Georgia Tech here.

Categories
Papers Publications

new conference paper “Detecting ICMP Rate Limiting in the Internet” in PAM 2018

We have published a new conference “Detecting ICMP Rate Limiting in the Internet” in PAM 2018 (the Passive and Active Measurement Conference) in Berlin, Germany.

Figure 4 from [Guo18a] Confirming a block is rate limited with additional probing
Figure 4 from [Guo18a] confirming a bock is rate limited, comparing experimental results with models of rate-limited and non-rate-limited behavior.
From the abstract of our conference paper:

Comparing model and experimental effects of rate limiting (Figure 4 from [Guo18a] )
ICMP active probing is the center of many network measurements. Rate limiting to ICMP traffic, if undetected, could distort measurements and create false conclusions. To settle this concern, we look systematically for ICMP rate limiting in the Internet. We create FADER, a new algorithm that can identify rate limiting from user-side traces with minimal new measurement traffic. We validate the accuracy of FADER with many different network configurations in testbed experiments and show that it almost always detects rate limiting. With this confidence, we apply our algorithm to a random sample of the whole Internet, showing that rate limiting exists but that for slow probing rates, rate-limiting is very rare. For our random sample of 40,493 /24 blocks (about 2% of the responsive space), we confirm 6 blocks (0.02%!) see rate limiting at 0.39 packets/s per block. We look at higher rates in public datasets and suggest that fall-off in responses as rates approach 1 packet/s per /24 block is consistent with rate limiting. We also show that even very slow probing (0.0001 packet/s) can encounter rate limiting of NACKs that are concentrated at a single router near the prober.

Datasets we used in this paper are all public. ISI Internet Census and Survey data (including it71w, it70w, it56j, it57j and it58j census and survey) are available at https://ant.isi.edu/datasets/index.html. ZMap 50-second experiments data are from their WOOT 14 paper and can be obtained from ZMap authors upon request.

This conference report is joint work of Hang Guo and  John Heidemann from USC/ISI.

Categories
Publications Technical Report

new technical report “Back Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)”

We released a new technical report “Back Out: End-to-end Inference of Common Points-of-Failure in the Internet (extended)”, ISI-TR-724, available at https://www.isi.edu/~johnh/PAPERS/Heidemann18b.pdf.

From the abstract:

Clustering (from our event clustering algorithm) of 2014q3 outages from 172/8, showing 7 weeks including the 2014-08-27 Time Warner outage.

Internet reliability has many potential weaknesses: fiber rights-of-way at the physical layer, exchange-point congestion from DDOS at the network layer, settlement disputes between organizations at the financial layer, and government intervention the political layer. This paper shows that we can discover common points-of-failure at any of these layers by observing correlated failures. We use end-to-end observations from data-plane-level connectivity of edge hosts in the Internet. We identify correlations in connectivity: networks that usually fail and recover at the same time suggest common point-of-failure. We define two new algorithms to meet these goals. First, we define a computationally-efficient algorithm to create a linear ordering of blocks to make correlated failures apparent to a human analyst. Second, we develop an event-based clustering algorithm that directly networks with correlated failures, suggesting common points-of-failure. Our algorithms scale to real-world datasets of millions of networks and observations: linear ordering is O(n log n) time and event-based clustering parallelizes with Map/Reduce. We demonstrate them on three months of outages for 4 million /24 network prefixes, showing high recall (0.83 to 0.98) and precision (0.72 to 1.0) for blocks that respond. We also show that our algorithms generalize to identify correlations in anycast catchments and routing.

Datasets from this paper are available at no cost and are listed at https://ant.isi.edu/datasets/outage/, and we expect to release the software for this paper in the coming months (contact us if you are interested).

Categories
Presentations

new talk “Collecting and Visualizing Outages Over the Long Haul” at the AIMS Workshop 2017

John Heidemann gave the talk “Collecting and Visualizing Outages Over the Long Haul” at CAIDA’s Active Internet Measurement (AIMS) Workshop in San Diego, California, USA on March 2, 2017.  Slides are available at http://www.isi.edu/~johnh/PAPERS/Heidemann17b.pdf.
From the abstract:

Unmeasurable blocks over time, a challenge in long-haul outage measurement, from [Alwabel15a]
We have been collecting data about outages in the Internet since Oct. 2014. Our outage detection system, Trinocular, uses active probing from four sites to study about 4 million /24 IPv4 address blocks. Long-duration measurements bring challenges that don’t occur in short observations. Most importantly, our target (“the Internet”) changes as we measure it, as new blocks come on-line, old blocks are reused in different ways, and ISPs observe and sometimes block our traffic. Our measurement platform also sees occasional hardware failures. Visualization can assist detection of these problems, allowing human perception to detect changes in data collection that have not previously been anticipated. This talk will discuss the challenges of long-term outage measurement and describe our new algorithm that scales to support clustering of 4M blocks and 3 months of observations for visualization.
Our visualization is joint work with Yuri Pradkin, and analysis of our long-term outages includes work with Abdulla Alwabel.

This talk draws on work from [Alwabel15a].  Data from this talk is available at https://ant.isi.edu/datasets/outage/, and visualizations can be found at https://ant.isi.edu/outage/browse/.

Categories
Software releases

new software dnsanon_rssac

We have released version 1.3 of dnsanon_rssac on 2016-06-13, a tool that processes DNS data seen in packet captures (typcally pcap format) to generate RSSAC-002 statistics reports.

Our tool is at https://ant.isi.edu/software/dnsanon_rssac/index.html, with a description at
https://ant.isi.edu/software/dnsanon_rssac/README.html .  Our tool builds on dnsanon.

The main goal of our implementation is that partial processing can be done independently and then merged. Merging works both for files captured at different times of the day, or at different anycast sites.

Our software stack has run at B-Root since February 2016, and since May 2016 in production use.

To our knowledge, this tool is the first to implement the RSSAC-002v3 specification.

 

Categories
Papers Publications

new workshop paper “BotDigger: Detecting DGA Bots in a Single Network” in TMA 2016

The paper “BotDigger: Detecting DGA Bots in a Single Network” has appeared at the TMA Workshop on April 8, 2016 in Louvain La Neuve, Belgium (available at http://www.cs.colostate.edu/~hanzhang/papers/BotDigger-TMA16.pdf).

The code of BotDigger is available on GitHub at: https://github.com/hanzhang0116/BotDigger

From the abstract:

To improve the resiliency of communication between bots and C&C servers, bot masters began utilizing Domain Generation Algorithms (DGA) in recent years. Many systems have been introduced to detect DGA-based botnets. However, they suffer from several limitations, such as requiring DNS traffic collected across many networks, the presence of multiple bots from the same botnet, and so forth. These limitations mBotDiggerOverviewake it very hard to detect individual bots when using traffic collected from a single network. In this paper, we introduce BotDigger, a system that detects DGA-based bots using DNS traffic without a priori knowledge of the domain generation algorithm. BotDigger utilizes a chain of evidence, including quantity, temporal and linguistic evidence to detect an individual bot by only monitoring traffic at the DNS servers of a single network. We evaluate BotDigger’s performance using traces from two DGA-based botnets: Kraken and Conflicker. Our results show that BotDigger detects all the Kraken bots and 99.8% of Conficker bots. A one-week DNS trace captured from our university and three traces collected from our research lab are used to evaluate false positives. The results show that the false positive rates are 0.05% and 0.39% for these two groups of background traces, respectively.

The work in this paper is by Han Zhang, Manaf Gharaibeh, Spiros Thanasoulas, and Christos Papadopoulos (Colorado State University).

Categories
Publications Technical Report

new technical report “BotDigger: Detecting DGA Bots in a Single Network”

We have released a new technical report “BotDigger: Detecting DGA Bots in a Single Network”, CS-16-101, available at http://www.cs.colostate.edu/~hanzhang/papers/BotDigger-techReport.pdf

The code of BotDigger is available on GitHub at: https://github.com/hanzhang0116/BotDigger

From the abstract:

To improve the resiliency of communication between bots and C&C servers, bot masters began utilizing Domain Generation Algorithms (DGA) in recent years. Many systems have been introduced to detect DGA-based botnets. However, they suffer from several limitations, such as requiring DNS traffic collected across many networks, the presence of multiple bots from the same botnet, and so forth. BotDiggerOverviewThese limitations make it very hard to detect individual bots when using traffic collected from a single network. In this paper, we introduce BotDigger, a system that detects DGA-based bots using DNS traffic without a priori knowledge of the domain generation algorithm. BotDigger utilizes a chain of evidence, including quantity, temporal and linguistic evidence
to detect an individual bot by only monitoring traffic at the DNS servers of a single network. We evaluate BotDigger’s performance using traces from two DGA-based botnets: Kraken and Conflicker. Our results show that BotDigger detects all the Kraken bots and 99.8% of Conficker bots. A one-week DNS trace captured from our university and three traces collected from our research lab are used to evaluate false positives. The results show that the false positive rates are 0.05% and 0.39% for these two groups of background traces, respectively.

This work is by Han Zhang, Manaf Gharaibeh, Spiros Thanasoulas and Christos Papadopoulos (Colorado State University).