In October 2012, Hurricane Sandy made landfall on the U.S. East Coast causing widespread power outages. We were able to see the effects of Hurricane Sandy by analyzing active probing of the Internet. We first reported this work in a technical report and then with more refined analysis in a peer-reviewed paper.
These 4 days before landfall and 7 after show some intersting results: On the day of landfall we see about three-times the number of outages relative to “typical” U.S. networks. Finally, we see it takes about four days to recover back to typical conditions.
This analysis uses dataset usc-lander / internet_address_survey_reprobing_it50j, available for research use from PREDICT, or by request from us if PREDICT access is not possible.
This animation was first shown at the Dec. 2014 DHS Cyber Security Division R&D Showcase and Technical Workshop as part of the talk “Towards Understanding Internet Reliability” given by John Heidemann. This work was supported by DHS, most recently through the LACREND project.
The paper “BotTalker: Generating Encrypted, Customizable C&C Traces” will appear at the 14th annual IEEE Symposium on Technologies for Homeland Security (HST ’15) in April 2015 (available at http://www.cs.colostate.edu/~zhang/papers/BotTalker.pdf)
From the abstract:
Encrypted botnets have seen an increasinguse in recent years. To enable research in detecting encrypted botnets researchers need samples of encrypted botnet traces with ground truth, which are very hard to get. Traces that are available are not customizable, which prevents testing under various controlled scenarios. To address this problem we introduce BotTalker, a tool that can be used to generate customized encrypted botnet communication traffic. BotTalker emulates the actions a bot would take to encrypt communication. It includes a highly configurable encrypted-traffic converter along with real, non- encrypted bot traces and background traffic. The converter is able to convert non-encrypted botnet traces into encrypted ones by providing customization along three dimensions: (a) selection of real encryption algorithm, (b) flow or packet level conversion, SSL emulation and (c) IP address substitution. To the best of our knowledge, BotTalk is the first work that provides users customized encrypted botnet traffic. In the paper we also apply BotTalker to evaluate the damage result from encrypted botnet traffic on a widely used botnet detection system – BotHunter and two IDS’ – Snort and Suricata. The results show that encrypted botnet traffic foils bot detection in these systems.
This work is advised by Christos Papadopoulos and supported by LACREND.
We’ve been taking Internet IPv4 censuses regularly since 2006. In each census, we probe the entire allocated IPv4 address space. You may browse 8 years of data at our IPv4 address browser.
These eight years show some interesting events, from an early “open” Internet in 2006, to full allocation of IPv4 by ICANN in 2011, to higher utilization in 2014.
This animation was first shown at the Dec. 2014 DHS Cyber Security Division R&D Showcase and Technical Workshop as part of the talk “Towards Understanding Internet Reliability” given by John Heidemann. This work was supported by DHS, most recently through the LACREND project.
The paper “Measuring DANE TLSA Deployment” will appear at the Traffic Monitoring and Analysis Workshop in April 2015 in Barcelona, Spain (available at http://www.isi.edu/~liangzhu/papers/dane_tlsa.pdf).
From the abstract:
The DANE (DNS-based Authentication of Named Entities) framework uses DNSSEC to provide a source of trust, and with TLSA it can serve as a root of trust for TLS certificates. This serves to complement traditional certificate authentication methods, which is important given the risks inherent in trusting hundreds of organizations—risks already demonstrated with multiple compromises. The TLSA protocol was published in 2012, and this paper presents the first systematic study of its deployment. We studied TLSA usage, developing a tool that actively probes all signed zones in .com and .net for TLSA records. We find the TLSA use is early: in our latest measurement, of the 485k signed zones, we find only 997 TLSA names. We characterize how it is being used so far, and find that around 7–13% of TLSA records are invalid. We find 33% of TLSA responses are larger than 1500 Bytes and will very likely be fragmented.
The work in the paper is by Liang Zhu (USC/ISI), Duane Wessels and Allison Mankin (both of Verisign Labs), and John Heidemann (USC/ISI).
Digit-1.1 has been released (available at https://ant.isi.edu/software/tdns/index.html). Digit is a DNS client side tool that can perform DNS queries via different protocols such as UDP, TCP, TLS. This tool is primarily designed to evaluate the client side latency of using DNS over TCP/TLS, as described in the technical report “T-DNS: Connection-Oriented DNS to Improve Privacy and Security” (http://www.isi.edu/~johnh/PAPERS/Zhu14b/index.html).
A README in the package has detailed instructions about how to use this software.
Our research studies the Internet’s public face. Since 2006 we have been taking censuses of the Internet address space (pinging all IPv4 addresses) every 3 months. Since 2012 we have studied network outages and events like Hurricane Sandy, using probes of much of the Internet every 11 minutes. Most recently we have evaluated the diurnal Internet, finding countries where most people turn off their computers at night. Finally, we have looked at network reputation, identifying how spam generation correlates with network location, and others have studies multiple measurements of “network reputation”.
A common theme across this work is one must estimate characteristics of the edge of the Internet in spite of noisy measurements and a underlying changes. One also need to compare and correlate these imperfect measurements with other factors (from GDP to telecommunications policies).
How do these applications relate to the mathematics of census taking and measurement, estimation, and correlation? Are there tools we should be using that we aren’t? Do the properties of the Internet suggest new approaches (for example where rapid full enumeration is possible)? Does correlation and estimates of network “badness” help us improve cybersecurity by treating different parts of the network differently?
We have recently put together a video showing 35 days of Internet address usage as observed from Trinocular, our outage detection system.
The Internet sleeps: address use in South America is low (blue) in the early morning, while India is high (red) in afternoon. When we look at address usage over time, we see that some parts of the globe have daily swings of +/-10% to 20% in the number of active addresses. In China, India, eastern Europe and much of South America, the Internet sleeps.
Understanding when the Internet sleeps is important to understand how different country’s network policies affect use, it is part of outage detection, and it is a piece of improving our long-term goal of understanding exactly how big the Internet is.
This work is partly supported by DHS S&T, Cyber Security division, agreement FA8750-12-2-0344 (under AFRL) and N66001-13-C-3001 (under SPAWAR). The views contained
herein are those of the authors and do not necessarily represent those of DHS or the U.S. Government. This work was classified by USC’s IRB as non-human subjects research (IIR00001648).
The differences in the census are small, as one would hope, since it’s a global Internet. However, when we look at latency (the time it takes for an IP address to reply to our requests), Greece gives us a European view.
Compare the lower-left corner of the Internet, since that is European IPv4 address space:
In addition to big thanks to George Xylomenos and George Polyzos of AUEB (σας ευχαριστώ!) and AUEB for institutional funding for this work. We also thank Christos Papadopoulos (Colorado State) for helping with many details, and Colin Perkins (U. Glasgow) for discussions about potential European hosts.
Data from our Greece census is available to researchers at no cost on the same terms as our existing census data. See our datasets page for details. Greek data starts with it61 as of 2014-08-29.
The DANE (DNS-based Authentication of Named Entities) framework uses DNSSEC to provide a source of trust, and with TLSA it can serve as a root of trust for TLS certificates. This alternative to traditional certificate authorities is important given the risks inherent in trusting hundreds of organizations—risks already demonstrated with multiple compromises. The TLSA protocol was published in 2012, and this talk presents the first systematic study of its deployment. We studied TLSA usage, developing a tool that actively probes all signed zones in .com and .net for TLSA records. We find the TLSA use is early: in our latest measurement, of the 461k signed zones, we find only 701 TLSA names. We characterize how it is being used so far, and find that around 7–12% of TLSA records are invalid. We find 31% of TLSA responses are larger than 1500 Bytes and get IP fragmented.
The work in the talk is by Liang Zhu (USC/ISI), Duane Wessels and Allison Mankin (both of Verisign), and John Heidemann (USC/ISI).
The paper “When the Internet Sleeps: Correlating Diurnal Networks With External Factors” will appear at ACM Internet Measurements Conference 2014 in Vancouver, Canada (available at http://www.isi.edu/~johnh/PAPERS/Quan14c/ with cite and pdf, or direct pdf).
From the abstract:
As the Internet matures, policy questions loom larger in its operation. When should an ISP, city, or government invest in infrastructure? How do their policies affect use? In this work, we develop a new approach to evaluate how policies, economic conditions and technology correlates with Internet use around the world. First, we develop an adaptive and accurate approach to estimate block availability, the fraction of active IP addresses in each /24 block over short timescales (every 11 minutes). Our estimator provides a new lens to interpret data taken from existing long-term outage measurements, thus requiring no additional traffic. (If new collection was required, it would be lightweight, since on average, outage detection requires less than 20 probes per hour per /24 block; less than 1% of background radiation.) Second, we show that spectral analysis of this measure can identify diurnal usage: blocks where addresses are regularly used during part of the day and idle in other times. Finally, we analyze data for the entire responsive Internet (3.7M /24 blocks) over 35 days. These global observations show when and where the Internet sleeps—networks are mostly always-on in the US and Western Europe, and diurnal in much of Asia, South America, and Eastern Europe. ANOVA (Analysis of Variance) testing shows that diurnal networks correlate negatively with country GDP and electrical consumption, quantifying that national policies and economics relate to networks.
All data in this paper is available to researchers at no cost, and source code to our analysis tools is available on request; see our diurnal datasets webpage.
This work is partly supported by DHS S&T, Cyber Security division, agreement FA8750-12-2-0344 (under AFRL) and N66001-13-C-3001 (under SPAWAR). The views contained
herein are those of the authors and do not necessarily represent those of DHS or the U.S. Government. This work was classified by USC’s IRB as non-human subjects research (IIR00001648).