Categories
Papers Publications

new workshop paper “Measuring DANE TLSA Deployment” in TMA 2015

The paper “Measuring DANE TLSA Deployment” will appear at the Traffic Monitoring and Analysis Workshop in April 2015 in Barcelona, Spain (available at http://www.isi.edu/~liangzhu/papers/dane_tlsa.pdf).

From the abstract:

The DANE (DNS-based Authentication of Named Entities) framework uses DNSSEC to provide a source of trust, and with TLSA it can serve as a root of trust for TLS certificates. This serves to complement traditional certificate authentication methods, which is important given the risks inherent in trusting hundreds of organizations—risks already demonstrated with multiple compromises. The TLSA protocol was published in 2012, and this paper presents the first systematic study of its deployment. We studied TLSA usage, developing a tool that actively probes all signed zones in .com and .net for TLSA records. We find the TLSA use is early: in our latest measurement, of the 485k signed zones, we find only 997 TLSA names. We characterize how it is being used so far, and find that around 7–13% of TLSA records are invalid. We find 33% of TLSA responses are larger than 1500 Bytes and will very likely be fragmented.

The work in the paper is by Liang Zhu (USC/ISI), Duane Wessels and Allison Mankin (both of Verisign Labs), and John Heidemann (USC/ISI).

Categories
Papers Publications

new conference paper “When the Internet Sleeps: Correlating Diurnal Networks With External Factors” in IMC 2014

The paper “When the Internet Sleeps: Correlating Diurnal Networks With External Factors” will appear at ACM Internet Measurements Conference 2014 in Vancouver, Canada (available at http://www.isi.edu/~johnh/PAPERS/Quan14c/ with cite and pdf, or direct pdf).

Predicting longitude from observed diurnal phase ([Quan14c], figure 14c)
Predicting longitude from observed diurnal phase for 287k geolocatable, diurnal blocks ([Quan14c], figure 14c)
From the abstract:

As the Internet matures, policy questions loom larger in its operation. When should an ISP, city, or government invest in infrastructure? How do their policies affect use? In this work, we develop a new approach to evaluate how policies, economic conditions and technology correlates with Internet use around the world. First, we develop an adaptive and accurate approach to estimate block availability, the fraction of active IP addresses in each /24 block over short timescales (every 11 minutes). Our estimator provides a new lens to interpret data taken from existing long-term outage measurements, thus requiring no additional traffic. (If new collection was required, it would be lightweight, since on average, outage detection requires less than 20 probes per hour per /24 block; less than 1% of background radiation.) Second, we show that spectral analysis of this measure can identify diurnal usage: blocks where addresses are regularly used during part of the day and idle in other times. Finally, we analyze data for the entire responsive Internet (3.7M /24 blocks) over 35 days. These global observations show when and where the Internet sleeps—networks are mostly always-on in the US and Western Europe, and diurnal in much of Asia, South America, and Eastern Europe. ANOVA (Analysis of Variance) testing shows that diurnal networks correlate negatively with country GDP and electrical consumption, quantifying that national policies and economics relate to networks.

Citation: Lin Quan, John Heidemann, and Yuri Pradkin. When the Internet Sleeps: Correlating Diurnal Networks With External Factors. In Proceedings of the ACM Internet Measurement Conference, p. to appear. Vancouver, BC, Canada, ACM. November, 2014.

All data in this paper is available to researchers at no cost, and source code to our analysis tools is available on request; see our diurnal datasets webpage.

This work is partly supported by DHS S&T, Cyber Security division, agreement FA8750-12-2-0344 (under AFRL) and N66001-13-C-3001 (under SPAWAR).  The views contained
herein are those of the authors and do not necessarily represent those of DHS or the U.S. Government.  This work was classified by USC’s IRB as non-human subjects research (IIR00001648).

Categories
Publications Technical Report

new technical report “Web-scale Content Reuse Detection (extended)”

We released a new technical report “Web-scale Content Reuse Detection (extended)”, ISI-TR-2014-692, available at http://www.isi.edu/publications/trpublic/files/tr-692.pdf.

From the abstract:

Discovering the amount of chunk-level duplication in Geocities (2008/2009, 97M chunks, Fig. 11).
Discovering the amount of chunk-level duplication in Geocities (2008/2009, 97M chunks, Fig. 11).

With the vast amount of accessible, online content, it is not surprising that unscrupulous entities “borrow” from the web to provide filler for advertisements, link farms, and spam and make a quick profit. Our insight is that cryptographic hashing and fingerprinting can efficiently identify content reuse for web-size corpora. We develop two related algorithms, one to automatically discover previously unknown duplicate content in the web, and the second to detect copies of discovered or manually identified content in the web. Our detection can also bad neighborhoods, clusters of pages where copied content is frequent. We verify our approach with controlled experiments with two large datasets: a Common Crawl subset the web, and a copy of Geocities, an older set of user-provided web content. We then demonstrate that we can discover otherwise unknown examples of duplication for spam, and detect both discovered and expert-identified content in these large datasets. Utilizing an original copy of Wikipedia as identified content, we find 40 sites that reuse this content, 86% for commercial benefit.

Categories
Publications Technical Report

new technical report “T-DNS: Connection-Oriented DNS to Improve Privacy and Security (extended)”

We released a new technical report “T-DNS: Connection-Oriented DNS to Improve Privacy and Security (extended)”, ISI-TR-2014-693, available as http://www.isi.edu/~johnh/PAPERS/Zhu14b.pdf

From the abstract:resp_cdf_diff_key_all

DNS is the canonical protocol for connectionless UDP. Yet DNS today is challenged by eavesdropping that compromises privacy, source-address spoofing that results in denial-of-service (DoS) attacks on the server and third parties, injection attacks that exploit fragmentation, and size limitations that constrain policy and operational choices. We propose T-DNS to address these problems. It uses TCP to smoothly support large payloads and to mitigate spoofing and amplification for DoS. T-DNS uses transport-layer security (TLS) to provide privacy from users to their DNS resolvers and optionally to authoritative servers. Expectations about DNS suggest connections will balloon client latency and overwhelm server with state, but our evaluation shows costs are modest: end-to-end latency from TLS to the recursive resolver is only about 9% slower when UDP is used to the authoritative server, and 22% slower with TCP to the authoritative. With diverse traces we show that frequent connection reuse is possible (60–95% for stub and recursive resolvers, although half that for authoritative servers), and after connection establishment, we show TCP and TLS latency is equivalent to UDP. With conservative timeouts (20 s at authoritative servers and 60 s elsewhere) and conservative estimates of connection state memory requirements, we show that server memory requirements match current hardware: a large recursive resolver may have 24k active connections requiring about 3.6 GB additional RAM. We identify the key design and implementation decisions needed to minimize overhead: query pipelining, out-of-order responses, TLS connection resumption, and plausible timeouts.

This paper is a major revision of the prior technical report ISI-TR-2014-688. Since that work we have improved our understanding of the availability of TCP fast open and TLS resumption, and we have tightened our estimates on memory based on external reports (section 5.2). This additional information has allowed us to conduct additional experiments, improve our modeling, and provide a more accurate view of what is possible today; our estimates of latency and memory consumption are both lower than in our prior technical report as a result. We have also added additional information about packet size limitations (Figure 2), experiments evaluating DNSCrypt/DNSCurve (section 6.1), analysis of DTLS, and covered a broader range of RTTs in our experiments. We believe these additions strengthen our central claims: that connectionless DNS causes multiple problems and that T-DNS addresses those problems with modest increase in latency and memory suitable for current hardware.

Categories
Publications Technical Report

new technical report “When the Internet Sleeps: Correlating Diurnal Networks With External Factors (extended)”

We released a new technical report “When the Internet Sleeps: Correlating Diurnal Networks With External Factors (extended)”, ISI-TR-2014-691, by Lin Quan, John Heidemann, and Yuri Pradkin, available as http://www.isi.edu/~johnh/PAPERS/Quan14b.
pdf

Comparing observed diurnal phase and geolocation longitude for 287k geolocatable, diurnal blocks ([Quan14b], figure 14b)
Comparing observed diurnal phase and geolocation longitude for 287k geolocatable, diurnal blocks ([Quan14b], figure 14b)
From the abstract:

As the Internet matures, policy questions loom larger in its operation. When should an ISP, city, or government invest in infrastructure? How do their policies affect use? In this work, we develop a new approach to evaluate how policies, economic conditions and technology correlates with Internet use around the world. First, we develop an adaptive and accurate approach to estimate block availability, the fraction of active IP addresses in each /24 block over short timescales (every 11 minutes). Our estimator provides a new lens to interpret data taken from existing long-term outage measurements, this requiring no no additional traffic. (If new collection was required, it would be lightweight, since on average, outage detection requires less than 20 probes per hour per /24 block; less than 1% of background radiation.) Second, we show that spectral analysis of this measure can identify diurnal usage: blocks where addresses are regularly used during part of the day and idle in other times. Finally, we analyze data for the entire responsive Internet (3.7M /24 blocks) over 35 days. These global observations show when and where the Internet sleeps—networks are mostly always-on in the US and Western Europe, and diurnal in much of Asia, South America, and Eastern Europe. ANOVA testing shows that diurnal networks correlate negatively with country GDP and electrical consumption, quantifying that national policies and economics relate to networks.

Data from this paper is available from http://www.isi.edu/ant/traces/internet_otuages/index.html, and from http://www.predict.org as dataset internet_outage_adaptive_a12w-20130424.

Categories
Publications Technical Report

new technical report “T-DNS: Connection-Oriented DNS to Improve Privacy and Security”

We released a new technical report “T-DNS: Connection-Oriented DNS to Improve Privacy and Security”, ISI-TR-2014-688, available as http://www.isi.edu/~johnh/PAPERS/Zhu14a.pdf

 

From the abstract:sim_hit_server_median_all

This paper explores connection-oriented DNS to improve DNS security and privacy. DNS is the canonical example of a connectionless, single packet, request/response protocol, with UDP as its dominant transport. Yet DNS today is challenged by eavesdropping that compromises privacy, source-address spoofing that results in denial-of-service (DoS) attacks on the server and third parties, injection attacks that exploit fragmentation, and size limitations that constrain policy and operational choices. We propose t-DNS to address these problems: it combines TCP to smoothly support large payloads and mitigate spoofing and amplification for DoS. T-DNS uses transport-layer security (TLS) to provide privacy from users to their DNS resolvers and optionally to authoritative servers. Traditional wisdom is that connection setup will balloon latency for clients and overwhelm servers. These are myths—our model of end-to-end latency shows TLS to the recursive resolver is only about 21% slower, with UDP to the authoritative server. End-to-end latency is 90% slower with TLS to recursive and TCP to authoritative. Experiments behind these models show that after connection establishment, TCP and TLS latency is equivalent to UDP. Using diverse trace data we show that frequent connection reuse is possible (60–95% for stub and recursive resolvers, although half that for authoritative servers). With conservative timeouts (20 s at authoritative servers and 60 s elsewhere) we show that server memory requirements match current hardware: a large recursive resolver may have 25k active connections consuming about 9 GB of RAM. We identify the key design and implementation decisions needed to minimize overhead—query pipelining, out-of-order responses, TLS connection resumption, and plausible timeouts.

 

Categories
Papers Publications

new conference paper “The Need for End-to-End Evaluation of Cloud Availability” in PAM 2014

The paper “The Need for End-to-End Evaluation of Cloud Availability” was published by PAM 2014 in Marina del Rey, CA (available at http://www.isi.edu/~zihu/paper/cloud_availability.pdf).

From the abstract:cloud_availability_blog

People’s computing lives are moving into the cloud, making understanding cloud availability increasingly critical. Prior studies of Internet outages have used ICMP-based pings and traceroutes. While these studies can detect network availability, we show that they can be inaccurate at estimating cloud availability. Without care, ICMP probes can underestimate availability because ICMP is not as robust as application-level measurements such as HTTP. They can overestimate availability if they measure reachability of the cloud’s edge, missing failures in the cloud’s back-end. We develop methodologies sensitive to five “nines” of reliability, and then we compare ICMP and end-to-end measurements for both cloud VM and storage services. We show case studies where one fails and the other succeeds, and our results highlight the importance of application-level retries to reach high precision. When possible, we recommend end-to-end measurement with application-level protocols to evaluate the availability of cloud services.

Citation: Zi Hu, Liang Zhu, Calvin Ardi, Ethan Katz-Bassett, Harsha Madhyastha, John Heidemann, Minlan Yu. The Need for End-to-End Evaluation of Cloud Availability. Passive and Active Measurements Conference (PAM). Los Angeles, CA, USA, March 2014.

Categories
Publications Technical Report

new technical report “A Holistic Framework for Bridging Regional Threats to User QoE”

We just released a new technical report “A Holistic Framework for Bridging Regional Threats to User QoE”, ISI-TR-2013-687, available as https://www.isi.edu/~johnh/PAPERS/Cai13c.pdf

Estimated impact on user QoE in four cable cut incidents (Figure 13 from [Cai13c])

From the abstract:

Submarine cable cuts have become increasingly common, with five incidents breaking more than ten cables in the last three years. Today, around~300 cables carry the majority of international Internet traffic, so a single cable cut can affect millions of users, and repairs to any cut are expensive and time consuming. Prior work has either measured the impact following incidents, or predicted the results of network changes to relatively abstract Internet topological models. In this paper, we develop a new approach to model cable cuts. Our approach differs by following problems drawn from real-world occurrences all the way to their impact on end-users. Because our approach spans many layers, no single organization can provide all the data needed to apply the model. We therefore perform what-if analysis to study a range of possibilities. With this approach we evaluate four incidents in 2012 and 2013; our analysis suggests general rules that assess the degree of a country’s vulnerability to a cut.

 

Categories
Papers Publications

new conference paper “Mapping the Expansion of Google’s Serving Infrastructure” in IMC 2013 and WSJ Blog

The paper “Mapping the Expansion of Google’s Serving Infrastructure” (by Matt Calder, Xun Fan, Zi Hu, Ethan Katz-Bassett, John Heidemann and Ramesh Govindan) will appear in the 2013 ACM Internet Measurements Conference (IMC) in Barcelona, Spain in Oct. 2013.

This work was also featured today in Digits, the technology news and analysis blog from the Wall Street Journal, and at USC’s press room.

A copy of the paper is available at http://www.isi.edu/~johnh/PAPERS/Calder13a, and data from the work is available at http://mappinggoogle.cs.usc.edu, from http://www.isi.edu/ant/traces/mapping_google/index.html, and from http://www.predict.org.

[Calder13a] figure 5a
Growth of Google’s infrastructure, measured in IP addresses [Calder13a] figure 5a

From the paper’s abstract:

Modern content-distribution networks both provide bulk content and act as “serving infrastructure” for web services in order to reduce user-perceived latency. Serving infrastructures such as Google’s are now critical to the online economy, making it imperative to understand their size, geographic distribution, and growth strategies. To this end, we develop techniques that enumerate IP addresses of servers in these infrastructures, find their geographic location, and identify the association between clients and clusters of servers. While general techniques for server enumeration and geolocation can exhibit large error, our techniques exploit the design and mechanisms of serving infrastructure to improve accuracy. We use the EDNS-client-subnet DNS extension to measure which clients a service maps to which of its serving sites. We devise a novel technique that uses this mapping to geolocate servers by combining noisy information about client locations with speed-of-light constraints. We demonstrate that this technique substantially improves geolocation accuracy relative to existing approaches. We also cluster server IP addresses into physical sites by measuring RTTs and adapting the cluster thresholds dynamically. Google’s serving infrastructure has grown dramatically in the ten months, and we use our methods to chart its growth and understand its content serving strategy. We find that the number of Google serving sites has increased more than sevenfold, and most of the growth has occurred by placing servers in large and small ISPs across the world, not by expanding Google’s backbone.

Categories
Papers Publications

new conference paper “Replay of Malicious Traffic in Network Testbeds” in IEEE Conf. on Technologies for Homeland Security (HST)

The paper “Replay of Malicious Traffic in Network Testbeds” (by Alefiya Hussain, Yuri Pradkin, and John Heidemann) will appear in the 3th IEEE Conference on Technologies for Homeland Security (HST) in Waltham, Mass. in Nov. 2013.  The paper is available at  http://www.isi.edu/~johnh/PAPERS/Hussain13a.

Hussain13a_iconFrom the paper’s abstract:

In this paper we present tools and methods to integrate attack measurements from the Internet with controlled experimentation on a network testbed. We show that this approach provides greater fidelity than synthetic models. We compare the statistical properties of real-world attacks with synthetically generated constant bit rate attacks on the testbed. Our results indicate that trace replay provides fine time-scale details that may be absent in constant bit rate attacks. Additionally, we demonstrate the effectiveness of our approach to study new and emerging attacks. We replay an Internet attack captured by the LANDER system on the DETERLab testbed within two hours.

Data from the paper is available as DoS_DNS_amplification-20130617 from the authors or http://www.predict.org, and the tools are at deterlab).