Categories
Papers Publications

new conference paper “Low-Rate, Flow-Level Periodicity Detection” at Global Internet 2011

Visualization of low-rate periodicity, before and after installation of a keylogger.  [Bartlett11a, figure 3]
Visualization of low-rate periodicity, before and after installation of a keylogger. [Bartlett11a, figure 3]
The paper “Low-Rate, Flow-Level Periodicity Detection”, by Genevieve Bartlett, John Heidemann, and Christos Papadopoulos is being presented at IEEE Global Internet 2011 in Shanghai, China this week. The full text is available at http://www.isi.edu/~johnh/PAPERS/Bartlett11a.pdf.

The abstract summarizes the work:

As desktops and servers become more complicated, they employ an increasing amount of automatic, non-user initiated communication. Such communication can be good (OS updates, RSS feed readers, and mail polling), bad (keyloggers, spyware, and botnet command-and-control), or ugly (adware or unauthorized peer-to-peer applications). Communication in these applications is often regular, but with very long periods, ranging from minutes to hours. This infrequent communication and the complexity of today’s systems makes these applications difficult for users to detect and diagnose. In this paper we present a new approach to identify low-rate periodic network traffic and changes in such regular communication. We employ signal-processing techniques, using discrete wavelets implemented as a fully decomposed, iterated filter bank. This approach not only detects low-rate periodicities, but also identifies approximate times when traffic changed. We implement a self-surveillance application that externally identifies changes to a user’s machine, such as interruption of periodic software updates, or an installation of a keylogger.

The datasets used in this paper are available on request, and through PREDICT.

An expanded version of the paper is available as a technical report “Using low-rate flow periodicities in anomaly detection” by Bartlett, Heidemann and Papadopoulos. Technical Report ISI-TR-661, USC/Information Sciences Institute, Jul 2009. http://www.isi.edu/~johnh/PAPERS/Bartlett09a.pdf

Categories
Papers Publications

New conference paper “Selecting Representative IP Addresses for Internet Topology Studies” to appear at IMC

The paper “Selecting Representative IP Addresses for Internet Topology Studies” (available at http://www.isi.edu/~xunfan/research/Fan10a.pdf) was accepted to appear at the ACM Internet Measurement Conference 2010 in Melbourne, Australia.

From the abstract:

An Internet hitlist is a set of addresses that cover and can represent the the Internet as a whole. Hitlists have long been used in studies of Internet topology, reachability, and performance, serving as the destinations of traceroute or performance probes. Most early topology studies used manually generated lists of prominent addresses, but evolution and growth of the Internet make human maintenance untenable. Random selection scales to today’s address space, but most andom addresses fail to respond. In this paper we present what we believe is the first automatic generation of hitlists informed censuses of Internet addresses. We formalize the desirable characteristics of a hitlist: reachability, each representative responds to pings; completeness, they cover all the allocated IPv4 address space; and stability, list evolution is minimized when possible. We quantify the accuracy of our automatic hitlists, showing that only one-third of the Internet allows informed selection of representatives. Of informed representatives, 50–60% are likely to respond three months later, and we show that causes for non-responses are likely due to dynamic addressing (so no stable representative exists) or firewalls. In spite of these limitations, we show that the use of informed hitlists can add 1.7 million edge links (a 5% growth) to traceroute-based Internet topology studies. Our hitlists are available free-of-charge and are in use by several other research projects.

Citation: Xun Fan and John Heidemann. Selecting Representative IP Addresses for Internet Topology Studies. To appear in Proceedings of the ACM Internet Measurement Conference (IMC). Melbourne, Australia, ACM. November, 2010. http://www.isi.edu/~johnh/PAPERS/Fan10a.html

Categories
Papers Publications

new conference paper “On the Characteristics and Reasons of Long-lived Internet Flows” at IMC

The paper “On the Characteristics and Reasons of Long-lived Internet Flows” was accepted by IMC’10 in Melbourne, Australia (available at http://www.isi.edu/~johnh/PAPERS/Quan10a.html).

From the abstract:

Prior studies of Internet traffic have considered traffic at different resolutions and time scales: packets and flows for hours or days, aggregate packet statistics for days or weeks, and hourly trends for months. However, little is known about the long-term behavior of individual flows. In this paper, we study individual flows (as defined by the 5-tuple of protocol, source and destination IP address and port) over days and weeks. While the vast majority of flows are short, and most bytes are in short flows, we find that about 20% of the overall bytes are carried in flows that last longer than 10 minutes, and flows lasting 100 minutes or longer make up 2% of traffic. We show that long-lived flows are qualitatively different from short flows: they are generally slower, less bursty, and are due to different applications and protocols. We investigate the causes of short- and long-lived flows, and show that the traffic mix varies significantly depending on duration time scale, with computer-to-computer traffic more and more dominating in larger time scales.

Citation: Lin Quan and John Heidemann. On the Characteristics and Reasons of Long-lived Internet Flows. In Proceedings of the ACM Internet Measurement Conference. Melbourne, Australia, ACM. November, 2010. <http://www.isi.edu/~johnh/PAPERS/Quan10a.html>.


Categories
Papers Publications

new conference paper “Understanding Block-level Address Usage in the Visible Internet” at SIGCOMM

The paper “Understanding Block-level Address Usage in the Visible Internet” was accepted and presented at SIGCOMM’10 in New Delhi, India (available at http://www.isi.edu/~johnh/PAPERS/Cai10a.html).

From the abstract:

Although the Internet is widely used today, we have little information about the edge of the network. Decentralized management, firewalls, and sensitivity to probing prevent easy answers and make measurement difficult. Building on frequent ICMP probing of 1% of the Internet address space, we develop clustering and analysis methods to estimate how Internet addresses are used. We show that adjacent addresses often have similar characteristics and are used for similar purposes (61% of addresses we probe are consistent blocks of 64 neighbors or more). We then apply this block-level clustering to provide data to explore several open questions in how networks are managed. First, we provide information about how effectively network address blocks appear to be used, finding that a significant number of blocks are only lightly used (most addresses in about one-fifth of /24 blocks are in use less than 10% of the time), an important issue as the IPv4 address space nears full allocation. Second, we provide new measurements about dynamically managed address space, showing nearly 40% of /24 blocks appear to be dynamically allocated, and dynamic addressing is most widely used in countries more recent to the Internet (more than 80% in China, while less than 30% in the U.S.). Third, we distinguish blocks with low-bitrate last-hops and show that such blocks are often underutilized.

Citation: Xue Cai and John Heidemann. Understanding Block-level Address Usage in the Visible Internet. In Proceedings of the ACM SIGCOMM Conference , p. to appear. New Delhi, India, ACM. August, 2010. <http://www.isi.edu/~johnh/PAPERS/Cai10a.html>.

Categories
Publications Technical Report

New tech report “Selecting Representative IP Addresses for Internet Topology Studies”

We just published a new technical report “Selecting Representative IP Addresses for Internet Topology Studies” (available at ftp://ftp.isi.edu/isi-pubs/tr-666.pdf) .

From the abstract:

An Internet hitlist is a set of addresses that cover and can represent the the Internet as a whole. Hitlists have long been used in studies of Internet topology, reachability, and performance, serving as the destinations of traceroute or performance probes. Most early topology studies used manually generated lists of prominent addresses, but evolution and growth of the Internet make human maintenance untenable. Random selection scales to today’s address space, but most andom addresses fail to respond. In this paper we present what we believe is the first automatic generation of hitlists informed censuses of Internet addresses. We formalize the desirable characteristics of a hitlist: reachability, each representative responds to pings; completeness, they cover all the allocated IPv4 address space; and stability, list evolution is minimized when possible. We quantify the accuracy of our automatic hitlists, showing that only one-third of the Internet allows informed selection of representatives. Of informed representatives, 50–60% are likely to respond three months later, and we show that causes for non-responses are likely due to dynamic addressing (so no stable representative exists) or firewalls. In spite of these limitations, we show that the use of informed hitlists can add 1.7 million edge links (a 5% growth) to traceroute-based Internet topology studies. Our hitlists are available free-of-charge and are in use by several other research projects.

Citation: Xun Fan and John Heidemann. Selecting Representative IP Addresses for Internet Topology Studies. Technical Report N. ISI-TR-666, USC/Information Sciences Institute, June, 2010. http://www.isi.edu/~johnh/PAPERS/Fan10a.html

Categories
Papers Publications

new paper “Uses and Challenges for Network Datasets”

We just posted a pre-print of the paper “Uses and Challenges for Network Datasets”, to appear at IEEE CATCH in March.  The pre-print is at <http://www.isi.edu/~johnh/PAPERS/Heidemann09a.html>.

The abstract summarizes the paper:

Network datasets are necessary for many types of network research.  While there has been significant discussion about specific datasets, there has been less about the overall state of network data collection.  The goal of this paper is to explore the research questions facing the Internet today, the datasets needed to answer those questions, and the challenges to using those datasets.  We suggest several practices that have proven important in use of current data sets, and open challenges to improve use of network data.

More specifically, the paper tries to answer the question Jody Westby put to PREDICT PIs, which is “why take data, what is it good for”?  While a simple question, it’s not easy to answer (at least, my attempt to dash of a quick answer in e-mail failed).  The paper is an attempt at a more thoughtful answer.

The paper tries to summarize and point to a lot of ongoing work, but I know that our coverage was insufficient.  We welcome feedback about what we’re missing.

John Heidemann and Christos Papadopoulos. Uses and Challenges for Network Datasets. In Proceedings of the IEEE Cybersecurity Applications and Technologies Conference for Homeland Security (CATCH), pp. 73-82. Washington, DC, USA, IEEE. March, 2009. http://www.isi.edu/~johnh/PAPERS/Heidemann09a.html