Categories
Papers Publications

Paper at Global Internet 2010

Chris Wilcox presented a paper titled “Correlating Spam Activity with IP Address Characteristics” In Global Inernet 2010. The paper uses Lander survey data as well as spam data from eSoft.

Abstract: It is well known that spam bots mostly utilize compromised machines with certain address characteristics, such as dynamically allocated addresses, machines in specific geographic areas and IP ranges from AS’ with more tolerant spam policies. Such machines tend to be less diligently administered and may exhibit less stability, more volatility, and shorter uptimes. However, few studies have attempted to quantify how such spambot address characteristics compare with non-spamming hosts.
Quantifying these characteristics may help provide important information for comprehensive spam mitigation.
We use two large datasets, namely a commercial blacklist
and an Internet-wide address visibility study to quantify address characteristics of spam and non-spam networks. We find that spam networks exhibit significantly less availability and uptime, and higher volatility than non-spam networks. In addition, we conduct a collateral damage study of a common practice where an ISP blocks the entire /24 prefix if spammers are detected in that range. We find that such a policy blacklists a significant portion of legitimate mail servers belonging to the same prefix.

Categories
Papers Publications

Paper at NPSec

Steve DiBenedetto presented a paper titled “Fingerprinting Custom Botnet Protocol Stacks” at NPSec 2010, in Kyoto Japan.

Categories
Presentations

New Video About Address Utilization and Allocations on Map Browser

The ANT project released a video describing Internet address allocation and how we study address utilization with IPv4 censuses. Aniruddh Rao prepared this video, working with John Heidemann and Xue Cai.

a scene from the ANT video describing address allocation and census taking

We have also updated our web-based IPv4 address browser to provide information about to what organizations each address block is allocated. The map now visualizes the whois allocation data; we thank the five regional internet registries for sharing this data with us and authorizing this visualization.

organizations in our Internet map

Finally, our web-based IPv4 address browser now has better time travel, with nearly 30 different census from Dec. 2005 to Nov. 2010, and we continue to update the map regularly.

Data collection for this work is through the LANDER project, and the map browser improvements are due to AMITE, both supported by DHS. Video preparation was supported by these projects and NSF through the MADCAT project.

Categories
Papers Publications

New conference paper “Selecting Representative IP Addresses for Internet Topology Studies” to appear at IMC

The paper “Selecting Representative IP Addresses for Internet Topology Studies” (available at http://www.isi.edu/~xunfan/research/Fan10a.pdf) was accepted to appear at the ACM Internet Measurement Conference 2010 in Melbourne, Australia.

From the abstract:

An Internet hitlist is a set of addresses that cover and can represent the the Internet as a whole. Hitlists have long been used in studies of Internet topology, reachability, and performance, serving as the destinations of traceroute or performance probes. Most early topology studies used manually generated lists of prominent addresses, but evolution and growth of the Internet make human maintenance untenable. Random selection scales to today’s address space, but most andom addresses fail to respond. In this paper we present what we believe is the first automatic generation of hitlists informed censuses of Internet addresses. We formalize the desirable characteristics of a hitlist: reachability, each representative responds to pings; completeness, they cover all the allocated IPv4 address space; and stability, list evolution is minimized when possible. We quantify the accuracy of our automatic hitlists, showing that only one-third of the Internet allows informed selection of representatives. Of informed representatives, 50–60% are likely to respond three months later, and we show that causes for non-responses are likely due to dynamic addressing (so no stable representative exists) or firewalls. In spite of these limitations, we show that the use of informed hitlists can add 1.7 million edge links (a 5% growth) to traceroute-based Internet topology studies. Our hitlists are available free-of-charge and are in use by several other research projects.

Citation: Xun Fan and John Heidemann. Selecting Representative IP Addresses for Internet Topology Studies. To appear in Proceedings of the ACM Internet Measurement Conference (IMC). Melbourne, Australia, ACM. November, 2010. http://www.isi.edu/~johnh/PAPERS/Fan10a.html

Categories
Papers Publications

new conference paper “Towards an AS-to-Organization Map” to appear at IMC

The paper “Towards an AS-to-Organization Map” was accepted by IMC’10 in Melbourne, Australia (available at http://www.isi.edu/~johnh/PAPERS/Cai10c.html).

From the abstract:

An understanding of Internet topology is central to answer various questions ranging from network resilience to peer selection or data center location. While much of prior work has examined AS-level connectivity, meaningful and relevant results from such an abstract view of Internet topology have been limited. For one, semantically, AS relationships capture business relationships and not physical connectivity. Additionally, many organizations often use multiple ASes, either to implement different routing policies, or as legacies from mergers and acquisitions. In this paper, we move beyond the traditional AS graph view of the Internet to define the problem of AS-to-organization mapping. We describe our initial steps at automating the capture of the rich semantics inherent in the AS-level ecosystem where routing and connectivity intersect with organizations. We discuss preliminary methods that identify multi-AS organizations from WHOIS data and illustrate the challenges posed by the quality of the available data and the complexity of real-world organizational relationships.

Citation: Xue Cai, John Heidemann, Balachander Krishnamurthy, and Walter Willinger. Towards an AS-to-Organization Map. In Proceedings of the ACM Internet Measurement Conference, p. to appear. Melbourne, Australia, ACM. November, 2010.

Categories
Papers Publications

New journal paper “Parametric Methods for Anomaly Detection in Aggregate Traffic” to appear in TON

The paper “Parametric Methods for Anomaly Detection in Aggregate Traffic” was accepted for publication in ACM/IEEE Transactions on Networking (available at http://www.isi.edu/~johnh/PAPERS/Thatte10a.html).

From the abstract:

This paper develops parametric methods to detect network anomalies using only aggregate traffic statistics, in contrast to other works requiring flow separation, even when the anomaly is a small fraction of the total traffic. By adopting simple statistical models for anomalous and background traffic in the time-domain, one can estimate model parameters in realtime, thus obviating the need for a long training phase or manual parameter tuning. The proposed bivariate Parametric Detection Mechanism (bPDM) uses a sequential probability ratio test, allowing for control over the false positive rate while examining the trade-off between detection time and the strength of an anomaly. Additionally, it uses both traffic-rate and packet-size statistics, yielding a bivariate model that eliminates most false positives. The method is analyzed using the bitrate SNR metric, which is shown to be an effective metric for anomaly detection. The performance of the bPDM is evaluated in three ways: first, synthetically-generated traffic provides for a controlled comparison of detection time as a function of the anomalous level of traffic. Second, the approach is shown to be able to detect controlled artificial attacks over the USC campus network in varying real traffic mixes. Third, the proposed algorithm achieves rapid detection of real denial-of-service attacks as determined by the replay of previously captured network traces. The method developed in this paper is able to detect all attacks in these scenarios in a few seconds or less.

Citation: Gautam Thatte, Urbashi Mitra, and John Heidemann. Parametric Methods for Anomaly Detection in Aggregate Traffic. ACM/IEEE Transactions on Networking, p. accepted to appear, August, 2010. (Likely publication in 2011). <http://www.isi.edu/~johnh/PAPERS/Thatte10a.html>.
Categories
Papers Publications

new conference paper “On the Characteristics and Reasons of Long-lived Internet Flows” at IMC

The paper “On the Characteristics and Reasons of Long-lived Internet Flows” was accepted by IMC’10 in Melbourne, Australia (available at http://www.isi.edu/~johnh/PAPERS/Quan10a.html).

From the abstract:

Prior studies of Internet traffic have considered traffic at different resolutions and time scales: packets and flows for hours or days, aggregate packet statistics for days or weeks, and hourly trends for months. However, little is known about the long-term behavior of individual flows. In this paper, we study individual flows (as defined by the 5-tuple of protocol, source and destination IP address and port) over days and weeks. While the vast majority of flows are short, and most bytes are in short flows, we find that about 20% of the overall bytes are carried in flows that last longer than 10 minutes, and flows lasting 100 minutes or longer make up 2% of traffic. We show that long-lived flows are qualitatively different from short flows: they are generally slower, less bursty, and are due to different applications and protocols. We investigate the causes of short- and long-lived flows, and show that the traffic mix varies significantly depending on duration time scale, with computer-to-computer traffic more and more dominating in larger time scales.

Citation: Lin Quan and John Heidemann. On the Characteristics and Reasons of Long-lived Internet Flows. In Proceedings of the ACM Internet Measurement Conference. Melbourne, Australia, ACM. November, 2010. <http://www.isi.edu/~johnh/PAPERS/Quan10a.html>.


Categories
Papers Publications

new conference paper “Understanding Block-level Address Usage in the Visible Internet” at SIGCOMM

The paper “Understanding Block-level Address Usage in the Visible Internet” was accepted and presented at SIGCOMM’10 in New Delhi, India (available at http://www.isi.edu/~johnh/PAPERS/Cai10a.html).

From the abstract:

Although the Internet is widely used today, we have little information about the edge of the network. Decentralized management, firewalls, and sensitivity to probing prevent easy answers and make measurement difficult. Building on frequent ICMP probing of 1% of the Internet address space, we develop clustering and analysis methods to estimate how Internet addresses are used. We show that adjacent addresses often have similar characteristics and are used for similar purposes (61% of addresses we probe are consistent blocks of 64 neighbors or more). We then apply this block-level clustering to provide data to explore several open questions in how networks are managed. First, we provide information about how effectively network address blocks appear to be used, finding that a significant number of blocks are only lightly used (most addresses in about one-fifth of /24 blocks are in use less than 10% of the time), an important issue as the IPv4 address space nears full allocation. Second, we provide new measurements about dynamically managed address space, showing nearly 40% of /24 blocks appear to be dynamically allocated, and dynamic addressing is most widely used in countries more recent to the Internet (more than 80% in China, while less than 30% in the U.S.). Third, we distinguish blocks with low-bitrate last-hops and show that such blocks are often underutilized.

Citation: Xue Cai and John Heidemann. Understanding Block-level Address Usage in the Visible Internet. In Proceedings of the ACM SIGCOMM Conference , p. to appear. New Delhi, India, ACM. August, 2010. <http://www.isi.edu/~johnh/PAPERS/Cai10a.html>.

Categories
Publications Technical Report

New tech report “Selecting Representative IP Addresses for Internet Topology Studies”

We just published a new technical report “Selecting Representative IP Addresses for Internet Topology Studies” (available at ftp://ftp.isi.edu/isi-pubs/tr-666.pdf) .

From the abstract:

An Internet hitlist is a set of addresses that cover and can represent the the Internet as a whole. Hitlists have long been used in studies of Internet topology, reachability, and performance, serving as the destinations of traceroute or performance probes. Most early topology studies used manually generated lists of prominent addresses, but evolution and growth of the Internet make human maintenance untenable. Random selection scales to today’s address space, but most andom addresses fail to respond. In this paper we present what we believe is the first automatic generation of hitlists informed censuses of Internet addresses. We formalize the desirable characteristics of a hitlist: reachability, each representative responds to pings; completeness, they cover all the allocated IPv4 address space; and stability, list evolution is minimized when possible. We quantify the accuracy of our automatic hitlists, showing that only one-third of the Internet allows informed selection of representatives. Of informed representatives, 50–60% are likely to respond three months later, and we show that causes for non-responses are likely due to dynamic addressing (so no stable representative exists) or firewalls. In spite of these limitations, we show that the use of informed hitlists can add 1.7 million edge links (a 5% growth) to traceroute-based Internet topology studies. Our hitlists are available free-of-charge and are in use by several other research projects.

Citation: Xun Fan and John Heidemann. Selecting Representative IP Addresses for Internet Topology Studies. Technical Report N. ISI-TR-666, USC/Information Sciences Institute, June, 2010. http://www.isi.edu/~johnh/PAPERS/Fan10a.html

Categories
Papers Publications

New conference paper “Correlating Spam Activity with IP Address Characteristics” at Global Internet

The paper “Correlating Spam Activity with IP Address Characteristics” (available at PDF Format) was accepted and presented at Global Internet 2010. The focus of this paper is to quantify the collateral damage (legitimate mail servers incorrectly blacklisted) caused by the practice of blocking /24 address blocks based on the presence of spammers. The paper also revisits the differences in IP address characteristics and domain names between spammers and non-spammers.

From the abstract:

It is well known that spam bots mostly utilize compromised machines with certain address characteristics, such as dynamically allocated addresses, machines in specific geographic areas and IP ranges from AS’ with more tolerant spam policies. Such machines tend to be less diligently administered and may exhibit less stability, more volatility, and shorter uptimes. However, few studies have attempted to quantify how such spam bot address characteristics compare with non-spamming hosts. Quantifying these characteristics may help provide important information for comprehensive spam mitigation.

We use two large datasets, namely a commercial blacklist and an Internet-wide address visibility study to quantify address characteristics of spam and non-spam networks. We find that spam networks exhibit significantly less availability and uptime, and higher volatility than non-spam networks. In addition, we conduct a collateral damage study of a common practice where an ISP blocks the entire /24 prefix if spammers are detected in that range.  We find that such a policy blacklists a significant portion of legitimate mail servers belonging to the same prefix.

Citation: Chris Wilcox, Christos Papadopoulos, John Heidemann. Correlating Spam Activity with IP Address Characteristics.  Proceedings of the IEEE Global Internet Conference, San Diego, CA, USA, IEEE.  March, 2010.