This paper develops parametric methods to detect network anomalies using only aggregate traffic statistics, in contrast to other works requiring flow separation, even when the anomaly is a small fraction of the total traffic. By adopting simple statistical models for anomalous and background traffic in the time-domain, one can estimate model parameters in realtime, thus obviating the need for a long training phase or manual parameter tuning. The proposed bivariate Parametric Detection Mechanism (bPDM) uses a sequential probability ratio test, allowing for control over the false positive rate while examining the trade-off between detection time and the strength of an anomaly. Additionally, it uses both traffic-rate and packet-size statistics, yielding a bivariate model that eliminates most false positives. The method is analyzed using the bitrate SNR metric, which is shown to be an effective metric for anomaly detection. The performance of the bPDM is evaluated in three ways: first, synthetically-generated traffic provides for a controlled comparison of detection time as a function of the anomalous level of traffic. Second, the approach is shown to be able to detect controlled artificial attacks over the USC campus network in varying real traffic mixes. Third, the proposed algorithm achieves rapid detection of real denial-of-service attacks as determined by the replay of previously captured network traces. The method developed in this paper is able to detect all attacks in these scenarios in a few seconds or less.
The paper “On the Characteristics and Reasons of Long-lived Internet Flows” was accepted by IMC’10 in Melbourne, Australia (available at http://www.isi.edu/~johnh/PAPERS/Quan10a.html).
From the abstract:
Prior studies of Internet traffic have considered traffic at different resolutions and time scales: packets and flows for hours or days, aggregate packet statistics for days or weeks, and hourly trends for months. However, little is known about the long-term behavior of individual flows. In this paper, we study individual flows (as defined by the 5-tuple of protocol, source and destination IP address and port) over days and weeks. While the vast majority of flows are short, and most bytes are in short flows, we find that about 20% of the overall bytes are carried in flows that last longer than 10 minutes, and flows lasting 100 minutes or longer make up 2% of traffic. We show that long-lived flows are qualitatively different from short flows: they are generally slower, less bursty, and are due to different applications and protocols. We investigate the causes of short- and long-lived flows, and show that the traffic mix varies significantly depending on duration time scale, with computer-to-computer traffic more and more dominating in larger time scales.
Citation: Lin Quan and John Heidemann. On the Characteristics and Reasons of Long-lived Internet Flows. In Proceedings of the ACM Internet Measurement Conference. Melbourne, Australia, ACM. November, 2010. <http://www.isi.edu/~johnh/PAPERS/Quan10a.html>.
The paper “Understanding Block-level Address Usage in the Visible Internet” was accepted and presented at SIGCOMM’10 in New Delhi, India (available at http://www.isi.edu/~johnh/PAPERS/Cai10a.html).
From the abstract:
Although the Internet is widely used today, we have little information about the edge of the network. Decentralized management, firewalls, and sensitivity to probing prevent easy answers and make measurement difficult. Building on frequent ICMP probing of 1% of the Internet address space, we develop clustering and analysis methods to estimate how Internet addresses are used. We show that adjacent addresses often have similar characteristics and are used for similar purposes (61% of addresses we probe are consistent blocks of 64 neighbors or more). We then apply this block-level clustering to provide data to explore several open questions in how networks are managed. First, we provide information about how effectively network address blocks appear to be used, finding that a significant number of blocks are only lightly used (most addresses in about one-fifth of /24 blocks are in use less than 10% of the time), an important issue as the IPv4 address space nears full allocation. Second, we provide new measurements about dynamically managed address space, showing nearly 40% of /24 blocks appear to be dynamically allocated, and dynamic addressing is most widely used in countries more recent to the Internet (more than 80% in China, while less than 30% in the U.S.). Third, we distinguish blocks with low-bitrate last-hops and show that such blocks are often underutilized.
Citation: Xue Cai and John Heidemann. Understanding Block-level Address Usage in the Visible Internet. In Proceedings of the ACM SIGCOMM Conference , p. to appear. New Delhi, India, ACM. August, 2010. <http://www.isi.edu/~johnh/PAPERS/Cai10a.html>.
From the abstract:
It is well known that spam bots mostly utilize compromised machines with certain address characteristics, such as dynamically allocated addresses, machines in specific geographic areas and IP ranges from AS’ with more tolerant spam policies. Such machines tend to be less diligently administered and may exhibit less stability, more volatility, and shorter uptimes. However, few studies have attempted to quantify how such spam bot address characteristics compare with non-spamming hosts. Quantifying these characteristics may help provide important information for comprehensive spam mitigation.
We use two large datasets, namely a commercial blacklist and an Internet-wide address visibility study to quantify address characteristics of spam and non-spam networks. We find that spam networks exhibit significantly less availability and uptime, and higher volatility than non-spam networks. In addition, we conduct a collateral damage study of a common practice where an ISP blocks the entire /24 prefix if spammers are detected in that range. We find that such a policy blacklists a significant portion of legitimate mail servers belonging to the same prefix.
Citation: Chris Wilcox, Christos Papadopoulos, John Heidemann. Correlating Spam Activity with IP Address Characteristics. Proceedings of the IEEE Global Internet Conference, San Diego, CA, USA, IEEE. March, 2010.
The paper “Improved Internet Traffic Analysis via Optimized Sampling” (available at PDF Format) was accepted to ICASSP 2010. The focus of this paper is on the best down-sampling methods to use when measuring internet traffic in order preserve signal information for traffic analysis techniques such as anomaly detection.
From the abstract:
Applications to evaluate Internet quality-of-service and increase network security are essential to maintaining reliability and high performance in computer networks. These applications typically use very accurate, but high cost, hardware measurement systems. Alternate, less expensive software based systems are often impractical for use with analysis applications because they reduce the number and accuracy of measurements using a technique called interrupt coalescence, which can be viewed as a form of sampling. The goal of this paper is to optimize the way interrupt coalescence groups packets into measurements
so as to retain as much of the packet timing information as possible. Our optimized solution produces estimates of timing distributions much closer to those obtained using hardware based systems.
Further we show that for a real Internet analysis application, periodic signal detection, using measurements generated with our method improved detection times by at least 36%.
Citation: Sean McPherson and Antonio Ortega. Improved Internet Traffic Analysis via Optimized Sampling. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, p. to appear. Dallas, TX, USA, IEEE. March, 2010.
We just posted a pre-print of the paper “Uses and Challenges for Network Datasets”, to appear at IEEE CATCH in March. The pre-print is at <http://www.isi.edu/~johnh/PAPERS/Heidemann09a.html>.
The abstract summarizes the paper:
Network datasets are necessary for many types of network research. While there has been significant discussion about specific datasets, there has been less about the overall state of network data collection. The goal of this paper is to explore the research questions facing the Internet today, the datasets needed to answer those questions, and the challenges to using those datasets. We suggest several practices that have proven important in use of current data sets, and open challenges to improve use of network data.
More specifically, the paper tries to answer the question Jody Westby put to PREDICT PIs, which is “why take data, what is it good for”? While a simple question, it’s not easy to answer (at least, my attempt to dash of a quick answer in e-mail failed). The paper is an attempt at a more thoughtful answer.
The paper tries to summarize and point to a lot of ongoing work, but I know that our coverage was insufficient. We welcome feedback about what we’re missing.
John Heidemann and Christos Papadopoulos. Uses and Challenges for Network Datasets. In
The IMC paper “Census and Survey of the Visible Internet” was described in an article “Probe Sees Unused Internet” in the MIT Technology Review by Robert Lemos.
The article provides a nice summary of the issues, but it reaches a conclusion that is stronger supported by the study. The subhead of the article is “A survey shows that addresses are not running out as quickly as we’d thought”, and the article draws the conclusion: “the problem [of IPv4 address exhaustion] may not be as bad as many fear.”
The article’s conclusion, I think, overly simplifies matters—it is only true if the “better things we should be doing in managing the IPv4 address space” are free. The Internet Census we carried out supports the opportunity for better IPv4 address space management. But an open question is the cost of such management. Historically, with plentiful IPv4 addresses, IPv4 management costs have been small, but potential better IPv4 management will likely be much more costly. This cost of ongoing IPv4 management needs to be weighed against the costs of one-time conversion cost to IPv6 coupled followed lower IPv6 management costs.
To me, one exciting conclusion from the Internet Census we carried out is that we now have data that allows us to start evaluating these trade-offs. The answer may be more careful IPv4 gets us a few years, or that the cost of more careful IPv4 makes IPv6 an obvious choice. In either case, resolving this transition is important for the Internet community.
We are happy to report that the paper “Census and Survey of the Visible Internet” has been accepted to appear at the Internet Measurement Conference in Vouliagmeni, Greece in October 2008.
A preprint is available at http://www.isi.edu/~johnh/PAPERS/Heidemann08c.html, and an extended version is available as an updated technical report at http://www.isi.edu/~johnh/PAPERS/Heidemann08a.html.
Citation: John Heidemann, Yuri Pradkin, Ramesh Govindan, Christos Papadopoulos, Genevieve Bartlett, and Joseph Bannister. Census and Survey of the Visible Internet. In
A number of folks expressed interest in our ANT census of the Internet address space at <http://www.isi.edu/ant/address/>.
We have three recent updates, a new TECHNICAL REPORT and a BROWSABLE INTERNET ADDRESS MAP, and a PROJECT BLOG.
We have have released a new TECHNICAL REPORT describing the methodology, ISI-TR-2008-649 at <http://www.isi.edu/~johnh/PAPERS/Heidemann08a.html>.
This report should completely supersede our previous report (#640), adding:
- evaluation in ping accuracy, both absolutely and relative to TCP probing
- estimation of error in our evaluations of hosts and server counts
- validation of our approach to firewall detection
- significant improvements in organization and presentation
We have also put up a BROWSABLE INTERNET ADDRESS MAP at <http://www.isi.edu/ant/address/browse/>.
With the Google maps engine, this map lets you zoom from an overview to any part of the address space, including showing individual hosts (permuted for anonymization).
Finally, we now have a PROJECT BLOG to allow folks to track future developments: <http://ant.isi.edu/blog/>. We plan to do all future announcements via the blog rather than with general e-mail messages, so folks can opt-in to what they want to hear.
We welcome any comments about the map or technical report, either to our group mailing list (ant, then at isi.edu), or to individuals.
-ANT folks (John Heidemann, Yuri Pradkin, Ramesh Govindan, Christos Papadopoulos, Genevieve Bartlett, Joseph Bannister)