Categories
Publications Technical Report

New Tech Report “An Organization-Level View of the Internet and its Implications (extended)”

We just published a new technical report “An Organization-Level View of the Internet and its Implications (extended)”, available at ftp://ftp.isi.edu/isi-pubs/tr-679.pdf.
From the abstract:

We present a new clustering approach for mapping ASes to organizations, to develop an organization-level view of the Internet’s AS ecosystem. We demonstrate that the choice of clustering method and use of a new (though unconventional) data source in the form of company subsidiary information contained in the U.S. SEC~Form 10-K filings are both essential to get accurate results. Evaluating our mapping and validating it against carefully chosen datasets shows few (less than 10%) false negatives for 90% of organizations and few false positives for 60% of our organizations. We apply our map to show the importance of an organization-level view of the Internet by contrasting it with the commonly-used view that considers only an organization’s “main” AS. We find that this main-AS view sometimes severely underrepresents the influence of an organization in terms of announced addresses, geographic footprint, and peerings at Internet eXchange Points (IXPs). For example, for 20% of our organizations, the main-AS view detects only 10-60% of the cities covered by the corresponding organization-level view.

Categories
Publications Technical Report

New Tech Report “Detecting Internet Outages with Precise Active Probing (extended)”

We just published a new technical report “Detecting Internet Outages with Precise Active Probing (extended)”, available at ftp://ftp.isi.edu/isi-pubs/tr-678b.pdf. This is an update of ISI-TR-678.

From the abstract:

Parts of the Internet are down every day, from the intentionalshutdown of the Egyptian Internet in Jan. 2011 and natural disasterssuch as the Mar. 2011 Japanese earthquake, to the thousands of smalloutages caused by localized accidents, and human error, maintenance,or choices.  Understanding these events requires efficient andaccurate detection methods, motivating our new system to detectnetwork outages by active probing.  We show that a single computer cantrack outages across the entire analyzable IPv4 Internet, probing asample of 20 addresses in all 2.5M responsive /24 address blocks.  Weshow that our approach is significantly more accurate than the bestcurrent methods, with 31% fewer false conclusions, while providing 14%greater coverage and requiring about the same probing traffic.  Wedevelop new algorithms to identify outages and cluster them to events,providing the first visualization of outages.  We carefully validateour approach, showing consistent results over two years and from threedifferent sites.  Using public BGP archives and news sources weconfirm 83% of large events.  For a random sample of 50 observedevents, we find 38% in partial control-plane information, reaffirmingprior work that small outages are often not caused by BGP.  Throughcontrolled emulation we show that our approach detects 100% offull-block outages that last at least twice our probing interval.Finally, we report on Internet stability as a whole, and the size andduration of typical outages, using core-to-edge observations with muchlarger coverage than prior mesh-based studies.  We find that about0.3% of the Internet is likely to be unreachable at any time,suggesting the Internet provides only 2.5 “nines” of availability.

Categories
Publications Technical Report

New tech report “Characterizing Anycast in the Domain Name System”

We just published an new technical report of our anycast enumeration work, including some exciting new results. Check out “Characterizing Anycast in the Domain Name System” (available at ftp://ftp.isi.edu/isi-pubs/tr-681.pdf) .

From the abstract:

IP anycast is a central part of production DNS. While prior
work has explored proximity, affinity and load balancing
for some anycast services, there has been little attention to
third-party discovery and enumeration of components of an
anycast service. Enumeration can reveal abnormal service
configurations, benign masquerading or hostile hijacking of
anycast services, and can help characterize the extent of any-
cast deployment. In this paper, we discuss two methods to
identify and characterize anycast nodes. The first uses an
existing anycast diagnosis method based on CHAOS-class
DNS records but augments it with traceroute to resolve
ambiguities. The second proposes Internet-class DNS records
which permit accurate discovery through the use of existing
recursive DNS infrastructure. We validate these two meth-
ods against three widely-used anycast DNS services, using
a very large number (60k and 300k) of vantage points, and
show that they can provide excellent precision and recall.
Finally, we use these methods to evaluate anycast deploy-
ments in top-level domains (TLDs), and find one case where
a third-party operates a server masquerading as a root DNS
anycast node as well as a noticeable proportion of unusual
anycast proxies. We also show that, across all TLDs, up to
72% use anycast, and that, of about 30 anycast providers,
the two largest serve nearly half the anycasted TLD name-
servers.

Citation: Xun Fan, John Heidemann and Ramesh Govindan. Characterizing Anycast in the Domain Name System. Technical Report N. ISI-TR-681, USC/Information Sciences Institute, May, 2012. ftp://ftp.isi.edu/isi-pubs/tr-681.pdf

Categories
Publications Technical Report

New tech report “Identifying and Characterizing Anycast in the Domain Name System”

We just published a new technical report “Identifying and Characterizing Anycast in the Domain Name System” (available at ftp://ftp.isi.edu/isi-pubs/tr-671.pdf) .

From the abstract:

Since its first appearance, IP anycast has become essential
for critical network services such as the Domain Name Sys-
tem (DNS). Despite this, there has been little attention to
independently identifying and characterizing anycast nodes.
External evaluation of anycast allows both third-party audit-
ing of its benefits, and is essential to discovering benign mas-
querading or hostile hijacking of anycast services. In this
paper, we develop ACE, an approach to identify and charac-
terize anycast nodes. ACE first method is DNS queries for
CHAOS records, the recommended debugging service for
anycast, suitable for cooperative anycast services. Its second
method uses traceroute to identify all anycast services by
their connectivity to the Internet. Each individual method
has ambiguities in some circumstances; we show a com-
bined method improves on both. We validate ACE against
two widely used anycast DNS services that provide ground
truth. ACE has good precision, with 88% of its results corre-
sponding to unique anycast nodes of the F-root DNS service.
Its recall is affected by the number and diversity of vantage
points. We use ACE for an initial study of how anycast is
used for top-level domain servers. We find one case where
a third-party server operates on root-DNS IP address, mas-
querades to capture traffic for its organization. We also study
the 1164 nameserver IP addresses used by all generic and
country-code top-level domains in April 2011. This study
shows evidence that at least 14% and perhaps 32% use any-
cast.

Citation: Xun Fan, John Heidemann and Ramesh Govindan. Identifying and Characterizing Anycast in the Domain Name System. Technical Report N. ISI-TR-671, USC/Information Sciences Institute, June, 2011. ftp://ftp.isi.edu/isi-pubs/tr-671.pdf

Data from this paper will be available from PREDICT through the LANDER project; contact the authors for details.

Categories
Publications Technical Report

New tech report “Detecting Internet Outages with Active Probing”

We just published a new technical report “Detecting Internet Outages with Active Probing”, available at ftp://ftp.isi.edu/isi-pubs/tr-672.pdf.

From the abstract:

With businesses, governments, and individuals increasingly
dependent on the Internet, understanding its reliability is more
important than ever. Network outages vary in scope and
cause, from the intentional shutdown of the Egyptian Inter-
net in February 2011, to outages caused by the effects of
March 2011 earthquakes on undersea cables entering Japan,
to the thousands of small, daily outages caused by localized
accidents or human error. In this paper we present a new
method to detect network outages by probing entire blocks.
Using 24 datasets, each a 2-week study of 22,000 /24 address
blocks randomly sampled from the Internet, we develop new
algorithms to identify and visualize outages and to cluster
those outages into network-level events. We validate our ap-
proach by comparing our data-plane results against control-
plane observations from BGP routing and news reports, ex-
amining both major and randomly selected events. We con-
firm our results are stable from two different locations and
over more than one and half years of observations. We show
that our approach of probing all addresses in a /24 block is
significantly more accurate than prior approaches that use a
single representative for all routed blocks, cutting the num-
ber of mistake outage observations from 44% to under 1%.
We use our approach to study several large outages such as
those mentioned above. We also develop a general estimate
for how much of the Internet is regularly down, finding about
0.3% of the Internet is likely to be unreachable at any time.
By providing a baseline estimate of Internet outages, our
work lays the groundwork to evaluate ISP reliability.

Citation: Lin Quan and John Heidemann. Detecting Internet Outages with Active Probing. Technical Report N. ISI-TR-672. USC/Information Sciences Institute, May 2011. http://ftp://ftp.isi.edu/isi-pubs/tr-672.pdf

Categories
Papers Publications

new conference paper “Low-Rate, Flow-Level Periodicity Detection” at Global Internet 2011

Visualization of low-rate periodicity, before and after installation of a keylogger.  [Bartlett11a, figure 3]
Visualization of low-rate periodicity, before and after installation of a keylogger. [Bartlett11a, figure 3]
The paper “Low-Rate, Flow-Level Periodicity Detection”, by Genevieve Bartlett, John Heidemann, and Christos Papadopoulos is being presented at IEEE Global Internet 2011 in Shanghai, China this week. The full text is available at http://www.isi.edu/~johnh/PAPERS/Bartlett11a.pdf.

The abstract summarizes the work:

As desktops and servers become more complicated, they employ an increasing amount of automatic, non-user initiated communication. Such communication can be good (OS updates, RSS feed readers, and mail polling), bad (keyloggers, spyware, and botnet command-and-control), or ugly (adware or unauthorized peer-to-peer applications). Communication in these applications is often regular, but with very long periods, ranging from minutes to hours. This infrequent communication and the complexity of today’s systems makes these applications difficult for users to detect and diagnose. In this paper we present a new approach to identify low-rate periodic network traffic and changes in such regular communication. We employ signal-processing techniques, using discrete wavelets implemented as a fully decomposed, iterated filter bank. This approach not only detects low-rate periodicities, but also identifies approximate times when traffic changed. We implement a self-surveillance application that externally identifies changes to a user’s machine, such as interruption of periodic software updates, or an installation of a keylogger.

The datasets used in this paper are available on request, and through PREDICT.

An expanded version of the paper is available as a technical report “Using low-rate flow periodicities in anomaly detection” by Bartlett, Heidemann and Papadopoulos. Technical Report ISI-TR-661, USC/Information Sciences Institute, Jul 2009. http://www.isi.edu/~johnh/PAPERS/Bartlett09a.pdf

Categories
Papers Publications

Paper at Global Internet 2010

Chris Wilcox presented a paper titled “Correlating Spam Activity with IP Address Characteristics” In Global Inernet 2010. The paper uses Lander survey data as well as spam data from eSoft.

Abstract: It is well known that spam bots mostly utilize compromised machines with certain address characteristics, such as dynamically allocated addresses, machines in specific geographic areas and IP ranges from AS’ with more tolerant spam policies. Such machines tend to be less diligently administered and may exhibit less stability, more volatility, and shorter uptimes. However, few studies have attempted to quantify how such spambot address characteristics compare with non-spamming hosts.
Quantifying these characteristics may help provide important information for comprehensive spam mitigation.
We use two large datasets, namely a commercial blacklist
and an Internet-wide address visibility study to quantify address characteristics of spam and non-spam networks. We find that spam networks exhibit significantly less availability and uptime, and higher volatility than non-spam networks. In addition, we conduct a collateral damage study of a common practice where an ISP blocks the entire /24 prefix if spammers are detected in that range. We find that such a policy blacklists a significant portion of legitimate mail servers belonging to the same prefix.

Categories
Papers Publications

Paper at NPSec

Steve DiBenedetto presented a paper titled “Fingerprinting Custom Botnet Protocol Stacks” at NPSec 2010, in Kyoto Japan.

Categories
Presentations

New Video About Address Utilization and Allocations on Map Browser

The ANT project released a video describing Internet address allocation and how we study address utilization with IPv4 censuses. Aniruddh Rao prepared this video, working with John Heidemann and Xue Cai.

a scene from the ANT video describing address allocation and census taking

We have also updated our web-based IPv4 address browser to provide information about to what organizations each address block is allocated. The map now visualizes the whois allocation data; we thank the five regional internet registries for sharing this data with us and authorizing this visualization.

organizations in our Internet map

Finally, our web-based IPv4 address browser now has better time travel, with nearly 30 different census from Dec. 2005 to Nov. 2010, and we continue to update the map regularly.

Data collection for this work is through the LANDER project, and the map browser improvements are due to AMITE, both supported by DHS. Video preparation was supported by these projects and NSF through the MADCAT project.

Categories
Papers Publications

New conference paper “Selecting Representative IP Addresses for Internet Topology Studies” to appear at IMC

The paper “Selecting Representative IP Addresses for Internet Topology Studies” (available at http://www.isi.edu/~xunfan/research/Fan10a.pdf) was accepted to appear at the ACM Internet Measurement Conference 2010 in Melbourne, Australia.

From the abstract:

An Internet hitlist is a set of addresses that cover and can represent the the Internet as a whole. Hitlists have long been used in studies of Internet topology, reachability, and performance, serving as the destinations of traceroute or performance probes. Most early topology studies used manually generated lists of prominent addresses, but evolution and growth of the Internet make human maintenance untenable. Random selection scales to today’s address space, but most andom addresses fail to respond. In this paper we present what we believe is the first automatic generation of hitlists informed censuses of Internet addresses. We formalize the desirable characteristics of a hitlist: reachability, each representative responds to pings; completeness, they cover all the allocated IPv4 address space; and stability, list evolution is minimized when possible. We quantify the accuracy of our automatic hitlists, showing that only one-third of the Internet allows informed selection of representatives. Of informed representatives, 50–60% are likely to respond three months later, and we show that causes for non-responses are likely due to dynamic addressing (so no stable representative exists) or firewalls. In spite of these limitations, we show that the use of informed hitlists can add 1.7 million edge links (a 5% growth) to traceroute-based Internet topology studies. Our hitlists are available free-of-charge and are in use by several other research projects.

Citation: Xun Fan and John Heidemann. Selecting Representative IP Addresses for Internet Topology Studies. To appear in Proceedings of the ACM Internet Measurement Conference (IMC). Melbourne, Australia, ACM. November, 2010. http://www.isi.edu/~johnh/PAPERS/Fan10a.html