Papers Publications

new conference paper “BotTalker: Generating Encrypted, Customizable C&C Traces” in HST 2015

The paper “BotTalker: Generating Encrypted, Customizable C&C Traces” will appear at the 14th annual IEEE Symposium on Technologies for Homeland Security (HST ’15) in April 2015 (available at

From the abstract:

Encrypted botnets have seen an increasingalerts-types-breakdown-originaluse  in recent years. To enable research in detecting encrypted botnets researchers need samples of encrypted botnet traces with ground truth, which are very hard to get. Traces that are available are not customizable, which prevents testing under various controlled scenarios. To address this problem we introduce BotTalker, a tool that can be used to generate customized encrypted botnet communication traffic. BotTalker emulates the actions a bot would take to encrypt communication. It includes a highly configurable encrypted-traffic converter along with real, non- encrypted bot traces and background traffic. The converter is able to convert non-encrypted botnet traces into encrypted ones by providing customization along three dimensions: (a) selection of real encryption algorithm, (b) flow or packet level conversion, SSL emulation and (c) IP address substitution. To the best of our knowledge, BotTalk is the first work that provides users customized encrypted botnet traffic. In the paper we also apply BotTalker to evaluate the damage result from encrypted botnet traffic on a widely used botnet detection system – BotHunter and two IDS’ – Snort and Suricata. The results show that encrypted botnet traffic foils bot detection in these systems.

This work is advised by Christos Papadopoulos and supported by LACREND.

Software releases

Digit-1.1 release

Digit-1.1 has been released  (available at from 2014-11-08 16:17:45).  Digit is a DNS client side tool that can perform DNS queries via different protocols such as UDP, TCP, TLS. This tool is primarily designed to evaluate the client side latency of using DNS over TCP/TLS, as described in the technical report “T-DNS: Connection-Oriented DNS to Improve Privacy and Security” (

A README in the package has detailed instructions about how to use this software.

Publications Technical Report

new technical report “T-DNS: Connection-Oriented DNS to Improve Privacy and Security (extended)”

We released a new technical report “T-DNS: Connection-Oriented DNS to Improve Privacy and Security (extended)”, ISI-TR-2014-693, available as

From the abstract:resp_cdf_diff_key_all

DNS is the canonical protocol for connectionless UDP. Yet DNS today is challenged by eavesdropping that compromises privacy, source-address spoofing that results in denial-of-service (DoS) attacks on the server and third parties, injection attacks that exploit fragmentation, and size limitations that constrain policy and operational choices. We propose T-DNS to address these problems. It uses TCP to smoothly support large payloads and to mitigate spoofing and amplification for DoS. T-DNS uses transport-layer security (TLS) to provide privacy from users to their DNS resolvers and optionally to authoritative servers. Expectations about DNS suggest connections will balloon client latency and overwhelm server with state, but our evaluation shows costs are modest: end-to-end latency from TLS to the recursive resolver is only about 9% slower when UDP is used to the authoritative server, and 22% slower with TCP to the authoritative. With diverse traces we show that frequent connection reuse is possible (60–95% for stub and recursive resolvers, although half that for authoritative servers), and after connection establishment, we show TCP and TLS latency is equivalent to UDP. With conservative timeouts (20 s at authoritative servers and 60 s elsewhere) and conservative estimates of connection state memory requirements, we show that server memory requirements match current hardware: a large recursive resolver may have 24k active connections requiring about 3.6 GB additional RAM. We identify the key design and implementation decisions needed to minimize overhead: query pipelining, out-of-order responses, TLS connection resumption, and plausible timeouts.

This paper is a major revision of the prior technical report ISI-TR-2014-688. Since that work we have improved our understanding of the availability of TCP fast open and TLS resumption, and we have tightened our estimates on memory based on external reports (section 5.2). This additional information has allowed us to conduct additional experiments, improve our modeling, and provide a more accurate view of what is possible today; our estimates of latency and memory consumption are both lower than in our prior technical report as a result. We have also added additional information about packet size limitations (Figure 2), experiments evaluating DNSCrypt/DNSCurve (section 6.1), analysis of DTLS, and covered a broader range of RTTs in our experiments. We believe these additions strengthen our central claims: that connectionless DNS causes multiple problems and that T-DNS addresses those problems with modest increase in latency and memory suitable for current hardware.


new video “A Retrospective on an Australian Routing Event”

On 2012-02-23, hardware problems in an Australian ISP (Dodo) router caused it to announce many global routes to their ISP (Telstra), and from there to others.

The result: for 45 minutes, millions of Australians lost international Internet connectivity.

While this problem was detected and corrected in less than an hour, this kind of problem can reoccur.

In this video we show the Internet address space (IPv4) from Sydney, Australia.   Colors show estimated physical location (blue: North America, Red: Europe, Green: Asia).   Addresses map to a Hilbert Curve, and nearby addresses form squares.  White boxes show routing changes, with bursts after 02:40 UTC.

In the visualization we see there are many, many routing changes for much of Internet (the many white boxes)–evidence of routing instability in Sydney.

A copy of this video is also available at Vimeo (some system may have problems viewing the above embedded video, but Vimeo is a good alternative).

This video was made by Kaustubh Gadkari, John Heidemann, Cathie Olschanowsky, Christos Papadopoulos, Yuri Pradkin, and Lawrence Weikum at University of Southern California/Information Sciences Institute (USC/ISI) and Colorado State University/Computer Science (CSU).

This video uses software developed at USC/ISI and CSU:  Retro-future Time Travel, the LANDER IPv4 Web Address Browser, and BGPMon, the BGP logging and monitor.  Data from this video is available from BGPMon and PREDICT (or the authors).

This work was supported by DHS S&T (BGPMon, contract N66001-08-C-2028; LANDER, contract D08PC75599, admin. by SPAWAR; LACREND, contract FA8750-12-2-0344, admin. by AFRL; Retro-future, contract N66001-13-C-3001, admin. by SPAWAR), and NSF/CISE (BGPMon, grant CNS-1305404).  Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of funding and administrative agencies.

Papers Publications

new conference paper “Replay of Malicious Traffic in Network Testbeds” in IEEE Conf. on Technologies for Homeland Security (HST)

The paper “Replay of Malicious Traffic in Network Testbeds” (by Alefiya Hussain, Yuri Pradkin, and John Heidemann) will appear in the 3th IEEE Conference on Technologies for Homeland Security (HST) in Waltham, Mass. in Nov. 2013.  The paper is available at

Hussain13a_iconFrom the paper’s abstract:

In this paper we present tools and methods to integrate attack measurements from the Internet with controlled experimentation on a network testbed. We show that this approach provides greater fidelity than synthetic models. We compare the statistical properties of real-world attacks with synthetically generated constant bit rate attacks on the testbed. Our results indicate that trace replay provides fine time-scale details that may be absent in constant bit rate attacks. Additionally, we demonstrate the effectiveness of our approach to study new and emerging attacks. We replay an Internet attack captured by the LANDER system on the DETERLab testbed within two hours.

Data from the paper is available as DoS_DNS_amplification-20130617 from the authors or, and the tools are at deterlab).

Software releases

Software to Generate IP Hitlists with Hadoop Now Available

We are happy to release the set of map/reduce processing scripts that run in Hadoop to consume our Internet address censuses and output hitlists, as described in the paper “Selecting Representative IP Addresses for Internet Topology Studies“.

These scripts depend on our internal Hadoop configuration and so will require some modification to work elsewhere, but we make them available and encourage feedback about their use.


IP Geolocation in our Browsable IPv4 Map

We’re happy to announce that our browsable Internet map at now includes IP geolocation.

We plot the latitude and longitude of each IP address around the world as a specific color, placing them on our IPv4 map (the zoomable Hilbert curve).  Thus we can show how blocks of IPv4 addresses map (above) to the globe (below).

AMITE Geolocation of IPv4 as of 2012-06-28
Hue and lightness to longitude and latitude.

On the IP map, we show latitude/longitude by color.  For each address, the longitude is the hue (the colors around the rainbow), so North America is blue; South America, fuschia; Europe and Africa, red; and Asia to Australia yellow to green.  The latitude controls lightness, so things north of the equator are darker, while those south of the equator are lighter. Thus Japan is dark green, while Australia is teal, and Scandanavia is dark read, while south Africa is orange.  (We have released the source code to do this mapping with a BSD license.)

The IP map shows IP all 4 billion addresses on the Hilbert curve.  We have discussed this mapping before (see our poster).

Our IP map is zoomable and draggable, so one can look at particular regions of interest.  For example, here is 128/8, including ISI (in Los Angeles, dark blue), between UC San Diego (also dark blue) and University of Maryland (US east coast, so purple), while the Fininnish University of Helsinki is dark brown, and the Australian University of Melboure is lime green.

Annotated IPv4 geolocation

Our geolocation data comes from three sources:

All of these geolocation sources have varying levels of accuracy, however we hope that the ability to visually relate IP addresses (onthe Hilbert curve) with geolocation (via latitude and longitude as shownby color) provides a fresh look at IP addresses and their locations.

This geolocation work is due to Zi Hu, Yuri Pradkin, and John Heideman.  This work and visualization has been supported by the AMITE project through DHS, and the data (both processed geolocation results and raw data if you can improve our accuracy) will be available through the LANDER project’s datasets and the PREDICT program.


Papers Publications

new conference paper “Towards an AS-to-Organization Map” to appear at IMC

The paper “Towards an AS-to-Organization Map” was accepted by IMC’10 in Melbourne, Australia (available at

From the abstract:

An understanding of Internet topology is central to answer various questions ranging from network resilience to peer selection or data center location. While much of prior work has examined AS-level connectivity, meaningful and relevant results from such an abstract view of Internet topology have been limited. For one, semantically, AS relationships capture business relationships and not physical connectivity. Additionally, many organizations often use multiple ASes, either to implement different routing policies, or as legacies from mergers and acquisitions. In this paper, we move beyond the traditional AS graph view of the Internet to define the problem of AS-to-organization mapping. We describe our initial steps at automating the capture of the rich semantics inherent in the AS-level ecosystem where routing and connectivity intersect with organizations. We discuss preliminary methods that identify multi-AS organizations from WHOIS data and illustrate the challenges posed by the quality of the available data and the complexity of real-world organizational relationships.

Citation: Xue Cai, John Heidemann, Balachander Krishnamurthy, and Walter Willinger. Towards an AS-to-Organization Map. In Proceedings of the ACM Internet Measurement Conference, p. to appear. Melbourne, Australia, ACM. November, 2010.

Announcements Collaborations Software releases

ANT extensions for bzip2-splitting to appear in Hadoop

The ANT project is happy to announce that our extensions to Hadoop to support splitting of bzip2-compressed files have been accepted to appear in the next Hadoop release (will be 0.21.0).

Support for compression is important in map/reduce because it reduces the amount of I/O, and because important input files (for us, our Internet address censuses) are provided in compressed format.

Splitting is important in map/reduce, because splitting allows many computers to process parts of a few big files.  Since the whole point of Hadoop and map/reduce is processing big files (for us, 4GB or more) with many computers (for us, dozens to hundreds), splitting is really essential.

Until now, Hadoop did not support splitting of compressed files.  Instead, if input data was compressed, you get at most one computer per file.  Some work-arounds were possible, but basically unpleasant, and often requiring that one rewrite all the input data is some other format.

Our extensions (see HADOOP-4012 and MAPREDUCE-830, plus HADOOP-3646 that went into 0.19.0) support Hadoop execution over bzip2 files with automatic splitting.  Getting this done was trickier than one might expect:  Hadoop really wants to decide where to split files, yet bzip2 can only support splits at specific locations that are different, and users don’t care about either of these but instead only about their record boundaries.  Fortunately, we were able to align all of these constraints, and deal with the corner cases that inevitably arise.  (What if the bzip2 marker appears in normal data?  What happens when markers exactly align, or are off-by-one?)

Abdul Qadeer did this work in 2008, working with Yuri Pradkin and me (John Heidemann), and continued to work with the patch through its getting committed.  We especially thank Chris Douglas at Yahoo for shepherding patch through the Hadoop bug tracking system, including helping clean it up and add test cases.  And we thank Doug Cutting for initially suggesting bzip2 as a splittable compression scheme.

This work was supported by NSF through the MR-Net research project (CNS-0823774).