Categories
Uncategorized

congratulations to Abdul Qadeer for his PhD

I would like to congratulate Dr. Abdul Qadeer for defending his PhD at the University of Southern California in March 2021 and completing his doctoral dissertation “Efficient Processing of Streaming Data in Multi-User and Multi-Abstraction Workflows”.

From the abstract:

Abdul Qadeer after his defense.

Ever-increasing data and evolving processing needs force enterprises to scale-out expensive computational resources to prioritize processing for timely results. Teams process their organization’s data either independently or using ad hoc sharing mechanisms. Often different users start with the same data and the same initial stages (decrypt, decompress, clean, anonymize). As their workflows evolve, later stages often diverge, and different stages may work best with different abstractions. The result is workflows with some overlap, some variations, and multiple transitions where data handling changes between continuous, windowed, and per-block. The system processing this diverse, multi-user, multi-abstraction workflow should be efficient and safe, but also must cope with fault recovery.

Analytics from multiple users can cause redundant processing and data, or encounter performance anomalies due to skew. Skew arises due to static or dynamic imbalance in the workflow stages. Both redundancy and skew waste compute resources and add latency to results. When users bridge between multiple abstractions, such as from per-block processing to windowed processing, they often employ custom code. These transitions can be error prone due to corner cases, can easily add latency as an inefficiency, and custom code is often a source of errors and maintenance difficulty. We need new solutions to manage the above challenges and to expose opportunities for data sharing explicitly. Our thesis is: new methods enable efficient processing of multi-user and multi-abstraction workflows of streaming data. We present two new methods for efficient stream processing—optimizations for multi-user workflows, and multiple abstractions for application coverage and efficient bridging.

These algorithms use a pipeline-graph to detect duplication of code and data across multiple users and cleanly delineate workflow stages for skew management. The pipeline-graph is our job description language that allows developers to specify their need easily and enables our system to automatically detect duplication and manage skew. The pipeline-graph acts as a shared canvas for collaboration amongst users to extend each other’s work. To efficiently implement our deduplication and skew management algorithms, we present streaming data to processing stages as fixed-sized but large blocks. Large-blocks have low meta-data overhead per user, provide good parallelism, and help with fault recovery.

Our second method enables applications to use a different abstraction on a different workflow stage. We provide three key abstractions and show that they cover many classes of analytics and our framework can bridge them efficiently. We provide Block-Streaming, Windowed-Streaming, and Stateful-Streaming abstractions. Block-Streaming is suitable for single-pass applications that care about temporal or spatial locality. Windowed-Streaming allows applications to process accumulated data (time-aligned blocks to sync with external information) and reductions like summation, averages, or other MapReduce-style analytics. We believe our three abstractions allow many classes of analytics and enable processing of one block, many blocks, or infinite stream. Plumb allows multiple abstractions in different parts of the workflow and provides efficient bridging between them so that users could make complex analytics from individual stages without worrying about data movement.

Our methods aim for good throughput, low latency, and clean and easy-to-use support for more applications to achieve better efficiency than our prior hand-tuned but often brittle system. The Plumb framework is the implementation of our solutions and a testbed to validate them. We use real-world workloads from the B-Root DNS domain to demonstrate effectiveness of our solutions. Our processing deduplication increases throughput up to $6\times$, reduces storage by 75%, as compared to their pre-Plumb counterparts. Plumb reduces CPU wastage due to structural skew up to half and reduces latency due to computational skew by 50%. Plumb has cut per-block latency by 74% and latency of daily statistics by 97%, while reducing code size by 58% and lowering manual intervention to handle problems by 73% as compared to pre-Plumb system.

The operational use of Plumb for the B-Root service provides a multi-year validation of our design choices under many traffic conditions. Over the last three years, Plumb has processed more than 12PB of DNS packet data and daily statistics. We show that our abstractions apply to many applications in the domain of networking big-data and beyond.

Categories
Publications Students

congratulations to Lan Wei for her new PhD

I would like to congratulate Dr. Lan Wei for defending her PhD in September 2020 and completing her doctoral dissertation “Anycast Stability, Security and Latency in The Domain Name System (DNS) and Content Deliver Networks (CDNs)” in December 2020.

From the abstract:

Clients’ performance is important for both Content-Delivery Networks (CDNs) and the Domain Name System (DNS). Operators would like the service to meet expectations of their users. CDNs providing stable connections will prevent users from experiencing downloading pause from connection breaks. Users expect DNS traffic to be secure without being intercepted or injected. Both CDN and DNS operators care about a short network latency, since users can become frustrated by slow replies.


Many CDNs and DNS services (such as the DNS root) use IP anycast to bring content closer to users. Anycast-based services announce the same IP address(es) from globally distributed sites. In an anycast infrastructure, Internet routing protocols will direct users to a nearby site naturally. The path between a user and an anycast site is formed on a hop-to-hop basis—at each hop} (a network device such as a router), routing protocols like Border Gateway Protocol (BGP) makes the decision about which next hop to go to. ISPs at each hop will impose their routing policies to influence BGP’s decisions. Without globally knowing (also unable to modify) the distributed information of BGP routing table of every ISP on the path, anycast infrastructure operators are unable to predict and control in real-time which specific site a user will visit and what the routing path will look like. Also, any change in routing policy along the path may change both the path and the site visited by a user. We refer to such minimal control over routing towards an anycast service, the uncertainty of anycast routing. Using anycast spares extra traffic management to map users to sites, but can operators provide a good anycast-based service without precise control over the routing?


This routing uncertainty raises three concerns: routing can change, breaking connections; uncertainty about global routing means spoofing can go undetected, and lack of knowledge of global routing can lead to suboptimal latency. In this thesis, we show how we confirm the stability, how we confirm the security, and how we improve the latency of anycast to answer these three concerns. First, routing changes can cause users to switch sites, and therefore break a stateful connection such as a TCP connection immediately. We study routing stability and demonstrate that connections in anycast infrastructure are rarely broken by routing instability. Of all vantage points (VPs), fewer than 0.15% VP’s TCP connections frequently break due to timeout in 5s during all 17 hours we observed. We only observe such frequent TCP connection break in 1 service out of all 12 anycast services studied. A second problem is DNS spoofing, where a third-party can intercept the DNS query and return a false answer. We examine DNS spoofing to study two aspects of security–integrity and privacy, and we design an algorithm to detect spoofing and distinguish different mechanisms to spoof anycast-based DNS. We show that DNS spoofing is uncommon, happening to only 1.7% of all VPs, although increasing over the years. Among all three ways to spoof DNS–injections, proxies, and third-party anycast site (prefix hijack), we show that third-party anycast site is the least popular one. Last, diagnosing poor latency and improving the latency can be difficult for CDNs. We develop a new approach, BAUP (bidirectional anycast unicast probing), which detects inefficient routing with better routing replacement provided. We use BAUP to study anycast latency. By applying BAUP and changing peering policies, a commercial CDN is able to significantly reduce latency, cutting median latency in half from 40ms to 16ms for regional users.

Lan defended her PhD when USC was on work-from-home due to COVID-19; she is the third ANT student with a fully on-line PhD defense.

Categories
Data

fighting bit rot in log-term data archives with babarchive

As part of research at ANT we generate a lot of data, and our goal is to keep it safe even in the face of an imperfect world of data storage.

When we say a lot, we mean hundreds of terabytes: As of May 2020, we have releasable 860 datasets making up 134 TB of storage (510TB if we uncompressed it). We provide this data at no cost to researchers, and since 2008 we’ve provided 2049 datasets (338 TB, or 1.1PB if uncompressed!) to 406 researchers!

These datasets range from packet captures of “normal” traffic, to curated captures of DDoS attacks, as well as dozens of research paper-specific datasets, 16 years of Internet censuses and 7 years of Internet outages, plus target lists for IPv4 that are regularly used for traffic studies and tools like Verfploeter anycast mapping.

As part of keeping this data, our goal is to keep this data. We want to fight bit rot and data loss. That means the RAID-6 for primary storage, with monitoring and timely disk replacement. It means off site backup (with a big thanks to our collaborators at Colorado State University, Christos Papadopoulos, Craig Partridge, and Dimitrios Kounalakis for their help). And it means watching bits to make sure they don’t spontaneously change.

One might think that bits at rest stay at rest, but… not always. We’ve seen three times when disks have spontaneously changed a byte over the last 20 years. In 2011 and 2012 I had bit flips on my personal files, and in 2020 we had a byte flip on a packet capture.

How do we know? We have application-level checksums of every file, and every day we take 10 minutes to check at least one dataset against its checksums. (Over time, we cover all datasets and then start all over.)

Our checksumming software is babarchive–our own wrapper around collecting SHA-256 checksums over a directory tree. We encourage other researchers interested in long-term data curation to carry out active content monitoring (in addition to backups and RAID).

A huge thanks to our research sponsors: DHS (through the LANDER, LACREND, and LACANIC projects), NSF (through the MADCAT, MR-Net), and DARPA (through GAWSEED).

Categories
Students

congratulations to Calvin Ardi for his new PhD

I would like to congratulate Dr. Calvin Ardi for defending his PhD in April 2020 and completing his doctoral dissertation “Improving Network Security through Collaborative Sharing” in June 2020.

From the abstract:

Calvin Ardi and John Heidemann (inset), after Calvin filed his PhD dissertation.

As our world continues to become more interconnected through the
Internet, cybersecurity incidents are correspondingly increasing in
number, severity, and complexity. The consequences of these attacks
include data loss, financial damages, and are steadily moving from the
digital to the physical world, impacting everything from public
infrastructure to our own homes. The existing mechanisms in
responding to cybersecurity incidents have three problems: they
promote a security monoculture, are too centralized, and are too slow.


In this thesis, we show that improving one’s network security strongly
benefits from a combination of personalized, local detection, coupled
with the controlled exchange of previously-private network information
with collaborators. We address the problem of a security monoculture
with personalized detection, introducing diversity by tailoring to the
individual’s browsing behavior, for example. We approach the problem
of too much centralization by localizing detection, emphasizing
detection techniques that can be used on the client device or local
network without reliance on external services. We counter slow
mechanisms by coupling controlled sharing of information with
collaborators to reactive techniques, enabling a more efficient
response to security events.


We prove that we can improve network security by demonstrating our
thesis with four studies and their respective research contributions
in malicious activity detection and cybersecurity data sharing. In
our first study, we develop Content Reuse Detection, an approach to
locally discover and detect duplication in large corpora and apply our
approach to improve network security by detecting “bad
neighborhoods” of suspicious activity on the web. Our second study
is AuntieTuna, an anti-phishing browser tool that implements personalized,
local detection of phish with user-personalization and improves
network security by reducing successful web phishing attacks. In our
third study, we develop Retro-Future, a framework for controlled information
exchange that enables organizations to control the risk-benefit
trade-off when sharing their previously-private data. Organizations
use Retro-Future to share data within and across collaborating organizations,
and improve their network security by using the shared data to
increase detection’s effectiveness in finding malicious activity.
Finally, we present AuntieTuna2.0 in our fourth study, extending the proactive
detection of phishing sites in AuntieTuna with data sharing between friends.
Users exchange previously-private information with collaborators to
collectively build a defense, improving their network security and
group’s collective immunity against phishing attacks.

Calvin defended his PhD when USC was on work-from-home due to COVID-19; he is the second ANT student with a fully on-line PhD defense.

Categories
Papers Publications

new conference paper “LDplayer: DNS Experimentation at Scale” at ACM IMC 2018

We have published a new paper LDplayer: DNS Experimentation at Scale by Liang Zhu and John Heidemann, in the ACM Internet Measurements Conference (IMC 2018) in Boston, Mass., USA.

Figure 14a: Evaluation of server memory with different TCP timeouts and minimal RTT (<1 ms). Trace: B-Root-17a. Protocol: TLS

From the abstract:

DNS has evolved over the last 20 years, improving in security and privacy and broadening the kinds of applications it supports. However, this evolution has been slowed by the large installed base and the wide range of implementations. The impact of changes is difficult to model due to complex interactions between DNS optimizations, caching, and distributed operation. We suggest that experimentation at scale is needed to evaluate changes and facilitate DNS evolution. This paper presents LDplayer, a configurable, general-purpose DNS experimental framework that enables DNS experiments to scale in several dimensions: many zones, multiple levels of DNS hierarchy, high query rates, and diverse query sources. LDplayer provides high fidelity experiments while meeting these requirements through its distributed DNS query replay system, methods to rebuild the relevant DNS hierarchy from traces, and efficient emulation of this hierarchy on minimal hardware. We show that a single DNS server can correctly emulate multiple independent levels of the DNS hierarchy while providing correct responses as if they were independent. We validate that our system can replay a DNS root traffic with tiny error (± 8 ms quartiles in query timing and ± 0.1% difference in query rate). We show that our system can replay queries at 87k queries/s while using only one CPU, more than twice of a normal DNS Root traffic rate. LDplayer’s trace replay has the unique ability to evaluate important design questions with confidence that we capture the interplay of caching, timeouts, and resource constraints. As an example, we demonstrate the memory requirements of a DNS root server with all traffic running over TCP and TLS, and identify performance discontinuities in latency as a function of client RTT.

Categories
Software releases

release of the cryptopANT library for IP address anonymization

cryptopANT v1.0 (stable) has been released (available at https://ant.isi.edu/software/cryptopANT/)

cryptopANT is a C library for IP address anonymization using crypto-PAn algorithm, originally defined by Georgia Tech. The library supports anonymization and de-anonymization (provided you possess a secret key) of IPv4, IPv6, and MAC addresses. The software release includes sample utilities that anonymize IP addresses in text, but we expect most use of the library will be as part of other programs. The Crypto-PAn anonymization scheme was developed by Xu, Fan, Ammar, and Moon at Georgia Tech and described in“Prefix-Preserving IP Address Anonymization”, Computer Networks, Volume 46, Issue 2, 7 October 2004, Pages 253-272, Elsevier. Our library is independent (and not binary compatible) of theirs.

Despite this being the first release as a library, the code has been in use for more than 10 years in other tools.  It had been part of our other software packages, such as dag_scrubber for years.  By popular request, we’re finally releasing it as a separate package.

The library is packaged with an example binary (scramble_ips) that can be used to anonymize text ips.

See also the crypto-PAn page at Georgia Tech here.

Categories
Announcements Projects

new project “Interactive Internet Outages Visualization to Assess Disaster Recovery”

We are happy to announce a new project, Interactive Internet Outages Visualization to Assess Disaster Recovery.   This project is supporting the use of Internet outage measurements to help understand and recover from natural disasters. It will expand on the visualization of Internet outages found at https://ant.isi.edu/outage/world/.

This visualization was initially seeded by a Michael Keston research grant here at ISI, and the outage measurement techniques and ongoing data collection has been developed with the support of DHS (the LANDER-2007, LACREND, LACANIC, and Retro-future Bridge and Outages projects).

Categories
Papers Publications

new conference paper “Detecting ICMP Rate Limiting in the Internet” in PAM 2018

We have published a new conference “Detecting ICMP Rate Limiting in the Internet” in PAM 2018 (the Passive and Active Measurement Conference) in Berlin, Germany.

Figure 4 from [Guo18a] Confirming a block is rate limited with additional probing
Figure 4 from [Guo18a] confirming a bock is rate limited, comparing experimental results with models of rate-limited and non-rate-limited behavior.
From the abstract of our conference paper:

Comparing model and experimental effects of rate limiting (Figure 4 from [Guo18a] )
ICMP active probing is the center of many network measurements. Rate limiting to ICMP traffic, if undetected, could distort measurements and create false conclusions. To settle this concern, we look systematically for ICMP rate limiting in the Internet. We create FADER, a new algorithm that can identify rate limiting from user-side traces with minimal new measurement traffic. We validate the accuracy of FADER with many different network configurations in testbed experiments and show that it almost always detects rate limiting. With this confidence, we apply our algorithm to a random sample of the whole Internet, showing that rate limiting exists but that for slow probing rates, rate-limiting is very rare. For our random sample of 40,493 /24 blocks (about 2% of the responsive space), we confirm 6 blocks (0.02%!) see rate limiting at 0.39 packets/s per block. We look at higher rates in public datasets and suggest that fall-off in responses as rates approach 1 packet/s per /24 block is consistent with rate limiting. We also show that even very slow probing (0.0001 packet/s) can encounter rate limiting of NACKs that are concentrated at a single router near the prober.

Datasets we used in this paper are all public. ISI Internet Census and Survey data (including it71w, it70w, it56j, it57j and it58j census and survey) are available at https://ant.isi.edu/datasets/index.html. ZMap 50-second experiments data are from their WOOT 14 paper and can be obtained from ZMap authors upon request.

This conference report is joint work of Hang Guo and  John Heidemann from USC/ISI.

Categories
DNS Papers Presentations Publications

New paper and talk “Enumerating Privacy Leaks in DNS Data Collected above the Recursive” at NDSS DNS Privacy Workshop 2018

Basileal Imana presented the paper “Enumerating Privacy Leaks in DNS Data Collected  above the Recursive” at NDSS DNS Privacy Workshop in San Diego, California, USA on February 18, 2018. Talk slides are available at https://ant.isi.edu/~imana/presentations/Imana18b.pdf and paper is available at  https://ant.isi.edu/~imana/papers/Imana18a.pdf, or can be found at the DNS privacy workshop page.

From the abstract:

Threat model for enumerating leaks above the recursive (left). Percentage of four categories of queries containing IPv4 addresses in their QNAMEs. (right)

As with any information system consisting of data derived from people’s actions, DNS data is vulnerable to privacy risks. In DNS, users make queries through recursive resolvers to authoritative servers. Data collected below (or in) the recursive resolver directly exposes users, so most prior DNS data sharing focuses on queries above the recursive resolver. Data collected above a recursive resolver has largely been seen as posing a minimal privacy risk since recursive resolvers typically aggregate traffic for many users, thereby hiding their identity and mixing their traffic. Although this assumption is widely made, to our knowledge it has not been verified. In this paper we re-examine this assumption for DNS traffic above the recursive resolver. First, we show that two kinds of information appear in query names above the recursive resolver: IP addresses and sensitive domain names, such as those pertaining to health, politics, or personal or lifestyle information. Second, we examine how often these classes of potentially sensitive names appear in Root DNS traffic, using 48 hours of B-Root data from April 2017.

This is a joint work by Basileal Imana (USC), Aleksandra Korolova (USC) and John Heidemann (USC/ISI).

The DITL dataset (ITL_B_Root-20170411) used in this work is available from DHS IMPACT, the ANT project, and through DNS-OARC.

Categories
Announcements In-the-news

news story about measuring Internet outages

PCMag released a news story on January 3, 2018 about our measuring Internet outages, including discussion about the 2017 hurricanes like Irma, and our new worldwide outage browser.