Categories
Papers Publications

New conference paper:  Inferring Changes in Daily Human Activity from Internet Response

Our new paper “Inferring Changes in Daily Human Activity from Internet Response” will appear at The 2023 Internet Measurement Conference (IMC 2023).

From the abstract:

Network traffic is often diurnal, with some networks peaking during the workday and many homes during evening streaming hours. Monitoring systems consider diurnal trends for capacity planning and anomaly detection. In this paper, we reverse this inference and use diurnal network trends and their absence to infer human activity. We draw on existing and new ICMP echo-request scans of more than 5.2M /24 IPv4 networks to identify diurnal trends in IP address responsiveness. Some of these networks are change-sensitive, with diurnal patterns correlating with human activity. We develop algorithms to clean this data, extract underlying trends from diurnal and weekly fluctuation, and detect changes in that activity. Although firewalls hide many networks, and Network Address Translation often hides human trends, we show about 168k to 330k (3.3–6.4% of the 5.2M) /24 IPv4 networks are change-sensitive. These blocks are spread globally, representing some of the most active 60% of 2 × 2◦ geographic gridcells, regions that include 98.5% of ping-responsive blocks. Finally, we detect interesting changes in human activity. Reusing existing data allows our new algorithm to identify changes, such as Work-from-Home due to the global reaction to the emergence of Covid-19 in 2020. We also see other changes in human activity, such as national holidays and government-mandated curfews. This ability to detect trends in human activity from the Internet data provides a new ability to understand our world, complementing other sources of public information such as news reports and wastewater virus observation.

The human-activity changes for 2020h1 by continent. It shows the global count of downward trends in changes for each continent over six months. Although aggregated, we see several trends. First, the large percentage of changes in Asia around 2020-01-20 (at (i)) might correspond to the Spring Festival, celebrated widely in many Asian countries and regions. Most of the rest of the world showed significant changes around 2020-03-20 (at (ii) and (iii)), corresponding to initial Covid pandemic control measures.

This paper is a joint work of Xiao Song from USC, Guillermo Baltra from USC, and John Heidemann from USC/ISI. Datasets from this paper can be found at https://ant.isi.edu/datasets/ip_accumulation. This work was supported by NSF (MINCEQ, NSF 2028279; EIEIO CNS-2007106; and InternetMap (CSN-2212480).

Categories
Papers Publications

New paper: Having your Privacy Cake and Eating it Too: Platform-supported Auditing of Social Media Algorithms for Public Interest

Our new paper “Having your Privacy Cake and Eating it Too: Platform-supported Auditing of Social Media Algorithms for Public Interest” will appear at The 26th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW 2023).

From the abstract:

Overview of our proposed platform-supported framework for auditing relevance estimators while protecting the privacy of audit participants and the business interests of platforms.

Concerns of potential harmful outcomes have prompted proposal of legislation in both the U.S. and the E.U. to mandate a new form of auditing where vetted external researchers get privileged access to social media platforms. Unfortunately, to date there have been no concrete technical proposals to provide such auditing, because auditing at scale risks disclosure of users’ private data and platforms’ proprietary algorithms. We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation. The first contribution of our work is to enumerate the challenges and the limitations of existing auditing methods to implement these policies at scale. Second, we suggest that limited, privileged access to relevance estimators is the key to enabling generalizable platform-supported auditing of social media platforms by external researchers. Third, we show platform-supported auditing need not risk user privacy nor disclosure of platforms’ business interests by proposing an auditing framework that protects against these risks. For a particular fairness metric, we show that ensuring privacy imposes only a small constant factor increase (6.34x as an upper bound, and 4x for typical parameters) in the number of samples required for accurate auditing. Our technical contributions, combined with ongoing legal and policy efforts, can enable public oversight into how social media platforms affect individuals and society by moving past the privacy-vs-transparency hurdle.

A 2-minute video overview of the work can be found here.

This paper is a joint work of Basileal Imana from USC, Aleksandra Korolova from Princeton University, and John Heidemann from USC/ISI.

Categories
DNS Internet Papers Publications Uncategorized

new paper “Defending Root DNS Servers Against DDoS Using Layered Defenses” at COMSNETS 2023 (best paper!)

Our paper titled “Defending Root DNS Servers Against DDoS Using Layered Defenses” will appear at COMSNETS 2023 in January 2023. In this work, by ASM Rizvi, Jelena Mirkovic, John Heidemann, Wes Hardaker, and Robert Story, we design an automated system named DDIDD with multiple filters to handle an ongoing DDoS attack on a DNS root server. We evaluated ten real-world attack events on B-root and showed DDIDD could successfully mitigate these attack events. We released the datasets for these attack events on our dataset webpage (dataset names starting with B_Root_Anomaly).

Update in January: we are happy to announce that this paper was awarded Best Paper for COMSNETS 2023! Thanks for the recognition.

Table II from [Rizvi23a] shows the performance of each individual filter, with near-best results in bold. This table shows that one filter covers all cases, but together in DDIDD they provide very tood defense.

From the abstract:

Distributed Denial-of-Service (DDoS) attacks exhaust resources, leaving a server unavailable to legitimate clients. The Domain Name System (DNS) is a frequent target of DDoS attacks. Since DNS is a critical infrastructure service, protecting it from DoS is imperative. Many prior approaches have focused on specific filters or anti-spoofing techniques to protect generic services. DNS root nameservers are more challenging to protect, since they use fixed IP addresses, serve very diverse clients and requests, receive predominantly UDP traffic that can be spoofed, and must guarantee high quality of service. In this paper we propose a layered DDoS defense for DNS root nameservers. Our defense uses a library of defensive filters, which can be optimized for different attack types, with different levels of selectivity. We further propose a method that automatically and continuously evaluates and selects the best combination of filters throughout the attack. We show that this layered defense approach provides exceptional protection against all attack types using traces of real attacks from a DNS root nameserver. Our automated system can select the best defense within seconds and quickly reduce the traffic to the server within a manageable range while keeping collateral damage lower than 2%. We can handle millions of filtering rules without noticeable operational overhead.

This work is partially supported by the National Science
Foundation (grant NSF OAC-1739034) and DHS HSARPA
Cyber Security Division (grant SHQDC-17-R-B0004-TTA.02-
0006-I), in collaboration with NWO.

A screen capture of the presentation of the best paper award.

Categories
Internet Papers Publications Software releases

new paper “Chhoyhopper: A Moving Target Defense with IPv6” at NDSS MADWeb Workshop 2022

On April 24, 2022 we will publish a new paper titled “Chhoyhopper: A Moving Target Defense with IPv6” by A S M Rizvi and John Heidemann at the 4th Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb 2022), co-located with NDSS. We provide Chhoyhopper as an open-source tool for SSH and HTTPS—try it out!

From the abstract:

Services on the public Internet are frequently scanned, then subject to brute-force password attempts and Denial-of-Service (DoS) attacks. We would like to run such services stealthily, where they are available to friends but hidden from adversaries. In this work, we propose a discovery-resistant moving target defense named “Chhoyhopper” that utilizes the vast IPv6 address space to conceal publicly available services. The client meets the server at an IPv6 address that changes in a pattern based on a shared, pre-distributed secret and the time of day. By hopping over a /64 prefix, services cannot be found by active scanners, and passively observed information is useless after two minutes. We demonstrate our system with the two important applications—SSH and HTTPS, and make our system publicly available.

Client and server interaction in Chhoyhopper. A Client with the right secret key can only get access into the system.

Thanks: A S M Rizvi and John Heidemann’s work on this paper is supported, in part, by the DHS HSARPA Cyber Security Division via contract number HSHQDC-17-R-B0004-TTA.02-0006-I (PAADDoS), and by DARPA under Contract No. HR001120C0157 (SABRES). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or DARPA. We thank Rayner Pais who prototyped an early version of Chhoyhopper and version in IPv4 hopping over ports.

Categories
Papers Publications

new symposium paper “Visualizing Internet Measurements of Covid-19 Work-from-Home” at IEEE Symposium on REU Research in Data Science, Systems, and Security

We published a new paper “Visualizing Internet Measurements of Covid-19 Work-from-Home” by Erica Stutz (Swarthmore College), Yuri Pradkin, Xiao Song, and John Heidemann (USC/ISI) at the Symposium for REU Research in Data Science, Systems, and Security, co-located with IEEE BigData 2021.

A screenshot from our Covid-WFH website showing an event in Malaysia on 2020-04-02.
A change in Internet use seen in Malaysia on 2020-04-02, present in our Covid-WFH data but discovered through our website.

From the abstract:

The Covid-19 pandemic disrupted the world as businesses and schools shifted to work-from-home (WFH), and comprehensive maps have helped visualize how those policies changed over time and in different places. We recently developed algorithms that infer the onset of WFH based on changes in observed Internet usage. Measurements of WFH are important to evaluate how effectively policies are implemented and followed, or to confirm policies in countries with less transparent journalism.This paper describes a web-based visualization system for measurements of Covid-19-induced WFH. We build on a web-based world map, showing a geographic grid of observations about WFH. We extend typical map interaction (zoom and pan, plus animation over time) with two new forms of pop-up information that allow users to drill-down to investigate our underlying data.We use sparklines to show changes over the first 6 months of 2020 for a given location, supporting identification and navigation to hot spots. Alternatively, users can report particular networks (Internet Service Providers) that show WFH on a given day.We show that these tools help us relate our observations to news reports of Covid-19-induced changes and, in some cases, lockdowns due to other causes. Our visualization is publicly available at https://covid.ant.isi.edu, as is our underlying data.

Datasets from this work will be available from our website and can be seen now at https://covid.ant.isi.edu. We thank NSF grants 2028279 and CNS-2007106 for supporting this work.

Categories
DNS Papers Publications

New paper and talk “Institutional Privacy Risks in Sharing DNS Data” at Applied Networking Research Workshop 2021

Basileal Imana presented the paper “Institutional Privacy Risks in Sharing DNS Data” by Basileal Imana, Aleksandra Korolova and John Heidemann at Applied Networking Research Workshop held virtually from July 26-28th, 2021.

From the abstract:

We document institutional privacy as a new risk
posed by DNS data collected at authoritative servers, even
after caching and aggregation by DNS recursives. We are the
first to demonstrate this risk by looking at leaks of e-mail
exchanges which show communications patterns, and leaks
from accessing sensitive websites, both of which can harm an
institution’s public image. We define a methodology to identify queries from institutions and identify leaks. We show the
current practices of prefix-preserving anonymization of IP
addresses and aggregation above the recursive are not sufficient to protect institutional privacy, suggesting the need for
novel approaches.

Number of MX and DNSBL queries in a week-long root DNS data that can potentially leak email-related activity

The data from this paper is available upon request, please see our project page.

Categories
Papers Publications Uncategorized

new conference paper “Efficient Processing of Streaming Data using Multiple Abstractions” at IEEE Cloud

We have published a new paper “Efficient Processing of Streaming Data using Multiple Abstractions” at the IEEE Cloud 2021 conference. (to be available at https://conferences.computer.org/cloud/2021/)

We show that one framework can efficiently support multiple abstractions. We provide three abstractions of Block, Windowed, and Stateful streaming and demonstrate that many application classes can be developed with ease, correctness, and low processing latency.

From the abstract of our paper:

Large websites and distributed systems employ sophisticated analytics to evaluate successes to celebrate and problems to be addressed. As analytics grow, different teams often require different frameworks, with dozens of packages supporting with streaming and batch processing, SQL and no-SQL. Bringing multiple frameworks to bear on a large, changing dataset often create challenges where data transitions—these impedance mismatches can create brittle glue logic and performance problems that consume developer time. We propose Plumb, a meta-framework that can bridge three different abstractions to meet the needs of a large class of applications in a common workflow. Large-block streaming (Block-Streaming) is suitable for single-pass applications that care about the temporal and spatial locality. Windowed-Streaming allows applications to process a group of data and many reductions. Stateful-Streaming enables applications to keep a long-term state and always-on behavior. We show that it is possible to bridge abstractions, with a common, high-level workflow specification, while the system transitions data batch processing and block- and record-level streaming as required. The challenge in bridging abstractions is to minimize latency while allowing applications to select between sequential and parallel operation, while handling out-of-order data delivery, component failures, and providing clear semantics in the face of missing data. We demonstrate these abstractions evaluating a 10-stage workflow of DNS analytics that has been in production use with Plumb for 2 years, comparing to a brittle hand-built system that has run for more than 3 years.

This conference paper is joint work of Abdul Qadeer and  John Heidemann from USC/ISI.

Plumb is open source software and will be available at: https://ant.isi.edu/software/plumb/index.html

Update 2021-09-26: This paper was given a “special paper award” at IEEE Conference on Cloud Computing 2021! Congratulations, Abdul!

Categories
Data Papers Publications

New paper “Auditing for Discrimination in Algorithms Delivering Job Ads” at TheWebConf 2021

We published a new paper “Auditing for Discrimination in Algorithms Delivering Job Ads” by Basileal Imana (University of Southern California), Aleksandra Korolova (University of Southern California) and John Heidemann (University of Southern California/ISI) at TheWebConf 2021 (WWW ’21).

From the abstract:

Skew in the delivery of real-world ads on Facebook (FB) but not LinkedIn (LI).
Comparison of ad delivery using “Reach” (R) and “Conversion” (C) campaign objectives on Facebook. There is skew for both cases but less skew for “Reach”.

Ad platforms such as Facebook, Google and LinkedIn promise value for advertisers through their targeted advertising. However, multiple studies have shown that ad delivery on such platforms can be skewed by gender or race due to hidden algorithmic optimization by the platforms, even when not requested by the advertisers. Building on prior work measuring skew in ad delivery, we develop a new methodology for black-box auditing of algorithms for discrimination in the delivery of job advertisements. Our first contribution is to identify the distinction between skew in ad delivery due to protected categories such as gender or race, from skew due to differences in qualification among people in the targeted audience. This distinction is important in U.S. law, where ads may be targeted based on qualifications, but not on protected categories. Second, we develop an auditing methodology that distinguishes between skew explainable by differences in qualifications from other factors, such as the ad platform’s optimization for engagement or training its algorithms on biased data. Our method controls for job qualification by comparing ad delivery of two concurrent ads for similar jobs, but for a pair of companies with different de facto gender distributions of employees. We describe the careful statistical tests that establish evidence of non-qualification skew in the results. Third, we apply our proposed methodology to two prominent targeted advertising platforms for job ads: Facebook and LinkedIn. We confirm skew by gender in ad delivery on Facebook, and show that it cannot be justified by differences in qualifications. We fail to find skew in ad delivery on LinkedIn. Finally, we suggest improvements to ad platform practices that could make external auditing of their algorithms in the public interest more feasible and accurate.

This paper was awarded runner-up for best student paper at The Web Conference 2021.

The data from this paper is upon request, please see our dataset page.

This work was reported in the popular press: The InterceptMIT Technology ReviewWall Street JournalThe RegisterVentureBeatReutersThe VergeEngadgetAssociated Press.

Categories
Papers Publications

new journal paper “Plumb: Efficient Stream Processing of Multi-User Pipelines” in the Journal of Software: Practice and Experience

We have published a new journal paper “Plumb: Efficient Stream Processing of Multi-User Pipelines” in Wiley’s Journal of Software: Practice and Experience, available at https://onlinelibrary.wiley.com/doi/10.1002/spe.2909

Plumb provides a new pipeline-graph abstraction that allows multiple users to specify workflows in which Plumb can detect and elimiate duplicate processing and handle processing skew due to unbalanced data or stages. The end result is that users get their results faster and a shared cluster is efficiently utilized.

From the abstract of our journal paper:

Operational services run 24×7 and require analytics pipelines to evaluate performance. In mature services such as DNS, these pipelines often grow to many stages developed by multiple, loosely-coupled teams. Such pipelines pose two problems: first, computation and data storage may be duplicated across components developed by different groups, wasting resources. Second, processing can be skewed, with structural skew occurring when different pipeline stages need different amounts of resources, and computational skew occurring when a block of input data requires increased resources. Duplication and structural skew both decrease efficiency, increasing cost, latency, or both. Computational skew can cause pipeline failure or deadlock when resource consumption balloons; we have seen cases where pessimal traffic increases CPU requirements 6-fold. Detecting duplication is challenging when components from multiple teams evolve independently and require fault isolation. Skew management is hard due to dynamic workloads coupled with the conflicting goals of both minimizing latency and maximizing utilization. We propose Plumb, a framework to abstract stream processing as large-block streaming (LBS) for a multi-stage, multi-user workflow. Plumb users express analytics as a DAG of processing modules, allowing Plumb to integrate and optimize workflows from multiple users. Many real-world applications map to the LBS abstraction. Plumb detects and eliminates duplicate computation and storage, and it detects and addresses both structural and computational skew by tracking computation across the pipeline. We exercise Plumb using the analytics pipeline for B-Root DNS. We compare Plumb to a hand-tuned system, cutting latency to one-third the original, and requiring 39% fewer container hours, while supporting more flexible, multi-user analytics and providing greater robustness to DDoS-driven demands.

This journal paper is joint work of Abdul Qadeer and  John Heidemann from USC/ISI.

Plumb is open source software and we will be interested in beta testers. Please contact us if you think it would be useful to manage your workflows over one or a cluster of computers.

Categories
Papers Publications

new journal paper “Detecting IoT Devices in the Internet” in IEEE/ACM Transactions on Networking

We have published a new journal paper “Detecting IoT Devices in the Internet” in IEEE/ACM Transactions on Networking, available at https://www.isi.edu/~johnh/PAPERS/Guo20c.pdf

Figure 5 from [Guo20c] showing per-device-type AS penetrations from 2013 to 2018 for 16 of the 23 device types we studies (omitting 7 device types appearing in less than10 ASes)

From the abstract of our journal paper:

Distributed Denial-of-Service (DDoS) attacks launched from compromised Internet-of-Things (IoT) devices have shown how vulnerable the Internet is to largescale DDoS attacks. To understand the risks of these attacks requires learning about these IoT devices: where are they? how many are there? how are they changing? This paper describes three new methods to find IoT devices on the Internet: server IP addresses in traffic, server names in DNS queries, and manufacturer information in TLS certificates. Our primary methods (IP addresses and DNS names) use knowledge of servers run by the manufacturers of these devices. Our third method uses TLS certificates obtained by active scanning. We have applied our algorithms to a number of observations. With our IP-based algorithm, we report detections from a university campus over 4 months and from traffic transiting an IXP over 10 days. We apply our DNS-based algorithm to traffic from 8 root DNS servers from 2013 to 2018 to study AS-level IoT deployment. We find substantial growth (about 3.5×) in AS penetration for 23 types of IoT devices and modest increase in device type density for ASes detected with these device types (at most 2 device types in 80% of these ASes in 2018). DNS also shows substantial growth in IoT deployment in residential households from 2013 to 2017. Our certificate-based algorithm finds 254k IP cameras and network video recorders from 199 countries around the world.

We make operational traffic we captured from 10 IoT devices we own public at https://ant.isi.edu/datasets/iot/. We also use operational traffic of 21 IoT devices shared by University of New South Wales at http://149.171.189.1/.

This journal paper is joint work of Hang Guo and  John Heidemann from USC/ISI.