Categories
Papers Publications

new conference best paper “External Evaluation of Discrimination Mitigation Efforts in Meta’s Ad Delivery”

Our new paper “External Evaluation of Discrimination Mitigation Efforts in Meta’s Ad Delivery” (PDF) will appear at The eighth annual ACM FAccT conference (FAccT 2025) being held from June 23-26, 2025 in Athens, Greece.

We are happy to note that this paper was awarded Best Paper, one of the three best paper awards at FAccT 2025!

Comparision of total reach and cost per 1000 reach with and without VRS enabled (Figure 5a)

From the abstract:

The 2022 settlement between Meta and the U.S. Department of Justice to resolve allegations of discriminatory advertising resulted is a first-of-its-kind change to Meta’s ad delivery system aimed to address algorithmic discrimination in its housing ad delivery. In this work, we explore direct and indirect effects of both the settlement’s choice of terms and the Variance Reduction System (VRS) implemented by Meta on the actual reduction in discrimination. We first show that the settlement terms allow for an implementation that does not meaningfully improve access to opportunities for individuals. The settlement measures impact of ad delivery in terms of impressions, instead of unique individuals reached by an ad; it allows the platform to level down access, reducing disparities by decreasing the overall access to opportunities; and it allows the platform to selectively apply VRS to only small advertisers. We then conduct experiments to evaluate VRS with real-world ads, and show that while VRS does reduce variance, it also raises advertiser costs (measured per-individuals-reached), therefore decreasing user exposure to opportunity ads for a given ad budget. VRS thus passes the cost of decreasing variance to advertisers}. Finally, we explore an alternative approach to achieve the settlement goals, that is significantly more intuitive and transparent than VRS. We show our approach outperforms VRS by both increasing ad exposure for users from all groups and reducing cost to advertisers, thus demonstrating that the increase in cost to advertisers when implementing the settlement is not inevitable. Our methodologies use a black-box approach that relies on capabilities available to any regular advertiser, rather than on privileged access to data, allowing others to reproduce or extend our work.

All data in this paper is publicly available to researchers at our datasets webpage.

This paper is a joint work of Basileal Imana, Zeyu Shen, and Aleksandra Korolova from Princeton University, and John Heidemann from USC/ISI. This work was supported in part by NSF grants CNS-1956435, CNS-2344925, and CNS-2319409.

Categories
Papers Publications

new conference paper “Auditing for Bias in Ad Delivery Using Inferred Demographic Attributes”

Our new paper “Auditing for Bias in Ad Delivery Using Inferred Demographic Attributes” (PDF) will appear at The eighth annual ACM FAccT conference (FAccT 2025) being held from June 23-26, 2025 in Athens, Greece.

Testing sensitivity to detecting ad delivery skew with and without accounting for error in inferred attributes (Figure 3)

From the abstract:

Auditing social-media algorithms has become a focus of public-interest research and policymaking to ensure their fairness across demographic groups such as race, age, and gender in consequential domains such as the presentation of employment opportunities. However, such demographic attributes are often unavailable to auditors and platforms. When demographics data is unavailable, auditors commonly infer them from other available information. In this work, we study the effects of inference error on auditing for bias in one prominent application: black-box audit of ad delivery using paired ads. We show that inference error, if not accounted for, causes auditing to falsely miss skew that exists. We then propose a way to mitigate the inference error when evaluating skew in ad delivery algorithms. Our method works by adjusting for expected error due to demographic inference, and it makes skew detection more sensitive when attributes must be inferred. Because inference is increasingly used for auditing, our results provide an important addition to the auditing toolbox to promote correct audits of ad delivery algorithms for bias. While the impact of attribute inference on accuracy has been studied in other domains, our work is the first to consider it for black-box evaluation of ad delivery bias, when only aggregate data is available to the auditor.

This paper is a joint work of Basileal Imana and Aleksandra Korolova from Princeton University, and John Heidemann from USC/ISI. This work was supported in part by NSF grants CNS-1956435, CNS-2344925, and CNS-2319409.

Categories
Papers Publications

New conference paper: Having your Privacy Cake and Eating it Too: Platform-supported Auditing of Social Media Algorithms for Public Interest

Our new paper “Having your Privacy Cake and Eating it Too: Platform-supported Auditing of Social Media Algorithms for Public Interest” will appear at The 26th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW 2023).

From the abstract:

Overview of our proposed platform-supported framework for auditing relevance estimators while protecting the privacy of audit participants and the business interests of platforms.

Concerns of potential harmful outcomes have prompted proposal of legislation in both the U.S. and the E.U. to mandate a new form of auditing where vetted external researchers get privileged access to social media platforms. Unfortunately, to date there have been no concrete technical proposals to provide such auditing, because auditing at scale risks disclosure of users’ private data and platforms’ proprietary algorithms. We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation. The first contribution of our work is to enumerate the challenges and the limitations of existing auditing methods to implement these policies at scale. Second, we suggest that limited, privileged access to relevance estimators is the key to enabling generalizable platform-supported auditing of social media platforms by external researchers. Third, we show platform-supported auditing need not risk user privacy nor disclosure of platforms’ business interests by proposing an auditing framework that protects against these risks. For a particular fairness metric, we show that ensuring privacy imposes only a small constant factor increase (6.34x as an upper bound, and 4x for typical parameters) in the number of samples required for accurate auditing. Our technical contributions, combined with ongoing legal and policy efforts, can enable public oversight into how social media platforms affect individuals and society by moving past the privacy-vs-transparency hurdle.

A 2-minute video overview of the work can be found here.

This paper is a joint work of Basileal Imana from USC, Aleksandra Korolova from Princeton University, and John Heidemann from USC/ISI.

Categories
Technical Report

new technical report: Having your Privacy Cake and Eating it Too: Platform-supported Auditing of Social Media Algorithms for Public Interest

We have released a new technical report: “Having your Privacy Cake and Eating it Too: Platform-supported Auditing of Social Media Algorithms for Public Interest”, available at https://arxiv.org/abs/2207.08773.

From the abstract:

Legislations have been proposed in both the U.S. and the E.U. that mandate auditing of social media algorithms by external researchers. But auditing at scale risks disclosure of users’ private data and platforms’ proprietary algorithms, and thus far there has been no concrete technical proposal that can provide such auditing. Our goal is to propose a new method for platform-supported auditing that can meet the goals of the proposed legislations. The first contribution of our work is to enumerate these challenges and the limitations of existing auditing methods to implement these policies at scale. Second, we suggest that limited, privileged access to relevance estimators is the key to enabling generalizable platform-supported auditing of social media platforms by external researchers. Third, we show platform-supported auditing need not risk user privacy nor disclosure of platforms’ business interests by proposing an auditing framework that protects against these risks. For a particular fairness metric, we show that ensuring privacy imposes only a small constant factor increase (6.34× as an upper bound, and 4× for typical parameters) in the number of samples required for accurate auditing. Our technical contributions, combined with ongoing legal and policy efforts, can enable public oversight into how social media platforms affect individuals and society by moving past the privacy-vs-transparency hurdle.

High-level overview of our proposed platform-supported framework for auditing relevance estimators while protecting the privacy of audit participants and the business interests of platforms.

This technical report is a joint work of Basileal Imana from USC, Aleksandra Korolova from Princeton University, and John Heidemann from USC/ISI.

Categories
DNS Papers Publications

New paper and talk “Institutional Privacy Risks in Sharing DNS Data” at Applied Networking Research Workshop 2021

Basileal Imana presented the paper “Institutional Privacy Risks in Sharing DNS Data” by Basileal Imana, Aleksandra Korolova and John Heidemann at Applied Networking Research Workshop held virtually from July 26-28th, 2021.

From the abstract:

We document institutional privacy as a new risk
posed by DNS data collected at authoritative servers, even
after caching and aggregation by DNS recursives. We are the
first to demonstrate this risk by looking at leaks of e-mail
exchanges which show communications patterns, and leaks
from accessing sensitive websites, both of which can harm an
institution’s public image. We define a methodology to identify queries from institutions and identify leaks. We show the
current practices of prefix-preserving anonymization of IP
addresses and aggregation above the recursive are not sufficient to protect institutional privacy, suggesting the need for
novel approaches.

Number of MX and DNSBL queries in a week-long root DNS data that can potentially leak email-related activity

The data from this paper is available upon request, please see our project page.

Categories
Data Papers Publications

New paper “Auditing for Discrimination in Algorithms Delivering Job Ads” at TheWebConf 2021

We published a new paper “Auditing for Discrimination in Algorithms Delivering Job Ads” by Basileal Imana (University of Southern California), Aleksandra Korolova (University of Southern California) and John Heidemann (University of Southern California/ISI) at TheWebConf 2021 (WWW ’21).

From the abstract:

Skew in the delivery of real-world ads on Facebook (FB) but not LinkedIn (LI).
Comparison of ad delivery using “Reach” (R) and “Conversion” (C) campaign objectives on Facebook. There is skew for both cases but less skew for “Reach”.

Ad platforms such as Facebook, Google and LinkedIn promise value for advertisers through their targeted advertising. However, multiple studies have shown that ad delivery on such platforms can be skewed by gender or race due to hidden algorithmic optimization by the platforms, even when not requested by the advertisers. Building on prior work measuring skew in ad delivery, we develop a new methodology for black-box auditing of algorithms for discrimination in the delivery of job advertisements. Our first contribution is to identify the distinction between skew in ad delivery due to protected categories such as gender or race, from skew due to differences in qualification among people in the targeted audience. This distinction is important in U.S. law, where ads may be targeted based on qualifications, but not on protected categories. Second, we develop an auditing methodology that distinguishes between skew explainable by differences in qualifications from other factors, such as the ad platform’s optimization for engagement or training its algorithms on biased data. Our method controls for job qualification by comparing ad delivery of two concurrent ads for similar jobs, but for a pair of companies with different de facto gender distributions of employees. We describe the careful statistical tests that establish evidence of non-qualification skew in the results. Third, we apply our proposed methodology to two prominent targeted advertising platforms for job ads: Facebook and LinkedIn. We confirm skew by gender in ad delivery on Facebook, and show that it cannot be justified by differences in qualifications. We fail to find skew in ad delivery on LinkedIn. Finally, we suggest improvements to ad platform practices that could make external auditing of their algorithms in the public interest more feasible and accurate.

This paper was awarded runner-up for best student paper at The Web Conference 2021.

The data from this paper is upon request, please see our dataset page.

This work was reported in the popular press: The InterceptMIT Technology ReviewWall Street JournalThe RegisterVentureBeatReutersThe VergeEngadgetAssociated Press.

Categories
DNS Papers Presentations Publications

New paper and talk “Enumerating Privacy Leaks in DNS Data Collected above the Recursive” at NDSS DNS Privacy Workshop 2018

Basileal Imana presented the paper “Enumerating Privacy Leaks in DNS Data Collected  above the Recursive” at NDSS DNS Privacy Workshop in San Diego, California, USA on February 18, 2018. Talk slides are available at https://ant.isi.edu/~imana/presentations/Imana18b.pdf and paper is available at  https://ant.isi.edu/~imana/papers/Imana18a.pdf, or can be found at the DNS privacy workshop page.

From the abstract:

Threat model for enumerating leaks above the recursive (left). Percentage of four categories of queries containing IPv4 addresses in their QNAMEs. (right)

As with any information system consisting of data derived from people’s actions, DNS data is vulnerable to privacy risks. In DNS, users make queries through recursive resolvers to authoritative servers. Data collected below (or in) the recursive resolver directly exposes users, so most prior DNS data sharing focuses on queries above the recursive resolver. Data collected above a recursive resolver has largely been seen as posing a minimal privacy risk since recursive resolvers typically aggregate traffic for many users, thereby hiding their identity and mixing their traffic. Although this assumption is widely made, to our knowledge it has not been verified. In this paper we re-examine this assumption for DNS traffic above the recursive resolver. First, we show that two kinds of information appear in query names above the recursive resolver: IP addresses and sensitive domain names, such as those pertaining to health, politics, or personal or lifestyle information. Second, we examine how often these classes of potentially sensitive names appear in Root DNS traffic, using 48 hours of B-Root data from April 2017.

This is a joint work by Basileal Imana (USC), Aleksandra Korolova (USC) and John Heidemann (USC/ISI).

The DITL dataset (ITL_B_Root-20170411) used in this work is available from DHS IMPACT, the ANT project, and through DNS-OARC.