Categories
Data Papers Publications

New paper “Auditing for Discrimination in Algorithms Delivering Job Ads” at TheWebConf 2021

We published a new paper “Auditing for Discrimination in Algorithms Delivering Job Ads” by Basileal Imana (University of Southern California), Aleksandra Korolova (University of Southern California) and John Heidemann (University of Southern California/ISI) at TheWebConf 2021 (WWW ’21).

From the abstract:

Skew in the delivery of real-world ads on Facebook (FB) but not LinkedIn (LI).
Comparison of ad delivery using “Reach” (R) and “Conversion” (C) campaign objectives on Facebook. There is skew for both cases but less skew for “Reach”.

Ad platforms such as Facebook, Google and LinkedIn promise value for advertisers through their targeted advertising. However, multiple studies have shown that ad delivery on such platforms can be skewed by gender or race due to hidden algorithmic optimization by the platforms, even when not requested by the advertisers. Building on prior work measuring skew in ad delivery, we develop a new methodology for black-box auditing of algorithms for discrimination in the delivery of job advertisements. Our first contribution is to identify the distinction between skew in ad delivery due to protected categories such as gender or race, from skew due to differences in qualification among people in the targeted audience. This distinction is important in U.S. law, where ads may be targeted based on qualifications, but not on protected categories. Second, we develop an auditing methodology that distinguishes between skew explainable by differences in qualifications from other factors, such as the ad platform’s optimization for engagement or training its algorithms on biased data. Our method controls for job qualification by comparing ad delivery of two concurrent ads for similar jobs, but for a pair of companies with different de facto gender distributions of employees. We describe the careful statistical tests that establish evidence of non-qualification skew in the results. Third, we apply our proposed methodology to two prominent targeted advertising platforms for job ads: Facebook and LinkedIn. We confirm skew by gender in ad delivery on Facebook, and show that it cannot be justified by differences in qualifications. We fail to find skew in ad delivery on LinkedIn. Finally, we suggest improvements to ad platform practices that could make external auditing of their algorithms in the public interest more feasible and accurate.

This paper was awarded runner-up for best student paper at The Web Conference 2021.

The data from this paper is upon request, please see our dataset page.

This work was reported in the popular press: The InterceptMIT Technology ReviewWall Street JournalThe RegisterVentureBeatReutersThe VergeEngadgetAssociated Press.

Categories
DNS Papers Presentations Publications

New paper and talk “Enumerating Privacy Leaks in DNS Data Collected above the Recursive” at NDSS DNS Privacy Workshop 2018

Basileal Imana presented the paper “Enumerating Privacy Leaks in DNS Data Collected  above the Recursive” at NDSS DNS Privacy Workshop in San Diego, California, USA on February 18, 2018. Talk slides are available at https://ant.isi.edu/~imana/presentations/Imana18b.pdf and paper is available at  https://ant.isi.edu/~imana/papers/Imana18a.pdf, or can be found at the DNS privacy workshop page.

From the abstract:

Threat model for enumerating leaks above the recursive (left). Percentage of four categories of queries containing IPv4 addresses in their QNAMEs. (right)

As with any information system consisting of data derived from people’s actions, DNS data is vulnerable to privacy risks. In DNS, users make queries through recursive resolvers to authoritative servers. Data collected below (or in) the recursive resolver directly exposes users, so most prior DNS data sharing focuses on queries above the recursive resolver. Data collected above a recursive resolver has largely been seen as posing a minimal privacy risk since recursive resolvers typically aggregate traffic for many users, thereby hiding their identity and mixing their traffic. Although this assumption is widely made, to our knowledge it has not been verified. In this paper we re-examine this assumption for DNS traffic above the recursive resolver. First, we show that two kinds of information appear in query names above the recursive resolver: IP addresses and sensitive domain names, such as those pertaining to health, politics, or personal or lifestyle information. Second, we examine how often these classes of potentially sensitive names appear in Root DNS traffic, using 48 hours of B-Root data from April 2017.

This is a joint work by Basileal Imana (USC), Aleksandra Korolova (USC) and John Heidemann (USC/ISI).

The DITL dataset (ITL_B_Root-20170411) used in this work is available from DHS IMPACT, the ANT project, and through DNS-OARC.