Page:On the Robustness of Topics API to a Re-Identification Attack.pdf/12

 how different parameter choices impact the probability of re-identify users, choosing the best combination.


 * The profile of a user $$\mathcal{P}_{u,e}$$ is currently populated by the 𝑧top topics in epoch $$e$$. It could be worth considering other approaches to build the profile, and better balance the utility of the advertisers and the privacy of the users.

Future directions. Our work provides a first study on how reidentification attacks can be carried out against the Topics API, and several angles are still to be explored. In fact, our experiments can be easily extended in several directions.

First, we rely on a dataset collected from the set of volunteers that participated in the EasyPIMS experimentation. As such, we cannot verify the dataset is representative of general human behaviour. The process of gathering such kind of personal data is cumbersome, but a larger and more heterogeneous audience may help in drawing more solid conclusions. In a similar direction, it is interesting to evaluate more diverse population generation approaches, including diverse usage patterns, classes of users, etc. Similarly, the study of the Topics API parameters can be extended to balance the utility and privacy of exposed information.

Other research directions include the extension of the threat model. The attacker could consider more sophisticated techniques to match the profiles rather than a simplistic exact match on the presence of a topic. For instance, the attacker could leverage the frequencies with which topics are exposed. Or can design some maximum-likelihood algorithm to maximize the correct reidentification probability. In the literature, several methodologies have been proposed that can be reused for this goal.

Finally, we suppose the attacker has no background knowledge of the victims. As said above, this assumption can be relaxed to study to what extent any additional information on the user (e.g., retrieved through browser fingerprinting techniques) can help the attacker.

The Topics API is a novel proposal by Google, and we believe the research community should further work to understand the implications of its design, as it might become the de facto standard for online advertising in the near future.

This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU and the project "National Center for HPC, Big Data and Quantum Computing", CN00000013 (Bando M42C – Investimento 1.4 – Avviso Centri Nazionali” – D.D. n. 3138 of 16.12.2021, funded with MUR Decree n. 1031 of 17.06.2022).

[1] Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, and Claudia Diaz. 2014. The web never forgets: Persistent tracking mechanisms in the wild. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. 674–689.

[2] Lada A Adamic and Bernardo A Huberman. 2000. Power-law distribution of the world wide web. science 287, 5461 (2000), 2115–2115.

[3] Alex Berke and Dan Calacci. 2022. Privacy Limitations of Interest-Based Advertising on The Web: A Post-Mortem Empirical Analysis of Google’s FLoC. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security(Los Angeles, CA, USA) (CCS ’22). Association for Computing Machinery, New York, NY, USA, 337–349.

[4] California State Legislature. 2018.California Consumer Privacy Act of 2018. https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180AB375 (Last accessed September 6, 2021).

[5] CJ Carey, Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Shankar Kumar, Andres Munoz Medina, Vahab Mirrokni, Gabriel Henrique Nunes, Sergei Vassilvitskii, and Peilin Zhong. 2023. Measuring Re-identification Risk. arXiv:2304.07210 [cs.CR]

[6] Luc Devroye. 1986. Sample-based non-uniform random variate generation. In Proceedings of the 18th conference on Winter simulation. 260–265.

[7] Steven Englehardt and Arvind Narayanan. 2016. Online tracking: A 1-million-site measurement and analysis. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 1388–1401.

[8] Alessandro Epasto, Andres Munoz Medina, Christina Ilvento, and Josh Karlin. 2022. Measures of cross-site re-identification risk: An analysis of the Topics API Proposal. https://github.com/patcg-individual-drafts/topics/blob/main/topics_analysis.pdf (Last accessed February 27, 2023).

[9] Alessandro Epasto, Andrés Muñoz Medina, Steven Avery, Yijian Bai, Robert BusaFekete, CJ Carey, Ya Gao, David Guthrie, Subham Ghosh, James Ioannidis, Junyi Jiao, Jakub Lacki, Jason Lee, Arne Mauser, Brian Milch, Vahab Mirrokni, Deepak Ravichandran, Wei Shi, Max Spero, Yunting Sun, Umar Syed, Sergei Vassilvitskii, and Shuo Wang. 2021. Clustering for Private Interest-Based Advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (Virtual Event, Singapore) (KDD ’21). Association for Computing Machinery, New York, NY, USA, 2802–2810.

[10] José Estrada-Jiménez, Javier Parra-Arnau, Ana Rodríguez-Hoyos, and Jordi Forné. 2017. Online advertising: Analysis of privacy threats and protection approaches. Computer Communications 100 (2017), 32–51.

[11] European Parliament and Council of European Union. 2016. Directive 95/46/EC. General Data Protection Regulation. http://data.consilium.europa.eu/doc/document/ST-5419-2016-INIT/en/pdf (Last accessed February 27, 2023).

[12] Stephen Farrell and Hannes Tschofenig. 2014. Pervasive Monitoring Is an Attack. RFC 7258. https://doi.org/10.17487/RFC7258

[13] Dominik Herrmann, Christian Banse, and Hannes Federrath. 2013. Behaviorbased tracking: Exploiting characteristic patterns in DNS traffic. Computers & Security 39 (2013), 17–33.

[14] Nikhil Jha, Martino Trevisan, Luca Vassio, Marco Mellia, Stefano Traverso, Alvaro Garcia-Recuero, Nikolaos Laoutaris, Amir Mehrjoo, Santiago Andrés Azcoitia, Ruben Cuevas Rumin, et al. 2022. A PIMS Development Kit for New Personal Data Platforms. IEEE Internet Computing 26, 3 (2022), 79–84.

[15] Jonathan R Mayer and John C Mitchell. 2012. Third-party web tracking: Policy and technology. In 2012 IEEE symposium on security and privacy. IEEE, 413–427.

[16] Hassan Metwalley, Stefano Traverso, Marco Mellia, Stanislav Miskovic, and Mario Baldi. 2015. The online tracking horde: a view from passive measurements. In International Workshop on Traffic Monitoring and Analysis. Springer, 111–125.

[17] Lukasz Olejnik, Claude Castelluccia, and Artur Janc. 2012. Why Johnny Can’t Browse in Peace: On the Uniqueness of Web Browsing History Patterns. In 5th Workshop on Hot Topics in Privacy Enhancing Technologies (HotPETs 2012). Spain.

[18] Emmanouil Papadogiannakis, Panagiotis Papadopoulos, Nicolas Kourtellis, and Evangelos P. Markatos. 2021. User Tracking in the Post-Cookie Era: How Websites Bypass GDPR Consent to Track Users. Association for Computing Machinery, New York, NY, USA, 2130–2141.

[19] Enric Pujol, Oliver Hohlfeld, and Anja Feldmann. 2015. Annoyed users: Ads and ad-block usage in the wild. In Proceedings of the 2015 Internet Measurement Conference. 93–106.

[20] Audrey Randall, Peter Snyder, Alisha Ukani, Alex C. Snoeren, Geoffrey M. Voelker, Stefan Savage, and Aaron Schulman. 2022. Measuring UID Smuggling in the Wild. In Proceedings of the 22nd ACM Internet Measurement Conference (Nice, France). Association for Computing Machinery, New York, NY, USA, 230–243.

[21] Deepak Ravichandran and S Vasilvitskii. 2021. Evaluation of cohort algorithms for the FLoC API. https://github.com/google/ads-privacy/raw/master/proposals/FLoC/FLOC-Whitepaper-Google.pdf (Last accessed February 27, 2023). Google Research & Ads white paper (2021).

[22] Eric Rescorla and Martin Thomson. 2021. Technical comments on FLoC privacy. https://mozilla.github.io/ppa-docs/floc_report.pdf (Last accessed February 27, 2023). (2021).

[23] Valentino Rizzo, Stefano Traverso, and Marco Mellia. 2021. Unveiling web fingerprinting in the wild via code mining and machine learning. Proceedings on Privacy Enhancing Technologies 2021, 1 (2021), 43–63.

[24] Janice C Sipior, Burke T Ward, and Ruben A Mendoza. 2011. Online privacy concerns associated with cookies, flash cookies, and web beacons. Journal of internet commerce 10, 1 (2011), 1–16.

[25] Martin Thomson. 2023. A Privacy Analysis of Google’s Topics Proposal. https://mozilla.github.io/ppa-docs/topics.pdf (Last accessed February 27, 2023). (2023).

[26] Florian Turati. 2022. Analysing and exploiting Google’s FLoC advertising proposal. Master’s thesis. ETH Zurich, Department of Computer Science.

[27] Luca Vassio, Danilo Giordano, Martino Trevisan, Marco Mellia, and Ana Paula Couto da Silva. 2017. Users’ Fingerprinting Techniques from TCP Traffic. 12