Page:Privacy Rarely Considered, Exploring Considerations in the Adoption of Third-Party Services by Websites.pdf/5

 Privacy Rarely Considered: Exploring Considerations in the Adoption of Third-Party Services by Websites 2020 to identify e-mail addresses of people working on websites, as indicated by the respective commit including file extensions related to web development (.js, .php, .css, .html, .htm). Anticipating a low response rate, we sent invitations to 37,000 email addresses, in addition to 12,000 contacted during pilot testing. 4.3 Research Ethics Prior to conducting the study we looked into opportunities for eth- ical and data protection review at our institutions. At the time this study was designed, conducted, and evaluated, the authors were affiliated with Leibniz University Hannover (LUH) and Ruhr Uni- versity Bochum (RUB), both located in Germany, and the University of Michigan (U-M) in the US. RUB only had an IRB for research in psychology, which was not meant to be mandatorily consulted by security and privacy researchers. LUH's IRB only targeted project proposals, not individual research papers. The co-author from U-M did not directly work with raw response data or interact with par- tícipants and confirmed with U-M's IRB that their oversight and approval was therefore not required. Nevertheless, we followed best practices for research conduct and transparency. To ensure GDPR compliance of our study, we consulted RUB's and LUH's data protection officers. They both independently considered our study design and specifically the approach for GitHub recruitment to be covered by the GDPR's research privilege. In Q2-2 we required some participants to provide the URL of a website they had worked on, following Mhaidlí et al.'s study design [55]. We explained in the initial consent form that this data would only be used to check the website for the presence of third- party services. Participants required to fill this field were able to drop out or proceed without penalty by entering arbitrary input. Regarding recruitment, we carefully considered the implications of sending email invitations to website contacts and GitHub de- velopers at a large scale. As mentioned above, the two consulted DPOs considered this recruitment approach to be GDPR-compliant. We contacted each email address only once (i. e., we did not send any confirmations or reminders) and gave email recipients a one- click option to opt-out of further contact. Still, we received a small number of emails with negative sentiments from people who were not aware that their public GitHub commits contained their email address. Upon this feedback we put up a page on our institution's website that explained our study, why the GitHub-recruited recipi- ent's email address was visible in commits into public repositories, and what steps could be taken to hide it. Despite these efforts, one recipient filed a complaint with our state's data protection author- ity, upon which we immediately stopped recruitment via GitHub, rather than waiting for the outcome. Three months later the DPA informed us that they did not consider the GDPR's research príví- lege to apply, because GitHub users, who are often unaware of their commit email addresses being publicly available, do not expect to be contacted via these addresses for the purpose of scientific re- search. We discuss the concrete problem with GitHub's mechanics for email addresses in more detail in Section 7.4. The DPA advised us to refrain from future recruitment via public GitHub commits but did not take formal action. When we designed and launched the study, ethical concerns with recruitment via public GitHub commits were not obvious: 9 Proceedings on Privacy Enhancing Technologies 2023(1) The method was established in the community [1, 2, 26, 56, 73, 74, 92], even post-GDPR [71, 81, 84], and had passed ethical or IRB review at different universities in the US, Europe, Australia, and at the NIST Human Subjects Protection Office. As such we followed established research practice at the time, as well as sought consultation/approval regarding GDPR from two data protection officers from different institutions, who independently concluded the recruitment method to be covered by the GDPR's research privilege. In hindsight, we agree with participants' and the DPA's concerns regarding GitHub recruitment, which is why we decided to fully discuss our experience in this paper. We consider this aspect of our work a valuable lesson learned for the community in how legal or ethical assessment of established study methods can - and should evolve. Section 7.4 discusses implications for future work. We want to stress that all participants whose data is reported in this paper provided their information with informed consent, obtained both at the beginning of the survey and at the end after debriefing about the study's privacy focus. The issue pointed out by the DPA lies with the recruitment method, not with the data we received from the willing and consenting survey participants. 4.4 Data Cleaning Across all recruitment phases, 2,177 people opened the survey link, 667 proceeded past the welcome page, and 452 completed the sur- vey. Out of these, we removed 41 that had not seen Parts 3 and 4 due to a lack of reported involvement, nine who selected contra- dictory levels of involvement, and seven who provided multiple websites. To increase data quality, we examined response times. Average completion time was 20:42 minutes. We did not observe any suspicious patterns and thus did not remove any answers. This left us with a total of 395 valid responses. Two authors inspected all open-response "Other" answers and re-coded answers that matched existing closed-ended options after discussion and mutual agree- ment. For website analysis, one author inspected all provided URLS (Q2-0) and removed all answers that were not URLs (e. g., "client confidential") or could not be resolved to a website. 4.5 Data Analysis Two of the authors applied thematic analysis [10] to the answers to open-ended questions. First they independently reviewed the data to identify recurring themes and created individual codebook drafts for each question. Next, they discussed these drafts and merged them into a first joint codebook. All data was then jointly coded by both researchers, who discussed problematic cases until an agree- ment was reached, which at times required refining codes' defini- tions and scopes and, thus, revisiting previously coded answers. We did not compute inter-rater reliability, as the number of responses was small enough to not require splitting up between multiple re- searchers [54]. Each open-ended response could be assigned one or more codes, as participants often mentioned more than one relevant talking point. Appendix B contains the final codebooks. To assess to which extent participants' responses about websites' integrated functionalities matched actual practice, we checked the provided websites with OpenWPM [16]. For each provided URL, we accessed the front page, searched it for links to subpages, and visited up to 100 unique pages randomly selected from these to