Page:Privacy Rarely Considered, Exploring Considerations in the Adoption of Third-Party Services by Websites.pdf/6

Proceedings on Privacy Enhancing Technologies 2023(1) ensure we gained a complete picture [87]. We performed crawls from Germany, California, and India to cover possible differences between jurisdictions [11, 32, 90]. For each page, we collected all HTTP(S) requests and compared the list of found third-party ser- vices with those mentioned in the respective survey response, using the WhoTracks.me [39] categorization as a basis. Finally, we com- piled metadata on the provided websites: top-level domains (TLDs), website topics based on the McAfee Real-Time Database [53], and popularity based on the same Tranco list we used for recruitment. For data analysis we mainly rely on descriptive statistics because the variance in response counts per website functionality would cause statistical tests to often be underpowered. Where statistical tests are appropriate and possible we used Fisher's exact tests to check if differences between categories were significant and cor- rected for multiple tests with the Benjamini-Hochberg procedure.

Our results show that, as in other domains, user privacy is rarely considered in web development. Yet, we do find influence of regu- lators' guidelines for some types of functionality, and self-hosting is a prominently considered alternative to third-party use. We also find a widespread lack of awareness that third-party use implies transmission of IP addresses and device metrics to the third party.

We first describe the sample of 395 participants and 361 websites they provided to support the main part of the survey.

5.1.1 Participant Demographics and Background. Participants pre- dominantly identified as men (85.1 %; Q5-2), are most frequently in the 18-24 (33.4 %) or 25-34 (30.6%) age ranges (Q5-1), and the major- ity holds a bachelor's degree (35.2 %; Q5-3). Most reported degrees (Q5-4) were in technical fields, with the most common non-technical degree being in business/economics (10.4 %). This is consistent with demographics surveys of people working with web technologies, whose large majority are men, typically in the 24-34 age range, holding a bachelor's degree in technical fields [12, 27, 78, 94]. Participants' work with websites (Q1-2) was most frequently in a full-time position (41.8 %), though freelancing and part-time employment were also common, as was non-paid work (hobbyist 31.4%). In the last three years, participants had mostly worked on 2-5 websites (43.8 %; Q1-1). As for previous experience with the ten website functionalities (Q1-3), all but one participant reported at least one functionality, with a mean of 5.28 (sd 2.37, median 5). Experience with front-end programming or design libraries (83.0 %) and user login or authentication (80.5 %) was most common, while the fewest participants had worked with privacy plugins (29.9 %) and advertising (23.0%). Participants held on average 3.4 different website-specific roles (std 2.58, min 1, max 13, medían 3; Q2-1) and most often worked as (web) developer, programmer, or software engineer (85.3 %). Other frequently reported roles include administrator/web operator, user experience design, content creator or contributor, and product or project manager. Most participants worked alone (35.7%) or in teams of sizes 2-5 (35.7%) (Q2-2). 42.0% had received prior privacy training. The most common resources of such training were self-study (38.6% of participants with training), employer training, courses at a university or school, and other non- online courses, including certifications such as CISSP. Table 6 in Appendix D has detailed data about participants' demographics and background in their work with websites.

5.1.2 Websites Provided by Participants. In Q2-0, we asked par- ticipants to provide a website they had recently worked on that would serve as a reference for Parts 3 and 4 of the survey. Data cleaning left us with 361 unique valid websites, for which we com- piled descriptive statistics. The most frequently occurring TLDs were .com, .org, and .de, followed by domains associated with web development, such as .github.io or .dev. Thematic classifications by McAfee were available for 264 (83.8 %) domains, the most common being Business, Internet Services, and Education/Reference. 141 reg- istered domains (44.8%) appeared on the Tranco top 1-million list, with a mean ranking of 104,767 (min 5, max 958,899, std 168,620.3, median 46,695). Overall we find that participants mainly reported international sites aimed at providing services or information, but also a significant amount of smaller and/or personal sites hosted on popular platforms and a multitude of other thematic categories, creating a diverse sample of websites. Participants named 72 different countries as the seat of the com- pany behind the website (Q2-3). Coding of the open-ended answers to Q2-4 revealed that the websites were mostly targeted at a global or multi-regional audience; Table 7 in Appendix D also lists the most popular individual target regions. Almost half of the websites (44.8 %) were reported not to have a website-specific revenue model (Q2-5). On average they relied on 0.91 sources of revenue (std 1.03, mín 0, max 5, medían 1). Most common were products/services sold on websites (20.5 %), subscriptions/membership (17.5 %), and revenue streams not explicitly listed in Q2-5 (14.4 %). Table 7 in Appendix D contains the full website statistics.

To find out if privacy played a role in the decision how to integrate a desired functionality, we investigated what functionalities were present on participants' websites, whether they were integrated via first- or third-party solutions, and the underlying decision process, including considered alternatives, consulted information sources, and the people involved.

5.2.1 Integrated Functionalities. In Q2-6 we asked participants which of the ten functionalities in Table 1 were present on their website. Participants' websites used on average 5.2 of them (sd 2.3, min. 1, max. 10, medían 5). In its "present" column, Table 2 lists how often each functionality was mentioned. The numbers show that the reported prevalence of functionalities differs greatly. Most commonly used were programming or design resources (355 / 89.9% of websites), customer interaction tools (268 / 67.8 %), and web analytics (251 / 63.5 %). To assess the number of third parties the websites actually use, we combined the data collected from three server locations to ensure that no configurations dependent on visitors' IP or region biased our results. Out of 361 unique websites provided we were not able to access 10. On average, each website contacted 6.2 third-party domains (min 0, max 144, std 6.95, median 3) and 80 sites made no requests to third parties at all. 10