Page:Not Your Average App, A Large-scale Privacy Analysis of Android Browsers.pdf/2

Proceedings on Privacy Enhancing Technologie YYYY(X) methodological challenges that we address in this work. First, neither the Android OS nor app markets provide a comprehensive way to determine whether an app is a mobile browser. To address this challenge, we combine code inspection and dynamic analysis methods to identify those apps that match our definition of a mobile browser. Second, both browsers and websites expose data to other parties while in use, and simple analysis of network traffic traces is insufficient to distinguish whether the data was exposed by a browser or a website. To address this, we develop a novel browser analysis pipeline that uses webpage replays and a baseline browser to attribute data collection to either the website or the browser itself. We further use a combination of static and dynamic analysis techniques to identify which browser component (for instance, a third-party library) is responsible for data collection and gather actual evidence of these behaviors. Third, browsers can implement privacy-enhancing features to protect users from harm (e.g., use of TLS encryption, limited access to JavaScript APIs that retrieve sensitive information, blocking websites from contacting third parties), but there are limited standard benchmarks to test for them. To overcome these challenges, we build a new test suite that addresses this need, along with instrumentation and automation to capture and study browsers' behaviors automatically and at scale.

We use the above methods to analyze a dataset of 424 browsers and observe the following:


 * We find that 65% of browsers enhance privacy by blocking tracking scripts by default. Similarly, 51% of browsers block scripts that access protected JavaScript APIs.
 * We see that most browsers do not default to HTTPS (only 2% of browsers do so) and that 10% of browsers do not properly validate TLS certificates, making them vulnerable to personin-the-middle attacks.
 * We find that 63% of browsers contain at least one third-party library related to advertisement and tracking, and that these libraries are often responsible for browsers requesting "dangerous" permissions. Furthermore, our run-time behavior analysis shows that 32% browsers share PII with other parties over the Internet, and that 19% browsers do the same for browsing history. While 14% of these browsers share this information for a specific feature (such as web search APIs), we find that 3% send it alongside personal data (thus harming user privacy). We also show that browsers often share both resettable and non-resettable identifiers with third-party servers across the Internet, including four instances of ID bridging, a practice that completely defeats the use of resettable identifiers.
 * We conduct a multidimensional analysis of individual browsers to understand their overall privacy disposition. We find that few browsers uniformly improve privacy (e.g., FOSS Browser), while many exhibit multiple harms (e.g., Yandex, Baidu, Opera etc.). We also find mixed behaviors: of the 276 (65%) browsers that block tracking content, we see that 70% also allow tracking requests, 23% expose PII, 14% share browsing history, and 7% fail to validate certificates.

Our study has important implications for (1) end users who adopt a non-default web browser on Android, and for (2) automatic app analysis processes. For the former, our results can help guide their decisions about which browser to install from a privacy standpoint.

In fact, understanding the privacy risks of the mobile browser ecosystem is critical in the EU as Google Chrome is no longer the default browser on Android and EU citizens can choose any browser when configuring their device [43, 52]. For the latter, we show that existing sandboxes may be ineffective for studying the privacy risks of mobile browsers, and they can benefit from our novel methodology to analyze browser apps and gain actual visibility into their behavior. We reported our findings to Google, which is currently investigating potential corresponding breaches of their policies. We also responsibly disclosed observed security vulnerabilities to developers. To support reproducibility and foster further research in this area, we make our code and data publicly available at https://github.com/NEU-SNS/mobile-browser.

2 THREAT & PROTECTION MODELS

This section describes our privacy threat and protection models. Our models are motivated by the fact that mobile browsers occupy a privileged position in terms of the type of data they can access: (1) all Android devices are expected to have at least one mobile browser [16, 51]; (2) like other apps, mobile browsers can access permission-protected device sensors, system resources and unique identifiers; and (3) being endpoints for web traffic, they have access to all data from web browsing e.g., page content or browsing history. Note that our study focuses on browser behavior in the default mode, as opposed to "safe", "private", or "incognito" modes. Previous work has shown that these modes have their own security and privacy issues [13], and that they do not prevent browsers from collecting user data [50, 116].

2.1 Privacy Threat Model

We assume that a benign browser should render webpages, follow best practices for connection security, and adhere to data minimization principles when it comes to exposing user data. We acknowledge that there may not be such a browser, but we assume one for the sake of comparing against our threat model. We consider that a browser is "privacy harmful" if it deviates from this benign browser model by exhibiting any of the following behaviors:

Data dissemination not required for page rendering.

Browsers may collect and share sensitive data (e.g., location, unique identifiers), with first parties (the browser developer) or third parties (e.g., data brokers, advertisers, analytics companies). Such sharing might be required to implement site features (e.g., geofencing or localized search results); or used for secondary purposes (e.g., for monetization through ads and tracking).

In this paper, sensitive data includes personally identifiable information (PII) and users' browsing history. Specifically, we consider the following data types to be PII: IMEI, Advertising ID (AdID), Android ID, MAC address, and geolocation. Note that the IMEI, Android ID, and MAC addresses are non-resettable IDs, while the AdID can be reset by users. We also consider the list of installed apps to be sensitive, as it can be used to profile users [115]. For browsing history leaks, we consider cases where over half of the websites visited are transmitted to another party over the Internet. While browsers may share the visited websites as part of their functionality (e.g., URL safety checks [67, 101]), this data can