Collective privacy recovery: Data-sharing coordination via decentralized artificial intelligence

Abstract Collective privacy loss becomes a colossal problem, an emergency for personal freedoms and democracy. But, are we prepared to handle personal data as scarce resource and collectively share data under the doctrine: as little as possible, as much as necessary? We hypothesize a significant privacy recovery if a population of individuals, the data collective, coordinates to share minimum data for running online services with the required quality. Here, we show how to automate and scale-up complex collective arrangements for privacy recovery using decentralized artificial intelligence. For this, we compare for the first time attitudinal, intrinsic, rewarded, and coordinated data sharing in a rigorous living-lab experiment of high realism involving >27,000 real data disclosures. Using causal inference and cluster analysis, we differentiate criteria predicting privacy and five key data-sharing behaviors. Strikingly, data-sharing coordination proves to be a win–win for all: remarkable privacy recovery for people with evident costs reduction for service providers.


Introduction
Control over sharing or giving access to personal data from pervasive devices, such as smartphones, turns out to be complex, involving critical decisions for privacy with impact on society. How to run data-intensive online services to improve everyday life without compromising personal values and freedoms? For instance, four apps [1] or spatio-temporal points [2] are enough to identify 91.2% and 95% of individuals. In practice, the data-sharing doctrine 'as little as possible, as much as necessary ' has not yet found a systematic and scalable applicability. The quality of online services is often a result of collective data-sharing decisions made by individuals consuming these services, for instance, traffic predictions using mobility data [2,3]. To achieve a minimum quality of service for a population of individuals while maximizing their privacy, a collective arrangement (i.e. coordination) Figure 1: Tragedies of data-sharing commons showing a coordination deficiency. We hypothesize that while individuals may rationally intend to share a sufficient level of data, they end sharing intrinsically an insufficient level. If rewarded, data sharing is excessive with significant privacy loss. When coordination is introduced via a trustworthy AI-based decision-support system, significant privacy is recovered while achieving the desired quality of service. These studied hypotheses are formalized into four data-sharing conditions: (i) attitudinal, (ii) intrinsic, (iii) rewarded and (iv) coordinated. calculated to improve the individual's choice: privacy or rewards, see Fig. 9b. To account for threats to validity and trace any order effects, this experimental condition is repeated twice (2 · 24 hours) by clearing the privacy-reward balance and collecting new data from sensors to share (Fig. 8). To challenge privacy preservation, the rewards are personalized by inflating and deflating the amounts based on each individual's privacy perception derived from attitudinal data sharing, see Section 1 in SI. This design choice is also expected to engage participants more effectively by rewarding the data-sharing scenarios fairly, according to their personal values [8], while discouraging dropouts. 4. Coordinated data sharing relies on the AI-based personal assistants. They use the intrinsic and rewarded data-sharing levels as discrete options to choose from (ex-post condition). Each assistant makes an optimized choice among these that recovers the collective privacy loss of the rewarded data sharing, while reducing the mismatch (discrepancy/fitness measure) between the shared and the required data by a service provider. This is a quality-of-service indicator that finds general applicability in adaptive sensor selection and flexible data fusion for several smart city and industrial applications [32,33,34]. Matching can also be applied by a coordinated data collective to preserve k-anonymity in a bottom-up way, i.e. no more than k individuals share any combination of personal data [6,12,35].
Smartphone sensor data play a pivotal role on privacy. This paper studies sharing of smartphone sensor data with five discrete choices to choose from (uniform sampling of 100% to 0% of sensor data with a step of 25%), see Fig. 9b. These choices are applied to the total sensor data collected with a fixed frequency of 30 sec (100% of data). This is a simple and general discrete-choice model that serves complexity of the experiment. It can be extended to more complex spatio-temporal models as discussed in Section 3. The study of smartphone sensor data is particularly impactful for both privacy and quality of online services. Sensor fusion has a paramount role in applications of smart homes, grids and transportation [32]. There is evidence that smartphone app developers delegate privacy to end-users as the former face challenges in providing privacy solutions at the design and implementation phase [36]. In practice though, it is the powerful data intermediaries that leverage the terms of data-sharing agreements [7,1]. Sharing smartphone sensor data can be regulated via privacy-protection mechanisms with a natural utility-driven interpretation (buysell) such as differential privacy [5]. Given the symbiotic relationship of individuals with their smartphones, capturing high-dimensional and diverse sensor data for different application scenarios, the study comes with a universal scope on privacy.
A novel approach to understanding data-sharing decisions. The performed living-lab experiment is the first of its kind: (i) It brings together all four data-sharing conditions for comparison, including the novel one of coordinated data sharing. This is distinguished from earlier survey studies and empirical observations focusing on the two dimensions of intentions vs. behavior that comprise the privacy paradox [37,38]. (ii) The experimental design uses mixed modalities to achieve rigor within a controlled lab environment as well as realism, scale and external validity by tracing behavior out of the lab using a smartphone platform developed for this purpose (see Section 4.2). (iii) The 4x4x4 factorial design results in 64 data-sharing scenarios (see Fig. 2). They involve the three data-sharing criteria that model the involved trust (data collectors) and risks (data type and context), and they are the ones that explain malleable data-sharing behaviors [15,8,39]. This large spectrum comes in contrast to earlier experiments and field tests made within a context and involving a specific data-sharing scenario such as online social lending [

Results
Three key results are illustrated in this paper: (i) Coordinated data sharing is efficient-it recovers privacy for people and reduces costs for service providers by accessing less but better quality of data. (ii) Data collector and context are the most important criteria with which individuals makes data-sharing choices. For rewarded choices with privacy loss though, the type of shared data becomes the most important criterion. (iii) Individuals exhibit five key group-behavior changes from intrinsic to rewarded data sharing. They are stable, yet reinforcing.

Coordinated data sharing recovers privacy and lowers costs
The privacy level and data-sharing quality (mismatch) are shown in Fig. 3 for the 64 data-sharing scenarios and the different experimental conditions. Fig. 4 aggregates these measurements for each of the four sensors, data collectors and contexts. The shaded areas in Fig. 3a illustrate the expected privacy level. It is derived by the mean privacy level of the sensor, collector and context that comprise each data-sharing scenario (see Section 4.3 for exact calculations).
The key observations are summarized as follows: (i) Coordinated data sharing results in significant privacy recovery ( Fig. 3a and 4a) as well as more efficient data sharing ( Fig. 3b and 4b) at a lower cost for service providers (Fig. 5). (ii) Intrinsic data sharing positively correlates to attitudinal data sharing but has a narrower range (Fig. 4a). (iii) Consecutive rewarded data sharing results in significant (and similar) privacy loss via, though, different data-sharing choices ( Fig. 3a and 4a). (iv) The privacy loss, rather than the privacy level, under rewarded data sharing is correlated to the perceived privacy sensitivity (Fig. 4a). (v) Individuals improve their privacy by sharing data with lower privacy sensitivity than when improving rewards, while they keep sharing data to privacy-intrusive collectors under privacy-intrusive contexts (Fig. 3a). Coordinated data sharing for efficiency and privacy recovery. Fig. 3b illustrates the mismatch (absolute error) between a privacy-goal signal (very low and very high privacy preservation) and the aggregated data-sharing choices made via the AI approach (both standardized). Coordinated data sharing has a lower average mismatch than intrinsic and rewarded data sharing for both goal signals: 22.8%<30.1%<40.2% for very high and 6.2%<12.1%<15.2% for very low privacy preservation respectively. With the very high privacy-preservation goal, matching is harder as there is mainly one data-sharing plan (intrinsic), out of three ones to choose from, containing data-sharing choices with high privacy preservation. On the contrary, with the very low privacy-preservation goal, mismatch is minimal by combining data-sharing plans from both the 1 st and 2 nd rewarded data-sharing conditions. This trend is also confirmed in the other three privacy-goal signals (see Fig. 11, Section 8 of SI). For the very low and very high privacy-preservation goal, health (4.7%, 16.5%) and noise (5.7%, 16.6%) show a low mismatch on average, while government (7.3%, 32.3%) and social networking (7.1%, 33.8%) show a high one, see Fig. 4b. Via coordinated data sharing, social networking shows the highest mismatch reduction of 66.6% and 45.5% under the very low and very high privacy privacy-preservation goals. The overall average privacy recovery from rewarded to coordinated data sharing is 77%. These results demonstrate the unprecedented potential of coordinated data sharing to protect privacy, while retaining a data-sharing efficiency (see also Fig. 12, Section 9 of SI illustrating different privacy-recovery valuations). Coordinated data sharing operates close to intrinsic data sharing with a minor (but significant: t(63) = 9.64, p = 1.00 × 10 −5 for the very low and t(63) = 7.81, p = 1.00 × 10 −5 for the very high privacy-preservation goal) additional privacy sacrifice that is a benefit for data-sharing efficiency and as a result, the data collective as a whole. Coordinated data sharing reduces data-collection costs. Fig. 5  (b) Data-sharing mismatch (ε, absolute error of standardized signals) between three data-sharing conditions and the privacy-preservation goal signals of very high and very low. Values are sorted from lowest to highest mismatch according to coordinated data sharing. Figure 3: Coordinated data sharing shows higher efficiency than intrinsic and rewarded data sharing. Privacy and mismatch for the 64 data-sharing scenarios. collection costs. The monetary cost of the 1 st and 2 nd rewarded data sharing for data collectors is 960.18 CHF and 905.14 CHF respectively. This cost is higher than the monetary value of the data shared intrinsically, which is 628.22 CHF. Strikingly, the cost of coordinated data sharing is on average 832.56 CHF (σ = 15.93), which is on average 10.7% lower than the rewarded data sharing. These costs include the monetary value of intrinsic data sharing. If this value is excluded assuming that this data is shared for free (as happened in the experiment), the cost drops further down to 626.77 CHF, which is on average 32.9% lower than rewarded data sharing. It is remarkable that the monetary value of coordinated data sharing is similar to the one of intrinsic, however, it yields data of higher utility for service providers. As a result, coordinated data sharing is a win-win for all: lower data collection costs for service providers, higher quality of service via improved data-sharing efficiency and significant privacy recovery for the participants of the data collective.  (b) Data-sharing mismatch between three data-sharing conditions and the privacy-preservation goal signals of very low (left) and very high (right).

Figure 4:
Privacy and data-sharing mismatch level of the different sensors, collectors and contexts under intrinsic, rewarded and coordinated data sharing. The privacy level of attitudinal data sharing is also shown. The 12 colored lines are ranked according to the privacy loss (intrinsic -1 st rewarded data sharing) and mismatch reduction (1 st rewarded data sharingcoordinated).
Attitudinal-intrinsic data sharing. Privacy preservation under intrinsic data sharing is 21.7% higher than the perceived privacy (Fig. 4a). While this difference is not significant (t(11) = −2.07, p = 0.06), the privacy levels between the 12 elements of attitudinal and intrinsic data sharing are positively correlated (R = 0.63, t(10) = 2.54, p = 0.029), despite the significant drop of 95.3% in the dispersion (variance). This result shows that data sharing operates in a narrower decision space than the perceived privacy. Social networking (0.78, 0.64) and corporation (0.64, 0.62) come with both high privacy sensitivity and preservation, while education (0.31, 0.5) and accelerometer (0.2, 0.53) show low privacy sensitivity and preservation.   Figure 5: Coordinated data sharing reduces data-collection cost 10.7%-32.9% compared to rewarded data sharing. This cost is comparable to intrinsic data sharing. Rewarded data sharing results in excessive data with 48.5% higher cost than intrinsic data sharing. Coordinated data sharing is calculated with and without the intrinsic cost. The gray points are random permutations of the initial conditions in the optimization process. Attitudinal-rewarded data sharing. Rewarded participants sacrifice privacy by 32.4% (t(11) = 2.72, p = 0.013) and 34% (t(11) = 2.85, p = 0.009) compared to attitudinal data sharing (Fig. 3a). The privacy level under the two rewarded data-sharing conditions is not correlated to the perceived privacy sensitivity (attitudinal) of the different sensors, collectors and contexts (R = 0.36, t(10) = 1.22, p = 0.24 and R = −0.39, t(10) = 1.53, p = 0.15 in Fig. 4a). Striking, though, it is the privacy loss (intrinsic-rewarded data sharing) that correlates to attitudinal data sharing (R = 0.64, t(10) = 2.64, p = 0.025 , R = 0.77, t(10) = 3.82, p = 0.0033). Which data-sharing scenarios improve privacy and rewards? Under rewards, data-sharing scenarios are automatically retrieved to fulfill participants' goal, i.e. data-sharing options with the highest improvement of privacy or rewards, see Fig. 9. Fig. 3a marks the top-5 scenarios that result in the highest mean privacy and reward gain (all ranked scenarios are presented in Fig. 8 and Table 10 of SI). The most highly privacy-gaining scenarios involve non-privacy-sensitive sensor data such as accelerometer, which are shared though with privacy-intrusive data collectors and contexts such as social networking and corporation. In contrast, the most highly reward-gaining scenarios involve privacy-sensitive sensor data such as GPS, which are also shared with the privacy-intrusive data collectors and context of social networking and corporations. These observations reveal the following: Individuals improve privacy or rewards by sharing data under privacy-sensitive contexts to privacy-intrusive collectors. Nonetheless, compared to improving rewards, individuals change to sharing data with lower privacy sensitivity when improving their privacy.

Rewarded individuals better distinguish data than collectors/contexts
Here we study the causal link between the data-sharing criteria/elements (independent variables) and the privacy/reward gains (dependent variables) in different experimental conditions. Four explanatory models based on a conjoint analysis are outlined in Section 4.5. Fig. 6a illustrates the regression coefficients of the models, while Fig. 6b shows the relative importance of the data-sharing criteria and their elements calculated from these coefficients. All models come with R 2 >0.8 and with statistically significant values of relative importance (p < 0.05) for the vast majority of data-sharing elements as shown in Table 13, Section 11 of SI. Fig. 6b also shows the perceived relative importance derived from the self-reported entry survey questions. (b) The relative importance (partworth utilities) of the data-sharing criteria and elements (relative within each criterion ) derived from the different regression models of conjoint analysis and the perceived privacy sensitivity. The data collector is the most important criterion for the models based on privacy. In contrast, the sensor type is the most important criterion for the model based on rewards gain. Figure 6: Rewarded individuals, who share data shift the importance from collectors and contexts to data. Via a conjoint analysis, four multiple linear regression models are compared. It explains how the different data-sharing criteria and elements influence different key data-sharing behaviors.
The data collector is the most important criterion (40.73% on average, Fig. 6b) for all models that predict privacy, and this criterion explains privacy loss (Fig. 6a). Context follows with a 33.91% of importance explaining privacy loss, while sensor type shows the lowest importance of 25.36%, explaining the privacy gains. The consistency of these three privacy models reveals the following: the data collectors to whom individuals share data determine to a high extent (i) the privacy level under intrinsic or coordinated data sharing and (ii) the privacy loss under rewarded data sharing. The type of data they share plays a more minor role, though a positive one for privacy preservation.
The models align well with the perception of individuals: 29.4%, 37.85% and 32.75% for sensor type, collector and context respectively (Fig. 6b). In contrast, for data-sharing choices of individuals with reward gains, the dominant criterion is the type of sensor data with a 45.4% of relative importance over the data collector and context with 24.55% and 30.01% respectively. The collectors and contexts explain loss of rewards, while the type of sensor, and in particular the GPS, explains reward gain. GPS, as a privacy-sensitive sensor, provides high gain of rewards, and individuals are likely to be accustomed with apps accessing their GPS data, which is likely to reduce privacy-preservation. Choices that improve rewards suggest a radically different decision frame than the ones that improve privacy: a shift from protecting to sharing GPS data without strongly distinguishing anymore the data collectors and contexts. Fig. 6b also provides the following observations: The relative importance of the perceived privacy sensitivity over the 12 data-sharing elements is positively correlated with all models based on privacy: R = 0.97, t(10) = 12.22, p = 2.46 × 10 −7 for rewarded data sharing, R = 0.84, t(10) = 4.87, p = 0.00066 for intrinsic−rewarded, R = 0.69, t(10) = 3.025, p = 0.013 for the coordinated data sharing and R = 0.67, t(10) = 2.89, p = 0.016 for the intrinsic one. All models come with a positive relative importance for GPS (12.67%), corporation (15.16%) and social networking (20.42%), while negative one for accelerometer (-11.85%), light (-8.9%), educational institutes (-21.52%), transportation (-6.13%) and health (-6.63%).

From intrinsic to rewarded data sharing: five behavior changes
Identifying group behaviors. Table 1 provides an exemplary of all nine possible behavioral transitions that can happen in data sharing as a result of introducing monetary rewards. A clustering and stability analysis are performed in the experimental data projected in Fig. 7a (intrinsic vs. 1 st rewarded), which reveal five robust behavioral patterns out of the 9 possible ones (similar groups are observed for intrinsic vs. 2 nd rewarded). See Section 4.6 for more information. Some individuals are oblivious to rewards. Yet, these are the ones who intrinsically share a significant amount of data (privacy ignorants and privacy neutrals) or do not share data (privacy preservers). Reward seekers increase the data-sharing level when rewarded, while reward opportunists intrinsically preserve privacy but eventually share a significant amount data when rewarded. It is astonishing that a moderate sacrifice of privacy preservation by rewards is not observed (privacy sacrificers in Table 1), meaning that rewards significantly polarize individuals to keep protecting privacy or give up significant privacy. There are also no cases observed in which rewards motivate change to privacy protection; however, rewards reinforce privacy protection for privacy preservers. Groups behavior converges to stable, while boundary ones polarize. The behavioral pattern of privacy sacrificer (Table 1) is found to be a transient one and observed within the reward opportunists during the first unique responses to the 64 data-sharing scenarios (see Fig. 7b). When though these individuals get more involved in reevaluating their decisions, they converge to a further privacy sacrifice of 30.9%. The minimum number of questions answered by all groups is 250. This incremental privacy decline in reoccurring decision-making is also observed in reward seekers and privacy ignorants that decrease their privacy level by 55  data-sharing decisions. Such a privacy increase of 8.1% is also observed for privacy neutrals. Strikingly, the two boundary behavioral patterns of privacy preservers and privacy ignorants show polarization from the very first data-sharing decisions. These individuals reinforce the privacy preservation and privacy ignorance respectively throughout the choices they make and regardless of whether these choices are the primary ones (the first 64 questions) or the reassessments (the follow up reinvoked questions). A similar behavior is documented for data sharing in social media [44,15,8], though this is the first evidence of such behavior in a broader context, involving both privacy and rewards dilemmas. How privacy sensitivity of data-sharing criteria explains group behaviors. Fig. 7c shows all group pairs and the differences between these groups in terms of how privacy sensitive they regard each data-sharing criterion (attitudinal). Statistically significant observations (p ≤ 0.05) and those close to the significance threshold are marked in Fig. 7c. These results are derived with a post hoc Tukey's range test (α = 0.05) after a one-way Analysis of Variance (ANOVA). The independent variable is calculated within the groups by the privacy change from intrinsic to rewarded data sharing. The dependent variables are the privacy sensitivity of the data-sharing criteria and their elements. Several of these criteria explain the data-sharing groups with a statistical significance (see In Fig. 7c, the data collector (p = 0.017) and the GPS sensor (p = 0.052) explain the privacysensitivity difference between reward opportunists and privacy ignorants: rewarded individuals of these groups share a significant level of data, while reward opportunists preserve privacy without rewards. Compared to privacy ignorants, reward opportunists find data collector and GPS more privacy intrusive by 24.2% and 20.4%. Similarly, the context of health (p = 0.042) and the GPS sensor (p = 0.033) explain the divergence between privacy neutrals and privacy ignorants. Privacy neutrals find these two data-sharing criteria 26.6% and 20.9% more privacy intrusive than privacy ignorants. Privacy neutrals also find sensors (p = 0.033) more privacy intrusive than reward seekers by 18%, which explains the higher data sharing of rewards seekers under rewards. Finally, the data-sharing criterion of educational institute determines when individuals share a very high or very low level of data with or without rewards: privacy preservers find the context of education (p = 0.058) 25.9% more privacy intrusive than privacy ignorants.

Discussion
The findings reveal that a significant privacy recovery is attainable within the modus operandi of a data collective. This is a radical shift from the mainstream thought of privacy as a personal value to privacy as a collective value [45], a public good shared within a community of citizens generating data. Coordinated data sharing supported by a trustworthy decentralized AI automates and scales up collective arrangements for sharing under the doctrine 'as little as possible as much as necessary'. Such optimized arrangements would be otherwise too complex and expensive to achieve in a transparent way with existing top-down privacy policies and regulations or even with automated data-access committees [46].
Findings also reveal that data collectives create tangible benefits for online service providers that collect or access data shared in a coordinated way: data collection costs drop down dramatically and data are used more purposefully to deliver the required quality of service. This can create further remarkable cost reductions such as reduced data storage, security, energy and carbon footprint costs as well as costs for solving legal disputes that are more likely to incur when dealing with excessive personal data.
Within rising information asymmetries and monopolies of knowledge in existing data markets and big tech, the capability of data collectives to coordinate data sharing at large-scale has been so far a gap [47,48]. This is underlined in promising solutions from political and economic theory such as data-owning democracy [49], digital socialism [47] and peer-to-peer digital commons [50]. Establishing data collectives at a community or municipality level can create alternative forms of data ownership and control; they can empower citizens participation based on an agenda of using digital assets for priorities such as social welfare and environmental sustainability [48,51]. These blueprints can be the basis of alternative data-market designs that encourage business models based on social innovation without over-relying on excessive free personal data. Data collectives can further benefit from scale, for instance, increasing individuals who coordinate their data-sharing decisions or increasing individuals' contributions by generating more alternative data-sharing options. The AI system based on collective learning has a higher degree of freedom to calculate data-sharing choices that match the required data and recover more privacy in larger populations [25]. It is also decentralized to make coordination more resilient to computational bottlenecks.
Science can also benefit from data collectives. They can scale up open data and citizen science initiatives, while improving the transparency and reproducibility of research. Moreover, data collectives can be a response to the current opaque models of generative AI such as ChatGPT. Selective data shared as a result of coordination can be used to train open and more transparent generative AI models, ethically aligned to community values. This could be a new type of 'curricula' for training AI, institutionalized in a bottom-up way via data collectives.
Choices under intrinsic and rewarded data sharing prioritize different criteria. Individuals better distinguish data collectors and contexts than the type of data they share. In contrast, rewarded individuals that give up privacy better distinguish the type of data they share, and in particular the GPS. Thus, rewards diminish the importance of who collects data and for what purpose. In this case, data collectors may have no competitive advantage against each other but instead excessive and irrelevant data that increase their costs and risks.
The perceived privacy sensitivity of the data-sharing criteria explains different key data-sharing behaviors (groups), for instance, individuals who do not preserve privacy vs. individuals who sacrifice privacy under rewards. Raising awareness about the privacy sensitivity of data collectors can influence data-sharing decisions. This has implications for how privacy policies and data consents are designed to be more transparent and user-friendly. Data-sharing choices that preserve and give up significant privacy tend to polarize, thus highlighting the value of privacy for individuals who have it rather than for the ones who do not [15]. Coordinated data sharing breaks this vicious cycle by redistributing the privacy cost within the individuals for the benefit of all. This demonstrates opportunities for digitally networked societies without borders to reconcile different cultural norms on privacy. Future work can unleash further opportunities to reclaim privacy in the digital age: Spatiotemporal coordinated data sharing can automate and scale up the "right to be forgotten", which improves both privacy control and the willingness to share data, e.g. 10%-18% [13]. The feasibility of collective learning using optimization scenarios in time and space are earlier demonstrated for Smart City applications [25]. Nevertheless, defining and conveying to individuals the context of data use is not always straightforward and further work is required in this area, for instance, semantics and ontologies [46]. Moreover, beyond purposeful data sharing, speculative data analysis out of a specific context can also encourage innovation and creativity. In such scenarios, data collectors may have a more significant role for trust in data-sharing decisions. The acceptance of coordinated datasharing recommendations requires a follow up study, in particular, the incentives and the interface design of the AI system for the broader population. Notwithstanding, earlier results demonstrate significant coordination capacity even when large portions of the population are not flexible [52]. The explainability of coordinated data sharing based on decentralized AI is particularly challenging and is expected to further shield the trust on data collectives.

Methods
We outline here the experimental design and the developed technical infrastructure. We also illustrate the methods with which we analyzed the experimental data and the AI-based decision-support system with which coordinated data sharing is performed.

Living-lab experimental design
A novel design for a 'living-lab' experiment is introduced. It defines a mixed-mode experiment that seamlessly integrates in participants' everyday life, while the overall experimental process is orchestrated via the controlled environment and experimental protocols of the Decision Science Laboratory (DeSciL) of ETH Zurich [53]. The proposed experiment has received ethical approval by DeSciL and the Ethics Commission at ETH Zurich (#EK 2016-N-40). To improve the realism of the experiment and comply to the non-deceiving policy of DeSciL, letters of support were collected from data collectors to confirm their interest in accessing the collected sensor data of participants. The study consist of three phases: (i) entry, (ii) core and (iii) exit. Fig. 8 provides an outline of the overall experimental process and the developed data-collection infrastructure (details are documented in Section 3 of SI). Recruitment approach and sampling biases. The living-lab experimentation involves the recruitment of 123 participants during the entry phase, out of which 116 completed the exit phase and 89 participated in all phases. Aggregated privacy-reward records for all experimental conditions is found for 84 participants. Responses to the data-sharing scenarios for all experimental conditions are found for 73 participants. In the context of this study, a higher number of participants  is particularly challenging and probably unrealistic as it requires significantly more resources for compensation/infrastructure, sacrifice of rigor, and much looser control of the experimental process. Instead, priority is given to a satisfactory compensation per participant for active participation in all experimental phases (see Section 3.5 in SI) and by incentivizing appropriately a large number of data-sharing choices: 27403 in total. Moreover, the development of a data-collection platform, including the data-access web portal and the mixed-mode experimental process, preserves an eminent realism, yet in well-controlled laboratory conditions that result at the end in a novel high-quality dataset to perform causal inference.
Participants were recruited from the DeSciL pool [54], mainly consisting of students of ETH Zurich and University of Zurich (see the invitation in Section 2.2 of SI). This pool is not representative of the population and is subject to sampling biases. However, smartphone users, who use a broad range of apps that require sharing of sensor data are mainly young people [55,56,57], and therefore the students' profile fits well with the nature of the conducted experiment. Participants with technological literacy are also more likely to be familiar with data-sharing dilemmas involving a privacy cost to gain access to smartphone app services. Studying such a sample of participants can make results more compelling as shown in earlier experiments conducted on such recruitment basis [58]. Only Android smartphone users are recruited, who are a large portion of the population, for instance, 39.8% in Switzerland, 68.6% in Europe and 72% worldwide in 2016 according to StatCounter. Moreover, several smartphone apps with data-sharing decisions are made for both Android and iOS. Therefore there is no substantial evidence to suggest different decision patterns among the market share in the population as also supported in earlier work [58]. Recruitment is performed in 8 sessions on a weekly basis. To eliminate any further temporal bias, each of the three phases in Fig. 8 took place on the same day of the week. Table 2 in SI provides an overview of the experimental sessions. Entry phase. It takes place at DeSciL and it involves the following: (i) Collection of basic demographics about participants and information about their privacy profile using the survey questions of Table 4 in SI. (ii) Use of the privacy-intrusion level assigned to each data-sharing criterion and its elements (Questions B.9-B.12) to calibrate the calculation of the monetary rewards for the core phase according to the model illustrated in Section 1 of the SI. (iii) Collection of the intrinsic data-sharing decisions by letting participants choose once the data-sharing level for each of the 64 data-sharing scenarios (see Fig. 3b in SI). The following question implements the data-sharing scenarios: Factorial Question. Please choose the amount of <sensor type> sensor data shared with <data collector> to be used in the context of <context>.
There are in total five possible data-sharing levels to choose from (see Fig. 3b in SI). Core phase. It takes place out of the lab and lasts for two days (48 hours), starting right after the completion of the entry phase. During the 24 hours of each day, participants are voluntarily involved in an (unlimited) sequence of dilemmas of either improving their privacy or rewards by sharing less or more data respectively in a data-sharing scenario. Fig. 9 illustrates the two app screens for the privacy-rewards dilemma and the data-sharing scenario that follows. First, participants decide what to improve based on their privacy-rewards balance they currently have (Fig. 9a). Next, a data-sharing scenario is automatically retrieved with the latest choice made (Fig. 9b), marking the options that fulfill their goal (the improvement box, see Arrow 6). The retrieved scenario is the one that maximizes the improvement of the chosen goal, i.e. privacy or rewards. For each option, the app informs participants about the rewards and privacy they gain or lose (Arrows 3 and 4 respectively). After a choice, the participant moves back to the main screen of Fig. 9a with an updated privacy-rewards balance. The first unique 64 data-sharing scenarios are the ones that participants have decided about during the entry phase. The difference in this core phase is that data sharing is rewarded based on two factors defined in the data-sharing model (see Section 1 of SI): (i) the data-sharing level (the higher, the more rewards) and (ii) how privacy-intrusive the data-sharing scenario is according to each participant. More rewards are allocated to data-sharing scenarios involving criteria regarded highly privacy-intrusive by a participant. The latter personalization is derived from the responses of the entry phase (Questions B.9-B.12 in Table 4 of SI) without explicitly making participants aware of this.
Within the 24 hours, participants can change their goal based on their privacy-reward balance. They continue responding to further retrieved data-sharing scenarios that can satisfy their goal, i.e. improve privacy or rewards, see Fig. 9a. This allows studying how data-sharing decisions evolve. Each decision in a data-sharing scenario overwrites the previous one for the calculation of the privacyreward balance. At the end of the 24 hours, the process completes by locking the decisions of the 64 scenarios and sharing the data to the data-access web portal. This process runs for two days to validate the results, confirming similar data-sharing behavior at both days (see Fig. 3a as well as Fig. 9a and 9b in SI). Exit phase. The participants of each experimental session return to DeSciL on the 4th day. They answer a survey questionnaire, participate in an interview and receive their calculated compensation. The survey consists of questions that cover the following aspects (see Table 6 to 9 in SI): (i) smartphone use, (ii) user interface and functionality of the app, (iii) rewards and privacy, (iv) experimental process. The data collected during this phase have a supportive role serving the validation and interpretation of the results produced during the entry and core phase. See Section 3.4 of SI for further details. Compensation and monetary incentives. Participants are compensated for their engagement in the experiment as well as for the sensor data they share. The engagement covers (i) showing up in the lab (2 · 10 = 20 CHF), (ii) completing the lab activities (15 + 5 = 20 CHF) and (iii) using the app in terms of answering at least once all 64 data-sharing scenarios (2 · 2.5 = 5 CHF). The rewards for the app use is distributed with a geomentric progression over the data-sharing scenarios to eliminate dropout effects (see Section 3.5 of SI). Those who successfully complete all experimental phases receive the total fixed compensation of 45 CHF and an additional maximum reward of 2 · 15 = 30 CHF based on the amount of shared data. Fig. 8 shows how the total maximum amount of 75 CHF is allocated over the experimental process. Section 3.5 of SI further motivates the allocation of these compensations. These data are stored on a remote server and locally on the smartphone for redundancy so that they can be restored during the exit phase by moderators in case of software or communication failures.

Technical infrastructure
The developed infrastructure consists of the following interactive systems: (i) the local and (ii) remote data-management system, (iii) the smartphone app and (iv) the data-access web portal. The two data-management systems synchronize and secure the shared sensor data as well as the experimental data. The smartphone app is developed to run on Android devices. The data-access web portal stores the shared data and provides authorized access to the registered participants of the experiment as well as the data collectors involved in the data-sharing scenarios. Making available this system improves the realism of the experiment by realizing the actual data-sharing decisions, while allowing the experimental design to comply with the non-deceiving policy of DeSciL. See Section 4 of SI for further details.

Privacy calculations for sensors, collectors and contexts
The privacy measurements in Fig. 4a are made as follows: In the case of the attitudinal datasharing condition, the mean privacy level is calculated by normalizing (in [0, 1] over all participants) the privacy sensitivity reported in the Questions B.10-B.12 during the entry phase. In the intrinsic, rewarded and coordinated data-sharing conditions, the privacy level of a certain sensor, data collector or context is the normalized privacy mean across all participants for 16/64 data-sharing scenarios that contain this respectively (see Fig. 3a). In the coordinated data-sharing conditions, this is calculated using the mean privacy level of the data-sharing scenarios selected over all 10 repetitions of the coordination with a random positioning of the agents (see Section 4.4 for more information).
The expected privacy level of a data-sharing scenario (see shaded areas in Fig. 3a) is calculated by the mean privacy level of the sensor, collector and context that comprise the data-sharing scenario. The expected privacy level of a certain sensor, data collector, or context is the mean expected privacy level over 16/64 data-sharing scenarios containing this. The relative difference between the actual privacy level and the expected one defines the privacy reinforcement. Detailed measurements are illustrated in Fig. 13, Section 10 of SI.

Coordinated data-sharing via decentralized AI
Coordinated data sharing is modeled as a decentralized discrete-choice multi-agent combinatorial optimization problem. It is designed to recover excessive privacy loss by rewarded data sharing. A decision-support system implements the optimization that achieves the coordination. The discrete choice model and the coordination method are outlined below. Data-sharing plans and elicitation of privacy sensitivity. Each participant comes with three data-sharing plans extracted from the living-lab experiment as follows: each plan is a sequence of 64 real values that represent the data-sharing choices made at each scenario and each experimental condition: intrinsic, 1 st rewarded and 2 nd rewarded. Each plan has a privacy cost represented by a real value. It is calculated by the mean normalized level (in [0, 1]) of shared data over the data-sharing scenarios. Alternative privacy valuation schemes are assessed in Section 9 of SI. Steering data sharing using privacy-preservation goal signals. A goal signal represents a data-collection scenario with the minimum required data to enable a data-driven service or application [32, 33, 34]. Five privacy-preservation goal signals for data sharing are generated using the intrinsic data-sharing choices of participants. Each goal signal is a sequence of 64 values corresponding to the data-sharing scenarios. For each data-sharing option out of the five possible ones, a goal signal is calculated with the 64 values representing the probability of participants choosing this data-sharing option without rewards. Similarly with the data-sharing options, the five goal signals are referred to within the range of very low to very high privacy preservation. Fig. 10, Section 7 in SI illustrates the five goal signals. Coordinated data sharing. The goal of the data collective is to choose and aggregate (sum up element-wise) the data-sharing plans of all individuals such that the resulting signal matches a given goal signal for privacy preservation. This matching is measured here with the residual sum of squares between these two signals (standardized). As this goal cannot be satisfied by letting individual participants choosing independently the plan with the best matching (minimizing a nonlinear cost function), coordination between participants' choices is required. This discrete-choice coordination problem is combinatorial NP-hard and requires approximating solutions [25]. The coordination capability can be generalized to a multi-objective combinatorial optimization problem in which the data collective minimizes the following cost function: where privacy inefficiency is the residual sum of squares between the aggregated data-sharing plans and the goal signal, privacy cost is mean cost of the selected plans and the privacy unfairness is the dispersion (variance) of privacy cost over individuals. The parameters α and β, for α + β = 1 and α, β ∈ [0, 1], are self-determined by each individual and model a behavioral continuum between selfish vs. altruistic behavior in terms of data sharing. A selfish individual that minimizes privacy without coordinating its data sharing with other individuals is determined by β = 1, α = 0. An individual that minimizes the collective privacy inefficiency without counting its personal privacy cost is an altruistic one by β = 0, α = 0. And these altruistic individuals can balance for privacy unfairness by increasing the α parameter.
A decentralized computational approach for coordination. The collective learning method of I-EPOS is used to cope with the computational and communication complexity of the coordinated data-sharing problem [25]. This algorithm is used as a decision-support system that automates and scales up the coordination, which would otherwise be too complex and infeasible for humans to perform without digital assistance. As featured by UNESCO IRCAI [29], this method is particularly fitting in this privacy context: (i) The algorithm itself is privacy-preserving by design as it exclusively relies on exchanging aggregated (and not individual) information. The use of differential privacy and homomorphic encryption can also enhance the overall security of information aggregation. (ii) The algorithm is highly cost-effective with a low computational and communication complexity compared to other multi-agent approaches for combinatorial optimization problems [25]. The data-sharing choices calculated by the algorithm can rapidly match the goal signal with a low communication exchange between the agents. (iii) The algorithm is open-source, decentralized and can scale up without relying on a trusted third party, which makes it particularly applicable for bottom-up data collectives. (iv) The algorithm can operate in different faulty environments and application scenarios [59].
Collective learning parameterization. Agents are self-organized in a binary balanced tree within which they are positioned randomly. Coordination repeats 10 times, each with a different random positioning of the agents. For each random positioning, collective learning runs for 50 learning iterations. Each iteration proceeds from leaves to root and back to leaves. It results in the selection of data-sharing plans that minimize at an aggregate level the cost function in Equation 1. More information about the algorithm can be found in earlier work [25].

Causal inference with conjoint analysis
The complete factorial design of 3 data-sharing criteria each with 4 elements results in 64 scenarios encoded by a sequence of 12 − 3 = 9 dummy variables. These represent the membership of a certain sensor, collector and context in a data-sharing scenario. Multiple linear regression models are constructed using as independent variables the nine dummy variables (4 − 1 = 3 variables per data-sharing element are used to resolve the linear dependency problem in multiple regression). The dependent variables that distinguish the regression models include the following (Fig. 6): privacy (intrinsic, intrinsic−2 nd rewarded, coordinated with very low privacy-preservation goal) and gained rewards (1 st and 2 nd rewarded data sharing with those individuals who intend and do improve rewards as in Fig. 9). These privacy and reward values across the 64 data-sharing scenarios of the full factorial design are used for a rating-based conjoint analysis. Other regression models with lower statistical power are assessed and further illustrated in Fig. 14, Table 13, Section 11 of SI. The regression models result in the 12 coefficients for each data-sharing element as shown in Fig. 6a. Together with a constant (Table 13 in SI), they predict the depend variable. Using the coefficients, the partworth utilities are estimated that calculate the relative importance of each data-sharing criterion and element (Equations 11 and 12 in SI). For each data-sharing element, the relative importance is calculated across the elements of the criterion it belongs (Equation 12) or across all elements (Equation 13). The latter is shown in Fig. 15 of SI. The conjoint analysis models are compared to the mean relative perceived privacy sensitivity as declared by participants in the Questions B.9-B.12 in Table 4 of SI.

Extraction and validation of group behavior
How groups are extracted. To extract the data-sharing group behaviors, the participants' privacy level under intrinsic and 1 st /2 nd rewarded data sharing are clustered using three clustering techniques of R: (i) k-means [60] (kmeans), (ii) hierarchical clustering [61,62] (hclust) and (iii) partitioning around medoids [63] (pamkCBI). A subset of 110 participants were clustered that made both intrinsic and rewarded data-sharing decisions. An optimum number of five clusters is confirmed in all three methods that correspond to the data-sharing groups marked in Fig. 7a. An exemplary of observed and unobserved group behaviors is outlined in Table 1. How groups are validated. In the case of k-means and hierarchical clustering, the optimum number of five clusters is derived by performing a bootstrap evaluation (clusterboot of R) of the clusters [64]. It assesses both the stability of the clusters and the stability of different clustering algorithms. The pamkCBI algorithm performs partitioning around medoids. The number of clusters is estimated by the optimum average silhouette width [65,66]. However, a bootstrap evaluation is also performed for pamkCBI for a complete comparison of the three algorithms. An outline of the clusters stability (mean Jaccard similarity) and the number of dissolved clusters for 100 bootstrap iterations is given in Table 14 of SI. Visual inspections show that all three algorithms find the same clusters, while k-means achieves a mean Jaccard similarity (bootmean) higher than 0.75 for all clusters, which indicates stable clusters. As such, the groups of k-means are analyzed in this paper (Fig. 7). Note also that the population split over the data-sharing groups matches well to Westin's general population privacy indexes, see further Section 12 of SI.

Data Availability
The

Code Availability
The source code of the AI system is under active development at https://github.com/epournaras/ epos. Source code used and developed for this paper is made available at https://doi.org/10. 5281/zenodo.7457575.    Table 1 provides an overview of the mathematical notations. The selected data-sharing level of individual i for a data-sharing scenario j D j A data-sharing scenario

References
A data-sharing decision function of individual i in a data-sharing scenario D j d j,u An element of criterion u in a data-sharing scenario j n Number of individuals w i,u The weight of criterion u by an individual i o The index of an element of a data-sharing criterion w i,o,u The weight of an element o of a criterion u by an individual i W i,j The weight of a data-sharing scenario j by an individual i B

Maximum (monetary) budget Bp
Rewards for participation Bs Rewards for data sharinĝ r i,j The maximum rewards of individual i for a data-sharing scenario j W i The total weight of all data-sharing scenarios by an individual i r i,j The actual rewards of an individual i for a data-sharing scenario j p i The privacy level of an individual i derived from the data-sharing choices λu,o The coefficient of a data-sharing element o in the criterion u Du,o The dummy variable for the absence or presence of the data-sharing element o in the criterion u The error of the regression model Pu The partworth utility (relative importance) of criterion û Pu,o The partworth utility (relative importance) of element o in criterion u among all criteria Pu,o The partworth utility (relative importance) of element o within criterion u P j The mean privacy level of a data-sharing scenario j ε The mismatch (absolute error) of data sharing from a privacy-preservation goal signal R j The mean rewards level of a data-sharing scenario j r i The rewards of individual i gained over the data-sharing scenarios r i The hypothetical rewards of an individual i gained over the data-sharing scenarios under intrinsic data sharing The privacy cost of a data-sharing plan generated by individual i as a function of r i α, β The weights of privacy unfairness and privacy cost respectively in the optimization cost function

Data-sharing criteria
Let k factors, referred to as criteria, govern the level of data sharing that an individual, i.e. a citizen, chooses. This ranges from sharing no data to sharing all locally available data in an individual's device such as a smartphone. Each criterion u ∈ {1, ..., k} has a number of possible elements l u . For instance, the type of sensor data is a criterion with the following elements (see Figure 2 in the main paper): GPS location, light sensor, etc. The former element may be regarded more privacy intrusive than the latter one. The total number: of combinations between the l u elements of the k criteria define the scenarios of data sharing, which are the ones studied in this paper. For each data-sharing scenario j ∈ {1, ..., m}, individuals have a number of z discrete data-sharing options, where the first option corresponds to sharing all collected data, whereas the zth option corresponds to sharing no data. Each individual i selects a datasharing level s i,j ∈ {1, ..., z} for scenario j. For simplicity, assume that the actual level of data sharing decreases linearly from 1 to z by, for instance, averaging, obfuscating or resampling the data to share (e.g. with a period proportional to s i,j ). The data-sharing level s i,j is a result of a function: where D j =(d j,u ) k u=1 represents the data-sharing scenario j as the sequence of elements d j,u ∈ {1, ..., l u } over all k criteria. For the sake of simplicity in the model illustration, the number of criteria k and the number of elements l u for each criterion u are assumed finite and fixed for all n individuals.

A weighting scheme for personalized privacy valuation
Let the weight w i,u ∈ [0, 1] denote how privacy-sensitive a criterion u is for an individual i relative to the rest of the criteria, such that The weight W i,j of a data-sharing scenario j is determined by each criterion weight w i,u and each element weight w i,o,u it consists of as follows: where o = d j,u is the element of criterion u in the data-sharing scenario j.
The weighting scheme is used to model the heterogeneity in the availability of data that stems from the individuals' privacy perception, i.e. it is expected that privacy-sensitive data are more scarce and as a result they are also expected to have higher value in data sharing.

Calculating rewards and privacy
The calculation of rewards and privacy relies on the weighting scheme for personalized privacy valuation (Section 1.2). Assume there is a maximum (monetary) budget B to incentivize data sharing that is split as follows: where B p rewards participation, meaning the cognitive effort required for individuals to make choices for all data-sharing scenarios and B s rewards the actual data sharing respectively. Moreover, assume that the weights of each criterion/element represent the actual intrinsic privacy concerns of individuals. The maximum rewardsr i,j of an individual i for each data-sharing scenario j are allocated according to self-determined privacy-intrusion level of the data-sharing scenario as follows: where the weight W i sums up the weights of all scenarios as follows: The actual received rewards of an individual i with a data-sharing level s i,j under a data-sharing scenario j are calculated as follows: The privacy of an individual i over all selections made in the m data-sharing scenarios is calculated as follows:

Recruitment Process
The split of the recruitment process into multiple sessions as well as the invitation for the recruitment are illustrated in this section.

Recruitment sessions
Splitting the recruitment of participants and the experiment into multiple sessions serves the following: (i) Guaranteeing enough time to recruit participants from the pool. (ii) Having a manageable number of participants to moderate during the experimental process. (iii) Scale up the number of participants incrementally so that potential failures do not influence the overall experiment. The entry phase takes place on Mondays, the core phase during Mondays-Wednesdays and the exit phase on Thursdays. A 93.6% of the participants did not know about the experiment before participating (Question D.28 in Table 9).

E-mail invitation for recruitment
The invitation sent to the DeSciL pool of participants for the recruitment is presented below: Dear <firstname> <lastname>, We would like to invite you to an upcoming experiment '<experiment name>'. The experiment will be carried out in English, so you should be fluent in English in order to register for this study.
The experiment requires your participation at the ETH Decision Science Laboratory at TWO different days and the use of your mobile phone (Android only) at other two days to answer some questions.
Your participation in the experiment will be maximally compensated as follows:

Experimental Design
The preparatory, entry, core and exit phase of the conducted experiment are outlined here in more detail. The compensation and monetary incentives introduced to engage participants are also illustrated.

Preparatory phase
The preparatory phase has a supportive role in the overall experiment as participants are neither compensated nor selected rigorously. Participants of the preparatory phase are selected from the network of employees at ETH Zurich (convenience sampling). The findings of the preparatory phase are not conclusive and mainly serve the design of the following phases. Nevertheless, this phase was scaled up to approximately 200 participants within 3 months, starting on 19.05.2016.
The preparatory phase consists of a web survey implemented in Qualtrics [1] with the questions outlined in Table 3. The goal of the preparatory phase is to provide some first insights about the perception of privacy from the perspective of the three studied aspects: sensor type, data collector and context. Questions A.9-A.14 are designed for this purpose. Questions A.6-A.8 provide information about the smartphone usage profiles, whereas, Question A.15 scrutinizes the type of incentives that motivate participants to share mobile sensor data. Questions A.1-A.5 collect demographic information.

Entry phase
The participants of each experimental session are verified by the DeSciL staff members by presenting a personal identification document, i.e. a passport or student card, nevertheless, the actual identity of the participants remains anonymous to the researchers using the lab. Participants are not allowed to interact with each other during the experiment and any questions need to be addressed in private directly to the experiment moderators by moving to a next room. In this way, biases about how each participant perceives and understands the experimental process are eliminated. This process is communicated to the participants before the beginning of the experiment. Next, participants are seated in a room with instructions about the experiment (Figure 1) and the information consent ( Figure 2) placed in front of them.
The Android app was made available in Google Play online store for the participants to download, see Figure 3a. The app generates locally in the background a unique ID used as identifier of the participants in the experiment as well as in the data collected in the database. This ID can be viewed in the app by participants. The first screens of the app present the survey questions B.1-B.8 of Table 4. The next screens personalize the sharing of sensor data. Initially, the three criteria of (i) sensor type, (ii) data collector and (iii) context receive their weights according to the perception of each | | Placeholder for organisational unit name / logo (edit in slide master via "View" > "Slide Master") 1.12.2014 First name Surname (edit via "Insert" > "Header & Footer") 2 Entry Phase | | Placeholder for organisational unit name / logo (edit in slide master via "View" > "Slide Master") 1.12.2014 First name Surname (edit via "Insert" > "Header & Footer") 4

Entry Phase
• Reading these instructions is a requirement.
• All requirements on the invitation e-mail need to be fulfilled to participate. If you don't fulfill the requirements contact immediately the experiment moderators. • You are not allowed to interact with other participants. • Questions should only be addressed to the experiment moderators in private. .

Data Collectors
§ Third parties requesting access to sensor data:

Entry Phase -Classify Features
There are five possible choices ranging from very low privacy intrusion to very high privacy intrusion.
"very low privacy intrusion" This feature affects your privacy to your minimum "very high privacy intrusion" This feature affects your privacy to your maximum ! All questions are compulsory, and required.
No default option (None). "very high privacy intrusion" These sensor data are privacy-intrusive to your maximum "very low privacy intrusion" These sensor data are privacy-intrusive to your minimum ! All questions are compulsory, and required.
No default option (None). 1. Classify how privacy-intrusive you find each data collector.
"very low privacy intrusion" This data collector is privacy-intrusive to your minimum "very high privacy intrusion" This data collector is privacy-intrusive to your maximum ! All questions are compulsory, and required.
No default option (None).

Entry Phase -Answer Questions
The entry phase (Day 1) finishes when this picture appears. The core phase starts that expands to Day 2 and Day 3.

Day 2 lasts for the next 24 hours from now. Then you enter Day 3, which respectively lasts 24 hours.
Ask and read the instructions of the core phase before you further answer questions.  Section 1.2. Moving to the next screens, the same personalization process is repeated within each criterion: for the different sensor types (group Question B.10), data collectors (group Question B.11) and contexts (group Question B.12). Table 5 illustrates the three criteria and its elements during the experiment. For each feature,  (Figure 4b and 4c respectively). Two out of the top-3 highly privacy-intrusive sensors are selected. These are the GPS (privacy intrusion of 0.85) and microphone (privacy intrusion of 0.78). The camera sensor is ranked 2nd with privacy intrusion of 0.83. It is not selected as it requires the collection of more complex data and higher storage space in the smartphones. The accelerometer (ranked 6th with privacy intrusion of 0.47) and light (ranked 7th with privacy intrusion of 0.46) sensors are the other two ones selected that belong into the middle ranking range of privacy intrusion. Figure 3b illustrates an instantiation example of the factorial question. After answering all questions, participants complete their participation in the entry phase and the smartphone app initializes the core phase. They receive the instructions of the core phase and they depart from DeSciL. Note that the answers to the instantiations of the factorial question during the entry phase are not monetary rewarded. The answers to these questions are the baseline with which the rewarded sharing of mobile sensor data during the core phase is compared.

Core phase
The core phase is initialized right after the completion of the entry phase when participants also receive the instructions shown in Figure 5. They also receive at this phase the instructions about  the data-access portal, see Figure 6. The core phase lasts for two full days (48 hours, Mondays to Tuesdays and Tuesdays to Wednesdays as shown in Table 2.). It takes place out of DeSciL lab and integrates to the daily life of participants. At the beginning of each day in the core phase, the rewards are zero as no data sharing is performed unless the participants consent to this via their responses to the data-sharing scenarios.

Exit phase
The exit phase is performed on Thursdays, the 4th day of each experimental session (see Table 2), and involves the return of the participants to DeSciL. The staff members of the lab verify the identify of the participants and they are then seated to lab computers to fill in an online survey created in | | Placeholder for organisational unit name / logo (edit in slide master via "View" > "Slide Master") 1.12.2014 First name Surname (edit via "Insert" > "Header & Footer") 18 Core Phase | | Placeholder for organisational unit name / logo (edit in slide master via "View" > "Slide Master") 1.12.2014 First name Surname (edit via "Insert" > "Header & Footer") 19 Core Phase • You are not allowed to interact with other participants about the experiment • All requirements on the invitation e-mail need to be fulfilled to participate. If you don't fulfill the requirements contact immediately the experiment moderators.   Figure 6: Instructions on the data-access portal presented to the participants starting the core phase and after finishing the entry phase.

Contact information
Qualtrics. The questions of the exit survey are outlined in Tables 6 to 9. The matching of the data collected in this phase with the data of the previous phases is performed with the user ID inserted in Question D.1. The exit survey begins by acquiring general information about the mobile phone used during the experiment as shown in Table 6. Questions about the user interface and functionality of the mobile   app are posed ( Table 7). The ease of use and the quality of the app are evaluated in Questions D.4 and D.5 respectively, with the group Question D.6 evaluating the satisfaction level of several features such as colors, formulation of questions, number of questions and others. The group Question D.7 evaluates how comprehensible and useful the user interface features are ( Figure 9 in the main paper). These questions are used to detect possible biases that may affect data-sharing choices. The questions of Table 8 follow that concern the rewards and privacy. A few factors evaluated are the awareness about privacy (Question D.9.1), ease of privacy adjustments (Question D.9.3), satisfaction level on rewards (Question D.10), data-sharing incentivization by rewards (Question D.11) and other. These questions further explain the data-sharing choices made during the entry and core phase. Table 9 includes the following questions about the experimental process. They evaluate the satisfaction level to several experimental aspects (Question D.17), the participation level and technical problems (Questions D.18-D.24) as well as the user experience of the data-access portal (Questions D.25-D.27). After the exit survey, participants have an interview with the moderators of the experimental session. The goal of the interview is to scrutinize in a more qualitative way how participants perceive the overall experimental process as well as to discuss some behavioral artifacts observed in the data collected by the Kinvey backend during the previous phases. Moreover, when data are not successfully transferred to Kinvey, the data are manually transferred from the participants' phones to the moderators' computers after participants' consent. At the end of the interview, the moderators compute and validate the final total compensation of each participant who receives the compensation by the lab moderators before departing from DeSciL.

Compensation and monetary incentives
The computed rewards are personalized according to the model of Section 1. The entry phase receives higher compensation as it requires the initial engagement and the execution of more complex tasks with the smartphone compared to the exit phase.
The distribution of the rewards for the app use follows a geometric progression and is implemented by transforming Equation 7 as follows: wherer i,j is the maximum rewards that can be gained in sharing scenario j computed by Equation 5, z = 5 is the number of sharing options, s i,j is the participant's selection, B p is the participation budget and B is the total available budget. The allocated amounts for the compensation of participants are decided empirically after consultation with the DeSciL staff members. Factors that influence the decisions are the following: the available budget, the target of employing around 100 participants, the complexity of the designed experimental process, Swiss economy and the student profile of the participants in the DeSciL pool. The amounts reflect a trade-off: high enough to inventivize and engage participants with this novel experimental process while not too high to study data-sharing dilemmas between privacy and monetary rewards. The effectiveness of the selected amounts is evaluated using Questions D.9-D.13 of Table 8. These results show that the designed rewards were effective for their purpose. A 57.7% of the participants were too busy to answer more questions, while 33.6% needed further motivation (Question D.21).

Implementation of the Technical Infrastructure
The data collected by participants' smartphone app are stored and managed locally by an implementation of the nervousnet framework [2] that provides high-level application programming interfaces (APIs) to store, query and analyze data on smartphones. Remotely on the server, the data are stored and managed by Kinvey [3] that provides secure communication by using TLS/SSL encryption between smartphones and the Kinvey backend. The data-access web portal relies on Node.js and a MongoDB database. The quality of the app (Question D.5) is evaluated 61% positively. The mobile phone remained turned on during the experiment in 82.7% of the participants (Question D.18), while only 13.8% of the participants ran out of battery (Question D.19) and a 25.9% reported battery drain problems (Question D.24). Figure 8 illustrates the mean privacy and reward gain of the data-sharing scenarios retrieved as a response of choosing to improve privacy and rewards respectively (see Figure 9 in the main paper). Table 10 outlines the mean privacy and reward gain of the different data-sharing elements that consist the 64 data-sharing scenarios. 6 Privacy Loss and Rewarded Data-sharing Choices of Groups Figure 9a and 9b illustrate the probability and cumulative density functions for the intrinsic, 1 st and 2 nd rewarded data sharing. The two experimental conditions for rewarded data sharing show very similar densities, while intrinsic data sharing comes with a single peak around the privacy level of 0.55. Figure 9c shows the privacy level over consecutive data-sharing choices under the 2 nd rewarded data sharing. Compared to Figure 7b in the main paper showing the 1 st rewarded data sharing, the group behaviors are similar. Reward opportunists show a further decline of their privacy level. Figure 10 illustrates the five goal signals of privacy preservation. They represent a distribution of required amount of data over the 64 data-sharing scenarios. They are referred within the range of very high to very low privacy preservation. This is because each signal measures the ratio of participants that choose a certain data-sharing level for each data-sharing scenario under intrinsic data sharing. Note that for each data-sharing scenario in Figure 10, the shares of participants sum up to 1.

Reward Gain
Ranked Data-sharing Scenarios  8 Data-sharing Mismatch Figure 11 shows the data-sharing mismatch for the rest of the three goal signals of privacy preservation: low, medium and high. The results here confirm the findings illustrated in Figure 3b of the main paper: mismatch increases for higher privacy-preservation goals as agents mainly have one privacy preserving option (intrinsic) to choose from.

Valuations of Collective Privacy Recovery
Four different valuations of privacy are compared in Table 11. All valuations are a function of r i that is the mean privacy level over all data-sharing scenarios, measured by the gained rewards as outlined in Equation 7: • Absolute shared data: The privacy cost C i (r i ) equals the gained rewards r i . This is the default valuation used throughout the main paper. The minimum privacy cost is 0, while the maximum is 17.5 that is the maximum rewards that an individual could gain in the lab      • Absolute sacrificed rewards: The privacy cost C i (r i ) equals the gained rewards r i minus the fixed data-sharing rewards B s . This scheme is equivalent to the absolute shared data as B s is constant. This valuation measures more directly the loss of rewards in exchange of privacy preservation. The minimum privacy cost is −17.5, while the maximum one is 0.
• Relative shared data: The privacy cost C i (r i ) equals the gained rewards r i minus the privacy level under intrinsic data sharing, measured as well in terms of (hypothetical) gained rewards (r i  Figure 11: Data-sharing mismatch (root mean square error ε) for the 64 data-sharing scenarios and for the three goal signals of high, medium and low privacy preservation. Values are sorted from lowest to highest mismatch according to the the coordinated data sharing. Coordinated data sharing shows higher efficiency than intrinsic and rewarded data sharing.
Absolute shared data data sharing over the intrinsic one, assuming that the intrinsic data sharing comes with no privacy cost. Depending on the level of intrinsic data sharing, the minimum privacy cost is −17.5, while the maximum is 17.5 (the behavior of reward opposer and reward opportunist respectively as shown in Table 1 of the main paper).
• Relative sacrificed rewards: The privacy cost C i (r i ) equals the gained rewards r i minus the privacy preservation under intrinsic data sharing measured by B s −r i . This scheme is equivalent to the one of absolute sacrificed rewards with the addition of the privacy costr i under intrinsic data sharing. Depending on the level of intrinsic data sharing, the minimum privacy cost is −17.5, while the maximum one is 17.5.
The collective privacy recovery under intrinsic, rewarded and coordinated data sharing is assessed using the four different valuations schemes under the very high and very low privacy preservation goal. Figure 12 shows the privacy cost per individual for each of these cases. All lines are sorted from lowest to highest privacy cost. Each plot in Figure 12 comes with the mean relative privacy gain and loss of the coordinated data sharing compared to rewarded and intrinsic data sharing respectively. The privacy cost of intrinsic data sharing corresponds to the data-sharing plan with the minimum privacy cost and it is calculated using EPOS with α = 0, β = 1. In contrast, the privacy cost of rewarded data sharing corresponds to the data-sharing plan with the maximum privacy cost and it is calculated using EPOS with α = 0, β = 1 and data-sharing plans with reversed sign.   Figure 12: The four privacy valuations illustrated in Table 11. The privacy cost is measured for the intrinsic, rewarded and coordinated data sharing under the very high and very low privacy preservation goal. The highest privacy gain is observed for the relative shared data and the relative sacrificed rewards.
The highest privacy gains are observed under the valuation scheme of relative shared data: 54% and 55.6% for the very high and very low privacy preservation goal. This means that coordinated data sharing shows a further privacy recovery when evaluating the data-sharing choices based on the additional privacy cost that individuals pay over the intrinsic data sharing. The relative sacrificed rewards follow with 43.1% and 47.2% respectively. The default valuation scheme of absolute shared data has the lowest privacy gain of 41.4% and 43.7% respectively, which equals the absolute sacrificed rewards as B s is constant (lines shifted to negative values). The mean privacy gain for the very low privacy preservation goal is 2.7% higher than the very high one. Similarly with the observation in Figure 3a of the main paper, two rewarded options of individuals with low privacy on average provide higher flexibility than a single one with high privacy preservation.
With their higher privacy gains, the alternative valuation schemes find applicability in the further adoption of the data-sharing plans recommended to users. They can also be used to provide augmented explanations of what these recommended plans mean for the data collective, while raising awareness of the different privacy manifestations and collective privacy gains. Figure 13 illustrates the privacy reinforcement of the different data-sharing elements. The key finding is that the perceived privacy sensitivity of the data-sharing elements is likely to reinforce privacy under intrinsic and coordinated data sharing rather than the rewarded ones. The mean absolute privacy reinforcement under intrinsic data sharing is higher than 1 st rewarded and the two coordinated data-sharing conditions: 4%>2.27%> 2.24%>0.65% respectively. Under intrinsic data sharing, social networking, corporation, noise sensor and NGO reinforce privacy gain, while education, accelerometer and transportation a privacy loss. Privacy reinforcement under intrinsic data sharing is correlated with the attitudinal privacy sensitivity (R = 0.63, t(10) = 2.57, p = 0.028). This means that privacy risk awareness is likely to reinforce privacy protection. There is a correlation in the privacy reinforcement under intrinsic and the 1 st rewarded data sharing (R = 0.73, t(10) = 3.4, p = 0.0067). In the 2 nd rewarded data sharing, GPS shifts to a 3.5% reinforcement of privacy loss, while environment shifts to a 8.4% reinforcement of privacy gain. Coordinated data sharing with the very low privacy-preservation goal is positively correlated to attitudinal (R = 0.65, t(10) = 3.68, p = 0.023), intrinsic (R = 0.96, t(10) = 11.58, p = 4.07 × 10 −7 ) and the 1 st rewarded (R = 0.62, t(10) = 2.48, p = 0.032) data sharing. With the very high privacy-preservation goal, the correlation to attitudinal data sharing is negative: R = −0.61, t(10) = −2.43, p = 0.035.

Conjoint Analysis
The assumptions of conjoint analysis are discussed and assessed in the context of the conducted experiment [4]. No direct carryover effects are involved under instinct data sharing as participants are exposed to each data-sharing scenario once. Under rewarded data sharing, the privacy-rewards balance introduces a carryover effect that is subject of study in this paper. Because rewards are personalized (i.e. each data-sharing scenario is retrieved to satisfy the intended action of improving rewards or privacy) and because responses to repeated data-sharing scenarios are made on-demand, carryover effects mainly originate from tuning the privacy-rewards balance. No influential order effects are anticipated within the designed rating-based conjoint experiment. In regards to the order of the data-sharing elements, each data-sharing scenario is presented in natural language as determined by the Factorial Question in Section 4.1 of the main paper. Decision-making quality is not expected to decrease for k = 3 < 10 data-sharing criteria as shown in earlier experimental tests in literature [5,4]. As this is not a choice-based conjoint experiment, the order of the datasharing levels ( Figure 9b of the main paper) simply adheres to design principles of likert scales and graphical user interfaces. As the experiment relies on a full-factorial design without rendering any data-sharing scenario as infeasible, order effects among the scenarios are unlikely. It is though personalization under rewarded data sharing that can yield, in theory, atypical data-sharing choices, i.e. one that can increase the accumulated rewards when a participant chooses to improve privacy, and vice versa. Excluding these or reducing the likelihood of their occurrence is expected to improve external validity [4], i.e. participants do not lose interest or react contrary to their privacy-reward goal improvement.
The performed conjoint analysis relies on the following multiple linear regression model: .., l k − 1} are the independent dummy variables that represent the absence or presence of a data-sharing element within a datasharing scenario. Note that one data-sharing element for each criterion is removed from the model (accelerometer, corporation, social networking) to resolve the linear dependency with which the effect of the confounded variables cannot be separated by the regression.
Using the estimated coefficients, the partworth utilities can be estimated for each data-sharing criterion u as follows: .
The partworth utilities measure the relative importance of the criteria within a regression model: which of the data type, collector or context is the most important when individuals make datasharing decisions. Similarly, the relative importance of each data-sharing element for each criterion is calculated as follows: The relative importance calculation can be adjusted for each data-sharing element among all criteria as follows:P The model of Equation 10 is evaluated at the population level for different dependent variables of privacy P j and rewards R j with values over the 64 data-sharing scenarios. These variables are selected among the different experimental conditions and they determine the compared conjoint analysis models. The regression coefficients are illustrated in Table 12 and Figure 14. The rest of the conjoint analysis and metrics are shown in Table 13.  Eight models with privacy as the dependent variable as assessed: intrinsic, 1 st , 2 nd rewarded, intrinsic−1 st rewarded, intrinsic−2 nd rewarded, 1 st rewarded−2 nd rewarded, coordinated for very low and very high privacy preservation. One model with rewards as the dependent variable is assessed: 1 st and 2 nd rewarded of those individuals who intent and do improve rewards as in Figure 9 in the main paper. In addition, the following four models with the mismatch as dependent variable are assessed: intrinsic, rewarded, coordinated from very low to very high privacy preservation. As they statistically perform poorly, they are not shown in Table 13. Figure 15 illustrates the relative importance (P u ,P u,o ) of the data-sharing criteria and elements among all criteria, in contrast to Figure 6 in the main paper that shows the relative importance (P u ,  Figure 14: Coefficients of multiple linear regression used in conjoint analysis. Nine models with different dependent variables for privacy and rewards are compared. Four of these models with R 2 > 0.8 are shown in the main paper, Figure 6a. P u,o ) of the elements within each criterion. The relative importance (P u ) of the data-sharing criteria is the same as shown in Figure 6 of the main paper. For all models, sensor data such as GPS (46.82%), noise (41.4%) and light (16.04%) show the highest mean positive relative importance among all elements of the three criteria, while education (-42.74%) from collectors and environment (-29.78%), health (-27.8%) and transportation (-27.35%) from context show the lowest one. In contrast to the regression models, the perceived   Table 14 illustrates the results of the bootstrap evaluation method for the 5 different group behaviors extracted from the experimental data.  (51) Furthermore, the split of the participants over the data-sharing groups is compared to privacy categories identified in the general population from studies such as the ones of Westin [7,8]. This comparison can only be indicative though: a random sample from a US population back in 1990 is compared to a non-random sample from a Swiss population in 2016. Moreover, the survey questions are not identical to the formulated data-sharing prompts. Nevertheless, this comparison has a value out of the the fact that there are groups that capture the intended privacy of a broader population vs. groups that capture the actual data-sharing decisions of typical smartphone users.

Validation of Groups
Westin's studies classify individuals in three behavioral categories based on survey responses: privacy fundamentalists, pragmatists and unconcerned. They cover the whole spectrum of datasharing levels depicted in the exemplary of Table 1 in the main paper. Based on this, we match the data-sharing groups to Westin's categories under intrinsic data sharing, i.e. the data-sharing behavior of individuals is not considered under rewarded data sharing. The matching is illustrated in Table 15. The observed groups sizes show a remarkable match to Westin's privacy categories.

Analysis of Variance for Data-sharing Criteria and Groups
The Analysis of Variance (ANOVA) is made with IBM SPSS 24.0. Figure 16  It is not confirmed for: accelerometer (p = 0.04), transportation (p = 0.039) and environment (p = 0.005). The whole report analysis is illustrated in Table 16. The report analysis of the post hoc Tukey's range test (α = 0.05) is illustrated in Table 17, 18 and 19. , in particular the GPS, as well as the transportation context. Moreover, the following data-sharing elements fall close to the significance threshold: environment, and education contexts, accelerometer and noise sensors, and the educational institutes as data collector.