Hiding opinions from machine learning

Abstract Recent breakthroughs in machine learning and big data analysis are allowing our online activities to be scrutinized at an unprecedented scale, and our private information to be inferred without our consent or knowledge. Here, we focus on algorithms designed to infer the opinions of Twitter users toward a growing number of topics, and consider the possibility of modifying the profiles of these users in the hope of hiding their opinions from such algorithms. We ran a survey to understand the extent of this privacy threat, and found evidence suggesting that a significant proportion of Twitter users wish to avoid revealing at least some of their opinions about social, political, and religious issues. Moreover, our participants were unable to reliably identify the Twitter activities that reveal one’s opinion to such algorithms. Given these findings, we consider the possibility of fighting AI with AI, i.e., instead of relying on human intuition, people may have a better chance at hiding their opinion if they modify their Twitter profiles following advice from an automated assistant. We propose a heuristic that identifies which Twitter accounts the users should follow or mention in their tweets, and show that such a heuristic can effectively hide the user’s opinions. Altogether, our study highlights the risk associated with developing machine learning algorithms that analyze people’s profiles, and demonstrates the potential to develop countermeasures that preserve the basic right of choosing which of our opinions to share with the world.

This research is conducted by a team of researchers at the University of Edinburgh and New York University Abu Dhabi, the members of this research are Abeer Aldayel, Walid Magdy, Talal Rahwan, and Marcin Waniek. The research has been approved by the respective Institutional Review Boards.
For questions about the rights of research participants, you may contact the Institutional Review Boards Committee, New York University Abu Dhabi, irbnyuad@nyu.edu.
If you have any questions, suggestions or concerns, please feel free to reach out to us at a.aldayel@ed.ac.uk an email address that only researchers associated with this project have access to.
Please do not complete the survey more than once. Upon finishing the survey you will receive a completion code. The payment of $2.50 will be made once you've entered that code in the space provided. Please do not close the browser with your MTurk account.
By continuing you agree that: You have read the above information and agree to participate in the study.

What is your worker ID?
[empty field to be filled by a number] 2. What is your state of residence?
[drop-down list to choose state] 3. What is the sex listed on your birth certificate ? [empty field to be filled by a number] 7. What is your highest completed level of education?
(a) Full-time employed (b) Part-time employed (c) Unemployed (d) Caregiver (e.g., children, elderly) or homemaker (e) Retired (f) Full-time student (g) Other [empty field to be filled by text] 9. What was your yearly personal income in 2019 (include salary, interests, returns on investments, etc)?
(a) Strongly against (b) Against (c) Neither (d) In favor (e) Strongly in favor 2. If a person is using one of the below words in a tweet, what would you assume is the stance of that person towards Hillary Clinton?
[Next to every word is a group of radio buttons with options "Strongly against", "Against", "Neither", "In favor", and "Strongly in favor"] (a) Clinton (b) say (c) needs (d) vote (e) way (f) bet 3. If a person is following one of the below accounts, what would you assume is the stance of that person towards Hilary Clinton?
[Next to every account is a group of radio buttons with options "Strongly against", "Against", "Neither", "In favor", and "Strongly in favor"] (a) @heyalaurena: Explorer of the world (b) @nianticproject: A slow leak of information is coming my way. [Next to every account is a group of radio buttons with options "Strongly against", "Against", "Neither", "In favor", and "Strongly in favor"] [Next to every word is a group of radio buttons with options "Strongly against", "Against", "Neither", "In favor", and "Strongly in favor"] If a person is following one of the below accounts, what would you assume is the stance of that person towards Feminist Movement?
[Next to every account is a group of radio buttons with options "Strongly against", "Against", "Neither", "In favor", and "Strongly in favor"] (a) @char98 x: A&N (b) @emma nicole246: 717 (c) @husamhsm: Alfredo tweeted me 'hi' the next day justin followed me holy shit -justin followed December 16th 2013 (d) @twitterfashion:Twitter, but make it fashion (e) @allahislamquran: #Allah #Islam #Quran One and Only¸[the sentence "in the name of God", inserted here in Arabic]¸&¸Prophets [the sentence "peace be upon him", inserted here in Arabic]¸#AllahIs-lamQuran (f) @imrankhanpti: Prime Minister of Pakistan 4. If a person posted a tweet that mentions one of the below accounts, what would you assume is the stance of that person towards Feminist Movement?
[Next to every account is a group of radio buttons with options "Strongly against", "Against", "Neither", "In favor", and "Strongly in favor"] (a) @flaccid joe: Fuck that bitch, this is Russia. -Big Igor (b) @x aeon x: Average centrist moderate independent in the middle. Between left & right. Not up or down. Horizontal. 5/10. Only my sexuality is bent. who/whom/whose (c) @NoMaaam: If you're mad its because its true. Pressing block only proves ur ignorance bc ur blocking information. #feminismisawful (d) @LZats: Literary Agent. Host@printrunpodcast w@erikhane Intersectional feminist. Geek. Beer lover/tea snob. Pibble enthusiast. She/her. Tweets my own (e) @NinjaEconomics: Clever musings and first-world complaints from a manic pixie wannabe-economist.
(a) Strongly against (b) Against (c) Neither (d) In favor (e) Strongly in favor 2. If a person is using one of the below words in a tweet, what would you assume is the stance of that person towards Atheism?
[Next to every word is a group of radio buttons with options "Strongly against", "Against", "Neither", "In favor", and "Strongly in favor"] 3. If a person is following one of the below accounts, what would you assume is the stance of that person towards Atheism?
[Next to every account is a group of radio buttons with options "Strongly against", "Against", "Neither", "In favor", and "Strongly in favor"] (a) @baptism saves: Pastor at (link: http://www.churchofgodonline.org) churchofgodonline.org and church of God @kingjamesonline. "The like figure whereunto even baptism doth also now save us: [Next to every account is a group of radio buttons with options "Strongly against", "Against", "Neither", "In favor", and "Strongly in favor"] (a) @VictoriaOsteen (b) @nytimes: "The Weekly" is our new TV series. Episodes air Sundays at 10 p.m. on FX and on Hulu the next day. (c) @KLOVEnews: Thanks for stopping by for news + useful, encouraging and faith-based stories. Got news for us? newstip@klove.com (d) @stephenfry: How can I tell you what I think until I've heard what I'm going to say? (e) @Gr8Darwinians: Biped ape who loves science & punk rock music. Blind acceptance is a sign for stupid fools to stand in line (f) @Twitter: What's happening?! S1.6 The Need to Hide Stance Indicate the degree to which you feel the need to avoid revealing your stance on Twitter towards the following topics.

Hillary Clinton
[Likert scale with values: 0, 1, . . . , 10, where "0" is labeled as "I want to reveal my stance" and "10" is labeled as "I strongly want to keep my stance private"]

Feminist Movement
[Likert scale with values: 0, 1, . . . , 10, where "0" is labeled as "I want to reveal my stance" and "10" is labeled as "I strongly want to keep my stance private"]

Atheism
[Likert scale with values: 0, 1, . . . , 10, where "0" is labeled as "I want to reveal my stance" and "10" is labeled as "I strongly want to keep my stance  Table S1: Numeric values from the left column of Figure 2a in the main article. The first column contains the description of the feature, in the same format as in Figure 2. The remaining columns present the percentage of responses that identified the feature as indicating each of the five stances.  Table S2: Numeric values from the right column of Figure 2a in the main article. The first column contains the description of the feature, in the same format as in Figure 2. The remaining columns present the percentage of responses that identified the feature as indicating each of the five stances.

Contact Interactions Prediction Model
In  Table S3: The F1 scores of stance detection algorithms before the hiding process. The table presents the F1 scores of four stance detection algorithms on the SemEval dataset, focusing on the Twitter users whose stance is specified as either "in favor" or "against" one of the following five topics: feminism, Hilary Clinton, atheism, climate change, and abortion. The algorithms use the original set of features, i.e., before tampering with it via our stance obfuscation heuristics. The features are either related to the users' contacts (i.e., the Twitter accounts they follow), or the users' interactions (i.e., the Twitter accounts and the websites mentioned in their tweets). For each type of features, the table presents the F1 score for the users whose stance is "in favor", the F1 score for those whose stance is "against", and the average of those two numbers.  Table S4: Detailed information about the size of our dataset. The first column contains the number of users extracted from the SemEval dataset. The second column contains the number of tweets from the home timeline of the users; these are the tweets from which the users' interactions were inferred. The third column contains the number of accounts that were mentioned by the users in their tweets. The fourth column contains the number of web domains that were mentioned by the users in their tweets. The fifth column contains the number of accounts that were followed by the users. Need to keep stance private Proportion Figure S1: The distribution of the need to avoid revealing one's stance. For every topic (feminism, Hilary Clinton, and atheism), the participants indicated on a Likert scale from 0 to 10 the degree to which they feel the need to avoid revealing their stance on Twitter. For each topic, results are presented separately for each stance towards that topic, where stance ranges from "Strongly against" (dark red) to "Strongly in favor" (dark green). Each plot presents the distribution of responses for a given topic and a given stance towards that topic.

S3 Robustness Checks
When presenting Figure 2a in the main article, we noted that only 20 out of the 54 features included in our survey were correctly classified by more than half the participants. This section shows that similar trends hold when omitting the participants who had neutral opinions about the topic in question ( Figure S2), when considering only those who had strong opinions ( Figure S3), and when considering only those who reported 8 or above when assessing their need to avoid revealing their stance on Twitter ( Figure S4).  Strongly in favor In favor Neither Against Strongly against Figure S2: The same as Figure 2, but for each topic we exclude participants whose stance towards that topic is neutral.  Strongly in favor In favor Neither Against Strongly against Figure S3: The same as Figure 2, but for each topic we consider only the participants whose are either strongly in favor or strongly against that topic.  Strongly in favor In favor Neither Against Strongly against Figure S4: The same as Figure 2, but for each topic we consider only the participants who reported 8 or above when assessing their need to avoid revealing their stance towards that topic on Twitter.

S4 Computational Complexity of Optimal Hiding
In this section, we analyze the optimization problem faced by an automated assistant whose goal is to hide the evader's opinion from stance detection algorithms by modifying his/her Twitter account. We formulate this problem as follows: Definition S1 (Optimal Stance Obfuscation). This problem is defined by a tuple, X, A, c, b, f ¦, where X 4 X is a binary vector of the evader's features, A is a set of actions available to the evader, c ¢ A R is a function describing the cost of each action, b 4 R is the budget of the evader specifying the total cost that the evader is willing to incur, and f ¢ X r0, 1x is a binary function that classifies the evader's stance based on their features. The goal is then to identify a set of actions A x A to be performed by the evader such that their total cost does not exceed b and the probability of the evader's stance being classified as 1 is minimized, i.e., A is in: where X A is the vector of the evader's features after performing the actions in A.
Here, the evader's goal is for their stance to be classified as 0 instead of 1 by the stance detection algorithm f . We introduce the notion of "cost" to reflect the fact that the evader may be more willing to perform certain actions than others. We also introduce the notion of "budget" to reflect the fact that the evader may have a limit on the modifications they are willing to make to their Twitter profile in return for their privacy.
Next, we analyze the computational complexity of the Optimal Stance Obfuscation problem. In our analysis, we focus on "k-nearest neighbors" as the stance detection algorithm from which the evader wishes to hide. We choose this algorithm not only due to its popularity as a general-purpose classifier, but also due to its closed-form formulation which makes it amenable to theoretical analysis. Despite the simplicity of this algorithm, the following theorem shows that the corresponding problem is NP-complete.
Theorem S1. The Optimal Stance Obfuscation problem is NP-complete given the k-nearest neighbors algorithm when the set of actions is limited either to feature addition or to feature removal.
Proof. Assuming that the actions of the evader can be performed in polynomial time and that the measure of distance used in the k-nearest neighbors algorithm can be computed in polynomial time, the problem is trivially in NP since the algorithm's outcome can be computed in polynomial time.
We will now show that the problem is NP-hard. To this end, we will show a reduction from the NP-complete 3-SAT problem. An instance Y, C¦ of this problem is defined by a set of n variables Y ry 1 , . . . , y n x and a conjunction of m clauses C C 1 H. . .HC m , where each clause C i is a disjunction of exactly 3 literals from y 1 , . . . , y n , 2y 1 , . . . , 2y n .
The goal is to find an assignment of ã and á to variables in Y such that the conjunction C is satisfied.
We will first focus on the setting where the set of actions is limited to feature addition, after which we will describe the modifications of the proof necessary for the case of feature removal.
Based on the given instance of the 3-SAT problem, we will construct an instance of the Optimal Stance Obfuscation problem, and show a correspondence between the optimal solutions of both instances, thereby showing the NP-hardness. To this end, let Y, C¦ be the given instance of the 3-SAT problem, where Y ry 1 , . . . , y n x and C C 1 H . . . H C m . We assume that no clause contains both negative and positive literal of the same variable. Based on Y, C¦, we construct an instance X, A, c, b, f ¦ of the Optimal Stance Obfuscation problem as follows ( Figure S5 presents an example of the construction and the reduction process): • the initial vector of the evader's features is X such that ¾ i Xi$ 0 and ¶X ¶ 2n, i.e., it is a vector of 2n zeroes; • the set of actions A ra 1 , . . . , a n ,ā 1 , . . . ,ā n x, where a i is the action that involves setting the value of Xi$ to 1, whileā i is the action that involves setting the value of Xn i$ to 1; • the cost of every action is 1, i.e., ¾ i c a i ¦ 1; • the budget of the evader is b n; = { 1 , 2 , 3 , 4 } = 1 ∧ 2 ∧ 3 ∧ 4 1 = ¬ 1 ∨ 2 ∨ 3 2 = 1 ∨ 2 ∨ ¬ 4 3 = 1 ∨ 3 ∨ 4 4 = ¬ 2 ∨ ¬ 3 ∨ ¬ 4

3-SAT instance Optimal Stance Obfuscation instance 1
Construct an instance of the Optimal Stance Obfuscation problem 1 = 1 = 1 ...  Figure S5: An example of the reduction from the 3-SAT problem to the Optimal Stance Obfuscation problem as described in the proof of Theorem S1, where the set of evader's actions is limited to feature addition. Solutions to both problems are highlighted in green, while the corresponding positive and negative elements of both problems are highlighted in blue and red, respectively.

3-SAT solution Optimal Stance Obfuscation solution
• f is the k-nearest neighbors algorithm.
As for the details of the algorithm f , we assume that k 2m 1, and assume that the algorithm uses the following distance measure: e also assume that the set of training examples consist of: • 2m examples B 1 , . . . , B 2m where ¾ i ¾ j B i j$ 0 and ¾ i ¶B i ¶ 2n, i.e., every B i is a vector of 2n zeroes; • m examples C 1 , . . . , C m where for clause C i and for j 8 n we set: and for j 7 n we set C i n 1$ C i n 2$ 1, The decision for every B i is 1, while the decision for every C i is 0.
Notice that given the set of actions A, the budget b and the initial features vector X, the evader can set any n positions of the vector X to 1, while keeping the rest as 0. Intuitively, setting the i-th position to 1 for i 8 n corresponds to choosing y i ã in the given instance of the 3-SAT problem, while setting the n i-th position to 1 corresponds to choosing y i á.
Notice also that for a given A x A the distance between X A and any of the examples B i is d X A , B i ¦ 2n. At the same time, the distance between X A and a given example C i is d X A , C i ¦ 2n µ i , where µ i is the number of positions j such that either: • j 8 n, C i j$ 1 and C i n j$ 0 (because of how C i is constructed, this would be the case if and only if C i contains the literal x j ), X A j$ 1 and X A n j$ 0, or • j 7 n, C i j$ 1 and C i j n$ 0 (because of how C i is constructed, this would be is the case if and only if C i contains the literal 2x jn ), X A j$ 1 and X A j n$ 0.
In such a situation we will say that X A and C i match in position j. Hence, the distance between X A and a given C i is smaller than the distance between X A and any B i if and only if X A and C i match in at least one position j 8 n.
Since the algorithm f is the k-nearest neighbors with k 2m 1, and since there are 2m examples B i (with decision 1) and only m examples C i (with decision 0), the algorithm f assigns to X A the decision 0 (desirable for the evader) with maximal probability if and only if X A matches with all examples C i in a least one position.
To prove the NP-hardness, we will now show that the constructed instance of the Optimal Stance Obfuscation problem has a solution if and only if the given instance of the 3-SAT problem has a solution.
Assume that there exists a solution y to the given instance of the 3-SAT problem, i.e., an assignment of values to variables y i such that all clauses in C are satisfied, and let y i denote the value assigned to y i in this solution. Moreover, let A be the set of actions such that a i 4 A if and only if y i ã andā i 4 A if and only if y i á. Since the assignment y is a solution to the given instance of the 3-SAT problem, for every clause C i there exists either a literal y j in this clause such that y j ã, or a literal 2y j in this clause such that y j á. In the first case we have C i j$ 1, C i n j$ 0, X A j$ 1 and X A n j$ 0, while in the second case we have C i n j$ 1, C i j$ 0, X A n j$ 1 and X A j$ 0. Therefore, X A matches with every example C i in at least one position, and it is assigned the decision 0 by the algorithm f with probability 1. Hence, A is a solution to the constructed instance of the Optimal Stance Obfuscation problem.
To prove the other implication, assume that there exists a solution A to the constructed instance of the Optimal Stance Obfuscation problem. Let y i ã if a i 4 A Hā i A , and let y i á ifā i 4 A H a i A . Otherwise, assign the value of y i randomly (as it is not crucial for satisfying the constraints in the 3-SAT problem instance). Since the algorithm f assigns the decision 0 to X A with probability 1, it must match with every C i in at least one position, i.e., for every C i there exists j such that either: • j 8 n, C i j$ 1, C i n j$ 0, X A j$ 1 and X A n j$ 0, or • j 7 n, C i j$ 1, C i j n$ 0, X A j$ 1 and X A j n$ 0.
Because of how C i and y j are constructed, in the first case we have that clause C i in the given instance of the 3-SAT problem contains literal y j and y j ã, while in the second case we have that clause C i contains literal 2y j and y j á. Hence, the assignment y i satisfies every clause C i in C, which makes it a solution to the given instance of the 3-SAT problem.
Therefore, we showed that the constructed instance of the Optimal Stance Obfuscation problem has a solution if and only if the given instance of the 3-SAT problem has a solution. This concludes the proof for the case where the set of actions is limited to feature addition.
In order to obtain the proof for the case of the feature removal, we have to perform the following modifications: • the initial vector of the evader's features is X such that ¾ i Xi$ 1 and ¶X ¶ 2n, i.e., it is a vector of 2n ones; • the set of actions A ra 1 , . . . , a n ,ā 1 , . . . ,ā n x, where a i corresponds to setting the value of Xn i$ to 0, whileā i corresponds to setting the value of Xi$ to 0.
The remainder of the proof follows exactly the same reasoning.

S5 Evaluating Cross-topic Implications
In the main manuscript, we evaluated the impact of our heuristic on a given topic, without considering the possible sideeffect of accidentally influencing one's perceived stance towards the other two topics. In this section, we run a set of experiments intended to analyze such cross-topic implications. To this end, we focus on different subsets of topics; these subsets are: rAtheism, Clintonx, rAtheism, Feminismx, rClinton, Feminismx, and rAtheism, Feminism, Clintonx. For each subset, we calculate the overlap between the features that indicate one's stance towards all the topics in that subset. Intuitively, if this overlap is relatively small, it suggests that changing the features that are related to a topic has a limited impact on the other topic(s). In our cross-topic experiments, we took the 1000 most indicative features for each topic and each stance (in favor vs. against), and computed the overlap between these features for each subset of topics. The results of this analysis can be found in Figure S6. As expected, the overlap increases as we increase the number features taken into consideration. Consequently, to minimize the cross-topic influence, one must limit the number of features that are modified by the heuristic. For example, if the user only modifies 25 features-a modification that effectively hides their stance, as shown in Figure 3 in the main article-none of these 25 would appear among the top 25 features for the other topics. The only exception is the subset rClinton, Feminismx, since the top 25 features related to Hilary Clinton and the top 25 features related to feminism have a single feature in common. These results suggest that there is a trade-off between hiding one's opinion towards a topic and affecting one's perceived opinion towards other topics. They also suggest that, as long as the hiding efforts are not excessive, then the cross-topic influence is likely to be limited.  Figure S6: The overlap between the features that are most indicative of one's stance towards each topic. The x-axis represents the number of highest-ranked features that are considered in the analysis, the y-axis represents the overlap between those features, while different colors represent different subsets of topics. For example, the green line depicts the overlap between the features that indicate one's stance towards atheism and towards feminism. For each these two topics, x 100 represents the 50 highest-ranked features indicating in-favor, and 50 highest-ranked features indicating against, resulting in a total of 100 features per topic. The y-axis would then represent the overlap between the 100 features related to atheism and the 100 features related to feminism.