Exploring user perceptions of deletion in mobile instant messaging applications

Contemporary mobile messaging provides rich text and multimedia functionality leaving detailed trails of sensitive user information that can span long periods of time. Allowing users to manage the privacy implications both on the sender and the receiver side can help to increase conﬁdence in the use of communication applications. In October 2017, one of the mobile messengers with the largest user base, WhatsApp, introduced a feature to delete past messages from communication, both from the sender’s and the recipient’s devices. In this article, we compare the deletion features of 17 popular messaging applications. Implementations of these features widely differ across the applications we examined. We further report on a study with 125 participants conducted in a between-subjects design. We explore users’ preferences for deleting mobile messages, and we investigate how well they comprehend this functionality as implemented in popular messaging applications. We found statistically signiﬁcant differences in users’ understanding of message deletion between our three test conditions, comprising WhatsApp, Facebook Messenger, and Skype. Eighty percent of participants in the WhatsApp condition could correctly assess the effects of deleting messages, compared to only 49% in the Skype condition. In addition, we provide insights into qualitative feedback received from our participants. Our ﬁndings indicate that message deletion is seen as a potentially useful feature that users may be able to use in different ways, including editing messages. Furthermore, users can more precisely estimate the capabilities of a deletion function when its effects are transparently explained in the application’s user interface.


Introduction
As internet-connected smartphones are prevalent nowadays, instant messaging applications on these devices are very popular, resulting in more and more people using mobile messaging apps in their daily communication with their peers [1]. In addition to one-to-one conversations, these apps facilitate group chats and support various message types, such as text, picture, video, or voice messages.
In contrast to face-to-face talks or telephone calls, the course of a conversation in mobile messaging is usually logged by each participant. Logging makes the communication persistent and allows previously uninvolved third parties to retrieve past communication from the message history. Since communication in mobile messengers is often informal, it seems plausible that messages are often of ephemeral nature and not meant to be stored permanently [2].
Moreover, the increasing use of mobile messaging in everyday life carries the risk of accidentally sending messages to the wrong recipient. This can be a serious threat to users' privacy, especially when the communication contains sensitive personal information [3,4]. Even if the availability of a proper revocation mechanism cannot completely eliminate these threats, the risks could at least be reduced when there is a chance to delete such a message before the recipient has read it. Related topics concerning the protection of personal privacy, most importantly the "Right to be Forgotten," have found considerable attention over the last years, especially in Europe, resulting in the establishment of the "General Data Protection Regulation (GDPR)" [5,6].
There are various reasons for users to delete messages described in the following. Users can freely decide to maintain their local message history, but also to delete specific messages from their own devices, e.g., to free memory on the device. In contrast, deleting messages from the recipient's message history is typically a much more difficult issue, especially in decentralized, open systems such as email or nonproprietary instant messaging networks where users can run their own client and server software. However, the most popular mobile messaging apps are part of closed ecosystems, which are not designed for interoperability, and therefore most users rely on client software and servers provided by a particular vendor.
In October 2017, the messenger "WhatsApp" introduced a new feature that allows users to choose whether a sent message is to be deleted only locally or also from the recipient's conversation log [7,8]. If users choose the latter, the message is replaced with a note indicating that the message has been deleted. This also applies to messages the recipient has already read. The release of the "Deleting Messages for Everyone" 1 feature indicates that the actual effect of the deletion functionality had not been explicitly stated before, thus raising the question whether the effects of such functions are apparent to the users. Other popular mobile instant messaging applications such as "Facebook Messenger" and "Skype" present the functionality for deleting messages in a similar fashion but have different effects.
To shed light on users' perception of message deletion, we conducted a user study to investigate whether the participants understand the actual functionality of message deletion in instant messaging applications. In particular, we explore the following research questions on user expectations toward these functions: RQ1 What are users' preferences for the functionality of deletion mechanisms?
RQ2 Do specific implementations of this functionality match users' perceptions, i.e., do users correctly estimate the consequences of a particular message deletion? This is interesting because, right now, users are bound to choices that designers and developers have made-long before when initially building their applications. It is unclear to what extent actual users and their feedback were involved in the underlying decision processes. In order to make devices such as smartphones better agents for their users, the capabilities of applications need to fit the users' needs. In particular, users should not have to face surprises because an effect triggered by their action does not match what they expected the action to do. While our study explores users' preferences and expectations in messaging applications, it can also provide valuable insights for developers to design features in their applications more comprehensible and usable. It has been well-known for almost two decades that failures in user interface design make it impossible for users to apply security features correctly [9][10][11].
Our major findings and contributions in this work are 3-fold: 1. We show that those participants of our study who have deleted messages had various reasons for deleting messages, ranging from spelling correction to withdrawing messages that have been sent mistakenly or that are considered inappropriate in retrospect. 2. Regarding the scope of deletion, our results indicate that users appreciate to be able to select for each individual message whether it should only be deleted from their own device or also from the recipient's, as expressed by more than 40% of participants in our study. 3. Ouinvestigate whether the participants understand r results indicate that the participants can better assess the effects of deleting messages when the functionalities are explained transparently. We reveal that the example implementation of WhatsApp can help developers to improve the user experience of their applications.

Deleting messages
Mobile messaging, i.e., communication using mobile devices such as smartphones via apps such as WhatsApp, Facebook Messenger, or WeChat, has a large user base and is regularly used for personal communication with friends or family [12,13]. Many of these apps offer the possibility to delete messages, while the concrete implementations widely differ between different apps. We investigated the characteristics of the implemented deletion functionality for 17 popular messaging applications (cf. Table 1). We selected apps with a high number of monthly active users [14], concentrating on apps whose primary focus is messaging, and additionally including messengers focused on protecting user privacy, such as Signal or Threema.
Our overview captures the situation as of January 2019, whereas application developers may adapt the functionality for subsequent versions. All properties were examined in a scenario where two individuals engage in a one-to-one conversation, both using a single mobile device such as a smartphone. We identified conceptual differences between the implementations, which we discuss in the remainder of this section.

Local versus global deletion
The effects of deleting a message differ between applications. Except for Google Hangouts, all applications under consideration (Table 1) support "Local Deletion" from the conversation history of the sender's device. Hangouts differs in that it only allows to delete the entire conversation history with the respective contact.
The majority of applications also allow messages to be removed from the recipient's device, denoted "Global Deletion." Popular applications supporting this feature include WhatsApp, WeChat, and Skype.
We say a messenger provides "Separate Functions" if it allows the user to determine the scope of deletion. This property only applies to messengers supporting both types of deleting messages, i.e., locally and globally. If a message can only be removed from all devices at the same time (which applies to Instagram Direct, Skype, and Snapchat), this messenger does not provide separate functions. Among those messengers providing separate functions, we observe two flavors of separation. WhatsApp, KakaoTalk, and Wire let the user choose between the two options "Delete for me" and "Delete for everyone" (Fig. 1) in a prompt appearing after the message has been selected to delete. WeChat, Line, and Viber provide two distinct menu items for these functions.
There are several messengers explaining the effects in a dialog that has to be confirmed. For example, Facebook explains that the user can only "delete [their] copy of the message." In KakaoTalk, where the user can explicitly choose between "Delete for me" and "Delete for everyone," selecting local deletion triggers a reminder that "the message will only be deleted from your chatroom and will still be visible to your friend(s)." In contrast, Skype only provides one "Delete" functionality that deletes the message for all participants in the conversation, without providing any further explanation or choice. Given these different levels of detail in explaining functionality, we think that the effects of the deleting mechanism of a particular messenger are not always obvious.
When a message is deleted globally, several messengers also confirm this in a dialog, along with possible limitations of the functionality. Line and KakaoTalk explain that deleting for everyone may not work depending on the application version used by the other participants in the conversation. WeChat and WhatsApp also show this hint, but only when the functionality is used for the first time.
The naming of local and global deletion can also be used to make users aware of their different functionality. While local-only deletion is mostly called "Delete" (except for "Hide message" in GroupMe), there are various names for deleting a message globally. In Instagram Direct and Line this feature is called "Unsend," WeChat names its global deletion "Recall." In some cases, deleting a message globally is only available within a specific time span, ranging from 2 minutes in WeChat to 24 hours in Line.
If a user can identify when messages in a conversation have been deleted, we say the messenger leaves "Residuals." The applications providing local-only deletion never show any residuals when a message is deleted locally. When a message is deleted globally, KakaoTalk, Line, Snapchat, WeChat, and WhatsApp leave a hint in place of the former message within the conversation, stating that a message has been deleted. The hint appears on the devices of all participants in the conversation. Wire follows a different approach: On the recipient's device, it replaces the message with only the name of the sender; it does not leave any residuals on the sender's side.

Deleting quoted messages
Several messaging applications include features to reply to a message, i.e., to send a new message in which the original one is embedded as a quotation. We examined how the messengers offering a reply functionality handle the interplay of replies and deleted messages in three scenarios.
1. Alice sends a message to Bob, Bob replies to that message, and then Alice locally deletes the original message from her device. 2. Alice sends a message to Bob, Alices deletes it locally, Bob then replies to the message. 3. Alice sends a message to Bob, Bob replies to that message, and then Alice globally deletes the message. This case only applies to applications offering global deletion.
Our observations are listed in Table 2. Only in Line and Wire the quoted message is deleted along with the original message in all three scenarios. Instead of the original message, both applications embed a notification stating that the message is not available. In Telegram, the quoted message is only deleted in the second scenario, i.e., when the original message is deleted before the recipient has replied. In this case, Telegram only shows the reply as a standalone message. In all other messengers, deleting does not affect quotations of a message. In the second scenario, Signal shows an embedded notification along with the quote, stating that the original message could not be found, while still displaying the original message.
Notes: The numbers of monthly active users are listed in millions. The "Residuals" properties denote whether a messenger leaves a hint indicating that a message has been deleted (both for local and global deletion). The "Separate Functions" property only applies if the messenger supports both types of deletion. When an application supports messages that automatically disappear, this is denoted "Ephemeral Messages." All properties apply to the latest available application versions as of January 2019.

Additional properties
Three messengers, Skype, Viber, and Wire, allow users to edit the content of a message in retrospect. In Skype and Viber, we were able to edit messages 2 days after they had been sent, but we could not determine if these messengers have a time limit for editing. All three applications indicate if a particular message has been edited. The majority of messengers we considered allow users to delete received messages from their device. This functionality applies to all messengers except for Hangouts, which does not allow deleting individual messages at all, and the messengers that do not allow localonly deletion, i.e., Instagram Direct, Skype, and Snapchat. Deleting a received message only takes effect locally, i.e., the recipient cannot delete the message from the sender's device.
Five messengers we considered support some concept of "Ephemeral Messages" but differ in their implementations. An ephemeral message is automatically deleted from all devices in the conversation after a specific time span. Instagram Direct provides ephemerality on a per-message-basis and limits it to media content such as photos, which automatically disappear 10 seconds after being displayed. In other applications, ephemerality is configured on a per-conversation-basis. Both participants can change the conversation settings that always affect both sides. Signal and Telegram allow to set time spans between a few seconds and one week. The time span can be changed during the course of the conversation, but the new time span only applies to future messages and does not affect older messages. Similarly, Snapchat users can set a timer for the conversation but only have two options, immediately after viewing and after 24 hours. Hangouts allows users to turn off the conversation history, but the message expiration time is neither explicitly specified nor configurable. Contrary to the other applications, ephemeral messages in Hangouts do not expire individually but in groups.
The functionality to delete an entire conversation is an additional feature supported by all mobile instant messaging applications we considered, except for GroupMe. Deleting a conversation only takes effect locally and can basically also be achieved by manually deleting all messages in the conversation. However, in Hangouts, deleting a conversation is the only way to delete messages locally.

Research questions
The different messaging applications comprise a variety of implementations of deletion functionalities. We consider this a broad selection of offers made by the application developers to their users. From the opposite perspective, this directly raises the question which features users actually prefer for their everyday conversations. Therefore, following our first research question (RQ1), we study how commonly deleting is applied by users, whether there is a need for this functionality, and in particular, which options users prefer, inspired by the currently available options.
In our second research question (RQ2), we examine whether users can correctly assess the capabilities of deletion functions and whether we can identify differences in distinct implementations of these functions. We expect that the variety of implementations of deletion mechanisms is confusing for users. For example, deleting a message in Skype removes messages from all participants' conversations, whereas the identically named function in, e.g., Facebook Messenger only removes messages from the sender's log. Line   Messenger is more transparent by providing a prompt stating that the message is only deleted locally and requiring the user to confirm before the message is eventually deleted. It is unclear how an average user who has not explicitly explored the actual impact of deleting in a particular app can objectively assess what happens when a message is deleted.

Method
In this section, we discuss the design and method of our study in detail. We conducted a between-subject study comprising three test conditions (study groups). During the study, the participants in each condition interact with one particular mobile instant messenger and answer 16 questions on a laptop.

Test conditions
For our practical study, we assigned participants one of three test conditions based on the different instant messaging applications: • Skype (version 8.13) deletes messages from the message logs of all participants in the conversation; • Facebook Messenger (version 151.0) allows the sender to delete messages from their own conversation history only; • WhatsApp (version 2.18) allows users to select whether to delete the message just from the sender's conversation history or for all parties involved.
These messengers were selected because they have a large user base and they implement different behaviors of the message deletion functionality, thus representing all sound variations of deleting messages (Table 1). During the assignment, we did not take into account if a participant had used the respective messenger application before. All versions correspond to the most current versions available as of February 2018.

Study design
Our study comprised five steps: (i) Introduction, (ii) Practical task: writing and deleting a message, (iii) Questionnaire Part I: preferences for deleting messages, (iv) Questionnaire Part II: reconsidering effects of deleting, and (v) Debriefing.
After the introduction, the participants completed an initial practical task in which they sent and deleted a message using an instant messaging application. Subsequently, participants answered a twopart questionnaire, during which the interviewers showed them the result of the experiment. We explain these steps in more detail in the following.
Step 1. Introduction In the first step, we explained each participant the reason for and purpose of the study as well as the study procedure. Furthermore, we informed them that we did not collect personal data, how long participation typically took, and how we compensated them.
Step 2. Practical task: writing and deleting a message Next, the first stage of the practical task followed. We asked the participants to write, send, and delete a message using a specific instant messaging service.
For this task, we gave the participants a mobile phone ("Samsung Galaxy S6" running "Android 7") with the specific messaging service already opened to keep the task of sending and deleting a message as simple as possible-we did not ask the participants to use their own mobile devices. We used our lab mobile phone in order to create a more controlled environment where the messaging service was installed and working, and the contact phone number was already in the contact list. Participants were asked to type an arbitrary message, but if they struggled to come up with a message of their own, we suggested them to send "hello." On a second phone, we then showed the participants that the message had arrived at the recipient's device, and asked them to delete the message on the device they had used to send the message. If necessary, we assisted the participants to figure out how to delete the message. At this point, we did not show them the effect of deleting the message on the recipient's device-what we deferred until Step 4.
Step 3. Questionnaire Part I: expectations of deleting messages At this point, the participants started to fill out a questionnaire. The first part of the questionnaire consisted of a few warm-up questions about the participants' usage of mobile devices and instant messaging, and whether they deleted instant messages and why. It further contained questions about the participants' expectations concerning the experiment-whether the message was deleted everywhere or only from the sending device. Additionally, we asked the participants which deletion behavior they would prefer. Demographic data were also collected in this part of the questionnaire.
The full set of questions and their answers can be found in the Appendix.
Step 4. Questionnaire Part II: reconsidering effects of deleting Before the participants proceeded with the second part of the questionnaire, we revealed the outcome of the experiment by showing the participants the message history of the recipient's device. This allowed them to see the effect of deleting the message on the recipient's side.
Subsequently, the participants continued with the second part of the questionnaire, specifically focusing on questions about the message deletion and whether it behaved as expected. In the last two questions, we asked the participants whether there should be limitations for deleting messages from the recipient's message history.
These questions were primarily addressed to participants of the WhatsApp and Skype conditions since these messengers allow deleting messages from the recipient's message history.
Step 5. Debriefing After these final questions, we thanked the participants for their participation. When they had any questions about the study, we answered them in this step.
Finally, we deleted the entire message history to preserve the privacy of the participants and to allow the next participant to start with an empty message history.

Pilot study
In December 2017, we conducted a pilot study to evaluate the procedure of the study, determine the duration per participant, and test the comprehensibility of the questions. We tested the study on 8 colleagues from a co-located department (75% male, 25% female, age ranging from 25 to 59 years). The participants did not have any prior knowledge of the study and its goals. As a result, three questions were removed from the questionnaire as they turned out to be somewhat redundant or too imprecise. We also decided to let participants fill out the questionnaire on laptops instead of structured interviews or paper-based questionnaires to avoid errors during data collection and simplify administering the responses.

Study protocol, recruitment, and demographics
Over a period of 3 days in February 2018, we collected a total of 135 responses from visitors to the main cafeteria of Ruhr University Bochum, located in the largest metropolitan area in Germany. The main cafeteria is centrally located and frequented by students and staff from all departments. We set up two tables in relatively quiet corners near the two main entrance doors and recruited participants from the passing students and staff. This setup allowed for rather quick recruitment of participants but may also have biased the sample. However, as the cafeteria serves all departments, we expected participants with a wide variety of backgrounds.
Study participants could choose if they participated in the study in English or German. Ninty-three percent of the participants chose to answer in German. Completing the study took on average 5 minutes, and we compensated each participant with two chocolate bars regardless of whether they completed the study or aborted early.
Although all participants completed the study, we discarded 10 responses because of incomplete answers, resulting in 125 responses we used in the evaluation. The participants were randomly assigned one of the three conditions.
When the participants answered the questionnaire, the interviewers kept their distance in order to not create additional pressure, while staying available for questions. We did not record the exact number of questions raised by participants, but we estimate that less than five participants asked for clarification of survey questions.
We collected demographic data from the study participants. A total of 32% of the participants stated to be female and 64% identified as male. The median age is 25 years in a range from 18 to 75 years. Table 3 summarizes the response to the demographic questions.
We also asked the participants to self-estimate their proficiency in using mobile devices on a five-point scale from beginner (1) to expert (5). According to their answers, more than 60% of the participants rated their experience in using mobile devices as four or five.

Response preparation
For three questions (i.e., Q5, Q14, and Q16), our participants were asked to provide their responses as free text. We used a coding approach to prepare the responses for analysis. Three authors independently created codes based on the free-text responses and assigned each response one or more codes resulting in three independent codings per question.
Subsequently, one of the coders created a codebook for each question based on the three individual codings. The codebook comprised a list of keywords, each accompanied by a short descriptive sentence. Creating the codebook required minor modifications such as renaming or merging particular keywords.
All changes in the individual codings have been documented and required approval of the respective coder. When the authors had used different codes for a response, this response was assigned the union of codes assigned by the three coders. In the last step, the codebook was approved by the three coders.

Ethical considerations
Our university does not have an IRB or ethics board that covers the type of our study. However, we have taken great care to adhere to principles of ethical research [15]. Our study was designed such that it did not contain deceiving questions. In case participants asked immediately after deleting the message how the deletion affected the recipient's message history, we asked them to be patient until they had completed the first part of the questionnaire. Furthermore, we did not store any data that would allow us to link participants to their responses.
In the recruitment process, each participant was informed that they were participating in a scientific study, about the purpose of the study, the possibility to withdraw at any time without giving any reasons, and that no personally identifying information would be stored.
At the beginning of the questionnaire, the participants were shown an introductory text summarizing the information previously given orally during recruitment.
We also informed the participants about the estimated duration of the study (approx. 5 minutes) and their compensation (two chocolate bars). However, some participants required up to 10 minutes or more because they provided detailed answers to the free text questions. Answering these questions was not mandatory and could be omitted. The demographic questions were also completely optional, and the participants could skip them without providing an answer or choose the "I prefer not to answer" option.

Results
In this section, we present the results of our study along our two research questions. We report our findings on the participants' preferences and expectations of deleting messages. The results from the practical task to delete a message are presented and analyzed as to whether and to what extent users correctly assess the actual capabilities of deletion functionality.

User preferences for message deletion
To answer our first research question (RQ1), we consider the participants' preferences for the functionality of deletion mechanisms as expressed in the questionnaire. Here we are faced with subjective wishes and concerns of users. First, we analyze the prevalence of deleting messages, i.e., if users actually use message deletion features in their daily lives, how often, and with what intentions they use them. Subsequently, we analyze users' preferences regarding several features of deletion to find out what technical implementation they think best fits their needs.

Frequency of message deletion
To learn about the prevalence of message deletion, we directly asked the participants how often they delete messages in instant messaging (Q4: How often do you delete instant messages?). Response options ranged from "Several times a day" to "Almost never," including "I don't know." The distribution of responses is shown in Fig. 2. On the one hand, we can see that, on average, message deletion is a relatively infrequent event: 56.8% of users (k ¼ 71) almost never delete messages, and only 10.4% (k ¼ 13) of participants use it on a daily basis.
Among those 39.2% (k ¼ 71) of participants who have used message deletion before, we see that usage patterns widely differ. We find about equal numbers of participants using message deletion "a few times a year/month/week" as well as "several times a day" (each approx. 10%).
Thus, the results from our sample of participants do not show a clear trend regarding the prevalence of message deletion. They also do not capture differences between actual message deletion and mere editing, which was provided as a reason for deletion by 13.6% (k ¼ 17) of our participants.

Reasons to delete messages
Participants could describe their reasons for deleting messages in a free text response, since we expected a wide variety of answers.
We derived 11 codes from our participants' responses that we assigned 68 times to 42 responses, following the coding approach described in section "Response preparation." Three responses were left out as the coders agreed that they were too ambiguous. The frequency of each code is shown in Fig. 3.
Revising messages was the most frequently named reason to delete messages ["Usually, I don't [delete messages], except for typos" 2 (P115)]. We coded these responses as "revision" (k ¼ 17) since they indicate that the participants delete messages with the intention of replacing or editing them instead of removing them. Participants stated that they revise messages "because they contain mistakes (typos)" (P33) or to "reconsider the wording" (P34). Conversation consistency may also play a role when deleting messages: "If I misspelled something and nobody has read it yet" (P94).
Participants also stated that they delete messages "if . . . inappropriate" (P129) and "sometimes . . . because I have said something inappropriate" (P74). They consider some of their messages as "inappropriate" (k ¼ 8) in retrospect and delete them for this reason.
Most of the responses coded as "inappropriate" indicate some sort of "regret" (k ¼ 7) of having sent the message in the first place. Explanations reported by participants include that they "texted without thinking" (P122) or because of "spontaneous emotions that [they] regretted afterwards" (P36).
Messages are also deleted when they are considered "obsolete" (k ¼ 7), e.g., if they are "no longer relevant" (P76) or "the circumstances under which I had sent the message have changed . . ." (P52).
Another closely related reason to delete messages is if messages have been sent to the "wrong recipient" (k ¼ 6). Participants described that they deleted messages because they had sent them to the "wrong recipient or the message was stupid and [they] wanted to take it back" (P40). Other similar responses were: "Sent to the wrong group/person or just because it was not clear enough" (P2) and "typo/send to wrong person" (P110).
Another frequently mentioned motivation for deleting messages is to free memory on the mobile device. Nine of the participants gave responses including "lack of memory" (P39), "no memory" (P53), or "just because they consume some memory" (P84) that suggest limited "storage capacity" (k ¼ 9) as a reason to delete.
Text messages do not consume much memory, but "multimedia" content such as images or videos does. We assume that the participants mentioning "storage capacity" issues also had "multimedia" content in mind. However, only three responses explicitly referred to "multimedia" content, e.g., P25, who wrote "media files that consume too much memory or group chats that are not interesting." In summary, we can distinguish three major categories: users delete messages to make corrections, free storage, or for privacy reasons. We consider the following reasons for deletion as privacyrelated: messages being "inappropriate," "obsolete," "regretted" by the sender, "sent mistakenly," or to the "wrong recipient." Thus,

Count
What are your reasons for deleting messages? 2 In the following, German-language quotes by participants were translated to English by the authors. 54.4% (k ¼ 37) of the codes are somehow privacy-related, distributed across 61.9% (k ¼ 26) of the (n ¼ 42) free-text answers we have received, that is 20.8% of all 125 participants.

Preferences for deleting messages
We have asked users about their preferred variant for deleting messages (Q7: Which of the following do you prefer when you delete a message?), i.e., from which message histories they prefer messages to be removed. Participants could pick one of four predefined answers. The results are listed in Table 4. The majority of participants (84%, k ¼ 105) preferred either the message to be deleted from both the sender's and recipient's logs or to be given the choice between global and local deletion whenever they delete a message. These numbers are supported by our observations of the study participants who were assigned to the WhatsApp condition in the experiment. 36 of them chose the "Delete for Everyone" option, while only three decided to remove the message from their message history only. This indicates two things: First, our results suggest that the majority of users who have decided to delete a message prefer deletion to have global effects. Second, there also appears to be a need for a selection mechanism on a per-message-basis, which implies that users desire more granular functionality and also higher transparency when they delete messages.

Preferences for notifications
We further asked participants how they perceived notifications in the contexts of messaging and deletion. First, we asked users about notifications whether a message has been read. The 39 participants who supported such a limitation were asked to further specify the type of limitation. (Q16: How should the delete function be limited?) We coded their free-text answers into six different categories. While we did not categorize seven answers as we agreed that these were too ambiguous or not related to the question, we assigned a total of 35 tags to 32 different answers. The distribution of the answers is illustrated in Fig. 4.
The most frequent proposals were to either allow deletion only for "unread" messages (k ¼ 12) or to limit the deletion functionality based on "time" (k ¼ 11).
Arguments in favor of restricting deletion to unread messages include possible manipulation ["because they have not yet caused a reaction on the recipient's side. If any message can be deleted, the recipient can probably be led to believe they had only imagined the deleted message to exist, which could be exploited . . ." (P69)] and discomfort with conversations partially disappearing from the recipient's conversation history ["It makes me feel uncomfortable if I cannot look up conversations that have already taken place" (P25)].
For time-restricted deletion, the suggestions for how long the functionality should be available range from "5 minutes" (P89) to "24 hours," with 1 hour being the most common proposal, suggested four times. Participants reported to favor time-restricted deletion because of "typing errors or [if] the message was supposed to be sent to another recipient" (P39) as opposed to "things . . . being distorted in the long run" (P56) and the need of a consistent message history to "prove things" (P67). These answers suggest that participants see a connection between the time and purpose of deletion. Another participant argued that the recipient could be expected to have read the message after a certain time has passed, so being able to delete it later would "no longer [be] worth it" (P43).
Five participants expressed that they opposed message deletion in general as "some information could be important for another person" (P128) or because "it creates an illusion of deletability that cannot be satisfiedjust think of screenshots" (P101). Discomfort with others manipulating information already stored on one's device was also mentioned ["It shouldn't be possible to delete data you know to exist on your device. Especially if you are not notified of it." (P112)].
Four participants proposed to restrict deletion to the "latest message" only. This restriction is actually implemented in the WeChat messenger (which we did not cover in our study) and appears interesting in that it keeps the conversation history consistent. Deleting a message after one or more follow-up messages can change the entire context of the subsequent conversation.
Quite interestingly, two participants suggested that for each conversation all partners should be required to "consent" whether and under which circumstances messages can be removed from the conversation history. People should be able to "pick the messenger that best matches [their] needs" (P112), including the need for a deletion functionality. "Before the conversation begins, every participant Count How should the delete function be limited?  should be able to determine if . . . and for how long message should be able to be deleted. . . . If the circumstances permit or require it, a new conversation could be generated (as in 3-person settings on Facebook)" (P112).

User perception of message deletion
Next, we analyze users' perception of message deletion, i.e., what our participants expect to happen when they delete a message, and whether the actual outcome of deleting a message matches what they expect. We use these analyses to answer our second research question (RQ2).

Expectation matching in real implementations
We asked our participants about their opinion from which devices their message has been deleted. (Q6: We just asked you to send a message and then to delete it. What do you think-where has the message been deleted?) Participants could choose each of the two devices involved in the conversation and specify additional answers as free text. We present the summary of answers to Q6 in Table 5.
In Facebook Messenger, the participants who only selected the sender estimated the outcome of the deletion correctly. In Skype, the outcome is correctly estimated when participants selected both the sender and the recipient. In WhatsApp, there are two options, sender only and both parties, depending on what participants actually selected when they deleted the message in the experiment-we consider both cases a correct prediction. We did not capture if participants had experience with the messenger they tested or if they knew about its actual functionality before.
In Step 4 of the study, we disclosed to the participants whether the message they had sent and deleted was still available on the recipient's device. We then asked the participants if the result matched their expectations. (Q13: Does this result match your expectations?) Additionally, the participants could provide a free text answer to specify differences between their expectations and the result. (Q14: Why does this result match your expectations? Why not?) The results are summarized in Table 6. Overall, 66.4% of the participants (k ¼ 83) stated that the observed behavior matched their expectations; however, the results depend on which messenger was used. For Facebook Messenger 71% agreed, for Skype 49% agreed, and for WhatsApp 80% agreed.
We used a chi-square test of independence to test if these differences among the three messengers are statistically significant and found a significant influence (v 2 ¼ 9:1468, df ¼ 2, P ¼ 0.01032). For post-hoc testing, we used chi-square tests on pairs of messengers and applied corrections for multiple testing. We used Bonferroni correction as a conservative choice, as the number of tests is small. Among the post-hoc tests on pairs of messengers, we found significant differences between Skype and WhatsApp (v 2 ¼ 6:8806, df ¼ 1, P ¼ 0.02613). Participants in the WhatsApp condition could better assess the effects of message deletion by 30% than participants in the Skype condition. A summary of the test results is shown in Table 7. We used a significance level of a ¼ 0:05.
Evaluating the results for Q14, we realized that Q13 can refer to multiple dimensions of message deletion instead of just the question whether the message was deleted from the recipient's device or not. Some participants stated that they had correctly anticipated the message being deleted (or not) but were surprised by other aspects of the process such as deletion notifications or (the lack of) other residuals on the recipient's device. Consequently, they provided a "no" answer even though they had correctly predicted which devices the message would be deleted from. This makes it harder to assess the binary answers to Q13; in retrospect, a more fine-grained answer space should have been provided. We address this issue in the Limitations section.

Reasons for non-matching expectations
Prior to the experiment, we expected a higher rate of expectation matching, particularly in the WhatsApp condition, where participants were able to explicitly choose which message history they would like to delete the message from. Therefore, we analyze the reasons why the expectations did not match. Participants could specify in detail the reasons why and how the result differed from what they had expected. (Q14: Why does this result match your expectations? Why not?) We received 32 free-text answers and coded them, again, as described in section "Response preparation." One participant noted to have expected a prompt to choose whether the message should be deleted locally or globally. The coders agreed to drop this answer as it is not related to the disclosure of the result at the end of the experiment. We categorized the remaining 31 answers into five disjoint categories as illustrated in Fig. 5.
The majority of responses (k ¼ 20) simply referred to surprises because a message was deleted (Deletion) or because it was not deleted (No Deletion). Messages deleted from the recipient's device surprised participants who did not expect global deletion to work at all, only with unread messages ["I thought if a message has already been read, it can no longer be deleted from the recipient's device" (P40)] or only with certain messengers ["I though it only worked in   deleted permanently" (P57) or stated that "once data has been transferred, the recipient should be able to manage it on their device autonomously" (P133). Conversely, eight participants in the Facebook Messenger condition stated they had expected the message to also disappear from the recipient's device, with one explicitly referring to their experience with other messengers ["I use Telegram, so I'm used to being able to delete messages from both devices . . ." (P94)], and another questioning the concept of local deletion because "it contradicts the purpose of deletion if the recipient can still read the message" (P73). Another participant in this condition thought technical problems "such as a failed connection to the server or (probably intentional) client malfunction" (P97) were to blame for the message still existing on the recipient's device.
Ten participants referred to the delete notification as the reason why the outcome did not match their expectations. Two of them had expected the recipient to "at least" (P8) be notified that a message has been deleted "because . . . this happens from experience, e.g., on WhatsApp" (P105). In turn, eight had expected the message to disappear without a trace and were surprised by the delete notification because it "sparks mistrust" (P36) or "doesn't matter . . . and it doesn't convey any message" (P48).
The reasons for mismatched expectations do not apply to all three messengers equally, e.g., only participants who used Facebook Messenger could expect a deletion that did not occur (k ¼ 10). Quite interestingly, the answers indicate that the expectation mismatch partially originated from the notification that a message has been deleted (Skype: 3, WhatsApp: 5).

Limitations
We have planned and conducted our study thoroughly. However, our sampling approach introduces certain limitations. We have reached a large number of participants with moderate effort, but this resulted in a sample biased toward younger people who have (at their own judgment) higher than average experience with mobile devices. For better general applicability of our results, a sample with a more representative age distribution and more objective assessment of experience would be desirable.
The study environment for the practical task and answering the questionnaire was rather busy compared to an in-lab setup, which is, however, more representative for normal smartphone usage.
In our survey, several questions only offered binary (yes/no) answer options. Most of the binary answers were used in the warm-up questions. Only the answers to Q13 were used for quantitative evaluation, and these are supported by the qualitative answers to Q14. Answer ranges based, e.g., on Likert scales might have been a better instrument to capture varying levels of people's opinions. Our goal was to obtain a coarse estimation of expectations on message deletion, not necessarily representing all possible aspects. The use of a survey with predominantly closed questions facilitated the analysis compared to interviews, at the expense of limiting the participants' ability to express differentiated answers.
The test conditions were also limiting the applicability of our findings, in that we only tested three different implementations of messengers and did not cover all deletion features such as ephemeral messages. The three messengers we tested are, however, among the most popular ones and comprise different realizations of the deletion functionality.

Discussion
The term "deleting messages" can be ambiguous as it can be unclear whether messages are removed from the sender's or the recipient's log, or both. Considering our second research question (RQ2), users could not always correctly estimate the consequences of a particular deletion action. Participants in our study could not correctly assess the actual effects of deleting a message in an application that does not adequately explain its functionality. This mismatch could possibly be remedied by improving the interface design, i.e., better explaining the consequences of selecting "delete." A convenient example for this is WhatsApp's implementation that lets users directly decide whether they prefer a message to be deleted only locally or also on the recipient's side. While WhatsApp's implementation is the most transparent one, it also meets the desire expressed by a considerable number of participants. Regarding our first research question (RQ1), 84% preferred to be able to delete messages from the recipient's device, either by always deleting on both devices or by having an explicit choice between local and global deletion.
We have seen that our participants consider message deletion a useful feature they would use in a diverse range of ways. Since 13.6% of our participants indicated that they deleted messages to revise them and send them again instead of actually removing them, we recommend application developers to consider including a dedicated "edit" feature into their applications.
It is still to be investigated whether a more clear description of the delete function on the user interface can better clarify where messages are deleted, even when no choice is given to the user. One example could be the Line messenger, which explicitly advises a user that the respective message is only deleted from the user's local conversation history and that the recipients will still be able to read it.
It is interesting that a majority of participants (68.8%) did not express an explicit desire for limits on the delete functionality. Participants in favor of such limitations explained that they desired preserving a consistent conversation. The limitation to seven minutes originally implemented by WhatsApp appears appropriate according to the majority of reasons users stated for deleting messages. This time span is sufficient to correct or improve messages and to withdraw messages that have been sent mistakenly or to a wrong recipient. However, it remains unclear how this limitation was determined. In early March 2018, the time limit for message deletion in WhatsApp was extended to 68 minutes and 16 seconds (i.e., 2 12 seconds) [16], which suggests that the rationale for the concrete time limit may also be purely technical. Another interesting proposal-yet not implemented in any of the messenger applications we have examined-might be consent-based deleting. In such a scenario, messages can only be deleted if all participants in the conversation have explicitly stated so beforehand, on a per-conversation basis. Such a mechanism could balance the individual interests of both the sender (to keep control of potentially sensitive data) and the receiver (to keep track of the conversation). Unlimited availability of the functionality to delete messages could evoke malicious deletion, e.g., to alter the context of a conversation retroactively. Consent-based deletion might help to reduce these threats.
These examples show how the user experience of messaging applications could be improved, in particular, concerning message deletion. Application developers could provide a notification where a message has been deleted from, or implement a dialog for explicit selection, to improve users' understanding of the capabilities of deletion functionality.

Future work
In the future, the study could be repeated with a larger and more heterogeneous sample with different age ranges and educational backgrounds to review our findings for generalizability beyond a university context. Such an extended study should consider if and to what extent our study scenario can be applied to a wider population and also take into account their communication behavior. A study replication could also cover additional messaging applications and other communication tools, and capture a wider range of deletion features, e.g., ephemeral messages.
While we considered aspects such as reasons for deletion, frequency, etc., only independent of each other, future research could explore dependencies, i.e., whether people who delete messages more frequently have different reasons for it.
While initially gathering information about the variety of deletion features, we observed that there are diverse names for these functions across different messengers (e.g., Delete, Recall, Unsend). This raises the question whether user perception differs depending on how the functionality is named.

Related work
We explored how users perceive message deletion in instant messengers on mobile devices. Murillo et al. [17] investigated how users understand data deletion in general and in the scenarios of email and social media. Their interview study with drawing tasks found that users' view of deletion is either purely UI-based or backendaware. In the email setting, half of the participants understood that shared copies of a deleted message may still exist in other locations, as opposed to only a few participants in the social media scenario with its complex data sharing mechanisms. Reasons for deletion were found to differ as well: while the main reason identified in both scenarios was data being outdated or no longer needed, limited storage and the removal of embarrassing content were only reported in the email and social media settings, respectively. Reasons for deletion being diverse is similar to our findings.
Our findings on reasons for message deletion are similar to the results of Almuhimedi et al. [18] who conducted a large-scale study on deleted tweets. We have found that regret and content being considered inappropriate were among the major reasons for users to delete instant messages. Several studies have already considered users' regret about their postings in Online Social Networks and examined reasons, consequences, and coping strategies [19][20][21], but without focusing on instant messaging.
Rost et al. [22] implemented a history-less mobile messaging app and explored how a lack of conversation history influences communication practices in mobile instant messaging. Their trial study found that if users have only access to the latest message received or sent, they perceive their communication as being more similar to a conversation in person and requiring more effort. Users reported that the lack of history made them feel more at ease with what they wrote, and they considered this feature useful to exchange and protect private information.
Apart from users' perceptions, questions about the security of the implementations of such features arise. Messenger security, e.g., with focus on their end-to-end encryption, has been well studied and shown to have flaws [23][24][25]. A broad overview of security features in instant messaging is provided by Unger et al. [26].
For several years, the privacy paradox has found remarkable attention-users' attitude concerning their online privacy differs from how they actually behave in online contexts [27][28][29]. One explanation for this phenomenon might be that privacy is just considered a feature that can be traded in for other valuable goods or services [30][31][32].
Other findings suggest that users cannot review the entire consequences of their behavior because the systems they use do not adequately inform them. Abu-Salma et al. [33] examined the use of various security features in Telegram, along with a usability inspection, and revealed that a sparse presentation of multiple security alternatives could lead to confusion among users. In a cloud computing context, Ramokapane et al. [34] have found that users fail to delete contents because of poorly designed interfaces. Acquisti et al. [35] provide an overview of how users can be better assisted in their security choices.
Despite shortcomings of user interfaces, users were shown to adapt their use of online communication tools to the situation. Sleeper et al. [36] found how users select different messaging or communication channels depending on the purpose or target audience. Ruoti et al. [37] showed that users reflect their online posture in the light of never being perfectly safe on the Internet.
Reasons for content deletion or de-referencing have also been explored in the context of social networks [38,39]. In this scenario, users have only little control over the dissemination of their content once it has been released. Disassociation and hiding were identified as appropriate alternatives to deletion [40].

Conclusion
In this work, we studied users' preferences for the deletion functionality in instant messengers. We also investigated whether users could accurately determine from which conversation histories their messages were removed upon performing a deletion. We tested three different messengers (WhatsApp, Skype, Facebook Messenger) in a user study with 125 participants.
If deletion features were available, we saw participants use them in different ways, including editing messages. The majority of our participants preferred to be able to remove messages also from a recipient's device.
Deletion functionality in WhatsApp is different from the other two messengers in that users can explicitly select whether they want to delete a message on their local conversation history or also from the recipients' logs. We found that this led to a 30% higher rate of correctly predicting the effects of deleting messages. We suggest that developers of other instant messaging applications describe the effect of message deletion more explicitly, e.g., by providing a dialog for selection as in WhatsApp or include a notification indicating where the message has been deleted from.

Q14
Why does this result match your expectations? Why not?
• Free text 7. Questionnaire Page 6 Q15 Do you think the delete function should be limited (e.g., only messages of the last hour, only the latest message, only unread messages could be deleted)?

Q16
How should the delete function be limited? Please specify.
• Free text