People, places, and time: a large-scale, longitudinal study of transformed avatars and environmental context in group interaction in the metaverse

As the metaverse expands, understanding how people use virtual reality to learn and connect is increasingly important. We used the Transformed Social Interaction paradigm (Bailenson, J. N., Beall, A. C., Loomis, J., Blascovich, J., & Turk, M. (2004). Transformed social interaction: Decoupling representation from behavior and form in collaborative virtual environments. Presence: Teleoperators and Virtual Environments , 13 (4), 428–441) to examine different avatar identities and environments over time. In Study 1 ( n ¼ 81), entitativity, presence, enjoyment, and realism increased over 8 weeks. Avatars that resembled participants increased synchrony, similarities in moment-to-moment nonverbal behaviors between participants. Moreover, self-avatars increased self-presence and realism, but decreased enjoyment, compared to uniform avatars. In Study 2 ( n ¼ 137), participants cycled through 192 unique virtual environments. As visible space increased, so did nonverbal synchrony, perceived restorativeness, entitativity, pleasure, arousal, self-and spatial presence, enjoyment, and realism. Outdoor environments increased perceived restorativeness and enjoyment more than indoor environments. Self-presence and realism increased over time in both studies. We discuss implications of avatar appearance and environmental context on social behavior in classroom contexts over time.

The metaverse, persistent immersive virtual worlds often viewed through virtual reality (VR) headsets, is receiving increasing attention in industry, media, and academia. What makes these virtual worlds unique is that people can easily transform their avatar's appearance or environmental context. With the touch of a button, a person can be anyone and anywhere. As the medium moves from gaming arcades and laboratories to consumers' homes and universities, it is becoming increasingly important from both a theoretical and societal well-being standpoint to understand how these Transformed Social Interactions (TSI, Bailenson et al., 2004) influence people and their relationships with others, especially over prolonged periods of time.
One way of understanding the metaverse is through literature on collaborative virtual environments (CVEs). While researchers have been studying CVEs for decades (for a recent review, see Aseeri & Interrante, 2021), several important challenges have limited these investigations (for exceptions, see Khojasteh & Won, 2021, Moustafa & Steed, 2018, and Bailenson & Yee, 2006. First, due to the high expense and technical challenges involving VR implementation, researchers are often forced to rely on small sample sizes and either oneshot or a limited number of sessions (Lanier et al., 2019). This raises the question: As the novelty wears off over time, will people have a "better" or "worse" experience as they adapt to the medium? Second, while current metaverse platforms feature groups of various sizes, the majority of research on CVEs feature dyads or occasionally triads (for a review, see Han et al., 2022), which take on very different turn-taking, gaze behavior, and other group dynamics from larger groups.
The current research used two longitudinal field experiments to systematically examine multiple sets of larger groups and how social dynamics evolve over time in CVEs. From a statistical standpoint, we take a multivariate approach to observe how multiple constructs change over time and how they may be interrelated. From a theoretical standpoint, we extend the TSI paradigm to CVEs. Study 1 examines the visual appearance of avatars, a construct that has received much attention in the literature overall (for a review, see Ratan et al., 2020), by investigating the effects of assigning avatars customized to look like the self or uniform avatars to mask visual identity cues. Study 2 examines anenvironmental context by leveraging the ability to easily create VR scenes that differ in spaciousness and setting.

Transformed social interaction
In CVEs, people are represented by avatars, or visual representations of the self. CVEs track various verbal and nonverbal signals and map them onto these digital beings. CVEs can also systematically filter avatars' behavioral actions and physical appearance, amplifying or suppressing features and nonverbal signals in real-time. As predicted by TSI, the behaviors and appearance of avatars in CVEs can have a drastic impact on people's perceptions, as well as their persuasive and instructional abilities (for a review, see Bailenson et al., 2008b). Of the three categories outlined by TSI: self-representations (i.e., avatars), contextual situations (i.e., virtual environments), and sensory capabilities, this current study focuses on the first two categories.
While research on TSI has occurred over two decades, there are still gaps in how these transformations affect people. For example, a recent article by Szolin et al. (2022) systematically reviewed the literature of avatar transformation on behavioral and attitudinal changes in the context of video games. The authors underscored how previous studies fail to separate different types of virtual environments, such as commercially available videogames and bespoke research-focused virtual environments. They concluded that this field of research is still in a relatively new stage of psychological investigation, and that further empirical investigation of avatar transformation and different contexts is needed to understand how they affect people both during and after the virtual experience. In the same vein, there has been growing literature on how other transformations lead to changed behavior, such as how body manipulations produce social, perceptual, and behavioral effects (for a review, see Gonzalez-Franco & Peck, 2018). However, most studies do not examine these transformations in the context of networked, social interactions or examine how the virtual environment itself may influence behaviors and attitudes.

Transforming self-representations
Previous research shows that even subtle transformations in avatars' behavioral or visual (e.g., photographic or anthropomorphic) resemblance can impact the way people engage with and perceive others (Blascovich, 2002;Nowak & Biocca, 2003;Yee et al., 2011). For example, Roth et al. (2018) manipulated the type of gaze-natural, hybrid, synthesized, and random-exhibited by avatars in dyadic social interactions. They reported that, based on trends found in perceived virtual rapport, interpersonal attraction, and trust, natural gaze was superior, and synthetic and hybrid gaze were better than random gaze. Other research found that, overriding avatars' actual head movements to mimic motions and increase synchrony led to greater liking between interaction partners (Bailenson & Yee, 2005). Finally, a longitudinal study by Bailenson and Yee (2006) implemented TSI conditions to examine the impact of visual and behavioral similarity on group cohesion and task performance. They found that in certain tasks, groups performed better when they saw their own face on their partners' avatars (i.e., shared visual similarity). They also reported that, even with TSI manipulations, entitativity increased over time. However, the small sample size of this study precluded the generalizability of these data.

Transforming the situation
In VR, each person can have a unique viewpoint and sensory information, as it is possible to vary the number of visible avatars, spatial arrangement, or even colors in a scene. TSI has shown that these factors alter sensory perception, social interaction, and performance (Bailenson et al., 2008b). For example, Hasenbein et al. (2022) transformed the seating position of a student in a virtual classroom and manipulated how many rows of peer learners were between themselves, the teacher, and the screen. They used eye-tracking to show subsequent changes to attention and found that different seating arrangements led to different focus of gaze transitions on their virtual peer learners, teachers, and the screen, as well as different gaze distributions. Such spatial transformations of students' seat positions have also been shown to influence memory (Bailenson et al., 2008a) and persuasion (McCall et al., 2009). Similarly, Miller et al. (2021) found that teams of designers had more positive interactions when working in a VR conference room compared to working in a VR garage. Other factors, such as the size of a virtual room and what kind of objects are placed within it, have also been shown to influence outcomes such as attention and navigation .
Visible space: panoramic and constrained environments One environmental factor of interest pertains to the amount of space visible from a given viewpoint. Previous research has shown differences in outcomes resulting from being in a constrained or panoramic (i.e., environments in which people can see wide and far) environment. For example, taller (i.e., more spacious) ceilings have been shown to prime feelings of freedom and encourage a more global, abstract way of processing information (Meyers-Levy & Zhu, 2007). Similarly, compared to confining environments, spacious environments were found to foster more self-disclosure (Okken et al., 2012). Finally, compared to smaller rooms, larger rooms were found to promote more engagement in informal learning among students (Wu et al., 2017).
Setting: outdoor and indoor environments Natural settings have been shown to have beneficial effects (Bratman et al., 2015;Lederbogen et al., 2011). Living in environments where people have access to green spaces have shown to lead to lower mental distress and greater well-being in the long term (White et al., 2013). Even shorter, simulated exposure to nature, such as viewing slides or pictures containing landscapes, have been shown to reduce stress (van den Berg et al., 2014), improve self-esteem (Barton & Pretty, 2010), and physiological restoration (Ulrich et al., 1991), and increase the ability to focus (Berto, 2005).
VR is a medium that is uniquely suited for simulating natural settings. For instance, Anderson et al. (2017) showed that 360 videos of natural environments, compared to those of indoor environments, led to greater relaxation and reduced negative affect. VR's immersion enhances the restorative potential of mediated natural environments on physiological well-being (for a review, see Browning et al., 2020). To our knowledge, the effect of virtual nature has not been studied in the presence of other avatars within CVEs.

Group behavior
When individuals are linked by social relationships that make up a group, they become interdependent and influence each other's behaviors, emotions, and perceptions (Janis, 1973;Milgram, 1963;Sherif, 1937). The moment a collection of individuals perceives itself as a group, a construct known as entitativity (Campbell, 1958), a series of psychological and interpersonal changes, occurs (Forsyth, 1990;Harasty, 1996). Given the social nature of CVEs, to meaningfully understand the effects of virtual experiences in such systems, it is important to investigate not only individuals, but also individuals as members of a group.
Nonverbal dynamics also play a critical role in groups. During the course of face-to-face communication, interactants tend to be in "synchrony," or be in similar states or have similar behaviors at similar times (Condon & Ogston, 1966). VR's ability to track motion allows for the examination of motion synchrony. Higher motion synchrony has been shown to be related to greater rapport between teachers and students (LaFrance, 1979) and more creativity (Won et al., 2014). Previously, researchers have manipulated synchrony with avatars displayed on screens (Oh Kruzic et al., 2020) and with agents (Tarr et al., 2018) and avatars (Sun et al., 2019;Miller et al., 2021) in VR.

Time
Human processes are complex and rarely, if ever, exist in isolation. It is critical to understand the processes by which human behavior and activity emerge as different components of a system, and influence and change one another over time (see Dynamic systems theory, Newman & Newman, 2020). Study of individuals repeatedly exposed to media stimuli and adapting to new technologies provide a unique opportunity to gain valuable insight about both the long-term and short-term processes invoked by those media (e.g., Bailenson & Yee, 2006;Brinberg et al., 2021). Inferences based on single-session exposures or obtained through analysis of just a few sessions when participants are adjusting to the novelty of a medium can be plagued with technical difficulties and be misleading. A number of researchers have theorized that, while first-time VR users may feel unfamiliar with the medium, with use, their experience in VR should improve (e.g., Loomis, 1992). Other perspectives suggest the opposite, arguing that the habituation effect can cause what was initially novel to diminish (e.g., Lombard & Ditton, 1997).
TSI is a paradigm that should be particularly sensitive to repeated exposures over time, as people learn to identify, adapt, and accommodate to the changes. For instance, while seeing a uniform avatar on everyone in a room may be jarring at first, perhaps with time people habituate. Alternatively, the effects of similarity may amplify. An early study by Bailenson and Yee (2006) followed three triads of participants for 15 sessions over a 10-week period as they collaborated for approximately 45 min per session. In addition to looking at time, the researchers manipulated two types of TSI-behavioral and visual similarity of group members. Results demonstrated changes in task performance, subjective ratings, nonverbal behavior, and simulator sickness over time as participants became familiar with the system. Furthermore, even in the presence of TSI, where there was a mismatch between types of behavior, entitativity was high over time, suggesting that people are able to retain symbolic meaning even with the starkest degree of social cues. However, the small sample reduced power, and further research is needed to examine how people evolve and respond to these transformations of people and place.

Current study
The current work aims to investigate how transformations of who you are and where you are evolve over time. Using TSI as its central framework, this work addresses two of the categories: self-representations and contextual situations, through two large-scale, longitudinal field experiments of how VRbased transformation of avatar appearance (Study 1) and environmental context (Study 2) influence interactants in group settings. The overarching research questions are as follows: Given the critical role that self-presentations have on how individuals perceive their experience and their communication partners in a virtual environment, we manipulated the visual appearance of the avatars such that members of the same group were represented by either avatars that resembled their physical self or avatars that were uniform across all members. Similarly, given the critical role that where you are and what kind of environment you are surrounded by can lead to differing outcomes, we manipulated the type of virtual environment in which group members interacted. Across both studies, we measured both behavioral and self-report variables central to understanding people's experiences in virtual environments, such as presence and realism. We additionally collected measures that aim to understand group outcomes, such as synchrony and entitativity. Using linear growth models with time-invariant and time-varying covariates, we built two models to understand how these outcomes change across time and vary based on individual differences.
Both field experiments were housed in a 10-week course about VR and its intersections with various disciplines. During each course, participants were provided with a VR headset, which they used to attend eight weekly instructorled, medium-group size discussion sessions (n1 ¼ 9-14, n2 ¼ 5-8). The nature of each study being housed in a course allowed for naturalistic intervention of our variables of interest and unobtrusive measurement of behaviors. Nonverbal behavior was measured by recording 18 degrees of freedom of movement from each participant (e.g., pitch, yaw, and roll of head and both hands) to compute motion synchrony for each group. Attitudes of each participant's experience were measured using weekly surveys. We additionally explored how much these outcomes may be mediated by individual differences.
Our key contributions to the field are as follows: • Time plays a critical role in people's experience in social VR. Across both studies, self-presence and realism increased over time and with VR use. In Study 1, entitativity, social and spatial presence, enjoyment, and realism increased over time and with VR use. • Who you are and what you look like matter in social VR.
When people are represented by avatars that resemble their physical selves, they are more nonverbally "in sync" with other people, and view the virtual environment and people as more realistic. On the other hand, having a uniform avatar leads to greater enjoyment. • Where you are matters in social VR. When people are in more spacious virtual environments, they are more nonverbally "in sync" with other people, report feeling greater restoration, entitativity, pleasure, arousal, presence, enjoyment, and realism than when they are in constraining environments. This is an especially important finding as it is difficult to study very large indoor spaces in the real world, and the current findings are novel. Similarly, when there are elements of nature and people are in outdoor environments, they report feeling greater restoration and enjoyment than when they are in indoor environments.

Study 1
Study 1 focuses on the transformation of avatar appearance and investigates the following research questions: first, how will sharing visual similarity with group members influence nonverbal synchrony and entitativity over time? Second, how will perceived self, social, and spatial presence change over time and with different avatars? Third, how will perceived enjoyment of interacting in a virtual environment change over time and with different avatars? Finally, how will perceived realism change over time and with different avatars?

Method
Participants Participants were 101 university students enrolled in a 10week course about VR. At the beginning of the course, students were invited to participate in an Institutional Review Board-approved (IRB) study of how repeated exposure to VR influenced their individual and group behavior. While all students who were part of the course took part in all the VR activities, only those who consented to participate in the study had their data included in the study. Of the 101 students in the course, 93 consented to participate in the study. The 81 participants who participated in five or more of the eight weekly sessions (M ¼ 47, F ¼ 30, Other ¼ 2, declined to or did not respond ¼ 2) were between 18 and 58 years old (M ¼ 22.26, SD ¼ 5.19; n 18$23 ¼ 68, n 24$29 ¼ 7, n 30$35 ¼ 3, n 35$40 ¼ 1, n 55$60 ¼ 1, declined to or did not respond ¼ 1) and identified as Asian or Asian-American (n ¼ 30), White (n ¼ 21), African, African-American, or Black (n ¼ 11), Hispanic or Latinx (n ¼ 9), multiracial (n ¼ 5), Middle Eastern (n ¼ 1), and declined to or did not respond (n ¼ 4). Participants had varying levels of experience with VR, with 48 (59%) having never used VR before. Prior to the course, 38 participants were not familiar with anyone in their discussion group, and others reported knowing one (n 1 ¼ 13) or more members (n 2 ¼ 12, n 3 ¼ 1, n 4 ¼ 2, n 5 ¼ 2). Safeguards implemented to ensure privacy and consent included review both by the IRB and a second university ethics organization, and thirdparty oversight of the consent process and data collection.
Hardware and VR equipment Participants were provided with Oculus Quest 2 headsets (standalone head-mounted display with 1832 Â 1920 resolution per eye, 104.00 horizontal FOV, 98.00 FOV, 90Hz refresh rate, and six-degree-of-freedom inside-out head and hand tracking, 503 g) and two hand controllers (126 g) for use in their personal environment. Of the 81 participants, 2 owned personal headsets (PC-based Valve Index) and participated using those devices.
Virtual environment: ENGAGE Weekly sessions were hosted in ENGAGE, a collaborative social VR platform designed for education. Every week, the virtual environment consisted of a private (password restricted) "Engineering Workshop" room that was a large, open-space area that allowed participants to walk/teleport freely, create 3D drawings, write on personal whiteboards/stickies, add immersive effects/3D objects, and display media content. The large space accommodated the use of 3D audio, which allowed for splitting off into smaller groups without audio overlap.
Avatar: self vs. uniform In ENGAGE, participants are represented by human avatars (Figure 1). Participants embodied one of two possible representations in the avatar conditions. In the self-avatar condition, participants were able to customize their avatars with various combinations of outfits, gender, age, skin complexion, weight, hairstyles, and facial features. In the uniform avatar condition, all participants used a pre-selected avatar within the customization options possible within ENGAGE. Through pre-testing and iteration, we chose an avatar that was gender and racially ambiguous. Prior to the study, we conducted a survey showing screenshots of five different avatars that varied in gender, skin tone, and facial features, and asked participants (n ¼ 27) on their perceptions of each avatar's gender presentation, racial category, and whether they felt comfortable being visually represented by the avatar in a virtual environment. We created the uniform avatar based on features that resulted in the highest perceived neutrality in gender, racial, and comfort in representation. A detailed description of the items, results, and sample avatars from the pre-test can be found in Appendix A. The final uniform avatar had no hair (to avoid racial marking tendencies, MacLin & Malpass, 2001) and had a neutral (given the available option) skin color (to be racially ambiguous). 1

Procedure
Participants selected a discussion group that fit their schedule and availability, resulting in eight consistent groups that met weekly for 8 weeks and varied in size from 9 to 14 members (M ¼ 12.63, SD ¼ 1.77). Two training sessions were held in the first 2 weeks of the course, during which participants were taught how to use the ENGAGE interface and navigate the virtual environment. During these training sessions, the teaching staff was available to assist in real-time both via video conferencing and within the virtual environment when participants faced technical mishaps. There was also a simultaneous Zoom call open during all discussion sessions, where participants could pull off their headsets and ask for technical support ( Figure 2).

Transformed avatars and environments in VR
The structure of weekly activities varied with course content (Table 1). The sessions involved discussion with either the whole group or smaller groups of three to four, and typically followed a three-question format where participants were asked what they liked, what they were concerned about, and how what they learned might influence the future. 2 The discussions allowed for preservation of nonverbal spatial constraints like interpersonal distance, head orientation, and spatialized sound (Figure 3, bottom right). Some sessions leveraged physical activity affordances of ENGAGE, including working together on a shared object (Figure 3, top left), creating new computer graphic content together (Figure 3,  Each week, groups were assigned to one of the two avatar conditions (self vs. uniform) via a Latin square randomization scheme that ensured that each group spent 4 weeks in each condition, counterbalanced across weeks and meeting days ( Table 2). In each session, all members of a group wore either their self or uniform avatar to attend the discussion.

Measures
Multiple aspects of individuals' behaviors and attitudes were measured at the start of the study (pre-test), and during and after each of the eight weekly sessions (motion and weekly surveys; see Table 3).

Weekly repeated measures
Individual ratings were obtained after each weekly VR session through analysis of behavioral motion time series or surveys. To reduce fatigue, repetitiveness, and burden, item sets for each construct were purposely designed to be brief (Conner & Lehman, 2012).

Nonverbal behavior: motion synchrony
Following prior work on synchrony in VR (Miller et al., 2021;Sun et al., 2019), motion synchrony was first calculated   Transformed avatars and environments in VR for each pair of individuals in a session. Specifically, we calculated the Spearman correlation between all measurements of two individuals' avatar head speeds obtained every onethirtieth of a second during the 30-min session (30 Hz) for all offsets of 62.5 s (not including offset ¼ 0). These 150 correlations were then averaged to obtain a synchrony measure for each pair of participants for each week (pre-registration at https://osf.io/3c4aj/). Synchrony for a given individual on a given week was then calculated as the average of the synchrony scores for all pairs from a given session that included that individual. A detailed description of how motion synchrony was calculated can be found in Appendix B.

Entitativity
Entitativity was measured by seven items adapted from Rydell and McConnell (2005) using a 7-point Likert scale (1 ¼ Strongly disagree, 7 ¼ Strongly agree). Sample items include "My discussion group is important to its members" and "Members of my discussion group are affected by the behaviors of other members." Weekly entitativity scores were calculated as the mean of the seven items (Cronbach's a ¼ 0.9), with higher scores indicating greater entitativity.
Self, social, and spatial presence Self, social, and spatial presence were measured by items adapted from prior work Oh et al., 2019) using a 7-point Likert scale (1 ¼ Strongly disagree to 7 ¼ Strongly agree). Self-presence was measured as the level of agreement with two items: "I felt like my avatar's body was my own body," and "When something happened to my avatar, I felt like it was happening to me." Social presence was measured as the level of agreement with two items, "I felt like I was in the same room as my classmates," and "I felt like my classmates were aware of my presence." Spatial presence was Group 1 Self Self Uniform Uniform Self Self Uniform Uniform Group 2 Self Uniform Self Uniform Self Uniform Self Uniform Group 3 Uniform Self Uniform Self Uniform Self Uniform Self Group 4 Uniform Uniform Self Self Uniform Uniform Self Self Group 5 Self Self Uniform Uniform Self Self Uniform Uniform Group 6 Self Uniform Self Uniform Self Uniform Self Uniform Group 7 Uniform Self Uniform Self Uniform Self Uniform Self Group 8 Uniform Uniform Self Self Uniform Uniform Self Self Note. This design ensured that each group experienced each condition once and that each condition appeared equally across the weekly schedule. measured as the level of agreement with the two items: "I felt like I was really there inside the virtual environment" and "I felt as if I could reach out and touch the objects or people in the virtual environment." Weekly scores for each of the three types of presence were calculated as the mean of the two items, with higher scores indicating greater perceived presence. Internal consistencies (calculated using Spearman-Brown formula, as recommended for two-item measures, Eisinga et al., 2013), across all participants and weeks were 0.86 for self-presence, 0.80 for social presence, and 0.85 for spatial presence.
Enjoyment Enjoyment was measured as the level of agreement with two items: "How much did you like interacting in the virtual environment?" and "How much fun did you have in the virtual environment?" using a 5-point Likert scale (1 ¼ Not at all, 5 ¼ Extremely). Weekly scores for enjoyment were calculated as the mean of the two items (Spearman-Brown coefficient ¼ 0.91), with higher scores indicating greater enjoyment in the virtual environment.

Realism
Perceived photorealism of the virtual environment and people, which refers to the rendering quality of the image, was measured weekly by a single item adapted from Nowak et al. (2009) using a slider scale (0 ¼ Cartoon-like, 100 ¼ Photorealistic). We used a one-item scale that focuses on one of the dimensions of realism, photorealism. There are multiple dimensions of realism that have distinct effects on people's perceptions of the mediated environment and characters. In this study, the most critical dimension was the level of realism of the avatar that dealt not with whether it was a fantasy character or could occur offline, but instead the quality of the imagery. Because the original scale included other items that may relate to other dimensions, those items were excluded.

Individual differences measures
Individual differences measures were obtained during the pre-test and through analysis of motion data obtained throughout the entire study period.

Prior relationships
The number of discussion group members individuals were familiar with prior to the course was measured at the start of the study (e.g., 0, 1, 2, 3 people), to evaluate if there was an influence of having prior familiarity with any group members on how the dependent variables evolve over time ( Prior VR use Individuals' prior experience with VR was measured at the start of the study. Individuals were asked if they had ever used a VR headset before (1 ¼ Yes, 0 ¼ No), and if they had, how many times they had experienced VR (n 0 ¼ 41, n 1 ¼ 6, n 2 ¼ 6, n 3 ¼ 7, n 3þ ¼ 20, declined to or did not respond ¼ 1).

Group identification
Individual ratings for group identification, a person's identification to a group they belong to, such as an organization, club, or sports team, were measured at the start of the study using eight items adapted from an in-group identification scale and an organizational identification scale (Leach et al., 2008;Mael & Ashforth, 1992). Sample items, each answered using a 7-point Likert scale (1 ¼ Strongly disagree, 7 ¼ Strongly agree), included: "The fact that I am part of my group is an important part of my identity" and "When I talk about my group, I usually say 'we' rather than 'they.'" Individual group identification scores were calculated as the mean of the eight items (Cronbach's a ¼ 0.89), with higher scores indicating greater identification with the group Other individual differences Additional individual differences predictors were examined in our preliminary models, including gender, computer and online learning self-efficacy, loneliness, Zoom fatigue, and video game usage, but were eventually trimmed from the reporting because none of these variables were related to baseline levels, rates of change, or avatar effects for any of the seven outcomes.

Data analysis
Individual differences in how individuals' behaviors and attitudes changed across the 8 weeks and in relation to the type of avatar (self vs. uniform), and how these effects were related to individual differences in prior relationships, prior VR experience, and group identification were examined using linear growth models with time-invariant and time-varying covariates (Grimm et al., 2016). Small between-group variance suggested use of a two-level structure with the repeated measures nested within individuals. Specifically, each of the seven weekly repeated measures outcomes were modeled as where the outcome of interest for person i at occasion t, outcome ti (e.g., social presence) is modeled as a function of a person-specific intercept, b 0i , a person-specific linear slope, b 1i , that indicates rate of change over time, a person-specific avatar effect, b 2i , that indicates the difference between avatar conditions, and residual error, e ti that is assumed normally distributed with standard deviation r e . The person-specific intercepts, linear slopes, and avatar condition effects are simultaneously modeled as where c 00 and c 01 describe the linear trajectory of change for the prototypical individual, c 20 describes the prototypical effect of the uniform avatar manipulation; c 01 , c 02 , and c 03 , indicate how prior relationships, prior VR experience, and group identification, respectively, are related to individual differences in the initial level; c 11 and c 12 indicate how prior relationships and prior VR experience are related to individual differences in rate of change; and u 0i , u 1i , and u 2i are residual 8 Transformed avatars and environments in VR unexplained differences that are assumed multivariate normal distributed with standard deviations r u0 , r u1 , r u2 , and correlations r u0u1 , r u0u2 , and r u1u2 . All models were fit to the data in R using lme4 (Bates et al., 2015) with restricted maximum likelihood estimation, incomplete data treated as missing at random, and statistical significance evaluated at alpha ¼ 0.05. Preliminary models allowed for moderation of the avatar effect, but the week Â avatar interaction was not significant in any of the seven models and so was removed. In a few cases where the data did not support estimation of all random effects, the u 2i term was removed. After the main models were run, a variety of follow-up models were used to check sensitivity and robustness of results. These included an examination of the random effects structure through expansion of the residual error terms so that they could be time-specific (i.e., removing the homogeneity of error assumption) and sensitivity to potential outlier observations. In all cases, the pattern of results remained intact. Thus, results from the more parsimonious models are reported.

Results
Results from growth models with time-varying predictors [week and avatar (uniform ¼ 1 vs. self ¼ 0)] and timeinvariant predictors (prior relationships, prior VR experience, and group identification) are presented separately for all seven outcomes (synchrony, entitativity, self, social, and spatial presence, enjoyment, and realism). Plots of the raw data overlaid with relevant prototypical trajectories are given in Figure 5.

Synchrony
The prototypical participant's synchrony decreased from an initial value of c 00 ¼ 0.0381, p ¼ .006 (on the À1 to 1 correlation scale) at a rate of c 10 ¼ À0.0034, p < .001, per week. There was a significant effect of avatar manipulation on synchrony, such that participants synchronized less, c 20 ¼ À0.0122, p < .001, in sessions with uniform avatars than in sessions with self-avatars. There was no evidence that individual differences in prior relationships, prior VR experience, or group identification were uniquely related to baseline levels of synchrony (ps > 0.21), or rates of change in synchrony (ps > 0.14). Figure 4 indicates the relationship of synchrony to time offset.

Entitativity
The prototypical participant's entitativity increased from an initial value of c 00 ¼ 5.010, p < .001 (on a 7-point scale) at a rate of c 10 ¼ 0.059, p ¼ .002 points per week. There was no evidence that the avatar manipulation influenced entitativity, c 20 ¼ À0.022, p ¼ .62. A prototypical trajectory showing how entitativity changed over time is in Panel A of Figure 5. Individuals with more prior relationships had higher baseline levels of entitativity, c 01 ¼ 0.22, p ¼ .04, as evident in the contrast between the blue solid (þ1SD on prior relationships) and dashed (À1 SD on prior relationships) lines in Panel A of Figure 5. There was no evidence that individual differences in group identification or prior VR experience were uniquely related to baseline levels of entitativity (ps > 0.07), or that individual differences in prior relationships or prior VR experience were uniquely related to the rate of increase in entitativity (ps > 0.59).

Presence
Self-presence The prototypical participant's self-presence increased from an initial value of c 00 ¼ 3.75, p < .001 (on a 7-point scale) at a rate of c 10 ¼ 0.101, p ¼ .004 points per week. There was a significant effect of the avatar manipulation, such that individuals reported lower self-presence when using uniform avatars than self-avatars, c 20 ¼ À0.21, p ¼ .021.

Social presence
The prototypical participant's social presence increased from an initial value of c 00 ¼ 5.23, p < .001 (on a 7-point scale) at a rate of c 10 ¼ 0.068, p ¼ .014 points per week. There was no evidence that the avatar manipulation influenced social presence, c 20 ¼ 0.055, p ¼ .40. Individuals with higher group identification had higher baseline levels of social presence, c 03 ¼ 0.25, p ¼ .021, as evident in the contrast between the green solid (þ1 SD on group identification) and dashed (À1 SD on group identification) lines in Panel C of Figure 5. There was no evidence that individual differences in prior relationships or prior VR experience were uniquely related to baseline levels of social presence (ps > 0.30).

Spatial presence
The prototypical participant's spatial presence increased from an initial value of c 00 ¼ 4.33, p < .001 (on a 7-point scale) at a rate of c 10 ¼ 0.083, p ¼ .014 points per week. There was no evidence that the avatar manipulation influenced spatial presence c 20 ¼ 0.069, p ¼ .38. There was no evidence that individual differences in group identification, prior relationships, or prior VR experience were uniquely related to baseline levels of self (ps > 0.35) or spatial (ps > 0.106) presence, or rates of increase in self (ps > 0.63), social (ps > 0.49), or spatial (ps > 0.43) presence.
Prototypical  . Effect of avatar on synchrony. This plot demonstrates that as the time offset of motion signals shifts away from zero (i.e., as one looks toward the right and left away from the center), synchrony (Y-axis) decreases. In this plot, synchrony for each group in each session is traced as a separate partially transparent line (60 total). The average of all sessions for a given avatar condition is the darker line, with the ribbon indicating 95% confidence intervals based on the underlying distribution. Each line is produced as the average of all unordered pairs in that session (from 6 to 78, M ¼ 36.8, SD ¼ 17.9), which is itself calculated from about 30 min of data per participant.

Enjoyment
The prototypical participant's enjoyment increased from an initial value of c 00 ¼ 3.057, p < .001 (on a 5-point scale) at a rate of c 10 ¼ 0.061, p ¼ .002 points per week. There was a significant effect of the avatar manipulation, such that individuals reported greater enjoyment during weeks when using uniform avatars than self-avatars, c 20 ¼ 0.16, p ¼ .011. Prototypical trajectories showing how enjoyment changed over time for individuals who alternated weekly between the two avatar conditions are shown as bold black lines in Panel E of Figure 5. Individuals with more prior relationships had higher baseline levels of enjoyment, c 01 ¼ 0.22, p ¼ .023, and more prior VR experience had higher baseline levels of enjoyment, c 02 ¼ 0.24, p ¼ .035, as evident in the contrast between the colored lines (blue prior relationships, yellow prior VR experience; þ1 SD solid, À1 SD dashed) in Panel E of Figure 5. There was no evidence that individual differences in group identification were uniquely related to baseline levels of enjoyment (p ¼ .37). Although there was no evidence that individual differences in prior relationships were uniquely related to rate of increase in enjoyment (p ¼ .96), the enjoyment of individuals with more prior VR experience did not increase as much as those with no prior VR experience, c 12 ¼ À0.044, p ¼ .0079, as seen in differential rates of increase of the yellow solid and dashed lines.

Realism
The prototypical participant's perception of realism increased from an initial value of c 00 ¼ 35.62, p < .001 (on a 0-100, cartoon-like to photorealistic scale) at a rate of c 10 ¼ 0.88, p ¼ .057 points per week. There was a significant effect of the avatar manipulation, such that individuals reported lower realism (i.e., more "cartoon-like") when using uniform avatars than self-avatars, c 20 ¼ À2.028, p ¼ .035. Prototypical trajectories showing how realism changed over time for individuals who alternated weekly between the two avatar conditions are shown as bold black lines in Panel F of Figure 5. Individuals with more prior relationships had higher baseline levels of realism, c 01 ¼ 5.89, p ¼ .0106, as evident in the contrast between the blue solid (þ1 SD on prior relationship) and dashed (À1 SD on prior relationship) lines in Panel F of Figure 5. There was no evidence that individual differences in group identification or prior VR experience were uniquely related to baseline levels of realism (ps > 0.41), or rate of change in realism (p ¼ .108). The realism of individuals with more prior relationships increased less than that of individuals with fewer prior relationships, c 11 ¼ À0.704, p ¼ .0405, as evident in the contrast between the slopes of the blue solid and dashed lines in Panel F of Figure 5.

Discussion
Study 1 examined the role of time and transformed visual appearance on participants' experience and group dynamics. Every week for 8 weeks, 81 participants, separated into eight groups, met for approximately 30 min in a CVE to engage in a discussion on the course material. Overall, the results showed that almost all measures, including entitativity, presence (self, social, and spatial), enjoyment, and realism increased over time. The remaining measure, synchrony, decreased over time. These effects underscore the critical role that time plays in how people's experience in VR evolves. Given this, it is possible that once participants adapt to the medium and are no longer uncomfortable with the novelty of Individual trajectories (raw data) are indicated by the light gray lines. Model-implied prototypical trajectories are indicated by the thick black lines and are shown for two hypothetical cases where the avatar conditions alternated weekly (and thus produce oscillations). When individual differences were related to baseline or rate of change, additional model implied trajectories for individuals 1 SD above (solid color) and 1 SD below (dashed color) the average score are indicated by thick colored lines. the technology, they can reap the advantages that VR and CVEs provide and feel more presence and connectedness.
The investigation of synchrony demonstrated that motion synchrony occurs even when mediated in VR, consistent with previous research. It also indicated that synchrony both decreased over time and was lower in the uniform avatar condition (i.e., visually similar to one another) compared to the self-avatar condition. This may mean that synchrony serves a balancing function, where synchrony acts as a tool to increase entitativity when needed (Dale et al., 2020). Indeed, it is possible that transforming nonverbal behavior to induce synchrony can improve entitativity (Bailenson & Yee, 2005). However, future work should examine these possibilities.
Furthermore, when participants were in the uniform avatar, they had lower motion synchrony, reported lower selfpresence, and perceived the virtual environment and others as more cartoon-like (less photorealistic), but reported greater enjoyment interacting in the virtual environment. Furthermore, while entitativity did increase over time, visual uniformity did not have an effect on entitativity. Similarly, while those who had prior relationships with group members did start with a higher level of entitativity, there was no evidence that this individual difference was uniquely related to the increase in entitativity.
Returning to the TSI paradigm, having limited cues about others' offline bodies in a virtual environment may make differences among group members more salient and interfere with the group identification process, though this does not hold true over time. It is possible that sharing identical visual features with everyone in the group creates a more recreative environment (i.e., leading to lowered photorealism), which may place less stress on how an individual is presented in front of others and less emphasis on individual behavior, ultimately leading to a lowered sense of self-presence and greater enjoyment. If all members of the group look the same, the stress of individuality and being present in the environment may be distributed across group members. The visual cue that every member of the group shares identical features may lower an individual's sense of ownership of their self and embodiment, affecting their sense of self as an individual more than how it affects their identification with the group.
Visual uniformity or similarity is often taken into consideration when wanting to create a stronger sense of group identity (Kim, 2009). However, how this transformation influences social interactions and behavior in virtual environments with time and use has remained open to question. It could be argued that visual appearance used in certain contexts can serve specific purposes in shaping social interactions. Given avatar appearance did not have an effect on entitativity and lowered motion synchrony, it may not make sense to use visual cues as a unifier for group identification. Conversely, avatar appearance did have an effect on variables such as self-presence, which, given its role in immersion, and in turn, attention in and connection to the environment, it may be unfavorable to have a uniform avatar in a group setting and suppress individuals' visual cues. At the same time, if the goal of social interactions is for enjoyment purposes, having uniform avatars may allow people to focus less on their individual role in a group setting and more on enjoying the task at hand. However, we note that further research is needed to better understand how such transformations impact enjoyment. While we designed our own measurement of enjoyment, it may require a more nuanced understanding to draw definite conclusions about how it interacts with shared visual cues.
Lastly, we found that it is important to consider individual differences in how people's experience in VR changes over time, as evidenced by our findings that individual differences accounted for different initial baselines and differences in how people's experiences evolved to varying degrees.

Study 2
Complementary to Study 1's focus on transformation of avatar appearance, Study 2 focuses on the transformation of environmental context. Based on the preliminary findings of Study 1, we generated and pre-registered hypotheses related to time and the virtual environment for Study 2 (pre-registration at https:// osf.io/s37xc). As Table 4 lays out, given the beneficial effects that being in spacious, panoramic environments, and outdoor, natural environments provide, we hypothesized that participants will be able to interact with one another more freely in panoramic environments than in constrained environments. We anticipate that this increase in interaction and engagement will foster a greater sense of entitativity and enjoyment. Similarly, outdoor, natural environments have been shown to have restorative properties, which should improve perceived restorativeness for these environments.
In this study, participants at each weekly session were exposed to one of four possible types of virtual environments (2 spaciousness Â 2 setting conditions): a panoramic outdoor environment, a panoramic indoor environment, a constrained outdoor environment, and a constrained indoor environment. Along with the dependent variables examined in Study 1, Study 2 examines the influence of time and virtual environment on additional variables such as perceived restorativeness and affect (pleasure and arousal).

Method
Participants Participants were 171 university students enrolled in a 10week course about VR. At the beginning of the course, students were invited to participate in an IRB-approved study of how repeated exposure to VR influenced their individual and group behavior. While all students who were part of the course took part in all the VR activities, only those who consented to participate in the study had their data included in the study. Of the 171 students in the course, 158 consented to participate in the study. The 137 participants who participated in five or more of the eight weekly sessions (M ¼ 78, F ¼ 59) were between 18 and 49 years old (M ¼ 20.9, SD ¼ 2.78; n 18$20 ¼ 62, n 21$23 ¼ 71, n 24$49 ¼ 5) and identified as Asian or Asian-American (n ¼ 47), White (n ¼ 41), multiracial (n ¼ 19), African, African-American, or Black (n ¼ 12), Hispanic or LatinX (n ¼ 8), Native Hawaiian or other Pacific Island (n ¼ 5), Indigenous/Native American, Alaska Native, First Nations (n ¼ 2), declined to or did not respond (n ¼ 2), Middle Eastern (n ¼ 1), and a racial group not listed (n ¼ 1). Participants had varying levels of experience with VR (n 0 ¼ 50, n 1 ¼ 29, n 2 ¼ 23, n 3$10 ¼ 26, n 20$50 ¼ 4, n 90 ¼ 2, n 100 ¼ 4). Prior to the course, 86 participants were not familiar with anyone in their discussion group and others reported knowing one (n 1 ¼ 40) or more members (n 1 ¼ 13, n 3 ¼ 5, n 4 ¼ 6, n 5 ¼ 1, n 7 ¼ 1).

Virtual environments
As in Study 1, weekly discussion sessions were hosted in ENGAGE. There were four types of virtual environments (2 spaciousness Â 2 setting): (a) panoramic outdoors, (b) panoramic indoors, (c) constrained outdoors, or (d) constrained indoors ( Figure 6). Each environment was built by research personnel using 3D objects. In total, there were 192 uniquelybuilt environments that differed in size of moving area and height. As suggested by Reeves et al. (2015), as variance in media is growing, so should variance in media research. As the authors argue, any media chosen as a stimulus can have a list of features that may be psychologically relevant and interact with the primary factors in an experiment. Selecting one idealized representative stimuli from each end of the distribution can increase Type I, II, and III errors. Through stimulus sampling and statistical methods (e.g., using a mixed statistical model that factors in fixed and random effects), we are able to better understand media that may be found in realworld experiences (Judd et al., 2012;Westfall et al., 2014).
To evaluate whether these manipulations work across a range of environments and examine generalization of results across stimuli, we created 192 unique environments which were rigorously controlled in terms of our theoretical variables related to context, but also contained diverse thematic features, as opposed to relying on a single stimuli manipulation. The moving area of the environments was measured by adding markers to the corners and ceilings of the environments inside ENGAGE and then calculating the areas using the positional data of the markers. By design, the panoramic environments (

Avatar
All participants were asked to use the customization tool to make an avatar that looked and felt like their offline selves.

Procedure 3
Participants selected a discussion group that fit their schedule and availability, resulting in 24 groups that met weekly for 8 weeks and varied in size from five to eight members (M ¼ 6.71, SD ¼ 0.81). The sizes of actual attended groups ranged from 2 to 11 members (Week 1 M ¼ 6.38, SD ¼ 1.47; Week 2 M ¼ 6.25, SD ¼ 1.48; Week 3 M ¼ 6.08, SD ¼ 1.18; Week 4 M ¼ 6.29, SD ¼ 1.23; Week 5 M ¼ 6.38, SD ¼ 1.35; Week 6 M ¼ 6.25, SD ¼ 1.33, Week 7 M ¼ 6.00, SD ¼ 1.50; Week 8 M ¼ 5.75, SD ¼ 2.01). Each week, each group was assigned to a set of four between-subject conditions (2 Â 2 design) via a Latin square randomization scheme that ensured each group experienced each condition once and that each condition appeared equally across the weekly schedule ( Table 5). The sessions were led by one of three instructors. Each instructor led the same eight groups every week.
A training session was held in the first week of the course, during which participants were guided through how to use the ENGAGE interface and the controllers to navigate the virtual environment. As in Study 1, during these training sessions, the teaching staff was available to assist via Zoom when participants faced technical mishaps in hardware and software.
The first discussion session began in the second week, during which participants completed a series of small-group activities to further familiarize them with the ENGAGE environment and its tools. All discussions, except in the fifth session, had a creative activity, which involved creating, brainstorming, or prototyping an idea using the tools available on ENGAGE (e.g., drawing with the 3D pen, bringing in 3D models, writing on whiteboards). The 30-min sessions were divided into a 10-min full-group discussion and recap of the course material, a 15-min individual creative activity based on a prompt, and a 5-min sharing of the final product of the activity portion (Table 6).

Measures
As in Study 1, multiple aspects of individuals' behaviors and attitudes were measured at the start of the study (pre-test), and during and after each of the eight weekly sessions (see Table 7 4 ).

Weekly repeated measures
Nonverbal behavior: motion synchrony As in Study 1, synchrony was computed for each participant for each week as the rank correlation of head speed over the entire (approximately 30 min) session.

Perceived restorativeness
Perceived restorativeness, the restorative quality and potential of environments, was measured using four items adapted from the Perceived Restorativeness Scale (Hartig et al., 1996) 12 Transformed avatars and environments in VR Figure 6. Environment types used every session. There were four possible types of virtual environments (2 spaciousness Â 2 setting): (1) panoramic outdoors, (2) panoramic indoors, (3) constrained outdoors, or (4) constrained indoors.
using a 5-point Likert scale (1 ¼ Not at all to 5 ¼ Extremely). Sample items include "Spending time here gave me a good break from my day-to-day routine" and "There is too much going on in this environment." Weekly perceived restorativeness scores were calculated as the mean of four item responses (Cronbach's a ¼ 0.71), with higher scores indicating greater perceived restorativeness of the environment.

Pleasure and arousal
Individual ratings for perceived pleasure and arousal were obtained after each weekly VR session using the Self-Assessment Manikin (Bradley & Lang, 1994) non-verbal pictorial scale accompanied by a pair of adjectives associated with the pleasure and arousal dimensions (Pleasure: 1 ¼ Bored to 9 ¼ Relaxed; Arousal: 1 ¼ Calm to 9 ¼ Excited).
Self, social, and spatial presence Items were adapted from Study 1 to include an additional item and utilize a 5-point Likert scale (1 ¼ Not at all to 5 ¼ Extremely). Self, social, and spatial presence were measured as the level of agreement with three items (Cronbach's a ¼ 0.84 for self-presence, 0.79 for social presence, and 0.82 for spatial presence). Weekly scores for each of the three types of presence were calculated as the mean of the three items, with higher scores indicating greater perceived presence. Note. This design ensured that each group experienced each condition once and that each condition appeared equally across the weekly schedule.
14 Transformed avatars and environments in VR Individual differences measures Additional individual differences predictors included in the model building process, including environmental identification and prior VR use, were trimmed from the reporting because none of these variables were related to baseline levels, rates of change, or environment conditions for any of the 10 outcomes.

Data analysis
Individual differences in how individuals' behaviors and attitudes changed across the 8 weeks and in relation to spaciousness and setting conditions, and how these effects were related to gender were examined using linear growth models with time-invariant and time-varying covariates (Grimm et al., 2016). Specifically, each of the 10 repeated measures outcomes was modeled as where the outcome of interest for person i at occasion t, outcome ti is modeled as a function of person-specific intercepts, b 0i , person-specific linear slopes, b 1i , that indicate rate of change across weeks, person-specific spaciousness effects, b 2i , that indicate the difference between panoramic and constrained conditions, person-specific setting effects, b 3i , that indicate the difference between outdoors and indoors conditions, an interaction term b 4i , that indicates extent of moderation between the spaciousness and setting manipulations, and residual error, e ti that is assumed normally distributed with standard deviation r e . The person-specific intercepts, linear slopes, and spaciousness and setting effects are simultaneously modeled as where c 00 and c 01 describe the linear trajectory of change for the prototypical individual, c 20 describes the prototypical effect of the spaciousness manipulation, c 30 describes the prototypical effect of the setting manipulation; c 40 describes the prototypical spaciousness and setting interaction effect; c 01 ,c 11 , c 21 , and c 31 indicate how individual differences in level, change, and the manipulations are related to gender, and u 0i is residual unexplained differences that are assumed normally distributed with standard deviation r u0 . As in Study 1, all models were fit to the data in R using the lme4 and lmerTest libraries with restricted maximum likelihood estimation, complete data treated as missing at random, and statistical significance evaluated at alpha ¼ 0.05.

Synchrony
The prototypical participant's motion synchrony was positive, c 00 ¼ 0.015, p < .001, confirming H12. Motion synchrony increased slightly, but not significantly at a rate of c 10 ¼ 0.00026, p ¼ .559 points per week over the 8 weeks of study. There was a significant effect of the spaciousness manipulation, such that individuals had higher synchrony when in panoramic environments than constrained environments, c 20 ¼ 0.010005, p ¼ .0004 (H4). There was no evidence that the setting manipulation influenced synchrony, c 30 ¼ 0.0019, p ¼ .507 (H9), interaction effects, or gender differences. Figure 7 shows the strength of synchrony over time offset, indicating the time dependence of synchrony.

Perceived restorativeness
The prototypical participant's perceived restorativeness decreased from an initial value of c 00 ¼ 3.169, p < .001 (on a 5point scale) at a rate of c 10 ¼ À0.027, p < .001 points per week. There was a significant effect of both the setting and

16
Transformed avatars and environments in VR spaciousness manipulations, such that individuals reported greater perceived restorativeness when in panoramic environments than constrained environments, c 20 ¼ 0.168, p ¼ .0005 (H5), or in outdoor environments than indoor environments, c 30 ¼ 0.14, p ¼ .004 (H10). There was no evidence of interaction effects or gender differences.

Entitativity
The prototypical participant's entitativity decreased from an initial value of c 00 ¼ 3.03, p < .001 (on a 7-point scale), though not significantly, at a rate of c 10 ¼ À0.005, p ¼ .34 points per week (H1). There was a significant effect of the spaciousness manipulation, such that individuals reported greater entitativity when in panoramic environments than constrained environments, c 20 ¼ 0.093, p ¼ .0092 (H6).
There was no evidence that the setting manipulation influenced entitativity, c 30 ¼ 0.048, p ¼ .187, interaction effects, or gender differences.

Pleasure
The prototypical participant's pleasure decreased from an initial value of c 00 ¼ 6.17, p < .001 (on a 9-point scale) at a rate of c 10 ¼ À0.11, p < .001 points per week. There was a significant effect of the spaciousness manipulation, such that individuals reported greater pleasure when in panoramic environments than constrained environments, c 20 ¼ 0.28, p ¼ .037 (H7a). There was no evidence of setting effects, c 30 ¼ 0.094, p ¼ .504, interaction effects, or gender differences.

Arousal
The prototypical participant's arousal decreased from an initial value of c 00 ¼ 4.54, p < .001 (on a 9-point scale) at a rate of c 10 ¼ À0.14, p < .001 points per week. There was a significant effect of the spaciousness manipulation, such that individuals reported greater arousal when in panoramic environments than constrained environments, c 20 ¼ 0.307, p ¼ .0339 (H7b). There was no evidence of setting effects, c 30 ¼ 0.118, p ¼ .42, interaction effects, or gender differences.

Presence
Self-presence The prototypical participant's self-presence increased from an initial value of c 00 ¼ 2.46, p < .001 (on a 5-point scale) at a rate of c 10 ¼ 0.022, p ¼ .0021 points per week (H2a). There was a significant effect of the spaciousness manipulation, such that individuals reported higher self-presence when in panoramic environments than constrained environments, c 20 ¼ 0.129, p ¼ .0048.

Social presence
The prototypical participant's social presence decreased from an initial value of c 00 ¼ 3.22, p < .001 (on a 5-point scale) at a rate of c 10 ¼ À0.0159, p ¼ .03 points per week (H2b).  Figure 7. Effect of view on synchrony. This plot demonstrates that as the time offset of motion signals shifts away from zero (i.e., as one looks toward the right and left away from the center), synchrony (Y-axis) decreases. In this plot, synchrony for each group in each session is traced as a separate partially transparent line (185 total). The average of all sessions for a given avatar condition is the darker line, with the ribbon indicating 95% confidence intervals based on the underlying distribution. Each line is produced as the average of all unordered pairs in that session (from 3 to 36, M ¼ 18.7, SD ¼ 8.59), which is itself calculated from about 30 min of data per participant.

Transformed avatars and environments in VR
There was no evidence that the spaciousness manipulation influenced social presence, c 20 ¼ 0.015, p ¼ .74.

Spatial presence
The prototypical participant's spatial presence decreased from an initial value of c 00 ¼ 3.22, p < .001 (on a 5-point scale) at a rate of c 10 ¼ À0.049, p < .001 points per week (H2c). There was a significant effect of the spaciousness manipulation, such that individuals reported higher spatial presence when in panoramic environments than constrained environments, c 20 ¼ 0.128, p ¼ .008. There was no evidence that the setting manipulation influenced self (c 30 ¼ 0.074, p ¼ .109), social (c 30 ¼ 0.038, p ¼ .41), or spatial (c 30 ¼ 0.071, p ¼ .14) presence. There was no interaction between the spaciousness and setting manipulations on self, social, or spatial presence (ps > 0.055). Individuals who identified as female had higher baseline levels of self (c 01 ¼ 0.24, p ¼ .039), social (c 01 ¼ 0.28, p ¼ .016), and spatial (c 01 ¼ 0.236, p ¼ .029) presence.

Enjoyment
The prototypical participant's enjoyment decreased from an initial value of c 00 ¼ 3.19, p < .001 (on a 5-point scale) at a rate of c 10 ¼ À0.064, p < .001 points per week. There was evidence that both the setting manipulation and spaciousness manipulation influenced enjoyment, such that individuals reported higher enjoyment when in panoramic environments than constrained environments, c 20 ¼ 0.166, p ¼ .0043 (H8) or in outdoor environments than in indoor environments, c 30 ¼ 0.13, p ¼ .0267 (H11). However, when the environment was one that was both outdoors and panoramic, there was a lower baseline level of enjoyment, c 40 ¼ À0.188, p ¼ .024. There was no evidence of gender differences.

Realism
The prototypical participant's realism increased from an initial value of c 00 ¼ 38.34, p < .001 (on a 0-100, cartoon-like to photorealistic scale) at a rate of c 10 ¼ 1.76, p < .001 points per week (H3). There was a significant effect of the spaciousness manipulation, such that individuals had higher realism when in panoramic environments than constrained environments, c 20 ¼ 3.57, p ¼ .0083. There was no evidence that the setting manipulation influenced realism, c 30 ¼ 1.9009, p ¼ .166. However, when the environment was one that was both outdoors and panoramic, there was a lower baseline level of realism, c 40 ¼ À4.0803, p ¼ .035. There was no evidence of gender differences.

Discussion
Study 2 examined the role of time and environmental context (spaciousness and setting) on participants' experience and group dynamics. Overall, the results showed that selfpresence and realism increased over time, and social presence, spatial presence, and enjoyment decreased over time. While the effects of time were less robust in Study 2, the results hold true that people's behaviors and attitudes in VR changes with time and use.
In line with our hypotheses of the beneficial effects of being in a spacious, panoramic environment, during the weeks where participants were in a panoramic environment (i.e., environments in which people can see wide and far), their synchrony increased, and they reported greater perceived restorativeness, entitativity, pleasure, arousal, self and spatial presence, enjoyment, and realism. As panoramic environments naturally come with more visual components (i.e., there is more visible space, and more content that fills that space), this may have caused the surrounding environment to be more stimulating, leading to greater arousal. In panoramic environments, participants had the freedom to look around and focus their attention on different features, be it the other members of the group or what was in the immediate or far surrounding space. In contrast, a constrained environment may have led to feelings of confinement and forced people to pay their full attention to a limited amount of options. Whereas a constrained environment may have acted as a stressor to an individual's experience, potentially influencing social interactions that took place in the space, as well as resulting in a more critical evaluation of their sense of self, group members, and perception of experience, a panoramic environment may have provided a more restorative, open space that allowed them the freedom to let their mind wander.
Similarly, in line with our hypotheses of the restorative effects of being in an outdoor environment, during the weeks where participants were in an outdoor environment with elements of nature, their perceived restorativeness and enjoyment were greater. In addition to considering the beneficial, restorative properties that outdoor, natural environments provide, it is also important to note the context in which these environments were used. Oftentimes group discussions and social interactions occur in indoor environments in classrooms, conference rooms, or common spaces. The context of meeting with group members and engaging in a discussion in an outdoor environment-in between boulders, near ponds, or surrounded by a forest-may have provided an experience that is not common or easily accessible, leading to novelty, and in turn, greater enjoyment. The novelty of the environmental context in which the group interaction took place may have enriched not only one's perception of the experience, but also the social experience.
However, if the environment was one that was both panoramic and outdoors, reported enjoyment and realism were lower. Theories from evolutionary perspectives, namely Appleton's (1975) prospect refuge theory may lend a hand in understanding these outcomes. The prospect refuge theory argues that there is an innate human preference for environments that allow for both prospect and refuge. Ideal environments allow for a clear view of the scene and evaluation of opportunities (e.g., resources, places for hiding) and threats (e.g., predators, hazards). Environments that pose threats to survival may trigger negative reactions such as fear and avoidance (Ulrich, 1983). It is possible that interacting in large, open spaces with elements of nature that do not provide a sense of protection led to participants not enjoying their experience as much and being more alert and critical of their virtual surroundings. If an individual's experience is instilled with a sense of fear and endangerment, this may negatively influence any social interactions that take place, and as TSI would predict, this would continue to alter their behavior during and after the engagement in the virtual world.
Given that elements of the surrounding environment, such as how much space is visible and whether they are outdoors surrounded by nature, influence people's behaviors and attitudes within CVEs, the virtual environments in which such interactions are designed can be transformed in different ways. In particular, depending on what the desired goals of these interactions are (e.g., social, team building, educational), the ways in which the virtual environment are structured can meet different needs and foster specific dynamics within groups.

Summary of results
In Study 1, we examined the transformation of the self and others in a CVE by manipulating the avatar appearance of the participants. Participants wore either a self-avatar or a uniform avatar. We found that over time, presence (self, social, and spatial), enjoyment, entitativity, and realism all increased. Wearing a self-avatar increased nonverbal synchrony, selfpresence, and realism, but decreased enjoyment. We also explored how much these outcomes may be mediated by individual differences and reported that those with more prior relationships had higher baseline levels of entitativity, enjoyment, and realism, but over time these individuals' perception of realism increased less than that of individuals with fewer prior relationships; those with prior VR experience had higher baseline levels of enjoyment; and those with higher group identification had a higher baseline level of social presence.
In Study 2, we examined the transformation of an environmental context by manipulating the virtual environment. Results showed that, as visible space increased, so did nonverbal synchrony, perceived restorativeness, entitativity, pleasure, arousal, self and spatial presence, enjoyment, and realism. Moreover, being in an outdoor environment led to greater reported perceived restorativeness and enjoyment. However, when the virtual environment was both panoramic and outdoors, reported enjoyment and realism were lower.
Based on the preliminary findings of Study 1, we hypothesized that there would be a robust effect of time. In line with Study 1, Study 2 results show that self-presence and realism increased over time. Oppositely, social and spatial presence, and enjoyment slightly decreased over time. We measured additional variables in Study 2, including perceived restorativeness, pleasure, and arousal, which also decreased over time.

Limitations
This study is the first large-scale, longitudinal, quantitative study of large groups in HMD-based CVEs. However, there are a variety of limitations. First, both studies were field experiments, which come with strengths and limitations. While field experiments allow for researchers to implement interventions and measure outcomes in naturalistic settings, there are constraints on how much control the researchers have on external conditions and potential intervening variables. Typical research studies rely on participant pools in social science departments, or online participants recruited through various panels, such as Mechanical Turk. These samples also have their own strengths and limitations, and the same holds true for a field study embedded within a class on VR. While our sample was heterogeneous in terms of race and previous VR use, it still reflects a convenience sample of college students and college students learning about the medium of VR, which makes them a very particular sample. It is possible that students learning about the medium could have served as a third variable explanation for our temporal effects. At the same time, we point out that it is critical to allow students to grow accustomed and learn about the medium before we can investigate how response to VR changes over time and understand its full potential. The current study implemented novel strategies aimed at observing the robustness of how these effects hold over time. Future work should investigate how these effects hold over different contexts and populations.
In a similar vein, the current study utilized a stimulus sampling method to see how our effects can generalize across different types of environments. While stimulus sampling serves as one of the strengths for the robustness of the observed effects, in order to isolate and strengthen the causal argument of our manipulation, we suggest that future work explore the moderators of our variables of interest with a more narrowed lens.
Third, while the Oculus Quest 2 headsets and the ENGAGE platform were surprisingly robust compared to our previous experience with immersive VR technology, many sessions were lost due to technological error, and our final sample for both studies was slightly smaller than we had hoped. Our choice to focus on groups over time made this study unique for many reasons, but had its own costs, such as handling software updates to the platform that changed features of the avatars, or network issues that led to participants being unable to join. Furthermore, due to the nature of the study simultaneously being a course, there was a need for flexibility to accommodate participants' schedules. This included allowing participants to attend different discussion sessions when needed, which affected the members and size of the group across weeks. Moreover, our choice of using the ENGAGE platform was driven by its features to easily create content and record data. However, there are specific aspects of this platform which will likely not generalize to all platforms, which have unique affordances and overall qualities (Barreda-Á ngeles & Hartmann, 2022).
Another limitation draws from the avatar design process. Although the selections were informed by previous research, the design was heavily limited by what options were available. Factors such as gender, which only came with binary options of female and male, or skin color, which we tried to keep as close to gray and racially ambiguous as possible, may have contributed to creating an avatar that, while uniform, gave off cues of a recognizable gender and race. Continuing this discussion of limitations brought on by avatars, in VR, a person's experience is presented from a first-person point of view. Consequently, people are unable to see their own selves. This raises the possibility of other cues overriding avatar perception. Although participants inevitably had to see their avatar in the customization page every time they were randomly assigned to the uniform avatar condition, this was only for half of the sessions. In future work, we hope to incorporate a mechanism in which participants are able to see their own selves to be reminded of what their avatars look like throughout their experience.
Lastly, the time variable was confounded with topic, in that the topics changed each week. While there was no pattern that dictated which topics were discussed early versus late (i.e., it was not the case that topics got more difficult or more technical over time), it is important to acknowledge that a better temporal manipulation would have had similar content over time or used a design that randomized topic over time across groups.

20
Transformed avatars and environments in VR

Future directions
There is a growing importance to understanding the social dynamics of how people use CVEs. Many questions remain unanswered on how the components of the TSI paradigmself-representation, sensory capabilities, and contextual situations-shape people as they navigate the virtual world and form groups. We examined the transformation of avatar appearance by utilizing a uniform avatar in which all avatars looked the same. While we suggest that having a uniform avatar in a group setting may suppress individuals' visual cues and be unfavorable as it lowers self-presence and realism, it is possible that some degree of visual similarity, rather than complete similarity, may be advantageous for group-building. One avenue of research that demands further research is varying the degree of similarity or the number of similar cues shared amongst group members. Similarly, there is research showing that there are other factors that may influence how people present their avatars, such as individual motivations (i.e., the individual is on the platform to be immersed in a virtual environment, or have social interactions, or achieve goals specific to the platform) or the functionality of the platform (i.e., the customization options available) (Harari et al., 2015). These differences can result in different ways of creating and expressing the self via avatars, which ultimately shape the type of avatar an individual uses to represent the self. In other words, the avatar people select to represent themselves may not be representative of their true self, but other versions of the self, such as an "ideal self" (Bessière et al., 2007;Ducheneaut et al., 2009).

Implications
The current study is one of the first large-scale, longitudinal field experiments to investigate how multiple sets of larger groups and social dynamics evolve over time in CVEs. From an experimental design standpoint, the study implemented a unique design that allowed for observations of behaviors in a naturalistic setting, rather than a controlled laboratory setting. From a statistical standpoint, we contribute to the field by using linear growth models to understand constructs and their changes across time, not in isolation, but in an interrelated way. We showed that choices of how avatars are created and scene size change nonverbal synchrony-a hallmark measure of the success of group interaction. Minor decisions made by metaverse designers will have psychological impacts on users.
In recent years, VR headsets and content have become more accessible to the general public. As there is interest in making a digital migration to the metaverse, there is a growing need to understand how transformations resulting from CVEs affect people's behaviors and attitudes, and how they should be taken into account when designing said platforms. In particular, as the metaverse is being used for purposes such as training, learning, and team building-which are often social activities that involve multiple individuals-what representation looks like is critical.
Currently, there is a wide breadth of research that has been conducted regarding how the two dimensions of interest in the TSI framework influence outcomes (e.g., for selfrepresentations see work related to the Proteus Effect, Praetorius and Gö rlich, 2020;Ratan et al., 2020;Roth et al., 2018; for contextual situations see Bolouki, 2022;Lee et al., 2022;Nukarinen et al., 2022). The current work contributes to this breadth of literature from a theoretical standpoint by examining the effects of time and group interaction, as well as self-representation and contextual situations.

Transforming avatar appearance
Previous research has pointed out the gains to customizing one's avatar. Results show that avatar customizing and similarity to the self do indeed increase presence (Waltemate et al., 2018). However, what is unique in this study is the finding that uniform avatars provide greater enjoyment than selfavatars. Hence, depending on the goal of the platform, one should take these findings into account. For applications in which self-presence and realism are the goal-such as training-customizing is best. On the other hand, for recreation and social interaction, fostering visual similarity is recommended.
The results have implications for designers of such platforms on how avatars are presented and what options are made available. Previous research suggests that the way in which avatars are presented gives rise to differences in cues that are more useful or appropriate for different contexts. For instance, Dobre et al. (2022) report that in a work setting, realistic avatars and their nonverbal behavior are more appropriate compared to cartoon-like avatars. Moreover, Tanis and Postmes (2003) argue that a lack of cues in a communication partner may lead to ambiguity and uncertainty. However, in a different context, such as gaming, oftentimes simple cartoon-like avatars are used and have been shown to have a more positive impact and engagement (e.g., Monteiro et al., 2018). Depending on how many customizable options are made available, avatars can be altered to be made as individualized and as close to a tailored avatar as possible, or oppositely, be reduced to a limited number of options that result in avatars that are highly similar to one another. Beyond aesthetic goals of a platform and its avatars, designers should consider the goal of their platform, and adjust for how much control and customization people can have when creating their avatars.

Transforming context
Context is a term often used in theories and models in social science and human-computer interaction, but is difficult to explicate. Some studies actively manipulate context through means such as randomly assigning students to various classrooms. Such studies are often limited by cost, as it is expensive to physically build dozens of rooms that only vary on a single parameter. For example, Meyers-Levy and Zhu (2007) examined ceiling height by constructing two false ceilings to create four rooms that differed only on the Y-axis, and needed to employ professional engineers to build the ceilings for the study. Consequently, due to the cost involved, there are very few studies that look at more than a handful of different rooms. Moreover, in most studies, there are confounds in the variables of interest (e.g., bigger rooms also have different furnishings or light patterns than smaller rooms).
Another strategy is to observe people as they move about the world and measure how various behaviors differ based on their location. Recent work examining smartphones can look to see how locations, as tracked by smartphone GPS signals, influence social interaction and other behaviors (Matz & Harari, 2021). This approach allows for larger variance in locations but is limited to places where people happen to go to, as opposed to locations which are designed specifically to meet some type of theoretical question. The current study is focused on VR, but also presents an enhanced understanding of how the structure of outdoor and indoor spaces-specifically how far a person can see on the X-Z plane-influences nonverbal behaviors and attitudes. Our stimulus sampling strategy of presenting 192 distinct locations makes this one of the most rigorous studies to ever examine the effects of location on psychological outcomes. In particular, researchers have rarely examined panoramic indoor spaces, as they are incredibly expensive to access in the real world.
Second, we found that the benefits of being in a spacious, panoramic environment found in previous research translate even inside virtual environments. In VR, space is free, and by holding an event within a spacious environment, a host would be able to foster a sense of perceived restorativeness, pleasure, arousal, presence (self and spatial), and enjoyment for participants. Such environments will also be beneficial for creating a sense of community, as indicated by greater synchrony and entitativity, which may be of interest for training, teaching, and team-building purposes.
However, it should be noted that environments that are both panoramic and outdoors may result in lower enjoyment and realism. One potential explanation for this draws from Appleton's prospect refuge theory (1975), which draws from theories on evolutionary survival instincts, and posits that people innately prefer environments that provide both opportunity and safety. The constrained nature of the outdoor environments fits with that framing, as a lack of access for shelter may induce threat and fear (Ulrich, 1983). Another potential reason for lowered enjoyment and realism pulls from qualitative observations made by discussion session leaders, in which several participants pointed out increased pixelation and lag in environments where there were more rendered trees (i.e., panoramic outdoors were filled with more elements of nature).

Groups
One question of interest within group interactions in VR relates to the size of the group. Creating a sense of community and fostering group cohesion is often a desired goal. While we did find that entitativity increased over time in Study 1, we did not find the same results in Study 2. In building our models, we examined how much variance was accounted for by the timevarying group size (i.e., repeated measures nested within individuals nested within the discussion session they attended that week) and found little variance at the group level. This raises the question: can a group be too big in CVEs? While there is research on the role of group size in efficacy and collaboration (e.g., Guimeraet al., 2005;Kerr, 1989), more research on the ideal or maximum size of group interactions in CVEs is required to draw any conclusions. We provide some suggestions that draw from qualitative observations. First, one theme that emerged was the value of a backup communication channel. We expected a small but serious likelihood of technical challenges that would prevent participants from reaching out for assistance. For example, a headset may have low battery or lose Internet connection, the participant may fail to log in, or the multi-user service may fail altogether. It was necessary and very helpful to have a fallback medium. In our case, we had a Zoom video conferencing window open that was operated by a different instructor that was not leading the discussion session. While technical challenges are inevitable in such settings, they can be addressed with ease and swiftness in a smaller group, or with a few number of students to assist. The ideal size of a group may be limited by how much technical support and resources can be provided.
Second, as CVEs are currently structured, audio issues may arise when there are many people occupying the same virtual space. Unless spatialized audio is used, it is difficult to have multiple people speaking at once due to how audio is outputted. This leads to social cues that are unique to CVEs to indicate turn-taking. For instance, in ENGAGE, a participant will raise and twist their wrist to indicate that they are planning to unmute and talk. Similarly, every participant had their usernames and a microphone icon floating above their avatars that showed whether they were muted. In a typical interaction, participants muted their microphones to prevent background noise from the real world bleeding into the virtual conversation. In order to speak, participants would have to turn their heads and look around the room to see if anyone had the microphone unmuted to speak. As the group size grows larger, such social cues may become less salient and challenging to pick up.

Time
Studies that examine individual or group behavior over time in VR, in particular CVEs, are extremely rare. In the current study, we were particularly interested in the evolution of groups over time, and how this evolution interacted with selfrepresentations and contextual situations. As VR users grow more comfortable using the medium and the novelty wears off, how do transformations of the self and context manifest in changes in attitudes? We report that, across both studies, there was an increase in self-presence and realism. One possibility is that, as participants grow accustomed to the virtual body and environment, they grow more comfortable and present in their avatar. With time and use, participants may have been able to focus more on being present and pay attention to their surroundings, rather than focus on learning how to use the medium. However, with comfort comes familiarity, and the novelty of the medium may have worn off. This potentially explains why there was a decrease in pleasure, arousal, and enjoyment.
In addition to learning about the evolution of virtual behavior over time, another finding emerged here: people change substantially over time, continually up to Week 8 in our studies. Even outcomes that were not obvious in hindsight-for example, our finding that scenes are perceived as more realistic over time-consistently change with more experience. If one simply looks at the first session, an inaccurate picture emerges. In some instances, the noise from looking at the first session masks important findings which emerge later. In this sense, studies are "temporally underpowered." More problematic are the instances in which the pattern that one sees during the first session is actually opposite to the patterns that consistently emerge over the majority of subsequent weeks, such as our finding on the effect of panoramic viewing on synchrony. Given that most published research in VR only looks at a single dose at one time point, it is critical for future work to spend the extra resources to ensure experimental effects are robust temporally. Notes 1. However, technical limitations led to variance, such as each avatar was customized on the participant's end, participants had to switch between the self and uniform avatars between sessions, and how

22
Transformed avatars and environments in VR the skin colors were rendered in individuals' headsets differed. Additionally, as lower torso, age, and weight were not rendered in the HMDs, no specific instructions were provided for these features. 2. The study also varied the type of onboarding exercise participants did when first entering VR. However, due to students arriving at different times, and the lack of adherence to the movement instructions, the variable failed manipulation checks and we do not report it given space constraints. The nature of the variable is further described in Appendix B. 3. In addition to the two variables related to context, we also attempted to manipulate the amount of translation-movement within the VR scene. However, given the nature of the collaboration tasks, there was not enough physical translation for this variable to show differences, it failed manipulation checks, and we do not report it given space constraints. The nature of the variable is further described in Appendix B. 4. Measures for entitativity, enjoyment, and realism were the same across Study 1 and Study 2. One less item was included in entitativity for Study 2. Cronbach's a was 0.89 for enjoyment and 0.86 for entitativity.

Data availability
The data underlying this article cannot be shared publicly due to the privacy of individuals that participated in the study. The data will be shared on reasonable request to the corresponding author.

Funding
This research was supported by the National Science Foundation grant (Award 1800922).

Conflicts of interest: None declared.
Appendix A: Uniform avatar pre-test The following questions were asked about five different avatars. Based on the results, different features (e.g., technically female or male and skin color) were selected from each avatar option. The features that were closest to neutral in terms of perceived gender, race, and comfort in representation were selected.

Gender perception
Please answer the following about this avatar's gender presentation (7-point Likert scale, 1 ¼ Not at all, 4 ¼ Neutral, 7 ¼ Very much).
1) This avatar is feminine 2) This avatar is masculine 3) I am easily able to identify this avatar's gender

Racial perception
Please rate the degree to which the avatar fits in the following racial categories (5-point Likert scale, 1 ¼ Extremely, 5 ¼ Not at all).
1) African, African-American, or Black 2) Asian or Asian-American 3) Hispanic or LatinX 4) Indigenous/Native American, Alaska Native, First Nations 5) Middle Eastern 6) Native Hawaiian or other Pacific Island 7) White 8) More than one race

Representation comfort
Please rate how you would feel if this avatar were to visually represent you in a virtual environment (7-point Likert scale, 1 ¼ Extremely uncomfortable, 4 ¼ Neither uncomfortable nor comfortable, 7 ¼ Extremely comfortable).