Prosocial Behaviour in Interethnic Encounters: Evidence from a Field Experiment with High- and Low-Status Immigrants


 Recent waves of immigration have changed the demographic face of European societies and fueled considerable debate over the consequences of ethnic diversity for social cohesion. One prominent argument in this debate holds that individuals are less willing to extend trust and solidarity across ethnic lines, leading to lower social capital in multiethnic communities. We present a direct test of this proposition in a field experiment involving native-immigrant interactions in Zurich's Central Train Station. Our intervention consists of approaching commuters with a small request for assistance (borrowing a mobile phone), which we take as a measure of prosociality. We further differentiate between reactions towards natives as well as both high- and low-status immigrant groups. Compared to native-native interactions, we find lower solidarity in native-immigrant encounters, especially in cases involving stereotypically low-status immigrants. In exploratory analyses, we further show that discrimination only obtains in 'low cost' situations where commuters could easily justify not helping (e.g. by claiming not to carry a phone). Overall our results shed light on key theoretical mechanisms underlying patterns of solidarity in contemporary multiethnic societies.


Introduction
Recent waves of immigration have changed the demographic face of European societies. According to official data from the European Commission, first-and secondgeneration immigrants now comprise between 20 per cent and 30 per cent of the population in countries such as France, Britain, and Germany (Eurostat, 2015). While immigration flows have contributed positively to economic growth, innovation, and competitiveness (Surowiecki, 2005;Putnam, 2007;Page, 2008;Lorenz et al., 2011), these demographic shifts have also fueled public anxieties and considerable academic debate about the potentially negative consequences of ethnic diversity for social solidarity in immigrant-receiving countries (for recent reviews, see Stichnoth and van der Straeten, 2013;Van der Meer and Tolsma, 2014;Schaeffer, 2016;Dinesen and Sønderskov, 2018).
One prominent argument linking diversity to undesirable collective outcomes holds that individuals are less willing to extend trust and solidarity across ethnic lines, leading to lower social capital in multiethnic communities (Alesina, Baqir and Easterly, 1999;Alesina and La Ferrara, 2002;Habyarimana et al., 2007;Schaeffer, 2013;Dinesen and Sønderskov, 2015;Koopmans and Schaeffer, 2015). However, while much of the diversity literature simply assumes that the scope of prosociality is ethnically bounded, only a handful of studies have sought to test this proposition directly in the European context (Bouckaert and Dhaene, 2004;Ahmed, 2010;Diekmann, Jann and Nä f, 2014;Koopmans and Veit, 2014;Cettolin and Suetens, 2018). 1 Further, extant studies have typically focused on interactions between natives and members of stereotypically disadvantaged immigrant groups. 2 In contrast, we know little about natives' reactions towards non-disadvantaged groups.
In this article, we present evidence from a field experiment documenting patterns of prosocial behaviour in interactions involving both low-and high-status immigrant groups. Our approach combines the traditional strengths of experimentation (random assignment to treatment) with a realistic intervention and unobtrusive measurement of behaviour (Baldassarri and Abascal, 2017). We contribute to the diversity literature by directly testing the extent to which prosociality in real-life encounters is conditioned upon the ethnicity of one's interaction partners. Moreover, we examine the extent to which individuals treat stereotypically high-versus low-status immigrant groups differently. As such, our work engages more broadly with scholarship looking beyond the monolithic (i.e. 'color-blind') effects of diversity to understand how immigrants' characteristics influence cross-ethnic relations (Bécares et al., 2011;Laurence, 2011;Bakker and Dekker, 2012;Hainmueller and Hangartner, 2013;Gundelach and Traunmü ller, 2014;Hainmueller and Hopkins, 2015;Turper et al., 2015;Bansak, Hainmueller and Hangartner, 2016;Czymara and Schmidt-Catran, 2017;Diehl et al., 2018;Winter and Zhang, 2018;Ward, 2019).
To test the proposition that prosociality is ethnically bounded, we specifically examine the behaviour of native Swiss towards both native and non-native residents of Switzerland. The Swiss setting is notable in that it allows us to study natives' interactions with both 'generic' (low-status) immigrants and (high-status) German nationals who constitute a sizable and politically salient minority (Helbling, 2011;Diehl et al., 2018). Our experimental intervention consists of approaching commuters in Zurich's Central Train Station and asking for assistance (borrowing a mobile phone to make a local call), which we take as a measure of prosociality. Confederates systematically varied the dialect in which this request was made in order to signal either a Swiss, German, or 'generic' (low-status) immigrant identity. This innovative feature of our design allows us to estimate the causal effect of ethnicity while holding constant idiosyncratic factors that may vary across confederates.
Results from 863 trials involving native Swiss commuters demonstrate a discernable pattern of antiforeigner bias: controlling for confederate-level fixed effects, speaking in a non-Swiss dialect or accent significantly decreases the likelihood of receiving help. Moreover, while we find evidence of discrimination directed against confederates posing as (high-status) Germans, the ethnic penalty is substantively larger and more robust for 'generic' (low-status) immigrants. These results indicate that ethnic boundaries do indeed play a role in explaining the oft-cited negative association between diversity and social cohesion, although diversity's detrimental effects may be largely driven by natives' aversion towards stereotypically low-status groups (Schaub, Gereke and Baldassarri, forthcoming).
In exploratory analyses, we further show that antiforeigner bias only obtains in experimental trials involving commuters who could plausibly deny carrying a mobile phone. In contrast, we detect no treatment effects in interventions with commuters whose phones were already visible when approached by confederates. This last result speaks to the role of situational factors in determining the 'costs' of discrimination (Merton, 1948;Crosby, Bromley and Saxe, 1980;Duckitt, 1992). More specifically, discrimination in our experiment appears to occur only in situations where individuals could easily justify not helping (e.g. by claiming not to carry a phone).

Theory, Prior Evidence, and Hypotheses
The relationship between ethnic diversity and social cohesion has been extensively studied by scholars across the social sciences. Recent meta-analyses point to a modest yet consistent negative effect of diversity on collective outcomes (Stichnoth and van der Straeten, 2013;Van der Meer and Tolsma, 2014;Schaeffer, 2016;Dinesen and Sønderskov, 2018). One mechanism frequently invoked by scholars to explain this association relates to the role of in-group biases in prosocial behaviour (Tajfel and Turner, 1979;Yamagishi, Jin and Kiyonari, 1999;Yamagishi and Mifune, 2008;Balliet, Wu and De Dreu, 2014). Specifically, humans are argued to possess a psychological disposition to create social categories that partition in-group versus outgroup members and to espouse attitudes and behaviours that positively differentiate the in-group. Signals of shared group membership thus cue behavioural biases to be generous, extend trust, and cooperate in social dilemmas.
In multiethnic societies, such group biases may hamper cooperation to the extent that group boundaries are constructed along ethnic or national lines. For instance, it has been argued that individuals derive non-pecuniary benefits when co-ethnics are made better off, but remain indifferent to the welfare of nonco-ethnics (Habyarimana et al., 2007: p. 710). Other authors have posited that individuals may be better able to read the intentions and feelings of co-ethnics, with greater empathy promoting the extension of trust within ethnic boundaries (Dinesen and Sønderskov, 2015: pp. 552-553). A third line of research holds that shared ethnicity facilitates the enforcement of social norms which help to sustain cooperation and curb free-riding within the group (Fearon and Laitin, 1996;Miguel and Gugerty, 2005;Habyarimana et al., 2007;Algan, Hémet and Laitin, 2016). In summary, a diverse body of literature suggests a prominent ethnic dimension to the process of social categorization and ingroup cooperation.
Of course, the precise location of ethnic boundaries is likely to vary from society to society depending on the dominant frames supplied by politics and the popular media (Posner, 2004;Miguel and Gugerty, 2005;Wimmer, 2008;Hopkins, 2010). 3 In contemporary European societies, such major fault lines are most likely to appear between the majority native population and minorities of foreign descent. Moreover, while the overarching European discourse tends to focus on typically low-skilled migrants from non-Western countries, the specific Swiss context in which our study is embedded is notable in that high-skilled immigration from the EUand in particular Germany-has also been the subject of much political debate (Helbling, 2011;Freitag, Vatter and Mueller, 2015;Diehl et al., 2018). 4 Given this configuration of politicized groups in Switzerland, we predict that salient group boundaries exist between native Swiss on the one hand, and both 'generic' (low-status) immigrants and high-status Germans on the other. These boundaries should manifest in lower levels of prosociality displayed by members of the majority Swiss population towards both non-native groups: H1: Natives are less prosocial towards immigrants than towards fellow natives.
The proposition that prosocial behaviour is ethnically bounded has been widely cited in the literature to explain the observed negative relationship between ethnic diversity and social cohesion (Alesina, Baqir and Easterly, 1999;Alesina and La Ferrara, 2002;Habyarimana et al., 2007;Schaeffer, 2013;Dinesen and Sønderskov, 2015;Koopmans and Schaeffer, 2015). Yet so far only a handful of studies have attempted to test whether individuals do indeed condition their behaviour on the migration background of their interaction partners. One approach in this line of research uses behavioural games to measure prosociality while exogenously manipulating the identity of the opposing party. For instance, Cettolin and Suetens (2018) administer a trust game with a nationally representative sample in the Netherlands and find that native Dutch are less trustworthy when matched with a 'non-Western' immigrant. In contrast, Bouckaert and Dhaene (2004) find no effect of ethnicity on either trust or reciprocity among Flemish and Turkish small-business owners in Ghent, Belgium using a similar experimental paradigm.
Other researchers have employed field experiments to measure prosociality in 'natural' encounters where subjects are unaware of their participation in an ongoing study. One example involves the use of the 'lost-letter' technique (Milgram, Mann and Harter, 1965) which records the rate at which letters dispersed in public places are picked up and forwarded to their intended recipients. Employing this technique in Sweden, Ahmed (2010) finds that letters addressed to individuals with Muslim names were less likely to be returned compared to letters containing Swedish names. 5 Other studies have attempted to measure prosociality directly via interpersonal helping behaviour. For instance, Diekmann, Jann and Näf (2014) record the frequency by which Zurich residents provided money to a confederate ostensibly needing to purchase a bus ticket. Using a treatment manipulation similar to ours, these authors vary the dialect (Swiss-German vs. High German, corresponding to the dialect spoken in Germany) in which the request was phrased, but find no effect of German identity on helping rates.
One distinguishing feature of those aforementioned studies which do find affirmative evidence of antiimmigrant bias relates to the specific characteristics of the immigrant groups considered. For example, Cettolin and Suetens (2018) focus on non-Western immigration to the Netherlands, which is predominantly comprised of population flows from Morocco, Turkey and the former Dutch colonies (Bakker and Dekker, 2012). Importantly, these groups are stereotypically associated with low socio-economic status and educational attainment relative to native Dutch (Heath, Rothon and Kilpi, 2008). Similar characterizations could also be made of immigrants from Muslim-majority countries living in Sweden (Snellman and Ekehammar, 2005). By contrast, Diekmann, Jann and Näf (2014) consider relatively equal status groups (Swiss and Germans). In a similar vein, Bouckaert and Dhaene (2004) interpret their null results in light of the fact that Turkish and Belgian participants were recruited from the same socio-professional ranks such that status differences were likely minimized.
This pattern of findings suggests that prosociality is likely to be particularly inhibited in interactions involving low-status immigrants. In fact, immigrants' socioeconomic status has been identified as a key moderator of their acceptance by the host society. Specifically, survey research on immigration-related attitudes consistently finds that while poorly educated, low-skilled foreigners tend to bear the brunt of exclusionary sentiments, the presence of high-status 'expatriates' appears far less controversial (Hainmueller and Hopkins, 2015;Turper et al., 2015;Bansak, Hainmueller and Hangartner, 2016;Czymara and Schmidt-Catran, 2017;Diehl et al., 2018;Ward, 2019).
These findings resonate with prominent theories of intergroup conflict (Blumer, 1958;Blalock, 1967;Bobo and Hutchings, 1996) linking interethnic tensions to public concerns over the adverse economic impacts of immigration (Quillian, 1995;Semyonov, Raijman and Gorodzeisky, 2006;Schneider, 2008;Hainmueller and Hiscox, 2010;Malhotra, Margalit and Mo, 2013;Dancygier and Laitin, 2014). Under this view, immigrants provoke opposition to the extent that they threaten native jobs and increase tax burdens. While natives may be more welcoming of high-status immigrants who are perceived as better able to contribute to the economy, negative views of low-status immigrants may serve to inhibit crossethnic solidarity towards these groups in particular.
A complementary mechanism relates anti-immigrant attitudes to concerns about criminality (Fitzgerald, Curtis and Corliss, 2012). Such concerns may be particularly relevant in our experiment in which the decision to render assistance introduces a risk of one's phone being stolen. Prosocial behaviour in our context thus involves an important element of trust in confederates' benign intentions. Moreover, such trust may be particularly lacking with respect to low-status immigrants who are more likely to be associated with stereotypes about criminal behaviour (Ward, 2019), or who may otherwise be perceived as having a greater incentive to steal the phone. To the extent that such beliefs manifest in a reluctance to help others in strategic situations, this perspective as well suggests that prosociality will be particularly inhibited towards low-status immigrant groups: H2: Beyond a general anti-foreigner bias, natives are less prosocial towards members of low-status immigrant groups than towards high-status groups.
We wish to highlight here that our hypotheses concern the behaviour of natives only. In contrast, we make no predictions about the behaviour of immigrants, even though arguments about the ethnic dimension of ingroup favoritism and ethnic competition have been applied to both majority and minority groups outside of Europe. 6 While we acknowledge the importance of immigrants' contribution to the overall pattern of social cohesion in multi-ethnic communities, we believe that there are important conceptual reasons for focusing on natives' behaviour in the context of the European immigration debate. More specifically, though natives may readily differentiate fellow natives from immigrants, the precise shape of group boundaries is less clear a priori from a non-native perspective. For example, non-natives may view themselves as members of (i) an encompassing 'immigrant' social category, (ii) distinct ethnic or national groups-e.g. 'Tamils', or (iii) some intermediate grouping such as 'Southern Europeans' (Wimmer, 2004). In some cases, more established immigrants may even consider natives as part of their own in-group. Given the unclear location of group boundaries with respect to immigrants, we choose to focus our attention on the behaviour of natives alone in testing the more general theoretical ideas discussed above.

Experimental Protocol
Our field experiment was conducted on two underground platforms in Zurich's Central Train Station. Confederates approached single commuters waiting on the platform 7 and explained that they had just missed their train and were consequently going to be late for a local appointment. Further, confederates stated that they wished to phone ahead to alert their meeting partner of their tardiness, but unfortunately their own phone had just run out of power. After telling this 'cover story', confederates showed commuters a piece of paper on which were written a name and local landline number. 8 Finally, confederates requested to borrow the commuter's cell phone to place the call.
A research assistant stood approximately three meters away from this interaction and discretely recorded commuters' responses using a smart-phone app. We coded as prosocial any behaviour ranging from handing over one's phone, to offering to call on the confederate's behalf, to soliciting the aid of third persons. We also coded whether commuters were holding a cell phone prior to being approached by our confederate, as it would ostensibly be harder to justify turning down a request for assistance in such circumstances. 9 Finally, research assistants were instructed to collect additional information on the gender and approximate age of the commuter, and to make their best guess as to the commuter's nationality or ethnic background based on accent and physical appearance. We use this latter information to identify the subsample of native Swiss commuters which forms the core of our analysis.
In all, we recruited seven professional actors (five female and two male) as confederates for our study. Six of the seven actors looked to be of working age (30-50 years old), while one actor was of retirement age (70 years old). Actors' profiles are provided in Supplementary Figure S1. Confederates were instructed to dress 'naturally' such that commuters would feel comfortable when approached; however, clothing could vary slightly depending on the actor and day of the experiment. Interventions were staged in the morning between 8: 00 and 11: 00 and in the afternoon between 15: 30 and 18: 30 on various weekdays (Monday to Friday) over the period 15 May to 6 September 2018 (see Supplementary Tables S1-S3). We instructed confederates to conduct an intervention only if a train departure was not imminent in order to avoid cutting short the interaction. After each intervention, research assistants approached commuters for debriefing. Commuters were informed that, should they so wish, it was possible to delete their data from our analysis (only two people requested we do so). Our experimental procedures were approved by the University of Zurich's Institutional Review Board.

Treatment Conditions
Confederates were instructed to approach commuters using either (i) Swiss-German dialect (Schweizerdeutsch), (ii) High German (Hochdeutsch), which corresponds to the 'standard' version of German spoken in Germany, or (iii) imperfect German with a detectable accent. In online pretests, we determined that both Schweizerdeutsch and Hochdeutsch were easily recognizable by Zurich residents and readily associated with their respective national groups. In contrast, our online sample found it almost impossible to accurately distinguish between different 'immigrant' accents (e.g. Eastern European vs. Iberian vs. Turkish). As such, we allowed our confederates to freely use any immigrant accent in which they felt comfortable playing their role. 10 We wish to highlight that our experimental manipulation is designed to measure reactions towards different non-native groups. In principle, an alternative design could have investigated how prosociality is shaped by individual-level status signals (e.g. dressing up and down). We note however that in-group favoritism and group competition theory derive their predictions from group-level dynamics. In other words, the theory holds that individuals experience discrimination by virtue of their membership in a(n) (high-or low-status) immigrant group, and not because they are perceived to be individually rich or poor. We thus opted for a group-level status manipulation in order to more faithfully capture the theoretical concepts of interest.
That said, our design is not without potential drawbacks, two of which we address here. First, commuters may not associate imperfectly spoken German with socio-economic disadvantage. To address this issue, we conducted a preliminary survey experiment with an online sample of Swiss train commuters from the Zurich region. Further details on the implementation of the survey can be found in Supplementary Section S2. Survey respondents were presented with a series of pictures of our confederates matched with real voice samples of the actors reading a set of simple phrases. The voice samples existed in three distinct versions, corresponding to our three linguistic treatments. Respondents listened to one randomly assigned voice sample from each confederate, and then rated that confederate in terms of socioeconomic status. 11 This procedure allows us to estimate how perceptions of each confederate vary as a function of the dialects used by the actual actors in the field experiment. Figure 1A displays the distribution of socio-economic status ratings across the three dialects (n ¼ 882 ratings provided by 126 native Swiss respondents). 12 We have standardized and doubly-demeaned the data by (i) each confederate's average rating elicited across all dialects, and (ii) the average rating provided by each respondent across all profiles. This allows us to focus on the effect of dialect independently of both respondent-specific characteristics and idiosyncratic factors related to individual actors. We observe that confederates are rated as having significantly lower status when speaking with an imperfect German accent, as compared against either Schweizerdeutsch (b ¼ 0.77, P < 0.001) or Hochdeutsch (b ¼ 0.67, P < 0.001). (The full regression models underlying these results are presented in Supplementary Table  S6). Further, the distributions of ratings attached to Schweizerdeutsch and Hochdeutsch are statistically indistinguishable from each other (b ¼ 0.10, P ¼ 0.26). In summary, the survey experiment provides evidence that our linguistic treatments do indeed convey the intended status connotations.
A second issue relates to the possibility that our group-level manipulation may also affect factors other than socio-economic status. This raises the potential concern that high-status Germans may indeed receive differential treatment, but not because of their status per se. In the context of intergroup relations, the most plausible alternative mechanism relates to the concept of cultural distance (Semyonov, Raijman and Gorodzeisky, 2006;Dancygier and Laitin, 2014). Specifically, since Germans could be considered culturally similar to native Swiss, they may be better liked, and thus elicit greater prosociality in comparison to other immigrants. That said, prior studies have argued that Swiss actually consider Germans to be a salient 'cultural threat' despite superficial similarities and dislike them as a consequence (see Helbling, 2011 and citations therein).
We test an implication of this cultural threat idea via an additional item drawn from our pre-experimental survey measuring the likability of confederates employing different dialects. 13 As before, we standardize and doubly demean the data. The results are presented in Figure 1B. We observe that in comparison to Schweizerdeutsch, confederates are rated as significantly less likable when employing either an imperfect German accent (b ¼ À0.56, P < 0.001) or Hochdeutsch (b ¼ À0.43, P < 0.001) (see also Supplementary Table S6). Further, while confederates are rated as slightly more likable when speaking Hochdeutsch compared to imperfect German, this difference is substantively small and only marginally significant (b ¼ 0.13, P ¼ 0.08). In other words, our survey indicates that our field experimental treatment consists mainly of manipulating perceptions of socio-economic status, while Germans' putative cultural similarity to Swiss does little to increase their likability over other immigrants.
With this information in hand, we proceeded to train the confederates in accordance with the aforementioned experimental protocol. Particular emphasis during training was placed on the relevance of displaying identical behaviour (e.g. in terms of cover story, body language or friendliness) across all treatments. We stressed that it was of utmost importance to avoid influencing the likelihood of receiving help by acting differently in each role. In addition, confederates were instructed to voice their request in a clearly comprehensible manner when using the imperfect German accent to mitigate concerns that native Swiss may be less helpful because they simply do not understand the confederate's request. 14 We believe that our employment of professional actors contributed significantly to the success of the training.
Confederates systematically rotated through all of the dialects according to a pre-specified schedule. Each confederate was assigned to work six separate 3-hour shifts, consisting of two shifts per dialect. Dialects were assigned to confederates at the beginning of each shift and were retained throughout the shift's duration. We implemented this procedure because we determined in The figure shows the distribution of (standardized) socio-economic status and likability ratings from our pre-experimental survey of native Swiss train commuters from the Zurich region. Large and significant status differences exist between confederates employing an imperfect German accent versus either Schweizerdeutsch or Hochdeutsch. In contrast, the difference in likability between imperfect German and Hochdeutsch is substantively small and only marginally significant.
pretests that switching dialects after every trial distracted confederates from focusing on other aspects of their role.

Data Description
Overall, we collected information on 1,198 experimental interventions. In the main text, we report on analyses using data from 863 trials involving commuters whom we identified as native Swiss based on appearance and accent. (A description of the full dataset is presented in Supplementary Table S7). As discussed above, we focus on this restricted sample of native Swiss because we lack clear theoretical predictions about the behaviour of nonnatives. Additionally, we do not have sufficient power to analyze non-native commuters separately. Nonetheless, we do replicate all of our analyses using the full sample (n ¼ 1,198) for robustness (see Supplementary Table S12). Table 1 displays the number of interventions involving the subset of native Swiss commuters conducted in each of the three dialects: (i) Schweizerdeutsch (Native), (ii) Hochdeutsch (high-status immigrant), and (iii) imperfect German (low-status immigrant). Overall, 50.5 per cent of the native Swiss sample is male, and the sample spans all age ranges. Approximately 48 per cent of commuters were observed to be holding a mobile phone when approached by confederates. Table 1 also compares the distribution of these characteristics across language treatments and displays corresponding P values from Pearson chi-squared tests. We see that there are no statistically significant differences in basic commuter characteristics across treatments, suggesting that overall our confederates did not systematically choose to engage with different types of commuters depending upon the dialect they adopted.

Results
Native Swiss commuters rendered assistance in 68 per cent of all interventions (586 out of 863 trials). Prosocial behaviour was elicited more frequently by confederates posing as Natives (74 per cent), in comparison to trials employing any non-Native accent (65 per cent). A chi-squared test reveals this difference to be statistically significant (n ¼ 863, v (1) 2 ¼ 7.39, P ¼ 0.007). We also estimate regression models of the likelihood of receiving help with actor-fixed effects (see Supplementary Table  S8). These models allow us to capture the average difference in helping rates between Native and non-Native treatments holding confederates' characteristics constant. Results are substantively similar (b ¼ À0.10, P ¼ 0.002) and provide evidence in support of H1: native Swiss are less prosocial towards immigrants than towards fellow natives.
To test H2, we examine natives' behaviour towards high-and low-status immigrants separately. The results are shown in Figure 2A. We see that commuters do indeed differentiate between different immigrant groups: confederates employing Hochdeutsch (simulating a high-status immigrant) were helped 69 per cent of the time, compared to 61 per cent in trials involving the use of imperfect German (simulating a low-status immigrant). (Recall the helping rate elicited by confederates posing as natives is 74 per cent). To more rigorously examine these differences, we estimate linear probability Notes: The table lists the mean of each variable calculated for the sample of native Swiss commuters, as well as within each of the treatment groups. High-Status Immigrant is denoted by "High" and Low-Status Immigrant by "Low," respectively. To test that the variables are balanced across treatment groups, we also display the test statistic from Pearson chi-squared tests with two degrees of freedom and the associated P values. There are no statistically significant differences across treatments. models (LPM) of the likelihood of receiving assistance with confederate-level fixed effects. 15 Our main explanatory variables consist of dummies denoting whether the intervention took place in Hochdeutsch (high-status immigrant) or imperfect German (low-status immigrant), treating interventions conducted in Schweizerdeutsch as the baseline.
Model 1 of Table 2 displays the results of the basic fixed effects regression. We see that the coefficients on both high-status and low-status immigrant treatments are negatively signed. However, only the coefficient on low-status Immigrant is statistically significant: on average, low-status immigrants are helped about 13.6 percentage points less than native Swiss (P < 0.001), while the penalty with respect to high-status Germans is only 5.6 percentage points (P ¼ 0.147). The model also indicates a significant difference between the two immigrant groups of around 8 percentage points (P ¼ 0.04; see Model 1 in Supplementary Table S10). Overall, we take these findings as evidence in support of H2: natives are less prosocial towards members of low-status immigrant groups than towards high-status groups.
Model 2 of Table 2 adds controls for the gender and age of commuters, as well as the date and time at which interventions took place (for brevity, we do not display these coefficients in the main text. Interested readers are referred to Supplementary Table S9). In Model 3, we additionally control for whether the commuter was holding a mobile phone when approached by the confederate. This coefficient is positive and highly  Notes: The table lists coefficient estimates from linear probability models with t-statistics in parentheses ( þ P < 0.1, *P<0.05, **P<0.01, ***P<0.001, for twosided tests). All models are estimated with confederate fixed effects. Models 2 through 4 include controls for commuters' gender and approximate age, as well as the month, day of the week, and time of day during which the intervention was conducted. Full results are reported in Supplementary Table S9. significant, suggesting that it is indeed more difficult to turn down requests for assistance under these circumstances. 16 Comparing Model 1 to Models 2 and 3, the empirical picture is slightly altered as the coefficient on high-status immigrant increases in size and reaches marginal statistical significance in Model 3. In contrast, the difference between the high-status and low-status immigrant treatments shrinks slightly, and its statistical significance falls just outside the 10 per cent level (see Supplementary Table S10). We stress, however, that the main message from Model 1 remains unchanged by the inclusion of covariates: we detect a robust anti-foreigner bias for low-status immigrants, and a weaker and more fragile ethnic penalty for high-status Germans.
We conduct additional exploratory analyses to examine whether our treatment effects themselves may vary by whether commuters were holding a phone when approached by confederates. We believe that the visible presence of a cell phone may moderate our results insofar as it is easier to discriminate or behave uncivically if one can plausibly deny having the ability to help (e.g. by claiming not to carry a phone). We note, however, that this aspect of the analysis was not a part of our original experimental design, and thus we did not block treatment assignment on whether a phone was visible. Nonetheless, approximately half of all interventions occurred under such circumstances, and the proportion of commuters carrying phones is roughly similar across our treatment conditions (see Table 1).
Model 4 of Table 2 includes an interaction between our linguistic treatments and a dummy variable denoting if the commuter was holding a phone. The treatment coefficients are now interpreted as the effect of dialect in the subset of interventions where no phone was visible (n ¼ 449). Under these circumstances, we estimate that high-status and low-status immigrants are about 15 and 23 percentage points less likely to receive assistance, respectively (see Figure 2B). Both effects are statistically significant and substantively larger than the pooled results reported in Models 1 through 3. The positive and significant interaction effects reported in Model 4 indicate that the anti-foreigner penalty is mitigated in the subset of interventions where a cell phone was visible (n ¼ 414). As shown in Figure 2C, helping rates are not significantly different across treatments under these circumstances. This is also confirmed in a parallel regression where we set phone visible as the baseline category (see Supplementary Table S11).
To summarize, our analysis yields evidence in support of both H1 and H2. Native Swiss are less prosocial towards immigrants, and this effect is driven by particularly low helping rates elicited in the low-status condition. Additionally, we explored the extent to which our treatments are moderated by situational factors which plausibly affect the 'costs' of discrimination (for example, by making it harder to turn down a request for assistance). We find that the (limited) anti-foreigner discrimination we detect in the main analysis is magnified in cases where commuters can plausibly justify their decision not to help. In contrast, when an easy justification is unavailable, treatment effects disappear entirely. Interestingly, phone visibility has no effect on prosociality towards native Swiss, as shown by the substantively small and statistically insignificant coefficient on the Phone Visible dummy in Model 4 of Table 2. Our interpretation is that situational factors do not so much influence prosocial decision-making per se (cf. Dana, Weber and Kuang, 2007), but rather seem to moderate specifically the extent to which anti-foreigner bias manifests in individual behaviour (Merton, 1948;Crosby, Bromley and Saxe, 1980;Duckitt, 1992).
Finally, we conduct a battery of additional robustness checks and briefly report on the results here. Full tables are available in the Supplementary Materials. First, we replicate our results using the full dataset of 1,198 interventions 17 in place of the reduced native Swiss sample (Supplementary Table S12). Along these lines, we also examine separately the behaviour of nonnative commuters (Supplementary Table S13). While we lack sufficient observations to draw meaningful inferences (n ¼ 310), an exploratory analysis suggests that non-native commuters actually seem to reproduce the discriminatory patterns we observe amongst natives (although none of the coefficients reach conventional significance levels). In particular, the direction of the coefficients indicates that discrimination is targeted against low-status immigrants even within the restricted non-native sample. We will return to the substantive implications of these preliminary results in the concluding discussion.
In additional robustness checks, we replicate our main analysis using logistic regressions instead of the LPM (Supplementary Table S14). We also re-run our analyses using the decision to physically hand over one's phone to the confederate as an alternative operationalization of the dependent variable (Supplementary Table  S15). Arguably, this decision provides stronger evidence of prosociality, as it involves elements of both altruism and trust (e.g. that the confederate will not run away with the phone). Finally, we check the sensitivity of our results to the influence of individual confederates by dropping confederates one at a time from our analysis (Supplementary Figure S2). None of these changes appreciably alters our conclusions.

Addressing Additional Concerns
In this section, we discuss additional issues pertaining to the internal and external validity of our findings. First, Heckman and Siegelman (1993) have expressed concerns that confederates in field experiments may privately infer the purpose of the research and consequently alter their behaviour to subtly influence the results. While we cannot definitively rule out this possibility, we stress that our training emphasized the importance of maintaining consistent behaviour across all trials. We further highlight that our experiment employed multiple confederates, such that biases introduced by a single individual are unlikely to tilt the overall results. Finally, Pager (2003) attempts to quantify the scope of Heckman and Siegelman's critique in the context of employment discrimination by comparing trials employing real actors versus fictitious resumes (where there was no scope for confederates to influence the results). Pager actually finds lower discrimination in cases of direct interaction, which is the opposite result as expected by Heckman and Siegelman. Together, we believe that these considerations help to mitigate related concerns in the context of our study.
A second issue relates to the generalizability of our findings across situational domains. More specifically, the present study has examined prosociality in the context of a strategic interaction wherein helping the confederate introduces a risk of exploitation (e.g. by having one's phone stolen). We believe that such situations are inherently different from more unilateral 'altruism' scenarios represented by donations to charity or behaviour in a dictator game where little scope for opportunism exists. In the latter, notions of fairness may be highly salient, leading individuals to display greater prosociality towards low-income targets (Katz, Cohen and Glass, 1975;Liebe and Tutic, 2010;Van Doesum, Tybur and Van Lange, 2017). In contrast, such fairness concerns are absent from our study, where the 'need for help' is constant for members of both high-status and low-status immigrant groups. Instead, our experimental context may have increased the salience of stereotypes associating low-status immigrants with criminality. In such situations, we find greater discrimination against members of low-status groups. However, we acknowledge that the specific setting in which we situate our study may limit the scope of our findings and that status considerations may operate differently in other domains.
Finally, we return to an issue inherent in our decision to manipulate status at the level of groups rather than individuals. We have motivated this design choice as conceptually appropriate given our theoretical framework, but we acknowledge that it potentially compromises our ability to causally identify a status effect as status could be correlated with other group-level differences between Germans and other immigrants. We have attempted to mitigate these concerns by drawing upon our pre-experimental survey results as well as related literature (Helbling, 2011). However, future work could build upon our design to definitively address these issues (e.g., by manipulating status simultaneously at the individual-and group-levels).

General Discussion
Our article contributes to a large body of research on the consequences of diversity for social cohesion by presenting a direct test of the oft-cited, though rarely examined, proposition that prosocial behaviour in multiethnic settings is ethnically-bounded. Results from a field experiment involving Swiss train commuters demonstrate evidence of bias against both high-status and low-status immigrant groups, although the ethnic penalty is substantially larger and statistically more robust for the latter. Further exploratory analyses indicate that our results are driven by the subset of interventions in which commuters could plausibly justify withholding assistance, suggesting that situational factors shaping the 'costs' of discrimination play an important role in moderating patterns of anti-immigrant bias.
One important implication of these findings is to highlight variation in discrimination against different ethnic minority groups. These differences are often overlooked in extant research on the consequences of diversity which tends to treat all immigrants in monolithic, undifferentiated terms. Our work indicates that such differences not only matter for shaping immigration-related attitudes (Hainmueller and Hangartner, 2013;Hainmueller and Hopkins, 2015;Turper et al., 2015;Bansak, Hainmueller and Hangartner, 2016;Czymara and Schmidt-Catran, 2017;Hellwig and Sinno, 2017;Diehl et al., 2018;Ward, 2019), but also hold real behavioural consequences in interpersonal encounters. More broadly in relation to the ethnic diversity literature, our results suggest that individuals' tendency to condition prosocial behaviour on ethnicity may indeed contribute to the oft-cited negative association between diversity and collective outcomes, although the anti-foreigner penalty seems to be largely driven by natives' adverse reactions towards stereotypically low-status immigrants.
Recognizing the importance of differentiating between immigrant groups also implies a methodological rethinking of how scholars choose to operationalize diversity in empirical research. Currently, the most common approach is to measure diversity using indexes of ethnolinguistic fractionalization (ELF). 18 By construction, however, ELF is 'color-blind', in that a neighbourhood which is composed of 70 per cent Swiss and 30 per cent Germans is considered identical to a neighbourhood composed of 70 per cent Swiss and 30 per cent Albanians (for similar critiques, see Abascal and Baldassarri, 2015;Kustov and Pardelli 2018;Winter and Zhang, 2018). Yet, our results suggest that patterns of prosocial behaviour would be quite different across these areas. Capturing these differences would require researchers to move beyond aggregate diversity indexes and focus instead on the specific composition of foreign residents in multiethnic communities (Bécares et al., 2011;Laurence, 2011;Bakker and Dekker, 2012;Gundelach and Traunmü ller, 2014;Kustov and Pardelli, 2018).
A secondary implication relates to our finding that discrimination is only discernable in interventions where confederates could plausibly deny carrying a mobile phone. In these circumstances, it appears that situational factors provide an opening to engage in discriminatory behaviour. This explanation resonates with seminal findings from sociology and social psychology showing that prejudicial attitudes are more likely to manifest in behaviour when the costs of discriminating are low (Merton, 1948;Crosby, Bromley and Saxe, 1980;Duckitt, 1992). In contrast, when the costs are high (as in the case of commuters holding cell phones), social desirability may pressure individuals to act civically despite their private inclinations to the contrary. Our results therefore highlight the role of situational factors in mediating the mapping between preferences (in this case, to avoid helping foreigners) and actions. Building from this finding, future research could more fully explore additional influences on the 'costs' of discrimination which may potentially inhibit the expression of anti-foreigner sentiment in nativeimmigrant encounters.
Future work may also extend the present study by examining how immigrants' behaviour is influenced by the ethnicity of one's interaction partners. More research taking account of immigrants' perspectives is needed since immigrants make up a large proportion of potential interaction partners in multiethnic neighbourhoods and thus contribute significantly to overall patterns of solidarity and cooperation. Moreover, it is possible that immigrants act more prosocially towards other immigrants, thereby partially compensating for the negative reactions of natives and cushioning the overall detrimental effects of diversity on collective outcomes. 19 That said, our exploratory analysis suggests this not to be the case: if anything, non-native commuters appear to reproduce the discriminatory patterns we observe amongst natives. These findings resonate instead with research in social psychology showing that the ethnic or racial hierarchies articulated by the dominant group tend to become embedded in society more broadly and even accepted by members of subordinate groups (Hagendoorn, 1995;Sidanius and Pratto, 2001;Snellman and Ekehammar, 2005). Future research could address these issues more definitively by employing larger non-native samples.
The present paper sidesteps these issues by examining how ethnicity shapes natives' behaviour. Accordingly, our analysis focuses on anti-immigrant discrimination as a key challenge to the cohesiveness of contemporary multiethnic societies. Here, it is important to highlight that the discrimination we document occurs in the context of anonymous, one-shot interactions. However, research drawing from theories of intergroup contact (Allport, 1954;Pettigrew and Tropp, 2006) has shown that the negative effects of diversity can be significantly mitigated via meaningful and sustained cross-ethnic interaction (Marschall and Stolle, 2004;Stolle, Soroka and Johnston, 2008). By extension, future research might fruitfully investigate whether the patterns of discrimination we uncover also obtain in other types of encounters (e.g. between coworkers, schoolmates, or neighbours) which are neither anonymous nor one-shot.
Finally, although we provide evidence in support of a key theoretical mechanism linking immigration to undesirable collective outcomes, we do not read our results to advocate for the benefits of ethnic homogeneity over diversity. Importantly, it is widely acknowledged that diversity contributes positively to economic growth, innovation, and competitiveness, and that immigration to advanced-industrial countries is needed to offset the impending fiscal effects of aging populations (Surowiecki, 2005;Putnam, 2007;Page, 2008;Lorenz et al., 2011). We believe that it is vitally important to keep sight of these benefits in the current debate about the consequences of diversity for contemporary European societies.

Notes
1 Other studies do test this proposition as applied to inter-group relations more broadly via behavioural experiments involving inter-alia Ashkenazi versus Eastern Jews in Israel (Ferschtman and Gneezy, 2001); Muslims, Croats, and Serbs in Bosnia (Whitt and Wilson, 2007); Ugandan ethnic groups (Habyarimana et al., 2007), and blacks and whites in the United States (Abascal, 2015;Simpson, McGrimmon and Irwin, 2007). 2 In addition to studies focusing directly on prosociality, there is also a vast field experimental literature on discrimination in housing and employment (see Auspurg, Schneck and Hinz 2019;Zschirnt and Ruedin, 2016). 3 For example, Posner (2004) shows how the activity of political entrepreneurs renders the cultural cleavage between the Chewa and Tumbuka peoples a highly salient ethnic boundary in Malawi, while the same cultural division holds little significance in neighbouring Zambia where a different political calculus prevails. 4 For instance, the passage of the referendum 'against mass immigration' (Eidgenö ssische Volksinitiative 'Gegen Masseneinwanderung') in 2014 was targeted primarily towards limiting the free movement of EU citizens to Switzerland. 5 However, this effect only appears when the envelopes contained money, such that finders of the lost letters had an incentive to keep the mail. Without financial incentives, the return rates for Muslim and Swedish recipients was similar. Similar (null) results are reported from un-incentivized lost letter experiments in Berlin (Koopmans and Veit, 2014) and Zurich (Diekmann et al., 2014). 6 In addition to the studies listed in Footnote 1, see also Abascal (2015) and Bobo and Hutchings (1996). 7 Specifically, confederates we instructed to alternate between platforms after every intervention. This procedure was designed to ensure that a new trial was not begun on the same platform until the train arrived and the platform was cleared of passengers. Upon entering the platform, confederates initiated the intervention by approaching the first single person they encountered. 8 The name of one of the co-authors was used. The number we showed corresponded to an actual landline at the University of Zurich. However, the phone was physically disconnected in order to avoid registering and recording incoming calls. In order to ensure that commuters understood that the call would be placed to a local number, confederates were explicitly instructed to state their request to borrow a phone only after clarifying that they had a local appointment. This procedure was designed to ensure that commuters would not worry about incurring potentially high costs for a non-local call. This was particularly relevant in trials involving low-status immigrants where commuters could otherwise have been apprehensive about a potentially expensive phone call abroad. 9 Although the vast majority of Swiss residents own mobile phones (Y&R Group, 2017), we cannot exclude the possibility that commuters not holding phones when approached by confederates may genuinely not have a phone with them. This provides an additional reason to control for phone visibility in our analyses. We are grateful to an anonymous reviewer for raising this point. 10 Importantly, we instructed confederates to avoid using either Italian or French accents, as these could be associated with autochthonous language groups from other parts of Switzerland. 11 We employed a version of the common MacArthur Scale of Subject Social Status, in which respondents were presented with a picture of a 10-step ladder, along with the following text: 'Think of this ladder as representing where people stand in Switzerland. At the top of the ladder are people who are the best off -those who have the most money, the most education and the most respected jobs. At the bottom are the people who are the worst off -who have the least money, least education, and the least respected jobs or no job. The higher up one is on this ladder, the closer they are to the people at the very top; the lower one is, the closer they are to the people at the very bottom.
Where would you place [the confederate] on this ladder'? 12 Replication data and code for all analyses reported in the main text and Supplementary Materials are available through the Open Science Framework at https://osf.io/9tnmf/. DOI 10.17605/OSF.IO/ 9TNMF 13 We adapted this question from the American National Election Survey feeling thermometer. Specifically, respondents were asked to indicate their feelings towards confederates using a thermometer degree measure. Higher scores represent warm or favourable feelings, and lower scores represent cold or unfavourable feelings. 14 To directly examine this possibility, we conducted online pretests in which respondents were exposed to a short, randomly selected sound sample employing either (i) Schweizerdeutsch, (ii) Hochdeutsch, (iii) an Italian accent, (iv) a Spanish accent, (v) an Arabic accent, or (vi) an Eastern-European accent. Respondents were then presented with a list of statements and asked to select the statement corresponding to the sound sample they just heard. Roughly 80-90 per cent of respondents identified the correct statement regardless of the accent employed in the sound sample, and a chi-squared test reveals no significant differences across the six treatment conditions (results not shown). Thus, based on our pretests, we do not believe that commuters would have problems understanding confederates in the field simply because a foreign accent was used. 15 Following Mood (2010), we opt for LPM in the main text to facilitate the presentation of our results. Supplementary Table S14 replicates our main results using logistic regressions. 16 Strictly speaking, lower helping rates obtaining in the 'Phone not visible' condition could also result from the fact that commuters might genuinely not have a phone with them. See footnote 9. 17 In addition to 863 interventions involving native Swiss commuters, we also coded 310 interventions involving non-Swiss commuters, as well as 25 interventions where we could not confidently assess the commuter's background. 18 ELF is commonly interpreted as measuring the probability that two randomly drawn individuals from a population will belong to different ethnic groups (Fearon, 2003). 19 We are grateful to an anonymous reviewer for raising this point.

Supplementary Data
Supplementary data are available at ESR online.