Abstract

We use a natural field experiment to estimate the causal effect of race on discretionary favours in the marketplace. Test customers are randomly assigned to board public buses with no money to purchase a fare, leaving the bus driver to voluntarily decide whether to offer them a free ride. Based on 1,552 transactions, we uncover strong evidence of racial bias: bus drivers were twice as willing to let white testers ride free as black testers (72% vs. 36% of the time). Signals of wealth and patriotism improve minority testers’ outcomes. Our results show that white privilege extends to unregulated daily interactions.

A grocery store worker is not allowed to hand out goods free of charge unless the customer has made a formal payment, with the latter transaction also being a norm in most societies. A bank manager is required to reject all loan applicants below the minimum credit score. Similarly, a bus driver is obliged by company rules to make sure that all boarding passengers have a valid ticket before being allowed onto the bus. And, a police officer is required to issue a monetary fine to any motorist exceeding the speed limit. Yet, when they have to make unmonitored decisions about many cases, do such decision makers voluntarily provide favours? And, if so, do they reward and accommodate some citizens relatively more than others?

This study tests for discretionary favours, i.e., private accommodations, in everyday consumer transactions. In our audit study, hired test customers are randomly assigned to board public buses where they present a travel card with a zero monetary balance and subsequently ask the bus driver if they can have a free ride to a bus stop that is an average distance away. While the public bus company’s official rules and policies discourage employees from providing a service free of monetary charge, we find that close to two-thirds of the observed bus drivers (N = 1,552) do actually grant such favours, and predominantly to lighter-skinned customer groups.1

The possibility and presence of such hidden and unregulated discriminatory gifts in the marketplace was noted almost two decades ago by Ian Ayres (2001), when he surmised that ‘retailers’ willingness to make private accommodations may be an important locus of disparate racial or gender treatment’ (p. 8). To this end, the degree of discrimination and unequal treatment in society is then largely understated. Moreover, it becomes difficult for discriminated citizens to complain about missing out on everyday favours even when that does lead to significant economic and social losses (see Small and Pager, 2020). Yet, society as a whole can decide to adopt norms and regulations that increase equal treatment, but it needs evidence. Therefore, uncovering the extent and determinants of such discretionary accommodations in real-world transactions is important for both scholars and policymakers.

Our tests reveal strong evidence of racial discrimination: bus drivers were twice as willing to let casually dressed white testers ride free as casually dressed black testers (72% vs. 36% of the time). Indian testers were accepted at 51%, while Asian testers were treated similarly to whites, being offered a free ride 73% of the time. Such racial bias against black citizens still exists after we control for a number of other decision variables including the subject’s age, gender and race. Based on the dyadic data, we find no evidence of own-group bias: bus drivers were just as likely to grant free rides to other-race customers as they were to own-race customers. However, black testers were rejected at the highest relative rate by white bus drivers, and the least by black drivers, who were overall race neutral in their decisions.

Our experimental design is simple and exploits an often-occurring standard situation: 29 trained testers, from four distinct racial and ethnic groups (white, black, Asian, Indian), followed an extremely short script by which they were instructed what to say in a single sentence. They were told to refrain from emotive nonverbal behaviour, with random checks implemented by other experimenters to ensure that these instructions were followed. With around 32,000 bus drivers employed across the studied region in Queensland, Australia; 72.8 million bus trips recorded annually; and 63,000 weekly available bus services, it was possible to conduct a large number of repeated interactions without running the risk of substantially altering the overall decision environment.

A key feature in our field experiment is that the bus drivers have only a few seconds to make a decision regarding a person standing right in front of them. This makes it ideal for examining whether statistical reasoning (Phelps, 1972; Arrow, 1973) is used when making ultimate judgements about others: if racial animus is the main motive, then variations in the attire or displayed status of the test customers should not matter. On the other hand, if statistical reasoning is used, whereby a bus driver uses a customer’s race to proxy other unobservable but relevant characteristics, then such added information could make a notable difference to any observed bias.

Even though no money exchanges hands, there is still the possibility of bus drivers using racial appearance to make inferences about expected profitability. The driver may believe that some customer groups will pay the full fare if he or she insists, and therefore could make a calculation of this happening based on the customer’s skin colour. This is broadly in line with decision elements present in a number of related studies where actual monetary quotes and payments (Ayres and Siegelman, 1995; List, 2004; Balafoutas et al., 2013; Castillo et al., 2013; Li et al., 2018), or worker evaluations are made (Booth et al., 2012; Leibbrandt and List, 2014; Hedegaard and Tyran, 2018).

Moreover, bus drivers might rely on a customer’s skin colour to make predictions about other personal characteristics including the latent level of trustworthiness, aggressiveness and overall worthiness. Such individual traits could be of some importance given, for instance, the frequent antisocial behaviour witnessed and reported by bus drivers in this region.2 Bus drivers might then, for example, grant more free rides to white customers if they believe them to be more honest than minorities. The relative level of bias would then derive from the weight given to the social norm to reward honesty versus the weight given to the norm of being blind to race. Recent field evidence suggests that perceived trust plays an important role in discriminatory behaviour. Doleac and Stein (2013) find that respondents to online advertisements from black sellers tend to display lower levels of trust: they are less likely to share their name, less likely to allow delivery by mail and more likely to express concern about making a long-distance payment. Similarly, Zussman (2013) shows ethnic bias to be strongly correlated with the belief that members from a different group are more likely to cheat. And, Cettolin and Suetens (2018) report ethnic minority group members to be less trustworthy than majority citizens in an experimental giving context.

In the same way that a car salesman might gauge a buyer’s likelihood to push for a lower price (Ayres and Siegelman, 1995), bus drivers may use race to predict the likelihood that a rejected customer would retaliate or argue with them. Perhaps they might believe that minorities are used to rejection and will not make a scene if turned away, but that white citizens, and especially those of higher status, would push back and waste more of the driver’s time. Seidel et al. (2000) show that black employees tend to negotiate significantly lower salary increases than white employees. In the present context, it would then be relatively low-cost for bus drivers to reject a black customer, but high-cost to reject a white customer.3

If such forms of statistical concerns are a key reason for discretionary accommodations, we should then observe different levels of acceptances once we add positive signals of socio-economic status and trust. Also, we should be able to explain the captured variation in free rides by measuring some of the observable traits inherent in our specific group of hired testers. These signals, either added or observed, should have no real impact on decisions and group outcomes in the presence of bus driver animosity based on pure psychological distaste (Becker, 1957).

In terms of measuring tester appearances, we conducted a survey in which we asked an independent group of observers to rate actual photographs of our testers on levels of attractiveness, trustworthiness and aggressiveness. We then add these subjective ratings to our formal regression equations to isolate their effect from that of race. In terms of direct signals, our audit methodology allows us to manipulate the economic status and patriotism levels of customers. First, to increase perceived wealth, some of the studied interactions involved testers wearing business suits and carrying a briefcase. Second, we had some testers dress in a military costume for a subset of their interactions, which we interpret as signals of patriotism and trustworthiness. The outcomes from each of these treatment variations are then compared to those from the baseline treatment in which testers simply wore plain casual clothing.

Compared to baseline interactions, black testers are granted 85% more free rides when they wear a business suit, and 115% more free rides when they wear a military uniform. However, these signals are not enough to make bus drivers accommodate black customers to the same extent as white customers, with black testers dressed in business or military attire being treated similarly to casually dressed whites. For example, testers wearing army uniforms were allowed to ride free 97% of the time if they were white, but only 77% of the time if they were black. On the other hand, initial disparities between Indian and white testers are completely eliminated following improved signals of wealth and patriotism. These strong treatment effects, coupled with the fact that both black and white drivers prefer white customers, suggest that the observed racial bias against blacks is largely driven by statistical judgements on socio-economic status and worthiness presumed from their skin colour, rather than racial animus generated by their skin colour.

To shed further light on the psychology behind subject decisions, and quantify potential differences between revealed and stated preferences, in the final part of the study we conduct a complementary survey of random bus drivers at appropriate resting stations across the city. The self-reported survey depicted a hypothetical version of the same interaction as in the field experiment, whereby each bus driver was shown a photograph of a test customer and asked if they would be willing to voluntarily accept the individual onto the bus without any monetary payment. The survey results strongly contradict the actual behaviour observed in the field: more minority testers, namely blacks and Indians, were offered hypothetical free rides than white testers. When subsequently asked which reasons were most important for their decisions, the surveyed bus drivers noted customer honesty, worthiness and the propensity to cause trouble.

Our finding that white customers are up to 50 percentage points (roughly 100%) more likely to be accepted than black customers is markedly greater than the white privilege documented in other markets and public services where disparate treatment is already illegal. For example, Bertrand and Mullainathan (2004) find a 50% gap in callback rates between black and white job applicants in the USA. Using the same fictitious CV approach, Booth et al. (2012) uncover a racial gap of 35% between white Australian and dark-skinned (Indigenous) applicants. Ewens et al. (2014) show that landlords in the US apartment rental market prefer white applicants, over similar blacks, with 16% more positive responses. A recent correspondence study by Giulietti et al. (2019) examines email interactions between fictional citizens and various public service employees across the USA, including local public library and sheriff-office staff. The authors report emails signed by a white name to be 5% more likely to receive a response than emails signed by a distinctively black name.4 Our observed racial gap is also notably wider than the 16% advantage that white citizens realise over blacks in less regulated parts of the modern economy where, for example, smaller landlords such as Airbnb hosts are unlikely to be reached by anti-discrimination laws (Leong and Belzer, 2016; Edelman et al., 2017).

From a methodological point of view, we add to existing field experimental designs that incorporate communication and social cues into bilateral face-to-face interactions by having potential recipients openly ask for kindness (List and Price, 2009; DellaVigna et al., 2012; Andreoni et al., 2017). Such social encounters and traditional day-to-day transactions are emotionally quite different, and strictly less impersonal, than other delayed or written correspondences in which decision makers are able to hide behind a communication device when deciding upon recipient outcomes. The latter includes, for example, the recent work on racial and gender discrimination in transportation network companies such as Uber (e.g., Ge et al., 2016). Our simple audit design, with minimal interaction time, also overcomes and limits some of the common criticisms (e.g., Heckman, 1998) and shortcomings of audit tests into discriminatory behaviour.

The rest of the paper is organised as follows: Section 1 details the experimental design and treatment variations, as well as the key advantages and disadvantages of our audit study approach. Section 2 reports the empirical results, along with a basic valuation of the uncovered white privilege. Section 3 concludes with a discussion of the main findings and relevant policy implications.

1. Field Experiment

1.1. Experimental Design

Our experimental design involved a set of testers who boarded public buses in the city of Brisbane, Australia, in possession of a bus travel card with a preset balance of zero dollars. This travel card was blue in colour, indicating that the customer was an adult over 18 years of age. After scanning an empty card upon entrance, the ticketing system automatically displays a red flashing signal, along with a loud sound, that informs both the bus driver and customer that the travel card has zero balance, requiring the customer to either pay for the intended trip in cash or otherwise exit the bus.5 At the time of the experiment, the bus fare was 4.50 AUD for travel within a single zone of the city.

Hired testers made a solitary statement to the bus driver after scanning their travel card: ‘I do not have any money, but I need to get to the [X] station’. The X station would refer to a stop that was not within close walking distance for the individual, but around 2 kilometres away.6 This medium-range distance was chosen to avoid bus drivers rejecting testers due to the required travelling distance being either too short or too long. Following this statement, the bus driver is then left with an ‘accept’ or ‘reject’ decision. If the tester is let on (accepted), he or she enters the bus and records this decision, along with a number of other observable driver and field characteristics. Otherwise, the rejected tester disembarks from the bus and records the same set of variables while waiting for the next bus to arrive.7

Public bus stations in the studied region usually have buses arrive every five to 15 minutes and consist of multiple platforms and routes which customers can take, making the waiting time of a rejected tester relatively short and enabling them to record around six to eight observations per hour. After being randomly assigned a starting station, testers were allowed to consider both a ‘sequential’ and ‘circular’ path when collecting the data; where in the first case they would start at one bus station and sequentially make their way to a terminal station, stepping off and waiting for a different bus after each decision. Alternatively, under the second approach, subject decisions would be observed at bus stops A and B only, where testers who received a favourable treatment at stop A would exit the bus at stop B, then cross the platform and wait to board another bus headed back to station A.

In this field setting, approximately 78.2 million annual bus trips are made by travellers on 63,859 available weekly services (in 2012), operated by the 32,000 registered bus drivers across the state.8 Employed bus drivers are assigned a different starting depot and bus route to follow on a daily basis as part of the internal roster system, adding further to the randomisation of our tester–subject pairings. Natural elements and variables such as weather conditions, traffic flows, driver turnover and absenteeism also combine to increase the randomness of the studied interactions.

In terms of the legal situation, the agreement between bus drivers and their employer discourages drivers from granting free rides.9 This company rule is made clear to all boarding passengers by a written sign near the entry of the bus. However, a bus driver does have the discretion to grant free rides, and is even mandated to do so if the customer is under 18 years of age. The travel card is of a different colour for underage customers than for adult ones. Bus drivers could clearly observe from the specific card that our testers were not underage. Importantly, the ‘No child left behind’ policy (first introduced in 2003) has over time exposed bus drivers to the regular practice of making real-time judgements and granting free rides to customers on a daily basis.10 It comes about as the act of letting schoolchildren ride free has carried over to encounters with older students and adult citizens (starting with the parents of the underage travellers), and then naturally extended to all other customer types.11

Our testers were shown several examples of bus trips and distances between particular stations that were acceptable and consistent with their script. They were then randomly assigned different starting times and stations, with enough variation to capture different parts of the city. To avoid any possible suspicion or repeat encounters with the same bus drivers, testers were also asked to take regular breaks (every 20 to 30 minutes) during which they would leave the bus station entirely. Upon their return, testers were instructed to then randomly choose another platform and bus route number, and attempt to board the next bus that arrives. About 400 unique bus routes are active in the city of Brisbane (ranging from 10 to as many as 50 at most major stations), making each resulting travel path and subject–tester pairing quite arbitrary.12

Because of notable variations in service demand and supply across weekdays and weekends, the field experiment was conducted only during regular weekdays between 8 a.m. and 8 p.m. While public bus services and schedules on any regular weekday are fairly constant, weekend services tend to be much less frequent and attract somewhat different customer groups. On each field day, two testers were randomly assigned into the first session (from 8 a.m. to 2 p.m.), and similarly two different testers were assigned into the second session (from 2 p.m. to 8 p.m.). To make sure that testers did not initially end up near the same starting positions, one tester was first assigned a random starting station somewhere in the city (e.g., north of the city centre). The other tester was then randomly assigned a starting point on the opposite side (e.g., south of the city centre). On average, testers spent up to three hours (with regular breaks) in the field during each session.

Our field experiment was mainly conducted within the first three zones or greater city regions, which cover a radius of approximately 20 km from the city centre. This inner-city geographical area, and its comprising suburbs, is fairly homogeneous in terms of socio-economic profile. Unfortunately, since the used travel cards were never registered (online) prior to data collection, we were later unable to access the detailed travel histories and routes taken by our testers. It would have been useful to incorporate such information into the empirical analysis by, for example, controlling for station and route fixed effects. This was not possible as we only realised the above technical issue near the end.13

1.2. Participants

In total, 29 testers participated in the field experiment between May 2011 and June 2012, consisting of university students from various faculties as well as non-student members of the outside community. The average tester was 23.6 years of age, with the youngest and oldest being 19 and 32 years old, respectively (see Table 1). In terms of racial or ethnic background, six of the testers were white (white Australian, American, European); 12 were Asian (Chinese, Malaysian); six were Indian (subcontinental); and five were identified as black (Indigenous Australian, African, African American, Pacific Islander). There were three females and three males in the white group; six females and six males in the Asian group; two females and four males in the Indian group; and two females and three males in the black group.

In addition to the main decision variable, the hired testers recorded a set of observable bus driver and field characteristics. The former included the gender, perceived age and racial appearance of drivers. The field variables collected were the time of day (day or night), weather conditions (sunny, cloudy or rain), and an indicator for the number of other customers already inside or boarding the same bus. As expected, the bus driving profession is male dominated, with only 16% of subjects being female. There were also slightly more older drivers (59%) than younger ones. In regard to skin colour, 79% of the drivers were white; 10% were Asian; 6% were black; and 5% were Indian.

1.2.1. Tester appearances

Table 1 also contains subjective ratings of tester appearances. Each tester’s photograph was rated on a scale from 1 (very unattractive) to 7 (very attractive) by 40 random raters, balanced by gender.14 Raters were also asked whether or not the shown individual appeared to be aggressive and trustworthy. Overall, the results indicate that random raters perceive our black and white testers to be of similar beauty and trust levels. On the other hand, black testers are judged to be more aggressive than white testers, while Indian testers are seen as the least aggressive group. We also find no evidence of the raters displaying any distaste for testers outside their own racial or ethnic group. White raters find black testers to be on average no less attractive than white testers. At the same time, white raters view blacks as the most aggressive group, and Indians as the least aggressive group. In terms of trustworthiness, we also fail to find any statistical evidence suggesting that whites trust black testers to a lesser degree than testers from other racial groups. These evaluations are strongly consistent across raters, with corresponding calculations of Cronbach’s alpha equal to 0.945, 0.817 and 0.818 (see Online Appendix A).

Table 1.

Tester, Subject and Field Characteristics (N = 1,552).

VariableDescriptionMeanSDMin.Max.
Tester
AgeYears of age23.733.951932
Gender= 1 if male0.640.4801
Race/ethnicity= % white0.24
 % Asian0.31
 % Indian0.25
 % black0.20
Attractiveness= 1 to 7 scale3.940.682.805.35
Aggressiveness= 1 if aggressive0.120.090.030.48
Trustworthiness= 1 if trustworthy0.650.140.350.95
Subject (bus driver)
Age= 1 if mature0.590.4901
Gender= 1 if male0.840.3701
Race/ethnicity= % white0.79
 % Asian0.10
 % Indian0.05
 % black0.06
Field variables
Busy period= 1 if yes0.260.4401
Time of day= 1 if day0.890.3101
Weather conditions= % sunny0.71
 % cloudy0.20
 % raining0.09
VariableDescriptionMeanSDMin.Max.
Tester
AgeYears of age23.733.951932
Gender= 1 if male0.640.4801
Race/ethnicity= % white0.24
 % Asian0.31
 % Indian0.25
 % black0.20
Attractiveness= 1 to 7 scale3.940.682.805.35
Aggressiveness= 1 if aggressive0.120.090.030.48
Trustworthiness= 1 if trustworthy0.650.140.350.95
Subject (bus driver)
Age= 1 if mature0.590.4901
Gender= 1 if male0.840.3701
Race/ethnicity= % white0.79
 % Asian0.10
 % Indian0.05
 % black0.06
Field variables
Busy period= 1 if yes0.260.4401
Time of day= 1 if day0.890.3101
Weather conditions= % sunny0.71
 % cloudy0.20
 % raining0.09

Notes: Total number of observations is N = 1,552. Sample (mean) values represent the portion of transactions that involved each group or category. There were 29 testers in total. Subjects (decision makers) are the bus drivers. Bus driver is defined as young if perceived age < 45; mature if perceived age ≥ 45. Busy period is a dummy for the number of other customers on the bus: 1 if more than half-occupied bus, 0 otherwise. Racial groups are defined by skin colour as: white (white-Australian, American, European); Asian (Chinese, Malaysian); Indian (subcontinental); black (Indigenous Australian, African, African American, Pacific Islander). Each tester’s photograph was rated on a scale from 1 (very unattractive) to 7 (very attractive) by 40 raters, balanced by gender. Raters also stated whether or not (⁠|$1/0$|⁠) the shown tester could be perceived as ‘aggressive’ and ‘trustworthy’. Attractiveness, aggressiveness and trustworthiness scores are averaged over raters.

Table 1.

Tester, Subject and Field Characteristics (N = 1,552).

VariableDescriptionMeanSDMin.Max.
Tester
AgeYears of age23.733.951932
Gender= 1 if male0.640.4801
Race/ethnicity= % white0.24
 % Asian0.31
 % Indian0.25
 % black0.20
Attractiveness= 1 to 7 scale3.940.682.805.35
Aggressiveness= 1 if aggressive0.120.090.030.48
Trustworthiness= 1 if trustworthy0.650.140.350.95
Subject (bus driver)
Age= 1 if mature0.590.4901
Gender= 1 if male0.840.3701
Race/ethnicity= % white0.79
 % Asian0.10
 % Indian0.05
 % black0.06
Field variables
Busy period= 1 if yes0.260.4401
Time of day= 1 if day0.890.3101
Weather conditions= % sunny0.71
 % cloudy0.20
 % raining0.09
VariableDescriptionMeanSDMin.Max.
Tester
AgeYears of age23.733.951932
Gender= 1 if male0.640.4801
Race/ethnicity= % white0.24
 % Asian0.31
 % Indian0.25
 % black0.20
Attractiveness= 1 to 7 scale3.940.682.805.35
Aggressiveness= 1 if aggressive0.120.090.030.48
Trustworthiness= 1 if trustworthy0.650.140.350.95
Subject (bus driver)
Age= 1 if mature0.590.4901
Gender= 1 if male0.840.3701
Race/ethnicity= % white0.79
 % Asian0.10
 % Indian0.05
 % black0.06
Field variables
Busy period= 1 if yes0.260.4401
Time of day= 1 if day0.890.3101
Weather conditions= % sunny0.71
 % cloudy0.20
 % raining0.09

Notes: Total number of observations is N = 1,552. Sample (mean) values represent the portion of transactions that involved each group or category. There were 29 testers in total. Subjects (decision makers) are the bus drivers. Bus driver is defined as young if perceived age < 45; mature if perceived age ≥ 45. Busy period is a dummy for the number of other customers on the bus: 1 if more than half-occupied bus, 0 otherwise. Racial groups are defined by skin colour as: white (white-Australian, American, European); Asian (Chinese, Malaysian); Indian (subcontinental); black (Indigenous Australian, African, African American, Pacific Islander). Each tester’s photograph was rated on a scale from 1 (very unattractive) to 7 (very attractive) by 40 raters, balanced by gender. Raters also stated whether or not (⁠|$1/0$|⁠) the shown tester could be perceived as ‘aggressive’ and ‘trustworthy’. Attractiveness, aggressiveness and trustworthiness scores are averaged over raters.

We include each tester’s average appearance rating as an added control in our regression equations, under the assumption that these same judgements are held by the studied bus drivers in the field. We are, however, aware of the possibility that such perceived individual traits might be viewed differently across the two types of decision contexts. That is, perceived trustworthiness and overall cooperation rates could vary when subjects evaluate a still photograph compared to having an actual in-person interaction (e.g., Redcay et al., 2010). Survey respondents (or raters) may be less inclined to discriminate between majority and minority members because of social desirability, or simply because the presented individual in the photograph is not seeking anything from them. The same decision maker, however, may act differently when coming face to face with the same agent openly asking for a favour.

1.2.2. Balance tests

Table B1 presents the distribution of interactions by tester and bus driver race. A chi-squared test rejects the null hypothesis of symmetry, indicating that the realised randomisation across the four tester and subject racial groups was not perfect. Table B2 shows average values for the observed bus driver and field characteristics broken down by tester race. An F-test of the joint hypothesis that all nine means are equal across the four tester types is rejected at the 1% level.

There are three main imbalances in the pre-treatment variables that we need to account for in our formal regression analysis. First, the probability of interacting with a white driver was relatively lower for black testers than for other testers. Specifically, the share of white drivers observed was the lowest for black testers at 63%, compared to about 83% for lighter-skinned tester groups. Second, the proportion of Indian drivers encountered was highest (by about 10 percentage points) for black testers. Third, a similar observation holds for the share of interactions between black drivers and black testers; with the probability of observing a black driver being on average the highest for a black tester.

On the other hand, the age and gender attributes of bus drivers appear to be equally distributed across tester racial groups. Indian testers were observed relatively more during busier periods. While black testers had the lowest relative share of daytime encounters. White testers also appeared in fewer clear weather interactions than the other tester races.

What do these imbalances in pre-treatment variables imply for our estimates of racial bias? Given that black customers are treated less favourably than whites, the first imbalance above would imply a downward bias in the estimated racial bias against blacks, as black testers could be ‘avoiding’ white bus drivers. For the second and third imbalances above, the finding that Indian drivers accept fewer black customers than do black drivers would lead to respective biases in the opposite directions, i.e., neither a strong upward nor downward bias in the estimated racial disparities.

To account for these asymmetries in the collected data, we include the observed bus driver and field characteristics as right-hand side controls in our regression equations. It should be noted, however, that it is more difficult to achieve balance in pre-treatment variables in ‘live’ audit tests compared to written correspondences; which are primarily aimed at geographically static and hidden decision makers inside private and public organisations. This is because in most audit studies the researcher has much less control over the arrival rates of different subjects on the other side of the transaction. Perhaps, this is why almost none of the existing audit studies on discriminatory behaviour report any statistical tests of balance, especially with respect to the race of their subjects (e.g., Ayres and Siegelman, 1995; Neumark et al., 1996; Pager, 2003; List, 2004; Currie et al., 2011; Castillo et al., 2013; Gneezy et al., 2012).15

1.3. Treatments

Given the recognised effect of clothing on first impressions of strangers (Davis and Lennon, 1988), we aimed to manipulate the perceived socio-economic status and patriotism level of customer groups. These group-level outcomes are then compared to the findings in the baseline treatment during which our testers simply wore plain casual clothing. To ensure comparability across treatments, the same testers were used throughout each variation.16

1.3.1. High income

In the high-income treatment, selected testers from each racial group wore a business suit and carried a briefcase, signalling white-collar employment and increased socio-economic status. While there is laboratory evidence suggesting that unselfish subjects tend to mainly help low-income individuals in situations where trust plays little role (e.g., Charness and Rabin, 2002), the bus drivers in our context might be more concerned with whether or not they believe the testers, and whether the interacting customers adhere to mainstream values and norms, both of which are likely to be positively correlated with higher status (Akerlof and Kranton, 2000; Goette et al., 2006; Frijters, 2013). Prior studies by Alesina and La Ferrara (2002) and Glaeser et al. (2000) find minority group membership and low socio-economic status to be associated with lower levels of trust. Similarly, in a city-wide field experiment, Falk and Zehnder (2013) show that people tend to trust strangers of higher status relatively more than those of lower status. Together with the findings by Doleac and Stein (2013), Zussman (2013) and Cettolin and Suetens (2018), which show minority agents to be less trusted in the marketplace, we would then expect the willingness of bus drivers to accommodate black and Indian customers to significantly rise following improved signals of economic status.

1.3.2. Patriotism

In the patriotism treatment, our testers assumed the role of members from the national defence force by wearing a replica of the Australian national army uniform. Such an explicit signal of patriotism could motivate kind behaviour in a number of similar ways.17 First, it can be seen as a symbol of accepting the local culture to the point of being willing to defend it, triggering a reciprocal response from the bus driver if he or she feels part of the protected group. Second, being employed by the national army is an informative signal in that the army does not enrol members with any notable criminal history or mental imbalance.18 The latter two attributes are not strictly bound to citizens of high income or wealth. Finally, a military attire may also be associated with a perception of potential threat in case of a rejection. This threat could come from either the customer or onlooking passengers, whereby rejecting a member of the national defence force could lead to condemnation from others. In this case, the bus driver may then be influenced by social desirability, regardless of his or her personal beliefs and social preferences.

1.4. Pros and Cons of the Audit Study Approach

While both audit and correspondence studies have their strengths and weaknesses (see Riach and Rich, 2002; Pager, 2007; Guryan and Charles, 2013), an in-person audit approach provides us with several important elements for the research question at hand.

First, we are primarily interested in everyday face-to-face interactions, which arguably entail a different psychology to the completely impersonal choices made by decision makers when reading a name from a CV, email or rental application. Also, by using skin colour to explicitly signal a customer’s race, we overcome the issue of using names as a proxy, where it has previously been argued that subjects could view common black names as a signal of low socio-economic status rather than race (Fryer and Levitt, 2004). In our audit study, we are able to directly alter and control a customer’s visible economic status.

Second, our audit test allows us to track transactions to their ultimate completion: with testers being either granted or denied a free ride. This is a notable advantage over most correspondence and other audit studies that only capture intermediate or partial market outcomes, such as the number of job interview or rental callbacks received. In such studies, the researcher never observes how many of the fictitious applicants would have been offered a job, or the salary level they would have received (see Riach and Rich, 2002). Our simple design, and transaction of interest, completely avoids such preliminary decisions and incomplete market outcomes.

Third, the script used by our testers is one of the shortest and simplest employed in audit studies on market behaviour, with a duration of less than 10 seconds. This design feature helps to minimise some of the common criticisms of audit testing. One of the main issues relates to effective matching (Heckman and Siegelman, 1993; Heckman, 1998): where in the eyes of the bus drivers, pairs of approaching customers should be identical except for their skin colour. Heckman (1998) formalises this concern by focusing on the ‘unobservables’. That is, those tester characteristics that are not observed by the researcher, but are somewhat visible and useful to the decision maker. He argues that audit studies are ‘quite fragile’ to such unobservable factors that potentially motivate final decisions and outcomes, as opposed to testers’ race. While Heckman mostly focuses on hiring decisions, he never provides any obvious real-life examples of such unobservable group-level traits or behaviours, other than a formal mathematical derivation, and thus leaves us to speculate about what these residual measures might be.19

Our hired testers were trained to carefully follow and communicate the uniform script using their normal-sounding voice; maintain an even demeanour; and not to argue over an unkind response or any other remarks made by the bus driver. To help ensure consistency and homogeneity across testers, we also employed undercover research assistants to observe a random selection of interactions. The short and simple script largely eliminated the risk of testers making errors or deviating from the intended interaction. We undertook two additional steps to empirically isolate tester race from other observable and unobservable decision factors: (i) by measuring and controlling for a host of different tester, bus driver and field variables including previously unmeasured tester appearances such as trustworthiness; and (ii) by utilising the panel nature of the collected data to account for individual-tester effects, which cannot be observed but remain constant within each tester’s set of interactions.

Providing a short bus ride is also a discrete transaction that requires substantially less knowledge about the customer than when screening potential job candidates, rental tenants or credit borrowers. To this end, our testers did not require as much training as in these other contexts, where hired testers are usually asked to recall and recite extensive biographies and credentials about oneself (e.g., Neumark et al., 1996; Pager, 2003; Currie et al., 2011). This makes the task of matching testers a lot easier, and the overall issue less significant.

While our experimental design made it difficult for the behaviour of testers to diverge, some residual differences surely remained. One such deviation may have been the speed at which our testers entered the bus, or the number of times they blinked their eyes during each encounter. In any case, we need to first consider whether such factors are correlated with skin colour. And, if so, we must then ask: can such tester idiosyncrasies explain the amount of racial bias uncovered? From the results in the next section, we are confident in rejecting any such claims.

Fourth, another common criticism of audit studies is that the ‘auditors know the purpose of the study’, and that ‘auditors are not in fact seeking jobs (or trying to buy a car for themselves) and are therefore more free to let their beliefs affect the bargaining or interview process’. (Bertrand and Duflo, 2017, p. 318). Since our experimental design and studied transaction does not involve any prolonged discussions among market participants, nor any bargaining or interview process (e.g., Ayres and Siegelman, 1995; Neumark et al., 1996; Pager, 2003; List, 2004; Gneezy et al., 2012; Castillo et al., 2013), the latter concerns are largely eliminated.

Moreover, in all carefully designed audit studies, including those cited above, the hired testers are never purposely told the research aims, and are trained to follow a uniform script from which they are instructed not to deviate. Our testers were informed in private meetings, during which they could only observe one or two others, that they would be participating in an experimental study conducted by academic researchers, and were not made aware of the specific research question of interest. However, we can never truly know whether or not the testers managed to guess our intentions, or if they had deviated from the set script.20 These are some of the shortcomings and experimenter errors which we need to balance against the benefits of in-person audit tests, including the possibility to systematically monitor and participate in actual market transactions.21

Finally, our audit study allows us to directly observe and measure the racial appearance of subjects transacting on the other side of the market. This element is especially important for identifying the presence of own-race preferences, and consequently the psychological motives behind any found bias, such as racial animus. Most written correspondence studies are unable to observe the decision maker’s race or skin colour. Such studies are only able to loosely approximate this variable by, for example, using the share of the black population employed or living in the region (e.g., Bertrand and Mullainathan, 2004). The latter indirect approach then potentially induces significant measurement error in a key explanatory variable.

2. Results

2.1. Descriptive Overview

Of the 1,552 tester requests for a free ride, 64% were granted by the studied decision makers, indicating that public bus drivers have a strong willingness to accommodate customers.22

Figure 1 illustrates overall acceptance rates by tester race, across all treatments. White testers were granted a free ride 77% of the time, while black testers received a free ride only 43% of the time. This 34 percentage point (or 80%) difference in acceptance rates is statistically significant at the 1% level, implying a strong bias against black customers. Indian testers were also accepted at a significantly lower rate of 57% relative to whites (p < 0.01, two-sided test). There is however no evidence of any acceptance gap between white and Asian testers (0.77 vs. 0.74, p = 0.24).

Fig. 1.

Acceptance Rate by Tester Race.

Notes: This figure shows acceptance rates by tester race, across all treatments. The error bars represent 95% confidence intervals.

The middle panel of Table 2 shows that black drivers granted on average a similar portion of free rides as Asian and white bus drivers, by accepting testers in 72% of the interactions compared to 64% by Asian and white drivers. These pairwise differences are not statistically different from zero at conventional levels. On the other hand, bus drivers of Indian appearance were the least likely to grant a free ride, accepting testers in 54% of interactions.

Table 2.

Acceptance Rate by Group.

Acceptance rateTest of difference
Overall (N = 1,552)0.64 (0.48)
Tester
White (n = 366)0.77 (0.42)
Asian (n = 485)0.74 (0.44)0.24
Indian (n = 385)0.57 (0.50)0.00
Black (n = 316)0.43 (0.50)0.00
Male (n = 992)0.67 (0.47)
Female (n = 560)0.59 (0.49)0.00
Young (n = 1,155)0.68 (0.47)
Mature (n = 397)0.53 (0.50)0.00
Attractive (n = 679)0.63 (0.48)
Unattractive (n = 873)0.65 (0.48)0.41
Aggressive (n = 647)0.57 (0.49)
Unaggressive (n = 905)0.69 (0.46)0.00
Trustworthy (n = 759)0.69 (0.46)
Untrustworthy (n = 793)0.60 (0.49)0.00
Subject (bus driver)
White (n = 1,227)0.64 (0.48)
Asian (n = 150)0.64 (0.48)0.96
Indian (n = 81)0.54 (0.50)0.07
Black (n = 94)0.72 (0.45)0.11
Male (n = 1,305)0.64 (0.48)
Female (n = 247)0.64 (0.48)0.94
Young (n = 634)0.66 (0.47)
Mature (n = 918)0.63 (0.48)0.13
Field variables
Non-busy period (n = 1,154)0.67 (0.47)
Busy period (n = 398)0.56 (0.50)0.00
Day (n = 1,379)0.63 (0.48)
Night (n = 173)0.73 (0.45)0.01
Sunny (n = 1,105)0.62 (0.49)
Cloudy (n = 313)0.61 (0.49)0.67
Raining (n = 134)0.87 (0.34)0.00
Acceptance rateTest of difference
Overall (N = 1,552)0.64 (0.48)
Tester
White (n = 366)0.77 (0.42)
Asian (n = 485)0.74 (0.44)0.24
Indian (n = 385)0.57 (0.50)0.00
Black (n = 316)0.43 (0.50)0.00
Male (n = 992)0.67 (0.47)
Female (n = 560)0.59 (0.49)0.00
Young (n = 1,155)0.68 (0.47)
Mature (n = 397)0.53 (0.50)0.00
Attractive (n = 679)0.63 (0.48)
Unattractive (n = 873)0.65 (0.48)0.41
Aggressive (n = 647)0.57 (0.49)
Unaggressive (n = 905)0.69 (0.46)0.00
Trustworthy (n = 759)0.69 (0.46)
Untrustworthy (n = 793)0.60 (0.49)0.00
Subject (bus driver)
White (n = 1,227)0.64 (0.48)
Asian (n = 150)0.64 (0.48)0.96
Indian (n = 81)0.54 (0.50)0.07
Black (n = 94)0.72 (0.45)0.11
Male (n = 1,305)0.64 (0.48)
Female (n = 247)0.64 (0.48)0.94
Young (n = 634)0.66 (0.47)
Mature (n = 918)0.63 (0.48)0.13
Field variables
Non-busy period (n = 1,154)0.67 (0.47)
Busy period (n = 398)0.56 (0.50)0.00
Day (n = 1,379)0.63 (0.48)
Night (n = 173)0.73 (0.45)0.01
Sunny (n = 1,105)0.62 (0.49)
Cloudy (n = 313)0.61 (0.49)0.67
Raining (n = 134)0.87 (0.34)0.00

Notes:Acceptance rate is the proportion of free rides received (granted) by testers (subjects). Tester is labelled as young if age < 25, mature if age ≥ 25; unattractive if attractiveness < 3.94; attractive if attractiveness ≥ 3.94; unaggressive if aggressiveness < 0.12, aggressive if aggressiveness ≥ 0.12; untrustworthy if trustworthiness < 0.65, trustworthy if trustworthiness ≥ 0.65, with the scores averaged over raters. Subject is labelled as young if perceived age < 45, mature if perceived age ≥ 45. The number of observations per group is given by n. Standard deviations are shown in parentheses (second column). Two-sided test of difference between sample proportions is based on HA: p1p2 ≠ 0, where p1 corresponds to the first listed subgroup (in cases when there are more than two subgroups). The resulting p-values are reported in the final column.

Table 2.

Acceptance Rate by Group.

Acceptance rateTest of difference
Overall (N = 1,552)0.64 (0.48)
Tester
White (n = 366)0.77 (0.42)
Asian (n = 485)0.74 (0.44)0.24
Indian (n = 385)0.57 (0.50)0.00
Black (n = 316)0.43 (0.50)0.00
Male (n = 992)0.67 (0.47)
Female (n = 560)0.59 (0.49)0.00
Young (n = 1,155)0.68 (0.47)
Mature (n = 397)0.53 (0.50)0.00
Attractive (n = 679)0.63 (0.48)
Unattractive (n = 873)0.65 (0.48)0.41
Aggressive (n = 647)0.57 (0.49)
Unaggressive (n = 905)0.69 (0.46)0.00
Trustworthy (n = 759)0.69 (0.46)
Untrustworthy (n = 793)0.60 (0.49)0.00
Subject (bus driver)
White (n = 1,227)0.64 (0.48)
Asian (n = 150)0.64 (0.48)0.96
Indian (n = 81)0.54 (0.50)0.07
Black (n = 94)0.72 (0.45)0.11
Male (n = 1,305)0.64 (0.48)
Female (n = 247)0.64 (0.48)0.94
Young (n = 634)0.66 (0.47)
Mature (n = 918)0.63 (0.48)0.13
Field variables
Non-busy period (n = 1,154)0.67 (0.47)
Busy period (n = 398)0.56 (0.50)0.00
Day (n = 1,379)0.63 (0.48)
Night (n = 173)0.73 (0.45)0.01
Sunny (n = 1,105)0.62 (0.49)
Cloudy (n = 313)0.61 (0.49)0.67
Raining (n = 134)0.87 (0.34)0.00
Acceptance rateTest of difference
Overall (N = 1,552)0.64 (0.48)
Tester
White (n = 366)0.77 (0.42)
Asian (n = 485)0.74 (0.44)0.24
Indian (n = 385)0.57 (0.50)0.00
Black (n = 316)0.43 (0.50)0.00
Male (n = 992)0.67 (0.47)
Female (n = 560)0.59 (0.49)0.00
Young (n = 1,155)0.68 (0.47)
Mature (n = 397)0.53 (0.50)0.00
Attractive (n = 679)0.63 (0.48)
Unattractive (n = 873)0.65 (0.48)0.41
Aggressive (n = 647)0.57 (0.49)
Unaggressive (n = 905)0.69 (0.46)0.00
Trustworthy (n = 759)0.69 (0.46)
Untrustworthy (n = 793)0.60 (0.49)0.00
Subject (bus driver)
White (n = 1,227)0.64 (0.48)
Asian (n = 150)0.64 (0.48)0.96
Indian (n = 81)0.54 (0.50)0.07
Black (n = 94)0.72 (0.45)0.11
Male (n = 1,305)0.64 (0.48)
Female (n = 247)0.64 (0.48)0.94
Young (n = 634)0.66 (0.47)
Mature (n = 918)0.63 (0.48)0.13
Field variables
Non-busy period (n = 1,154)0.67 (0.47)
Busy period (n = 398)0.56 (0.50)0.00
Day (n = 1,379)0.63 (0.48)
Night (n = 173)0.73 (0.45)0.01
Sunny (n = 1,105)0.62 (0.49)
Cloudy (n = 313)0.61 (0.49)0.67
Raining (n = 134)0.87 (0.34)0.00

Notes:Acceptance rate is the proportion of free rides received (granted) by testers (subjects). Tester is labelled as young if age < 25, mature if age ≥ 25; unattractive if attractiveness < 3.94; attractive if attractiveness ≥ 3.94; unaggressive if aggressiveness < 0.12, aggressive if aggressiveness ≥ 0.12; untrustworthy if trustworthiness < 0.65, trustworthy if trustworthiness ≥ 0.65, with the scores averaged over raters. Subject is labelled as young if perceived age < 45, mature if perceived age ≥ 45. The number of observations per group is given by n. Standard deviations are shown in parentheses (second column). Two-sided test of difference between sample proportions is based on HA: p1p2 ≠ 0, where p1 corresponds to the first listed subgroup (in cases when there are more than two subgroups). The resulting p-values are reported in the final column.

Given the dyadic nature of our data, Table 3 presents average levels of acceptance by subject–tester match. The diagonal entries capture interactions between bus drivers and testers of the same race. Two main patterns emerge. First, across most subject groups, there is a clear bias against black customers relative to white customers. However, black testers were treated significantly better by black drivers than by any other subject group. For example, the black–white gap in acceptance rates is equal to 0.38 (p < 0.01) for decisions made by white drivers; 0.40 (p < 0.01) for decisions made by Asian drivers; 0.32 (p = 0.02) for decisions made by Indian drivers; and 0.16 (p = 0.22) for decisions made by black drivers, with the latter observed difference being statistically insignificant. Second, there is no strong evidence of any own-group bias, with each subject group always accepting another racial group as much as their own. White bus drivers accepted white and Asian testers at similarly high rates (76% vs. 72%, p > 0.30). The same was also true for Asian bus drivers (93% vs. 86%, p > 0.30). And, both Indian and black bus drivers treated all other tester races similar to their own (p > 0.10 for each two-sided pairwise test). Overall, the above findings are inconsistent with most models of racial animus. Moreover, the empirical patterns suggest the observed discrimination to be against black customers rather than in favour of white customers, where for the latter to be apparent a clear own-group bias by white bus drivers (towards white testers) should at least arise.23

Table 3.

Acceptance Rate by Racial Match.

Subject (bus driver)
WhiteAsianIndianBlack
TesterWhite0.76 (0.43)0.93 (0.26)0.68 (0.48)0.83 (0.38)
n = 301n = 28n = 19n = 18
Asian0.73 (0.45)0.86 (0.35)0.73 (0.46)0.74 (0.45)
n = 407n = 36n = 15n = 27
Indian0.59 (0.49)0.39 (0.49)0.67 (0.50)0.67 (0.49)
n = 320n = 44n = 9n = 12
Black0.38 (0.49)0.52 (0.51)0.37 (0.49)0.68 (0.48)
n = 199n = 42n = 38n = 37
Subject (bus driver)
WhiteAsianIndianBlack
TesterWhite0.76 (0.43)0.93 (0.26)0.68 (0.48)0.83 (0.38)
n = 301n = 28n = 19n = 18
Asian0.73 (0.45)0.86 (0.35)0.73 (0.46)0.74 (0.45)
n = 407n = 36n = 15n = 27
Indian0.59 (0.49)0.39 (0.49)0.67 (0.50)0.67 (0.49)
n = 320n = 44n = 9n = 12
Black0.38 (0.49)0.52 (0.51)0.37 (0.49)0.68 (0.48)
n = 199n = 42n = 38n = 37

Notes: Each entry represents the average acceptance rate conditional on tester and subject race. Standard deviations are shown in parentheses. The corresponding number of observations is given by n for each subject–tester pairing. Bold entries (on the main diagonal) capture results for testers and subjects of the same race. Total number of observations is 1,552.

Table 3.

Acceptance Rate by Racial Match.

Subject (bus driver)
WhiteAsianIndianBlack
TesterWhite0.76 (0.43)0.93 (0.26)0.68 (0.48)0.83 (0.38)
n = 301n = 28n = 19n = 18
Asian0.73 (0.45)0.86 (0.35)0.73 (0.46)0.74 (0.45)
n = 407n = 36n = 15n = 27
Indian0.59 (0.49)0.39 (0.49)0.67 (0.50)0.67 (0.49)
n = 320n = 44n = 9n = 12
Black0.38 (0.49)0.52 (0.51)0.37 (0.49)0.68 (0.48)
n = 199n = 42n = 38n = 37
Subject (bus driver)
WhiteAsianIndianBlack
TesterWhite0.76 (0.43)0.93 (0.26)0.68 (0.48)0.83 (0.38)
n = 301n = 28n = 19n = 18
Asian0.73 (0.45)0.86 (0.35)0.73 (0.46)0.74 (0.45)
n = 407n = 36n = 15n = 27
Indian0.59 (0.49)0.39 (0.49)0.67 (0.50)0.67 (0.49)
n = 320n = 44n = 9n = 12
Black0.38 (0.49)0.52 (0.51)0.37 (0.49)0.68 (0.48)
n = 199n = 42n = 38n = 37

Notes: Each entry represents the average acceptance rate conditional on tester and subject race. Standard deviations are shown in parentheses. The corresponding number of observations is given by n for each subject–tester pairing. Bold entries (on the main diagonal) capture results for testers and subjects of the same race. Total number of observations is 1,552.

Table 2 also summarises acceptance rates conditional on participant characteristics other than race. There is some evidence of a gender bias, with male testers being accepted 67% of the time compared to 59% for female testers (p < 0.01).24 Younger testers were also favoured over older ones (0.68 vs. 0.53, p < 0.01). This result should be interpreted with some caution given the limited age range of our tester group (19–32 years).25 As expected, testers who were rated as being more aggressive received fewer acceptances on average than less aggressive ones (0.57 vs. 0.69, p < 0.01). Moreover, trustworthy testers were rewarded more than untrustworthy testers (0.69 vs. 0.60, p < 0.01). Surprisingly, tester beauty did not play any significant role in subject decisions.

Based on the raw comparisons, bus drivers were more likely to let customers ride free when there were fewer people on the bus to observe the transaction (p < 0.01). Acceptance rates were also higher on rainy occasions (87%) than on sunny occasions (62%). And, free rides were more common, by about 10 percentage points, during night-time interactions. The latter two findings are consistent with an altruistic model of behaviour: when it rains and when it is dark, the value of the favour to the customer is relatively higher.

2.2. Regression Results

Table 4 presents marginal effect estimates from a linear probability model, across all experimental treatments. These regression equations take into account the panel nature of our data, with each tester taking part in multiple interactions, as well as other information available about each encounter. We first estimate the simple equation:
$$\begin{eqnarray} {\rm I}\lbrace \text{accept}\rbrace_{ {ij}} &=& \alpha _{B} \cdot {\rm I}\lbrace \text{black tester}\rbrace_{j} + \alpha _{{I}} \cdot {\rm I}\lbrace \text{Indian tester}\rbrace_{j} + \alpha _{A} \cdot {\rm I}\lbrace \text{Asian tester}\rbrace_{j} \\ && + \, \beta _{B} \cdot {\rm I}\lbrace \text{black driver}\rbrace_{i} + \beta _{{I}} \cdot {\rm I}\lbrace \text{Indian driver}\rbrace_{i} + \beta _{A} \cdot {\rm I}\lbrace \text{Asian driver}\rbrace_{i} \\ && + \, \gamma \cdot {\rm I}\lbrace \text{tester } \& \text{ driver of same race}\rbrace_{ {ij}} + \text{controls}_{ {ij}}\lambda + \mu _{j} + \epsilon _{ {ij}} , \end{eqnarray}$$
where the dependent variable is a binary indicator for whether driver i granted a free ride to tester j. The coefficients αA, αI, and αB identify the difference in the probability of receiving a free ride between Asian and white testers, Indian and white testers, and black and white testers, respectively. A negative and significant coefficient estimate would imply preferential treatment of white customers, while a coefficient equal to zero would rule out the presence of any racial bias. Similarly, the set of β coefficients quantify how much more or less likely Asian, Indian and black drivers are than white drivers to grant a free ride. Evidence of own-group bias is captured by the γ coefficient being positive and statistically significant. Other participant and field variables such as age, gender, beauty, aggressiveness and trustworthiness are also included in the model to isolate the effect of tester race on subject decisions. We allow for tester random effects via the term μj that may capture elements idiosyncratic to the tester not already controlled for by the observable characteristics, such as an appearance of health or a particular facial expression. More complete regression equations, which additionally account for our treatment variations, are also estimated later on.
Table 4.

Effect of Race on the Probability of Acceptance.

Dependent variable:LPM marginal effects
Accepted (yes/no)(1)(2)(3)(4)
Tester characteristics
Asian tester0.0730.0170.0130.035
(0.106)(0.102)(0.100)(0.098)
Indian tester−0.135−0.142−0.148−0.085
(0.109)(0.115)(0.119)(0.124)
Black tester−0.335**−0.465***−0.471***−0.438***
(0.131)(0.120)(0.120)(0.122)
Age0.016*0.015*0.015*
(0.008)(0.008)(0.008)
Male−0.010−0.020−0.009
(0.084)(0.088)(0.084)
Attractiveness−0.038−0.045−0.055
(0.053)(0.055)(0.053)
Aggressiveness0.4460.4440.501
(0.499)(0.489)(0.463)
Trustworthiness0.497*0.492*0.566**
(0.296)(0.298)(0.283)
Subject characteristics
Asian driver0.0050.004
(0.053)(0.057)
Indian driver−0.047−0.039
(0.046)(0.046)
Black driver0.135**0.119**
(0.055)(0.052)
Age−0.063***−0.060***
(0.021)(0.020)
Male0.0160.004
(0.037)(0.037)
Same race0.060
(0.053)
Same gender0.015
(0.035)
Field variables
Busy period−0.055
(0.039)
Daytime−0.064
(0.039)
Worsening weather0.052**
(0.022)
Constant0.682***0.1560.2210.180
(0.092)(0.431)(0.444)(0.456)
Tester random effects
Observations1,5521,5521,5521,552
Overall R20.070.100.110.13
Dependent variable:LPM marginal effects
Accepted (yes/no)(1)(2)(3)(4)
Tester characteristics
Asian tester0.0730.0170.0130.035
(0.106)(0.102)(0.100)(0.098)
Indian tester−0.135−0.142−0.148−0.085
(0.109)(0.115)(0.119)(0.124)
Black tester−0.335**−0.465***−0.471***−0.438***
(0.131)(0.120)(0.120)(0.122)
Age0.016*0.015*0.015*
(0.008)(0.008)(0.008)
Male−0.010−0.020−0.009
(0.084)(0.088)(0.084)
Attractiveness−0.038−0.045−0.055
(0.053)(0.055)(0.053)
Aggressiveness0.4460.4440.501
(0.499)(0.489)(0.463)
Trustworthiness0.497*0.492*0.566**
(0.296)(0.298)(0.283)
Subject characteristics
Asian driver0.0050.004
(0.053)(0.057)
Indian driver−0.047−0.039
(0.046)(0.046)
Black driver0.135**0.119**
(0.055)(0.052)
Age−0.063***−0.060***
(0.021)(0.020)
Male0.0160.004
(0.037)(0.037)
Same race0.060
(0.053)
Same gender0.015
(0.035)
Field variables
Busy period−0.055
(0.039)
Daytime−0.064
(0.039)
Worsening weather0.052**
(0.022)
Constant0.682***0.1560.2210.180
(0.092)(0.431)(0.444)(0.456)
Tester random effects
Observations1,5521,5521,5521,552
Overall R20.070.100.110.13

Notes: Linear probability model. Robust standard errors in parentheses, clustered at the tester level. The dependent variable in all regressions is accepted, a binary indicator that takes on a value of 1 if the bus driver granted the tester a free ride, and 0 otherwise. Same race and same gender are indicator variables for subject–tester pairings that are of the same race and same gender, respectively. *, ** and *** indicate statistical significance at the |$10\%$|⁠, |$5\%$| and |$1\%$| levels, respectively.

Table 4.

Effect of Race on the Probability of Acceptance.

Dependent variable:LPM marginal effects
Accepted (yes/no)(1)(2)(3)(4)
Tester characteristics
Asian tester0.0730.0170.0130.035
(0.106)(0.102)(0.100)(0.098)
Indian tester−0.135−0.142−0.148−0.085
(0.109)(0.115)(0.119)(0.124)
Black tester−0.335**−0.465***−0.471***−0.438***
(0.131)(0.120)(0.120)(0.122)
Age0.016*0.015*0.015*
(0.008)(0.008)(0.008)
Male−0.010−0.020−0.009
(0.084)(0.088)(0.084)
Attractiveness−0.038−0.045−0.055
(0.053)(0.055)(0.053)
Aggressiveness0.4460.4440.501
(0.499)(0.489)(0.463)
Trustworthiness0.497*0.492*0.566**
(0.296)(0.298)(0.283)
Subject characteristics
Asian driver0.0050.004
(0.053)(0.057)
Indian driver−0.047−0.039
(0.046)(0.046)
Black driver0.135**0.119**
(0.055)(0.052)
Age−0.063***−0.060***
(0.021)(0.020)
Male0.0160.004
(0.037)(0.037)
Same race0.060
(0.053)
Same gender0.015
(0.035)
Field variables
Busy period−0.055
(0.039)
Daytime−0.064
(0.039)
Worsening weather0.052**
(0.022)
Constant0.682***0.1560.2210.180
(0.092)(0.431)(0.444)(0.456)
Tester random effects
Observations1,5521,5521,5521,552
Overall R20.070.100.110.13
Dependent variable:LPM marginal effects
Accepted (yes/no)(1)(2)(3)(4)
Tester characteristics
Asian tester0.0730.0170.0130.035
(0.106)(0.102)(0.100)(0.098)
Indian tester−0.135−0.142−0.148−0.085
(0.109)(0.115)(0.119)(0.124)
Black tester−0.335**−0.465***−0.471***−0.438***
(0.131)(0.120)(0.120)(0.122)
Age0.016*0.015*0.015*
(0.008)(0.008)(0.008)
Male−0.010−0.020−0.009
(0.084)(0.088)(0.084)
Attractiveness−0.038−0.045−0.055
(0.053)(0.055)(0.053)
Aggressiveness0.4460.4440.501
(0.499)(0.489)(0.463)
Trustworthiness0.497*0.492*0.566**
(0.296)(0.298)(0.283)
Subject characteristics
Asian driver0.0050.004
(0.053)(0.057)
Indian driver−0.047−0.039
(0.046)(0.046)
Black driver0.135**0.119**
(0.055)(0.052)
Age−0.063***−0.060***
(0.021)(0.020)
Male0.0160.004
(0.037)(0.037)
Same race0.060
(0.053)
Same gender0.015
(0.035)
Field variables
Busy period−0.055
(0.039)
Daytime−0.064
(0.039)
Worsening weather0.052**
(0.022)
Constant0.682***0.1560.2210.180
(0.092)(0.431)(0.444)(0.456)
Tester random effects
Observations1,5521,5521,5521,552
Overall R20.070.100.110.13

Notes: Linear probability model. Robust standard errors in parentheses, clustered at the tester level. The dependent variable in all regressions is accepted, a binary indicator that takes on a value of 1 if the bus driver granted the tester a free ride, and 0 otherwise. Same race and same gender are indicator variables for subject–tester pairings that are of the same race and same gender, respectively. *, ** and *** indicate statistical significance at the |$10\%$|⁠, |$5\%$| and |$1\%$| levels, respectively.

The first column of Table 4 presents estimated coefficients from a linear probability model in which a set of binary indicators for tester race are the only explanatory variables. The average acceptance rate of whites is then captured by the constant term. In this simple specification, the estimated coefficient on black tester is negative and quite large, implying that black customers are about 34 percentage points less likely than white customers to be granted a free ride. This estimate is also consistent with the descriptive results above. In contrast to the raw statistics, there is no evidence of discrimination against Indian testers relative to white testers, with the estimated coefficient being highly insignificant (p > 0.20). The latter result also holds across the different model specifications in Table 4.

Next, we gradually increase the number of control variables to test the robustness of our main estimates. The estimated acceptance rate for black testers is 47 percentage points lower than for white testers (column 2), holding other factors constant. This point estimate is statistically significant at the 1% level. Importantly, the estimated marginal effects of tester race remain largely unchanged as we include more subject and field controls (columns 3–4).26

Black bus drivers are estimated to be 14 percentage points more willing than white drivers to accommodate customers. Moreover, there is no evidence of own-race preferences, with the coefficient on the same-race indicator being small and statistically insignificant (columns 3–4). A similar, and statistically insignificant, coefficient for same-race transactions is estimated if we use tester fixed effects instead of tester random effects (0.054, p > 0.30), and also if we restrict the analysed sample to black and white participants only (0.065, p > 0.12). These regression results are consistent with the cross-race summaries from Table 3.

Also in Table 4, both the age and gender of customers seem relatively unimportant in predicting the allocation of free rides (columns 2–4), with older testers having a 1.5 percentage point higher probability of being accepted. Similarly, customer attractiveness and aggressiveness are statistically insignificant factors in bus driver decisions. On the other hand, customer trustworthiness has a notable effect on the probability of being granted a free ride, with completely trustworthy customers being as much as 57 percentage points more likely to be accommodated than completely untrustworthy ones (p < 0.05, column 4). However, we note that the variation in this measured trait is rather small between testers (see Table A1).

2.3. Average Treatment Effects

Table 5 reports average acceptance rates by experimental treatment. Both the high-income and patriotism treatment variations cause bus drivers to voluntarily accommodate more customer requests. Testers in the high-income treatment experienced a notable increase in acceptances relative to the baseline treatment: 81% vs. 60% (p < 0.01). Similarly, signalling a higher degree of patriotism resulted in testers being offered free rides in about 90% of the recorded interactions.

Table 5.

Acceptance Rate by Experimental Treatment.

Acceptance rateTest of difference
Treatment
Baseline (n = 1,281)0.60 (0.49)
High income (n = 160)0.81 (0.40)0.00
Patriotism (n = 111)0.89 (0.31)0.00
Treatment | Tester race
WhiteAsianIndianBlack
Baseline0.72 (0.45)0.73 (0.44)0.51 (0.50)0.36 (0.48)
n = 276n = 429n = 320n = 256
High income0.93 (0.25)0.69 (0.47)0.83 (0.38)0.67 (0.48)
n = 60n = 35n = 35n = 30
Patriotism0.97 (0.18)0.90 (0.30)0.93 (0.25)0.77 (0.43)
n = 30n = 21n = 30n = 30
Non-busy period0.77 (0.42)0.77 (0.42)0.63 (0.49)0.44 (0.50)
n = 281n = 387n = 248n = 238
Busy period0.79 (0.41)0.62 (0.49)0.47 (0.50)0.40 (0.49)
n = 85n = 98n = 137n = 78
Acceptance rateTest of difference
Treatment
Baseline (n = 1,281)0.60 (0.49)
High income (n = 160)0.81 (0.40)0.00
Patriotism (n = 111)0.89 (0.31)0.00
Treatment | Tester race
WhiteAsianIndianBlack
Baseline0.72 (0.45)0.73 (0.44)0.51 (0.50)0.36 (0.48)
n = 276n = 429n = 320n = 256
High income0.93 (0.25)0.69 (0.47)0.83 (0.38)0.67 (0.48)
n = 60n = 35n = 35n = 30
Patriotism0.97 (0.18)0.90 (0.30)0.93 (0.25)0.77 (0.43)
n = 30n = 21n = 30n = 30
Non-busy period0.77 (0.42)0.77 (0.42)0.63 (0.49)0.44 (0.50)
n = 281n = 387n = 248n = 238
Busy period0.79 (0.41)0.62 (0.49)0.47 (0.50)0.40 (0.49)
n = 85n = 98n = 137n = 78

Notes: Each entry represents the average acceptance rate. Standard deviations are shown in parentheses. The corresponding number of observations is given by n. The middle panel shows the acceptance rates for each treatment conditional on tester race. High income is a dummy variable for testers wearing a business suit. Patriotism is a dummy variable for testers wearing an army uniform. In the baseline treatment testers wore casual clothing. In the bottom panel, busy period is a dummy for the number of other customers on the bus: 1 if more than half-occupied bus, 0 otherwise. Acceptance rates during busy and non-busy periods are reported in Table 2. p-values from two-sided tests of difference between sample proportions are reported in the top panel.

Table 5.

Acceptance Rate by Experimental Treatment.

Acceptance rateTest of difference
Treatment
Baseline (n = 1,281)0.60 (0.49)
High income (n = 160)0.81 (0.40)0.00
Patriotism (n = 111)0.89 (0.31)0.00
Treatment | Tester race
WhiteAsianIndianBlack
Baseline0.72 (0.45)0.73 (0.44)0.51 (0.50)0.36 (0.48)
n = 276n = 429n = 320n = 256
High income0.93 (0.25)0.69 (0.47)0.83 (0.38)0.67 (0.48)
n = 60n = 35n = 35n = 30
Patriotism0.97 (0.18)0.90 (0.30)0.93 (0.25)0.77 (0.43)
n = 30n = 21n = 30n = 30
Non-busy period0.77 (0.42)0.77 (0.42)0.63 (0.49)0.44 (0.50)
n = 281n = 387n = 248n = 238
Busy period0.79 (0.41)0.62 (0.49)0.47 (0.50)0.40 (0.49)
n = 85n = 98n = 137n = 78
Acceptance rateTest of difference
Treatment
Baseline (n = 1,281)0.60 (0.49)
High income (n = 160)0.81 (0.40)0.00
Patriotism (n = 111)0.89 (0.31)0.00
Treatment | Tester race
WhiteAsianIndianBlack
Baseline0.72 (0.45)0.73 (0.44)0.51 (0.50)0.36 (0.48)
n = 276n = 429n = 320n = 256
High income0.93 (0.25)0.69 (0.47)0.83 (0.38)0.67 (0.48)
n = 60n = 35n = 35n = 30
Patriotism0.97 (0.18)0.90 (0.30)0.93 (0.25)0.77 (0.43)
n = 30n = 21n = 30n = 30
Non-busy period0.77 (0.42)0.77 (0.42)0.63 (0.49)0.44 (0.50)
n = 281n = 387n = 248n = 238
Busy period0.79 (0.41)0.62 (0.49)0.47 (0.50)0.40 (0.49)
n = 85n = 98n = 137n = 78

Notes: Each entry represents the average acceptance rate. Standard deviations are shown in parentheses. The corresponding number of observations is given by n. The middle panel shows the acceptance rates for each treatment conditional on tester race. High income is a dummy variable for testers wearing a business suit. Patriotism is a dummy variable for testers wearing an army uniform. In the baseline treatment testers wore casual clothing. In the bottom panel, busy period is a dummy for the number of other customers on the bus: 1 if more than half-occupied bus, 0 otherwise. Acceptance rates during busy and non-busy periods are reported in Table 2. p-values from two-sided tests of difference between sample proportions are reported in the top panel.

In the baseline treatment, during which testers simply wore plain casual clothing, white testers were accommodated at exactly twice the rate as black testers (72% vs. 36%, p < 0.01). Indian testers received more initial free rides than black testers, but still at a much lower rate than whites (51% vs. 72%, p < 0.01). On the other hand, Asian testers were treated similarly to white testers (p > 0.60), with an acceptance rate of 73%.

Increased signals of socio-economic status resulted in black and Indian testers being most rewarded by bus drivers, with respective increases in acceptances of 86% (from 0.36 to 0.67, p < 0.01) and 63% (from 0.51 to 0.83, p < 0.01) in the high-income treatment. Perceived wealthy white customers experienced a positive increase in acceptances of 29% (from 0.72 to 0.93, p < 0.01). On the other hand, high-status Asian customers were not treated any better than in their baseline interactions (0.73 vs. 0.69, p > 0.50).

The patriotism treatment led to further accommodations by bus drivers. Black testers reported a 114% increase in free rides relative to the baseline treatment (from 0.36 to 0.77, p < 0.01). Similarly, Indian testers experienced a positive increase of 82% in acceptances (from 0.51 to 0.93, p < 0.01). As expected, customers of white and Asian appearance reported smaller increases of 35% and 23%, respectively, due to their already high levels of acceptances in the baseline treatment. In the patriotism treatment, white testers were offered a free ride in almost every interaction (97% of the time).

Figure 2 illustrates changes in the racial gap between white and non-white testers by treatment. The black–white acceptance gap narrows from 36 percentage points to 26 percentage points in the high-income treatment, and further down to 20 percentage points in the patriotism treatment (p = 0.02). On the other hand, the initial disparity between white and Indian testers is completely eliminated following the patriotism treatment. The acceptance gap gradually diminishes from 21 percentage points (baseline) to 10 percentage points (high income) and finally down to 3 percentage points (patriotism treatment), with the latter difference being statistically insignificant. Interestingly, high-status Asian testers were accepted at a lower rate than low-status ones; with outcome gaps relative to whites of 24 and −1 percentage points, respectively. Nonetheless, this disparity also disappears in the patriotism treatment (0.06, p > 0.35).27

Fig. 2.

Acceptance Gap by Tester Race and Treatment.

Notes: The acceptance gap in each treatment is calculated as the difference in acceptance rates between white testers and minority testers.

Table 6 corroborates the above findings and shows the found treatment differences to be robust to the inclusion of other demographic and field variables.28 Both the high-income and patriotism treatments increase the estimated probability of a free ride, with corresponding marginal effects equal to 0.14 and 0.24 (column 1). High-status customers were on average 14 percentage points more likely than lower-status customers to receive a free ride, independent of their skin colour. Moreover, patriotic appearances increased acceptance rates by about 24 percentage points relative to baseline appearances (p < 0.01). These estimated average treatment effects demonstrate positive adjustments in subject perceptions, implying that our outcome variable measures actual generosity rather than some other motive or random noise.

Table 6.

Effect of Race and Treatment on the Probability of Acceptance.

(1)(2)(3)(4)(5)
Asian tester−0.0030.0030.0070.0430.047
(0.085)(0.092)(0.089)(0.097)(0.101)
Indian tester−0.138−0.155−0.150−0.061−0.083
(0.102)(0.109)(0.106)(0.106)(0.100)
Black tester−0.478***−0.477***−0.479***−0.443***−0.442***
(0.110)(0.116)(0.112)(0.128)(0.129)
High income0.139**−0.0190.001
(0.060)(0.016)(0.016)
High income × Asian tester0.156***0.135***
(0.030)(0.030)
High Income × Indian tester0.165***0.265***
(0.028)(0.029)
High income × black tester0.152***0.207***
(0.025)(0.028)
Patriotism0.241***0.087***0.118**
(0.055)(0.018)(0.025)
Patriotism × Asian tester0.061**0.063*
(0.028)(0.035)
Patriotism × Indian tester0.130***0.142***
(0.032)(0.041)
Patriotism × black tester0.171***0.210***
(0.014)(0.022)
Busy period−0.0520.0960.109
(0.040)(0.071)(0.069)
Busy period × Asian tester−0.181−0.184
(0.114)(0.114)
Busy period × Indian tester−0.233**−0.236**
(0.092)(0.098)
Busy period × black tester−0.141−0.187**
(0.092)(0.092)
Constant0.3540.2710.2450.1610.239
(0.473)(0.511)(0.463)(0.448)(0.479)
Other controls
Tester random effects
Observations1,5521,5521,5521,5521,552
Overall R20.160.130.140.140.17
(1)(2)(3)(4)(5)
Asian tester−0.0030.0030.0070.0430.047
(0.085)(0.092)(0.089)(0.097)(0.101)
Indian tester−0.138−0.155−0.150−0.061−0.083
(0.102)(0.109)(0.106)(0.106)(0.100)
Black tester−0.478***−0.477***−0.479***−0.443***−0.442***
(0.110)(0.116)(0.112)(0.128)(0.129)
High income0.139**−0.0190.001
(0.060)(0.016)(0.016)
High income × Asian tester0.156***0.135***
(0.030)(0.030)
High Income × Indian tester0.165***0.265***
(0.028)(0.029)
High income × black tester0.152***0.207***
(0.025)(0.028)
Patriotism0.241***0.087***0.118**
(0.055)(0.018)(0.025)
Patriotism × Asian tester0.061**0.063*
(0.028)(0.035)
Patriotism × Indian tester0.130***0.142***
(0.032)(0.041)
Patriotism × black tester0.171***0.210***
(0.014)(0.022)
Busy period−0.0520.0960.109
(0.040)(0.071)(0.069)
Busy period × Asian tester−0.181−0.184
(0.114)(0.114)
Busy period × Indian tester−0.233**−0.236**
(0.092)(0.098)
Busy period × black tester−0.141−0.187**
(0.092)(0.092)
Constant0.3540.2710.2450.1610.239
(0.473)(0.511)(0.463)(0.448)(0.479)
Other controls
Tester random effects
Observations1,5521,5521,5521,5521,552
Overall R20.160.130.140.140.17

Notes: Linear probability model. Robust standard errors in parentheses, clustered at the tester level. The dependent variable in all regressions is accepted, an indicator variable that takes on a value of 1 if the bus driver granted the tester a free ride, and 0 otherwise. High income is a dummy variable for testers wearing a business suit. Patriotism is a dummy variable for testers wearing an army uniform. In the baseline treatment testers were dressed in casual clothing. Busy period is a dummy for the number of other customers on the bus: 1 if more than half-occupied bus, 0 otherwise. Other controls include tester beauty, aggressiveness and trustworthiness; subject race, age and gender; time of day and weather conditions. *, ** and *** indicate statistical significance at the |$10\%$|⁠, |$5\%$| and |$1\%$| levels, respectively.

Table 6.

Effect of Race and Treatment on the Probability of Acceptance.

(1)(2)(3)(4)(5)
Asian tester−0.0030.0030.0070.0430.047
(0.085)(0.092)(0.089)(0.097)(0.101)
Indian tester−0.138−0.155−0.150−0.061−0.083
(0.102)(0.109)(0.106)(0.106)(0.100)
Black tester−0.478***−0.477***−0.479***−0.443***−0.442***
(0.110)(0.116)(0.112)(0.128)(0.129)
High income0.139**−0.0190.001
(0.060)(0.016)(0.016)
High income × Asian tester0.156***0.135***
(0.030)(0.030)
High Income × Indian tester0.165***0.265***
(0.028)(0.029)
High income × black tester0.152***0.207***
(0.025)(0.028)
Patriotism0.241***0.087***0.118**
(0.055)(0.018)(0.025)
Patriotism × Asian tester0.061**0.063*
(0.028)(0.035)
Patriotism × Indian tester0.130***0.142***
(0.032)(0.041)
Patriotism × black tester0.171***0.210***
(0.014)(0.022)
Busy period−0.0520.0960.109
(0.040)(0.071)(0.069)
Busy period × Asian tester−0.181−0.184
(0.114)(0.114)
Busy period × Indian tester−0.233**−0.236**
(0.092)(0.098)
Busy period × black tester−0.141−0.187**
(0.092)(0.092)
Constant0.3540.2710.2450.1610.239
(0.473)(0.511)(0.463)(0.448)(0.479)
Other controls
Tester random effects
Observations1,5521,5521,5521,5521,552
Overall R20.160.130.140.140.17
(1)(2)(3)(4)(5)
Asian tester−0.0030.0030.0070.0430.047
(0.085)(0.092)(0.089)(0.097)(0.101)
Indian tester−0.138−0.155−0.150−0.061−0.083
(0.102)(0.109)(0.106)(0.106)(0.100)
Black tester−0.478***−0.477***−0.479***−0.443***−0.442***
(0.110)(0.116)(0.112)(0.128)(0.129)
High income0.139**−0.0190.001
(0.060)(0.016)(0.016)
High income × Asian tester0.156***0.135***
(0.030)(0.030)
High Income × Indian tester0.165***0.265***
(0.028)(0.029)
High income × black tester0.152***0.207***
(0.025)(0.028)
Patriotism0.241***0.087***0.118**
(0.055)(0.018)(0.025)
Patriotism × Asian tester0.061**0.063*
(0.028)(0.035)
Patriotism × Indian tester0.130***0.142***
(0.032)(0.041)
Patriotism × black tester0.171***0.210***
(0.014)(0.022)
Busy period−0.0520.0960.109
(0.040)(0.071)(0.069)
Busy period × Asian tester−0.181−0.184
(0.114)(0.114)
Busy period × Indian tester−0.233**−0.236**
(0.092)(0.098)
Busy period × black tester−0.141−0.187**
(0.092)(0.092)
Constant0.3540.2710.2450.1610.239
(0.473)(0.511)(0.463)(0.448)(0.479)
Other controls
Tester random effects
Observations1,5521,5521,5521,5521,552
Overall R20.160.130.140.140.17

Notes: Linear probability model. Robust standard errors in parentheses, clustered at the tester level. The dependent variable in all regressions is accepted, an indicator variable that takes on a value of 1 if the bus driver granted the tester a free ride, and 0 otherwise. High income is a dummy variable for testers wearing a business suit. Patriotism is a dummy variable for testers wearing an army uniform. In the baseline treatment testers were dressed in casual clothing. Busy period is a dummy for the number of other customers on the bus: 1 if more than half-occupied bus, 0 otherwise. Other controls include tester beauty, aggressiveness and trustworthiness; subject race, age and gender; time of day and weather conditions. *, ** and *** indicate statistical significance at the |$10\%$|⁠, |$5\%$| and |$1\%$| levels, respectively.

The first column of Table 6 shows black testers to be 48 percentage points less likely than white testers to receive a free ride. This point estimate is highly significant and robust across various specifications. The positive and statistically significant coefficient on high income×black tester implies that this initial racial gap narrows by 15 to 21 percentage points when black testers are dressed in business attire (columns 2 and 5). Similarly, the interaction estimates in columns (3) and (5) suggest that the above white privilege diminishes by 17 to 21 percentage points for blacks wearing an army uniform, holding other factors constant.

Despite testers of Indian appearance being treated similarly to white testers in the baseline treatment, members from this minority group also experience significant treatment effects. The estimated probability that an Indian customer is granted a free ride (relative to a casually dressed white) increases by about 17 percentage points in the high-income treatment, and by about 13 percentage points in the patriotism treatment (columns 2–3, Table 6). Similarly, Asian testers are treated notably better in the high-income treatment than in the baseline treatment (by 15 percentage points). Asian testers with improved signals of patriotism also experience a positive but smaller increase of 6 percentage points in acceptances.

The final column of Table 6 suggests that the baseline outcome gap between black and white testers widens by 19 percentage points during busy periods, when there are many other passengers on the bus. A similar result is found for Asian and Indian testers, who are treated worse by about 18 and 24 percentage points, respectively, relative to their baseline outcomes. While the estimated coefficients on these interaction terms are not all statistically significant at the 5% level, formal pairwise tests of equality between coefficients also cannot be rejected (p > 0.58 for each pairwise comparison). These findings have a number of possible interpretations. For instance, bus drivers might be responding to the anticipated reactions of the other passengers: more bystanders on the bus implies a greater risk of complaint about giving away free rides. At the same time, the likelihood that a customer complains may in turn depend on the recipient’s race. This argument would be in line with recent findings on consumer-side discrimination (e.g., Doleac and Stein, 2013; Ayres et al., 2015; Bar and Zussman, 2017). Nonetheless, our experimental design does not focus on this side of the market.

2.4. Complementary Survey

Lastly, we conducted a complementary survey of bus drivers at random resting stations across the city. The survey took place a few months after our field experiment and depicted a hypothetical version of the same favour-seeking scenario as studied in the field. Each surveyed bus driver was shown one randomly selected photograph of a test customer wearing casual clothing (i.e., the same photo as used for tester appearance ratings), and asked if they would be willing to let the individual onto the bus, again making it clear that their travel card had zero balance. Following this decision, bus drivers were then asked a set of questions that aimed to capture the reasons behind an acceptance or rejection (see Online Appendix C).

We were able to collect 108 responses before the survey was boycotted by the bus company.29 In terms of race or ethnicity, the distribution of surveyed bus drivers was similar to that observed in the field experiment: 71% of respondents were white; 12% were black; 9% were Asian; and 7% were Indian. The majority of surveyed drivers were also male (85%).

Overall, 69% of bus drivers stated that they would accommodate the presented customer with a free ride, a result that is slightly higher than the 60% acceptance rate observed in the baseline treatment of our field experiment (p = 0.05). Table 7 presents pairwise comparisons of group acceptance rates across the two methodologies (revealed vs. stated). Not surprisingly, the results show stated acceptances by bus drivers to strongly contradict their actual behaviour revealed in the field.

Table 7.

Revealed versus Stated Acceptance Rates.

RevealedStatedTest of difference
Overall0.600.690.05
Tester characteristics
White0.720.670.58
Asian0.730.620.11
Indian0.510.760.01
Black0.360.860.00
Male0.600.670.31
Female0.590.720.08
Young0.640.680.41
Mature0.490.750.03
Attractive0.580.700.11
Unattractive0.620.690.24
Aggressive0.560.730.02
Unaggressive0.630.670.59
Trustworthy0.650.740.18
Untrustworthy0.560.660.15
Bus driver characteristics
White0.600.710.05
Asian0.570.500.68
Indian0.500.630.50
Black0.740.770.81
Male0.600.720.03
Female0.590.560.82
Young0.630.660.63
Mature0.580.740.03
Same race0.720.620.26
Different race0.560.720.01
Same gender0.610.670.42
Different gender0.580.730.05
RevealedStatedTest of difference
Overall0.600.690.05
Tester characteristics
White0.720.670.58
Asian0.730.620.11
Indian0.510.760.01
Black0.360.860.00
Male0.600.670.31
Female0.590.720.08
Young0.640.680.41
Mature0.490.750.03
Attractive0.580.700.11
Unattractive0.620.690.24
Aggressive0.560.730.02
Unaggressive0.630.670.59
Trustworthy0.650.740.18
Untrustworthy0.560.660.15
Bus driver characteristics
White0.600.710.05
Asian0.570.500.68
Indian0.500.630.50
Black0.740.770.81
Male0.600.720.03
Female0.590.560.82
Young0.630.660.63
Mature0.580.740.03
Same race0.720.620.26
Different race0.560.720.01
Same gender0.610.670.42
Different gender0.580.730.05

Notes:Revealed acceptance rate is the proportion of free rides granted by bus drivers in the baseline treatment of the field experiment (see Table 5). Stated acceptance rate is the proportion of free rides granted by surveyed bus drivers in the hypothetical survey. Tester is labelled as young if age < 25; mature if age ≥ 25; unattractive if attractiveness < 3.94; attractive if attractiveness ≥ 3.94; unaggressive if aggressiveness < 0.12, aggressive if aggressiveness ≥ 0.12; untrustworthy if trustworthiness < 0.65, trustworthy if trustworthiness ≥ 0.65, with the scores averaged over raters. Bus driver (subject) is labelled as young if perceived age < 45; mature if perceived age ≥ 45. Same race and same gender are indicator variables for tester and subject pairings that are of the same race and same gender, respectively. Two-sided tests of differences between sample proportions are performed (across the two methodologies) for each listed tester and bus driver characteristic. The resulting p-values are reported in the final column. More precise p-values for some of the above tests of difference are as follows: overall (p = 0.0525); Indian tester (p = 0.0144); white bus driver (p = 0.0465); different race (p = 0.0062).

Table 7.

Revealed versus Stated Acceptance Rates.

RevealedStatedTest of difference
Overall0.600.690.05
Tester characteristics
White0.720.670.58
Asian0.730.620.11
Indian0.510.760.01
Black0.360.860.00
Male0.600.670.31
Female0.590.720.08
Young0.640.680.41
Mature0.490.750.03
Attractive0.580.700.11
Unattractive0.620.690.24
Aggressive0.560.730.02
Unaggressive0.630.670.59
Trustworthy0.650.740.18
Untrustworthy0.560.660.15
Bus driver characteristics
White0.600.710.05
Asian0.570.500.68
Indian0.500.630.50
Black0.740.770.81
Male0.600.720.03
Female0.590.560.82
Young0.630.660.63
Mature0.580.740.03
Same race0.720.620.26
Different race0.560.720.01
Same gender0.610.670.42
Different gender0.580.730.05
RevealedStatedTest of difference
Overall0.600.690.05
Tester characteristics
White0.720.670.58
Asian0.730.620.11
Indian0.510.760.01
Black0.360.860.00
Male0.600.670.31
Female0.590.720.08
Young0.640.680.41
Mature0.490.750.03
Attractive0.580.700.11
Unattractive0.620.690.24
Aggressive0.560.730.02
Unaggressive0.630.670.59
Trustworthy0.650.740.18
Untrustworthy0.560.660.15
Bus driver characteristics
White0.600.710.05
Asian0.570.500.68
Indian0.500.630.50
Black0.740.770.81
Male0.600.720.03
Female0.590.560.82
Young0.630.660.63
Mature0.580.740.03
Same race0.720.620.26
Different race0.560.720.01
Same gender0.610.670.42
Different gender0.580.730.05

Notes:Revealed acceptance rate is the proportion of free rides granted by bus drivers in the baseline treatment of the field experiment (see Table 5). Stated acceptance rate is the proportion of free rides granted by surveyed bus drivers in the hypothetical survey. Tester is labelled as young if age < 25; mature if age ≥ 25; unattractive if attractiveness < 3.94; attractive if attractiveness ≥ 3.94; unaggressive if aggressiveness < 0.12, aggressive if aggressiveness ≥ 0.12; untrustworthy if trustworthiness < 0.65, trustworthy if trustworthiness ≥ 0.65, with the scores averaged over raters. Bus driver (subject) is labelled as young if perceived age < 45; mature if perceived age ≥ 45. Same race and same gender are indicator variables for tester and subject pairings that are of the same race and same gender, respectively. Two-sided tests of differences between sample proportions are performed (across the two methodologies) for each listed tester and bus driver characteristic. The resulting p-values are reported in the final column. More precise p-values for some of the above tests of difference are as follows: overall (p = 0.0525); Indian tester (p = 0.0144); white bus driver (p = 0.0465); different race (p = 0.0062).

Figure 3 illustrates these differences by recipient race. There is no evidence of any preferential treatment of white customers, with black and Indian customers being offered the highest proportion of free rides: 86% and 76%, respectively. Moreover, black customers were hypothetically accepted at more than twice the rate than in the baseline treatment of our field experiment: 86% vs. 36%. Despite the low number of survey responses, this difference is statistically significant at the 1% level, and indicates untruthful reporting.

Fig. 3.

Revealed versus Stated Acceptances.

Notes: The dark-shaded bars show acceptance rates in the baseline treatment of the field experiment. The light-shaded bars show the stated acceptances in the hypothetical survey. The error bars represent 95% confidence intervals.

Consistent with subjects being race neutral in hypothetical situations, we find stated acceptances by bus drivers to be specifically aimed at customers from opposing racial groups. Respondents were willing to help hypothetical customers from other groups at a rate of 72%, compared to only 56% in actual situations (p < 0.01). White bus drivers are also found to be much more accommodating in the survey than in the field (0.71 vs. 0.60, p < 0.05).

In terms of the stated reasons for accepting or rejecting customers: 83% of surveyed bus drivers indicated customer honesty and the propensity to cause trouble as being at least ‘somewhat important’, while 73% of the respondents broadly judged the presented help-seeker on the basis of their worthiness. Around 65% of the surveyed bus drivers seemed to care about the impact of their decisions on other customers, assigning some level of importance to this statement. On the other hand, the ability to relate to the customer was pronounced not to have a major influence on bus driver choices, with 57% of respondents declaring this variable as strictly ‘unimportant’.

Overall, the analysed survey responses appear to be strongly driven by social desirability that masks discriminatory behaviour (e.g., LaPiere, 1934; Tourangeau et al., 2000; Norton et al., 2006). Bus drivers were perhaps not willing to express their true beliefs about the shown customers due to such responses directly revealing the type of person (defined by race or gender) that they would potentially favour or discriminate against.30 More broadly, such contrasting findings between revealed and stated behaviours explicitly illustrate the scientific value of using natural field experiments for studying sensitive societal issues such as race.31

2.5. Quantifying the Uncovered White Privilege

What is the economic value of the discretionary benefits and white privilege observed in the studied consumer transaction? What is the burden on the minority group due to their lack of privilege? We cannot answer these questions exactly, but here we propose a simple back-of-the-envelope calculation based on the economic transfer to the minority group (ETM):
$$\begin{eqnarray} \text{ETM} = \text{Number of favours}_{M} \times \text{Value per favour}. \end{eqnarray}$$
The number of favours granted to the minority group requires an estimate of how often a minority customer arrives without a bus ticket, multiplied by the probability of getting a free ride. The probability that a black customer receives a free ride is taken from our empirical results: 0.43 across all treatments. The number of black bus customers can be estimated by using the fact that there are 78.2 million bus rides per year, with 5% of the population classified as black, resulting in an estimated 3.9 million annual bus trips consumed by black customers.32 There are no official records on the percentage of bus customers that present themselves without any monetary balance, so the best we can do is to rely on the casual observations of the research assistants and ourselves, which is that about 3% of customers entering any public bus do so without a valid fare. There are then 117,300 black customers who could potentially be accommodated with a free ride each year (and, similarly, about 1.6 million white customers). Based on our audit test, around 50,439 black customers would get a free ride. If we take the average price of a bus ticket as the transferred value, which is 4.50 AUD, then black customers receive 227,000 AUD per year in discretionary gifts, while white customers receive an annual total of 5.7 million AUD.

We can now also calculate the additional (forgone) monetary value due to such privilege: black customers would get 179,000 AUD more in transfers if they were accommodated to the same extent as whites, while white majority customers would get 2.5 million AUD less per year if they were treated similarly to blacks. Thus, the value of white privilege is an annual transfer of 2.5 million AUD to the white community, and the relative cost from the lack of privilege is estimated to be worth 179,000 AUD per year to the black community.

These are fairly crude calculations as one could argue that a rejected black customer can also wait for the next bus and ‘try again’, in which case the economic cost is simply the expected amount of extra waiting time for a black customer relative to a white customer (which, if there is a bus every 15 minutes, is equal to 12 minutes: an additional 0.79 buses are needed to approach a black citizen in order for him or her to experience the same favourable treatment as a white citizen), multiplied by the expected value of time (≈30 AUD if we take median earnings), which equates to 6 AUD.33 This estimate is about 33% higher than the price of a bus ride. Notably, the above calculations do also neglect the psychological costs of discrimination which, as Siegelman (1998), Ayres (2001) and Small and Pager (2020) argue, can accumulate to be at least as large as the economic costs.34

3. Conclusions

Empirical evidence on disparate treatment in consumer transactions mostly comes from markets in which discrimination is illegal including employment, housing, credit and other public accommodations. Here, the question is whether the white privilege documented in these regulated environments holds more in general. Our results suggest that white privilege extends into marketplace favours, or private accommodations, that are often hidden and unregulated.

Our natural field experiment measures the willingness of public bus drivers to voluntarily grant randomly assigned customers a free ride. We present causal evidence that white customers are twice more likely than black customers to be extended such a favour, with the black–white acceptance gap being as wide as 100% (or 45 percentage points) in the baseline treatment. Indian customers also receive substantially fewer favours, but that effect becomes marginally insignificant after we control for other observables and tester random-effects. On the other hand, Asian customers are just as likely to be rewarded as whites.

We find two pieces of evidence consistent with our results being driven by statistical considerations on the part of the studied decision makers. First, we fail to find any evidence of own-race preferences. Both majority and minority drivers preferred white customers over blacks, on average. Such a result is inconsistent with traditional preference-based models, where large and statistically significant ingroup biases exist (e.g., Antonovics et al., 2005; Fong and Luttmer, 2009; Price and Wolfers, 2010; Shayo and Zussman, 2011; Fisman et al., 2017; West, 2018; Bar and Zussman, 2020). Second, the strong observed reactions to improved signals of socio-economic status and patriotism suggest that bus drivers use a customer’s skin colour to approximate other relevant characteristics which they cannot directly observe but nevertheless find important.

The notion that bus drivers are simply biased against blacks on the basis of perceived levels of criminality or aggressiveness is inconsistent with the fact that we find no gender effects in any of our formal results, where one would expect female customers to be treated differently from male customers if such factors were of concern. This story also fails to fit the large average treatment effect experienced by Indian testers when wearing an army uniform. Members of the latter minority ethnic group come from a sub-population in Australia that has much lower criminal conviction rates than majority whites, meaning that the army costume should not have mattered if aggression was a key driver.

Our results are more in line with statistical channels of perceived trust as noted by earlier studies such as Doleac and Stein (2013) and Zussman (2013). That is, bus drivers are unlikely to dismiss white citizens dressed in business attire as irregular customers who have purposefully allowed their travel balance to reach zero, but are more inclined to do so for lower-status minorities. To this end, bus drivers are likely using race to gauge the likelihood that a customer is telling the truth, with minority black customers being viewed as less honest than their white counterparts. The latter possibility is consistent with the strong increase in acceptances observed for minority black and Indian testers in the high-income treatment. It is also consistent with the large and statistically significant estimated effect of tester trustworthiness on the probability of being granted a free ride.

Despite the above evidence that the observed bias against black customers appears to be mostly driven by beliefs, we cannot completely rule out a role for tastes. In the eyes of the bus drivers, the fact that a black passenger wears a suit or military uniform does not necessarily convey only a signal about the customer’s trustworthiness. Instead, it may also make him or her a ‘different person’. Becker’s (1957) model of taste-based discrimination is mainly about aversion to cross-group contact. But from the bus driver’s perspective there may be different categories of blacks (see Fiske, 1998; Macrae and Bodenhausen, 2000), and aversion to contact with members of the different subgroups may not necessarily be the same.35

Our study also highlights an important methodological issue regarding the study of actions versus reported attitudes. Bus drivers who responded to a hypothetical survey on whether they would accommodate random customers shown on a photograph also made very different decisions than in the field, demonstrating pro-black preferences. This result suggests that hypothetical surveys and other self-reported measures, including those elicited in laboratory settings, can easily mask discrimination.

The uncovered evidence that white citizens are extended extra gifts in daily life has a number of important implications. First, a decision needs to be made whether the observed behaviour is a problem, which is a matter of social norms and the democratic process. If it is deemed a social issue, the question then becomes what to do about discretionary gifts in various real-world market contexts. One idea is to standardise and proscribe behaviour, such as in this case strictly prohibiting bus drivers from granting a free ride to any citizen regardless of their skin colour. A less heavy-handed option would be to target norms around behaviours that we expect of ourselves and others in the marketplace by making such acts visible in regular audits of relevant transactions. Ayres (2001) points out that discrimination is more likely in situations where consumers have imperfect information on how comparable others are treated, i.e., such as in the one-off accommodations studied above. Establishing a national audit office that would regularly monitor and publicise discrepancies between accepted norms in society and actual behaviour in markets could feed into public awareness and education. Parsons et al. (2011) show that public scrutiny and monitoring play a vital role in shaping when agents in positions of authority engage in racial bias.

Pressure from both minority and majority citizens may also help to limit the degree of white privilege: simply being aware about the presence of this bias could trigger members of the favoured group not to request or accept any such favours in future transactions. Pope et al. (2018) demonstrate that raising awareness of racial bias subsequently eliminates this bias by ultimate decision makers. The situations studied in paired audits might well be replicated in training sessions for different groups of workers and even in schools, with role-model playing used to demonstrate that biases exists and what the desired social behaviour should be.

Overall, the simple audit approach used in this study may be viewed as an effective tool for detecting and monitoring the distribution of discriminatory gifts not only in public services but also many other everyday consumer transactions including retail purchases, financial and legal services, education and healthcare provision. Such contexts demand further investigation.

Additional Supporting Information may be found in the online version of this article:

Online Appendix

Replication Package

Notes

The data and codes for this paper are available on the Journal website. They were checked for their ability to reproduce the results presented in the paper.

Many thanks to the Editor, Frederic Vermeulen, and three anonymous referees for very insightful comments that considerably improved this study. We also thank Ian Ayres, Loukas Balafoutas, Anna Bindler, Jerker Denrell, Gigi Foster, Ben Greiner, Jonathan Guryan, David Huffman, Olof Johansson-Stenman, Kenan Kalayci, Rudolf Kerschbamer, Fabian Kosse, Peter Kuhn, Gijs van de Kuilen, Andreas Leibbrandt, Nick Llewellyn, Michel Maréchal, Tigran Melkonyan, Wieland Müller, Anand Murugesan, Andreas Ortmann, Andrew Oswald, Nick Powdthavee, Michael K. Price, Bettina Rockenbach, Zvi Safra, Rupert Sausgruber, Simeon Schudy, Stefan Trautmann, Jean-Robert Tyran, Jeroen van de Ven, Randall Walsh and Valentin Zelenyuk for helpful comments and discussions. Finally, thanks to participants at the Advances with Field Experiments 2017 (University of Chicago), Workshop on the Economics of Discrimination 2015 (Naples), Austrian Experimental Economics Workshop 2017, and the ESA European Meeting 2017, as well as seminar audiences at the University of Gothenburg, Monash University, Tinbergen Institute, Warwick Business School, and WU Vienna for their valuable feedback. A previous version of this paper circulated under the title ‘Still Not Allowed on the Bus: It Matters If You're Black or White!’.

Footnotes

1

In-person audit studies have long been used to measure disparate treatment in a wide range of real-world market contexts including automobile sales (Ayres and Siegelman, 1995), housing (Daniel, 1968; Yinger, 1986), lending (Ross and Yinger, 2002), employment (Neumark et al., 1996; Pager, 2003), healthcare (Currie et al., 2011), sportscard trading (List, 2004) and other consumer transactions (Siegelman, 1998; Ayres, 2001; Gneezy et al., 2012; Balafoutas et al., 2013). See Riach and Rich (2002), Pager and Shepherd (2008), List and Rasul (2011), Guryan and Charles (2013) and Neumark (2018) for excellent surveys.

2

See the Department of Transport and Main Roads: Queensland Bus Driver Safety Review (2017). Based on fairly rough survey data, the most common type of antisocial behaviour reported by bus drivers is ‘verbal aggression’ (>50% of all cases), with ‘physical violence’ being by far the least reported offence (less than 2%). The key apparent triggers of such negative customer reactions are ‘fare conflict’ and ‘alcohol and drugs’.

3

To this end, the studied subjects could base their decisions upon the anticipated reaction of rejected customers. For example, if drivers believe that members of minority racial groups are more likely to respond negatively to a rejection, they may incorporate a ‘safety premium’ and hence accommodate customers which they perceive to be troublesome at much higher rates than they would prefer. The implication of such behaviour for our findings is simply that the observed racial bias against some minorities, namely blacks, is biased downwards and therefore should be even higher.

4

In the context of public services, most studies into racial bias use naturally occurring data, namely documented legal cases and police searches. For evidence of racial bias in judge and jury decisions, see Ayres and Waldfogel (1994), Anwar et al. (2012) and Alesina and La Ferrara (2014). Racial bias in police decisions about which type of motorists or citizens to stop and search is examined, for example, by Knowles et al. (2001), Anwar and Fang (2006), Persico and Todd (2006) and Vomfell and Stewart (2019). West (2018) uses unique data on automobile crash investigations to overcome some of the common endogeneity and selection issues in this literature.

5

The scanning machine has a text screen that displayed the error message ‘Insufficient funds’, which both the bus driver and tester could clearly observe. This is important for our design and overall interpretation of the results since it informs the bus driver that the travel card was not invalid or faulty, but rather that it had no money on it. If the card was not functioning properly, such as it would be the case if someone tried to scan another card containing a magnetic chip (e.g., a credit card), then the machine would signal the error message: ‘Invalid card. Seek assistance’. This was never the case in our field experiment.

6

From casual observation, based on many years of experience and hundreds of bus rides, we know that at least one out of every 30 to 40 boarding passengers present themselves in a similar situation where their travel card has no monetary balance.

7

Following the release of our draft working paper (Mujcic and Frijters, 2013), our general idea and experimental design has also been applied to the market for taxi rides in the UK (Grosskopf and Pearce, 2016).

8

The bus service and passenger turnover information is obtained from official annual reports on public transport services in Queensland, Australia.

9

In the past few years, a bus driver was caught not to have collected fares consistently over a period of 12 months. The punishment handed down by the bus company was merely a formal warning (see ‘Dozens of Brisbane City Council bus drivers dismissed since 2016’, Courier Mail, 13 Sept. 2019).

10

While bus drivers are not directly or openly monitored (in real time) by their employer regarding the act of giving away free rides, the bus company does have the ability to later inspect and calculate the number of free rides that a particular driver is giving away. This, somewhat tedious, task could be done via the ticketing system which records details about each transaction. Moreover, a security video camera records all behaviour on the bus.

11

See, for example, the Department of Transport and Main Roads: Queensland Bus Driver Safety Review (2017).

12

After interviewing each of the 29 testers, only two testers stated that they had observed the same driver twice by chance. These testers followed our instructions and did not board the approaching buses.

13

In order not to further burden our testers, and allow them to focus on their live interactions, we did not ask them to collect these details, in addition to the other various bus driver and field variables. However, we should have been a bit more careful in pre-testing and checking for such technical issues with the (empty) travel cards.

14

Similar rating techniques are used by Fisman et al. (2008) to study racial preferences in dating, and also by Belot et al. (2012) to examine beauty discrimination among game show contestants.

15

There is also debate about whether such balancing tests on covariates are useful in experimental studies from a formal statistical point of view (e.g., Imai et al., 2008).

16

Another experimental variation that was exogenous to our design constituted a 15% increase in public transport prices as of 1 January 2012. Comparing tester outcomes before and after the recorded price change, we found no significant difference in overall acceptance rates (0.64 vs. 0.65). This result is most likely due to the personal cost of granting a free ride being constant for bus drivers.

17

In a study of online credit markets, Pope and Sydnor (2011) find evidence of favouritism towards borrowers that show signs of military involvement.

18

According to the Australian Bureau of Statistics (2012), criminal offenders born in Nigeria (Africa) have the highest imprisonment rate (1,079 prisoners per 100,000 adult population born in Nigeria). The rate of imprisonment for Aboriginal and Torres Strait Islander prisoners is about 14 times higher than the rate for non-Indigenous prisoners. Moreover, around one-third of all Aboriginal and Torres Strait Islander prisoners were sentenced or charged for acts intended to cause injury. On the other hand, both Indian and Asian ethnic groups have relatively low criminal conviction rates, much lower than whites. Less than 3.5% of all prisoners in Australia are of East Asian or Indian (South Asian) origin, while more than 85% are white.

19

See Ayres (2001), Riach and Rich (2002) and Pager (2007) for similar arguments and discussions.

20

One way of checking and correcting for this would have been to video-record each interaction. We decided against this approach in order to maintain bus driver anonymity.

21

Doleac and Stein (2013) and Ayres et al. (2015) avoid some of the above issues by examining racial bias in online marketplaces such as Craigslist and eBay. These innovative studies and designs use advertisements that feature photographs of sale items (such as iPods and baseball cards) either held by a dark- or light-skinned hand. Such a visual and static approach then removes all potential behavioural idiosyncrasies across seller groups, isolating the role of race to perhaps the greatest degree possible. If the display of a seller’s skin colour is not too uncommon in these online markets, then this method is ideal for studying consumer-side discrimination. A recent study by Bohren et al. (2019) also employs a clever field experiment to shed light on the underlying sources of gender discrimination in a large online forum.

22

Dur and Zoutenbier (2015) show that public sector employees are significantly more altruistic and lazy than observationally equivalent private sector employees. Such heterogeneity in social preferences may partly explain the relatively high level of generosity exhibited by the studied bus drivers.

23

For more focused studies and field tests of discrimination versus favouritism, see Feld et al. (2016) and Sandberg (2018).

24

We find no differences in the willingness to accommodate by sex of the driver. In a post-experiment interview, many of our hired testers did however note that female bus drivers were much more likely to express their anger and disappointment following a rejection, mainly reiterating the fact that all passengers had to purchase a ticket before boarding the bus.

25

The experimental literature on giving behaviour suggests that trustworthiness increases with the age of potential recipients (e.g., Sutter and Kocher, 2007). Thus, it may also be the case that bus drivers are more willing to grant free rides to middle-aged or elderly citizens, via the perceived trust channel noted above. This is perhaps an important consideration, and treatment variation, for future work on this topic. We thank an anonymous referee for this point.

26

Using the list of covariates from model (4) in Table 4, the marginal effect for black tester estimated with a probit model is similar to the estimate from the linear probability model: −0.444 (SE 0.099). The same (probit) marginal effect for black tester is estimated if we include controls for the treatment variations: −0.438 (SE 0.094). These findings suggest that our estimation results are robust to the chosen functional form.

27

The lack of any observed status effects for Asian customers is likely due to the uncorrected data in the raw comparisons. That is, Asian testers wearing business attire may have simultaneously been subject to unlucky draws of other decision factors that reduce the likelihood of acceptance.

28

The estimated treatment effects are also robust to the inclusion of tester appearance scores on attractiveness, aggressiveness and trustworthiness. The corresponding regression equations, without any controls for tester appearances, are reported in Table B3.

29

Not long after our research assistants began conducting the survey, the bus drivers were instructed by management not to participate in the survey any longer and thus rejected all consequent approaches.

30

As pointed out by a referee, if the surveyed bus drivers had any reason to believe that they were being studied by their own employer, then the recorded disparities between revealed and stated attitudes could be further biased upward. While it was made clear that the anonymous survey was being conducted by academic researchers from a university, such beliefs about employer involvement might still be reflected in the collected responses.

31

In a study on employer attitudes towards hiring ex-offenders, Pager and Quillian (2005) report a similar reversal in racial attitudes after switching from revealed to stated elicitation methods. Alem et al. (2018) provide a more recent example focusing on unethical behaviour.

32

The data on population demographics and average earnings come from the Australian Bureau of Statistics: www.abs.gov.au/earnings-and-work-hours.

33

Data on local bus passenger waiting times are drawn from the Moovit Public Transport Index; a part of the public transportation application Moovit, with around one million registered users across Australia.

34

We can similarly question whether the bus-going community is the same as the general one, and whether the appropriate incomes should reflect those of regular bus riders. Since no valid statistics are available on such elements, all we can do is mention that we would expect bus customers to be relatively poor (so black customers would be over-represented), and the average earnings of bus customers (and their opportunity cost of time) to be lower; with conflicting effects on the welfare calculations above.

35

We thank an anonymous referee for this point.

References

Akerlof
G.
,
Kranton
R.
(
2000
). ‘
Economics and identity
’,
Quarterly Journal of Economics
, vol.
115
(
3
), pp.
715
53
.

Alem
Y.
,
Eggert
H.
,
Kocher
M.
,
Ruhinduka
R.
(
2018
). ‘
Why (field) experiments on unethical behavior are important: comparing stated and revealed behavior
’,
Journal of Economic Behavior and Organization
, vol.
156
, pp.
71
85
.

Alesina
A.
,
La Ferrara
E.
(
2002
). ‘
Who trusts others?
’,
Journal of Public Economics
, vol.
85
(
2
), pp.
207
34
.

Alesina
A.
,
La Ferrara
E.
(
2014
). ‘
A test of racial bias in capital sentencing
’,
American Economic Review
, vol.
104
(
11
), pp.
3397
433
.

Andreoni
J.
,
Rao
J.
,
Trachtman
H.
(
2017
). ‘
Avoiding the ask: a field experiment on altruism, empathy, and charitable giving
’,
Journal of Political Economy
, vol.
125
(
3
), pp.
625
53
.

Antonovics
K.
,
Arcidiacono
P.
,
Walsh
R.
(
2005
). ‘
Games and discrimination: lessons from the Weakest Link
’,
Journal of Human Resources
, vol.
40
(
4
), pp.
918
47
.

Anwar
S.
,
Bayer
P.
,
Hjalmarsson
R.
(
2012
). ‘
The impact of jury race in criminal trials
’,
Quarterly Journal of Economics
, vol.
127
(
2
), pp.
1017
55
.

Anwar
S.
,
Fang
H.
(
2006
). ‘
An alternative test of racial prejudice in motor vehicle searches: theory and evidence
’,
American Economic Review
, vol.
96
(
1
), pp.
127
51
.

Arrow
K.
(
1973
). ‘
The theory of discrimination
’, in (
Ashenfelter
O.
and
Rees
A.
, eds.),
Discrimination in Labor Markets
, pp.
3
33
.,
Princeton, NJ
:
Princeton University Press
.

Australian Bureau of Statistics
. (
2012
). ‘
Prisoners in Australia
’, ABS cat. no. 4517.0,
Canberra
:
ABS
.

Ayres
I.
(
2001
).
Pervasive Prejudice?: Unconventional Evidence of Race and Gender Discrimination
,
Chicago
:
University of Chicago Press
.

Ayres
I.
,
Banaji
M.
,
Jolls
C.
(
2015
). ‘
Race effects on eBay
’,
RAND Journal of Economics
, vol.
46
(
4
), pp.
891
917
.

Ayres
I.
,
Siegelman
P.
(
1995
). ‘
Race and gender discrimination in bargaining for a new car
’,
American Economic Review
, vol.
85
(
3
), pp.
304
21
.

Ayres
I.
,
Waldfogel
J.
(
1994
). ‘
A market test for race discrimination in bail setting
’,
Stanford Law Review
, vol.
46
(
5
), pp.
987
1047
.

Balafoutas
L.
,
Beck
A.
,
Kerschbamer
R.
,
Sutter
M.
(
2013
). ‘
What drives taxi drivers? A field experiment on fraud in a market for credence goods
’,
Review of Economic Studies
, vol.
80
(
3
), pp.
876
91
.

Bar
R.
,
Zussman
A.
(
2017
). ‘
Customer discrimination: evidence from Israel
’,
Journal of Labor Economics
, vol.
35
(
4
), pp.
1031
59
.

Bar
R.
,
Zussman
A.
(
2020
). ‘
Identity and bias: insights from driving tests
’,
Economic Journal
, vol.
130
(
625
), pp.
1
23
.

Becker
G.S.
(
1957
).
The Economics of Discrimination
,
Chicago
:
University of Chicago Press
.

Belot
M.
,
Bhaskar
V.
,
van de Ven
J.
(
2012
). ‘
Beauty and the sources of discrimination
’,
Journal of Human Resources
, vol.
47
(
3
), pp.
851
72
.

Bertrand
M.
,
Duflo
E.
(
2017
). ‘
Field experiments on discrimination
’, in (
Banerjee
A.V.
and
Duflo
E.
, eds.),
Handbook of Economic Field Experiments
, pp.
309
93
.,
Amsterdam: North Holland
.

Bertrand
M.
,
Mullainathan
S.
(
2004
). ‘
Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination
’,
American Economic Review
, vol.
94
(
4
), pp.
991
1013
.

Bohren
J.A.
,
Imas
A.
,
Rosenberg
M.
(
2019
). ‘
The dynamics of discrimination: theory and evidence
’,
American Economic Review
, vol.
109
(
10
), pp.
3395
436
.

Booth
A.
,
Leigh
A.
,
Varganova
E.
(
2012
). ‘
Does racial and ethnic discrimination vary across minority groups? Evidence from a field experiment
’,
Oxford Bulletin of Economics and Statistics
, vol.
74
(
4
), pp.
547
73
.

Castillo
M.
,
Petrie
R.
,
Torero
M.
,
Vesterlund
L.
(
2013
). ‘
Gender differences in bargaining outcomes: a field experiment on discrimination
’,
Journal of Public Economics
, vol.
99
, pp.
35
48
.

Cettolin
E.
,
Suetens
S.
(
2018
). ‘
Return on trust is lower for immigrants
’,
Economic Journal
, vol.
129
(
621
), pp.
1992
2009
.

Charness
G.
,
Rabin
M.
(
2002
). ‘
Understanding social preferences with simple tests
’,
Quarterly Journal of Economics
, vol.
117
(
3
), pp.
817
69
.

Currie
J.
,
Lin
W.
,
Zhang
W.
(
2011
). ‘
Patient knowledge and antibiotic abuse: evidence from an audit study in China
’,
Journal of Health Economics
, vol.
30
(
5
), pp.
933
49
.

Daniel
W.
(
1968
).
Racial Discrimination in England
,
Middlesex
:
Penguin Books
.

Davis
L.
,
Lennon
S.
(
1988
). ‘
Social cognition and the study of clothing and human behavior
’,
Social Behavior and Personality
, vol.
16
(
2
), pp.
175
86
.

DellaVigna
S.
,
List
J.
,
Malmendier
U.
(
2012
). ‘
Testing for altruism and social pressure in charitable giving
’,
Quarterly Journal of Economics
, vol.
127
(
1
), pp.
1
56
.

Doleac
J.
,
Stein
L.
(
2013
). ‘
The visible hand: race and online market outcomes
’,
Economic Journal
, vol.
123
(
572
), pp.
F469
92
.

Dur
R.
,
Zoutenbier
R.
(
2015
). ‘
Intrinsic motivations of public sector employees: evidence for Germany
’,
German Economic Review
, vol.
16
(
3
), pp.
343
66
.

Edelman
B.
,
Luca
M.
,
Svirsky
D.
(
2017
). ‘
Racial discrimination in the sharing economy: evidence from a field experiment
’,
American Economic Journal: Applied Economics
, vol.
9
(
2
), pp.
1
22
.

Ewens
M.
,
Tomlin
B.
,
Wang
L.
(
2014
). ‘
Statistical discrimination or prejudice? A large sample field experiment
’,
Review of Economics and Statistics
, vol.
96
(
1
), pp.
119
34
.

Falk
A.
,
Zehnder
C.
(
2013
). ‘
A city-wide experiment on trust discrimination
’,
Journal of Public Economics
, vol.
100
, pp.
15
27
.

Feld
J.
,
Salamanca
N.
,
Hamermesh
D.
(
2016
). ‘
Endophilia or exophobia: beyond discrimination
’,
Economic Journal
, vol.
126
(
594
), pp.
1503
27
.

Fiske
S.
(
1998
). ‘
Stereotyping, prejudice, and discrimination
’,
The Handbook of Social Psychology
, vol.
2
(
4
), pp.
357
411
.

Fisman
R.
,
Iyengar
S.
,
Kamenica
E.
,
Simonson
I.
(
2008
). ‘
Racial preferences in dating
’,
Review of Economic Studies
, vol.
75
(
1
), pp.
117
32
.

Fisman
R.
,
Paravisini
D.
,
Vig
V.
(
2017
). ‘
Cultural proximity and loan outcomes
’,
American Economic Review
, vol.
107
(
2
), pp.
457
92
.

Fong
C.
,
Luttmer
E.
(
2009
). ‘
What determines giving to hurricane Katrina victims? Experimental evidence on racial group loyalty
’,
American Economic Journal: Applied Economics
, vol.
1
(
2
), pp.
64
87
.

Frijters
P.
(
2013
).
An Economic Theory of Greed, Love, Groups, and Networks
,
Cambridge
:
Cambridge University Press
.

Fryer
R.G. Jr
,
Levitt
S.D.
(
2004
). ‘
The causes and consequences of distinctively black names
’,
Quarterly Journal of Economics
, vol.
119
(
3
), pp.
767
805
.

Ge
Y.
,
Knittel
C.
,
MacKenzie
D.
,
Zoepf
S.
(
2016
). ‘
Racial and gender discrimination in transportation network companies
’,
Working Paper
,
National Bureau of Economic Research
.

Giulietti
C.
,
Tonin
M.
,
Vlassopoulos
M.
(
2019
). ‘
Racial discrimination in local public services: a field experiment in the United States
’,
Journal of the European Economic Association
, vol.
17
(
1
), pp.
165
204
.

Glaeser
E.
,
Laibson
D.
,
Scheinkman
J.
,
Soutter
C.
(
2000
). ‘
Measuring trust
’,
Quarterly Journal of Economics
, vol.
115
(
3
), pp.
811
46
.

Gneezy
U.
,
List
J.
,
Price
M.
(
2012
). ‘
Toward an understanding of why people discriminate: evidence from a series of natural field experiments
’,
Working Paper
,
National Bureau of Economic Research
.

Goette
L.
,
Huffman
D.
,
Meier
S.
(
2006
). ‘
The impact of group membership on cooperation and norm enforcement: evidence using random assignment to real social groups
’,
American Economic Review
, vol.
96
(
2
), pp.
212
6
.

Grosskopf
B.
,
Pearce
G.
(
2016
). ‘
Do you mind me paying less? Measuring other-regarding preferences in the market for taxis
’, Working Paper.

Guryan
J.
,
Charles
K.
(
2013
). ‘
Taste-based or statistical discrimination: the economics of discrimination returns to its roots
’,
Economic Journal
, vol.
123
(
572
), pp.
F417
32
.

Heckman
J.
(
1998
). ‘
Detecting discrimination
’,
Journal of Economic Perspectives
, vol.
12
(
2
), pp.
101
16
.

Heckman
J.
,
Siegelman
P.
(
1993
). ‘
The Urban Institute audit studies: their methods and findings
’, in (
Fix
M.
,
Struyk
R.J.
, eds.),
Clear and Convincing Evidence: Measurement of Discrimination in America
, pp.
187
258
.,
Washington, DC
:
Urban Institute Press
.

Hedegaard
M.
,
Tyran
J.R.
(
2018
). ‘
The price of prejudice
’,
American Economic Journal: Applied Economics
, vol.
10
(
1
), pp.
40
63
.

Imai
K.
,
King
G.
,
Stuart
E.
(
2008
). ‘
Misunderstandings between experimentalists and observationalists about causal inference
’,
Journal of the Royal Statistical Society: Series A (Statistics in Society)
, vol.
171
(
2
), pp.
481
502
.

Knowles
J.
,
Persico
N.
,
Todd
P.
(
2001
). ‘
Racial bias in motor vehicle searches: theory and evidence
’,
Journal of Political Economy
, vol.
109
(
1
), pp.
203
32
.

LaPiere
R.
(
1934
). ‘
Attitudes vs. actions
’,
Social Forces
, vol.
13
(
2
), pp.
230
7
.

Leibbrandt
A.
,
List
J.
(
2014
). ‘
Do women avoid salary negotiations? Evidence from a large-scale natural field experiment
’,
Management Science
, vol.
61
(
9
), pp.
2016
24
.

Leong
N.
,
Belzer
A.
(
2016
). ‘
The new public accommodations: race discrimination in the platform economy
’,
Georgetown Law Journal
, vol.
105
, pp.
1271
322
.

Li
H.
,
Lang
K.
,
Leong
K.
(
2018
). ‘
Does competition eliminate discrimination? Evidence from the commercial sex market in Singapore
’,
Economic Journal
, vol.
128
(
611
), pp.
1570
608
.

List
J.
(
2004
). ‘
The nature and extent of discrimination in the marketplace: evidence from the field
’,
Quarterly Journal of Economics
, vol.
119
(
1
), pp.
49
89
.

List
J.
,
Price
M.
(
2009
). ‘
The role of social connections in charitable fundraising: evidence from a natural field experiment
’,
Journal of Economic Behavior and Organization
, vol.
69
(
2
), pp.
160
9
.

List
J.
,
Rasul
I.
(
2011
). ‘
Field experiments in labor economics
’, in (
Ashenfelter
O.
and
Card
D.
, eds.),
Handbook of Labor Economics
, pp.
103
228
.,
Amsterdam: North Holland
.

Macrae
C.
,
Bodenhausen
G.
(
2000
). ‘
Social cognition: thinking categorically about others
’,
Annual Review of Psychology
, vol.
51
(
1
), pp.
93
120
.

Mujcic
R.
,
Frijters
P.
(
2013
). ‘
Still not allowed on the bus: it matters if you’re black or white!
’,
Working Paper
,
IZA Institute of Labor Economics
.

Neumark
D.
(
2018
). ‘
Experimental research on labor market discrimination
’,
Journal of Economic Literature
, vol.
56
(
3
), pp.
799
866
.

Neumark
D.
,
Bank
R.
,
Van Nort
K.
(
1996
). ‘
Sex discrimination in restaurant hiring: an audit study
’,
Quarterly Journal of Economics
, vol.
111
(
3
), pp.
915
41
.

Norton
M.
,
Sommers
S.
,
Apfelbaum
E.
,
Pura
N.
,
Ariely
D.
(
2006
). ‘
Color blindness and interracial interaction: playing the political correctness game
’,
Psychological Science
, vol.
17
(
11
), pp.
949
53
.

Pager
D.
(
2003
). ‘
The mark of a criminal record
’,
American Journal of Sociology
, vol.
108
(
5
), pp.
937
75
.

Pager
D.
(
2007
). ‘
The use of field experiments for studies of employment discrimination: contributions, critiques, and directions for the future
’,
Annals of the American Academy of Political and Social Science
, vol.
609
(
1
), pp.
104
33
.

Pager
D.
,
Quillian
L.
(
2005
). ‘
Walking the talk? What employers say versus what they do
’,
American Sociological Review
, vol.
70
(
3
), pp.
355
80
.

Pager
D.
,
Shepherd
H.
(
2008
). ‘
The sociology of discrimination: racial discrimination in employment, housing, credit, and consumer markets
’,
Annual Review of Sociology
, vol.
34
, pp.
181
209
.

Parsons
C.
,
Sulaeman
J.
,
Yates
M.
,
Hamermesh
D.
(
2011
). ‘
Strike three: discrimination, incentives, and evaluation
’,
American Economic Review
, vol.
101
(
4
), pp.
1410
35
.

Persico
N.
,
Todd
P.
(
2006
). ‘
Generalising the hit rates test for racial bias in law enforcement, with an application to vehicle searches in Wichita
’,
Economic Journal
, vol.
116
(
515
), pp.
F351
67
.

Phelps
E.
(
1972
). ‘
The statistical theory of racism and sexism
’,
American Economic Review
, vol.
62
(
4
), pp.
659
61
.

Pope
D.
,
Price
J.
,
Wolfers
J.
(
2018
). ‘
Awareness reduces racial bias
’,
Management Science
, vol.
64
(
11
), pp.
4988
95
.

Pope
D.
,
Sydnor
J.
(
2011
). ‘
What’s in a picture? Evidence of discrimination from Prosper.com
’,
Journal of Human Resources
, vol.
46
(
1
), pp.
53
92
.

Price
J.
,
Wolfers
J.
(
2010
). ‘
Racial discrimination among NBA referees
’,
Quarterly Journal of Economics
, vol.
125
(
4
), pp.
1859
87
.

Riach
P.
,
Rich
J.
(
2002
). ‘
Field experiments of discrimination in the market place
’,
Economic Journal
, vol.
112
(
483
), pp.
F480
518
.

Redcay
E.
,
Dodell-Feder
D.
,
Pearrow
M.J.
,
Mavros
P.L.
,
Kleiner
M.
,
Gabrieli
J.D.
,
Saxe
R.
(
2010
). ‘
Live face-to-face interaction during fMRI: a new tool for social cognitive neuroscience
’,
Neuroimage
, vol.
50
(
4
), pp.
1639
47
.

Ross
S.
,
Yinger
J.
(
2002
).
The Color of Credit: Mortgage Discrimination, Research Methodology, and Fair-Lending Enforcement
,
Cambridge, MA
:
MIT Press
.

Sandberg
A.
(
2018
). ‘
Competing identities: a field study of in-group bias among professional evaluators
’,
Economic Journal
, vol.
128
(
613
), pp.
2131
59
.

Seidel
M.
,
Polzer
J.
,
Stewart
K.
(
2000
). ‘
Friends in high places: the effects of social networks on discrimination in salary negotiations
’,
Administrative Science Quarterly
, vol.
45
(
1
), pp.
1
24
.

Shayo
M.
,
Zussman
A.
(
2011
). ‘
Judicial ingroup bias in the shadow of terrorism
’,
Quarterly Journal of Economics
, vol.
126
(
3
), pp.
1447
84
.

Siegelman
P.
(
1998
). ‘
Racial discrimination in everyday commercial transactions: what do we know, what do we need to know, and how can we find out?
’, in (
Fix
M.
and
Turner
M.A.
, eds.),
A National Report Card on Discrimination in America: The Role of Testing
, pp.
69
98
.,
Washington, DC
:
Urban Institute Press
.

Small
M.L.
,
Pager
D.
(
2020
). ‘
Sociological perspectives on racial discrimination
’,
Journal of Economic Perspectives
, vol.
34
(
2
), pp.
49
67
.

Sutter
M.
,
Kocher
M.
(
2007
). ‘
Trust and trustworthiness across different age groups
’,
Games and Economic Behavior
, vol.
59
(
2
), pp.
364
82
.

Tourangeau
R.
,
Rips
L.
,
Rasinski
K.
(
2000
).
The Psychology of Survey Response
,
Cambridge
:
Cambridge University Press
.

Vomfell
L.
,
Stewart
N.
(
2019
). ‘
Officer bias in stop and search is exacerbated by deployment decisions
’,
Working Paper
,
University of Warwick
.

West
J.
(
2018
). ‘
Racial bias in police investigations
’,
Working Paper
,
University of California Santa Cruz
.

Yinger
J.
(
1986
). ‘
Measuring racial discrimination with fair housing audits: caught in the act
’,
American Economic Review
, vol.
76
(
5
), pp.
881
93
.

Zussman
A.
(
2013
). ‘
Ethnic discrimination: lessons from the Israeli online market for used cars
’,
Economic Journal
, vol.
123
(
572
), pp.
F433
68
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.