- Split View
-
Views
-
Cite
Cite
Patrick Kline, Evan K Rose, Christopher R Walters, Systemic Discrimination Among Large U.S. Employers, The Quarterly Journal of Economics, Volume 137, Issue 4, November 2022, Pages 1963–2036, https://doi.org/10.1093/qje/qjac024
- Share Icon Share
Abstract
We study the results of a massive nationwide correspondence experiment sending more than 83,000 fictitious applications with randomized characteristics to geographically dispersed jobs posted by 108 of the largest U.S. employers. Distinctively Black names reduce the probability of employer contact by 2.1 percentage points relative to distinctively white names. The magnitude of this racial gap in contact rates differs substantially across firms, exhibiting a between-company standard deviation of 1.9 percentage points. Despite an insignificant average gap in contact rates between male and female applicants, we find a between-company standard deviation in gender contact gaps of 2.7 percentage points, revealing that some firms favor male applicants and others favor women. Company-specific racial contact gaps are temporally and spatially persistent, and negatively correlated with firm profitability, federal contractor status, and a measure of recruiting centralization. Discrimination exhibits little geographical dispersion, but two-digit industry explains roughly half of the cross-firm variation in both racial and gender contact gaps. Contact gaps are highly concentrated in particular companies, with firms in the top quintile of racial discrimination responsible for nearly half of lost contacts to Black applicants in the experiment. Controlling false discovery rates to the 5% level, 23 companies are found to discriminate against Black applicants. Our findings establish that discrimination against distinctively Black names is concentrated among a select set of large employers, many of which can be identified with high confidence using large-scale inference methods.
I. Introduction
Employment discrimination is a stubbornly persistent social problem. Title VII of the Civil Rights Act of 1964 forbids employment discrimination on the basis of race, sex, color, religion, and national origin. Yet a large social science literature analyzing résumé correspondence experiments finds that these protected characteristics influence employer treatment of job applications (Bertrand and Duflo 2017; Quillian et al. 2017; Baert 2018), with some studies finding that this disparate treatment predicts later hiring decisions (Quillian, Lee, and Oliver 2020). In a reanalysis of several correspondence experiments, Kline and Walters (2021) find that discriminatory biases vary tremendously across job vacancies. Less is known, however, about the extent to which discriminatory jobs are concentrated in particular companies. Is the U.S. labor market characterized by a small faction of severe discriminators adrift in an ocean of unbiased firms, or do most companies exhibit roughly equivalent biases?
The answer to this question has a host of important ramifications. First, as emphasized by Becker (1957), if discrimination is confined to a small minority of firms, workers may be able to avoid prejudice by sorting to nondiscriminatory employers. Second, if the most biased firms also tend to offer the highest wages, the contribution of discrimination to observed disparities will tend to be amplified (Card, Cardoso, and Kline 2016; Gerard et al. 2021). Third, if only a few firms discriminate, and do so heavily, it may be possible for government regulators to target these companies for audits and investigations. For instance, the Office of Federal Contract Compliance (OFCCP) annually audits thousands of federal contractors for compliance with equal employment laws (Maxwell et al. 2013). Likewise, the U.S. Equal Employment Opportunity Commission (EEOC) routinely launches investigations into whether particular companies have engaged in “systemic discrimination,” a term they define as “a pattern or practice, policy and/or class cases where the discrimination has a broad impact on an industry, profession, company or geographic location” (U.S. EEOC 2006b).
This article reports the results of a massive nationwide correspondence experiment designed to measure patterns of discrimination by large U.S. companies. The two goals of our analysis are to quantify the extent to which discriminatory patterns differ across firms and assess the feasibility of using experimental evidence to target firms likely to be engaged in discrimination. To facilitate these goals, our experiment was designed to repeatedly elicit signals of bias from specific companies. Unlike traditional audit studies that passively sample jobs from newspapers or job boards (e.g., Bertrand and Mullainathan 2004), we prospectively applied to entry-level job vacancies hosted on the web portals of 108 Fortune 500 firms. For each company, we sampled up to 125 entry-level jobs in distinct U.S. counties. By sampling a large number of geographically distinct jobs from each company, we are able to average out idiosyncrasies associated with particular geographic areas, establishments, or hiring managers, revealing consistent organization-wide patterns.
Following a large social science literature (Bertrand and Duflo 2017; Baert 2018), our experiment manipulated employer perceptions of race by randomly assigning racially distinctive names to job applications. Each job received four pairs of applications, with one member of each pair assigned a distinctively Black name and the other a distinctively white name. We also randomly varied signals of applicant sex, age, sexual orientation, gender identity, and political leaning. Over 83,000 job applications were sent in total, providing uniquely precise signals of employer conduct.
Overall, 24% of the applications we sent were contacted by employers within 30 days. This contact rate is nearly three times greater than what Bertrand and Mullainathan (2004) found in their seminal experiment, suggesting that our fictitious applicants were viewed as plausible job candidates by employers. We find that distinctively Black names reduce the likelihood of employer contact relative to distinctively white names by 2.1 percentage points, an effect equal to 9% of the Black mean contact rate. Past work has typically found larger proportional effects, which may be attributable to less biased behavior among the extremely large employers we study and the high overall contact rates yielded by our experiment.
A key finding of our analysis is that patterns of discrimination against Black names vary substantially across employers. After adjusting for sampling error, the cross-firm standard deviation of racial contact gaps is 1.9 percentage points, only slightly below the mean contact penalty for Black names. Despite this wide variability, we cannot reject the null hypothesis that all 108 firms in our experiment weakly favor white names. An application of Efron’s (2016) empirical Bayes (EB) deconvolution estimator reveals that although most firms exhibit mild discrimination against Black applicants, a few exhibit very large biases. We estimate that the top quintile of discriminating firms are responsible for nearly half of the lost contacts to Black applicants in our experiment. The Gini coefficient of employer contact gaps is estimated to be approximately 0.4, suggesting that discrimination against Black names is roughly as concentrated among firms in our experiment as income is among U.S. households.
Companies vary enormously in their treatment of applicant gender. On average, male and female applicants are equally likely to be contacted, but the standard deviation of gender contact gaps across companies is 2.7 percentage points, with a distribution that is roughly symmetric about zero. This “bidirectional” discrimination result accords with the findings of Kline and Walters (2021), who conclude, using different methods, that some jobs sampled in a correspondence experiment of Mexican employers (Arceo-Gomez and Campos-Vazquez 2014) discriminated against women, while others discriminated against men. Our analysis shows that large U.S. employers exhibit corresponding cross-company patterns of heterogeneity in their average gender contact gaps. Like racial discrimination, gender discrimination is highly concentrated in particular firms, with the top quintile of discriminating firms responsible for nearly 60% of contacts lost to gender discrimination and a Gini concentration coefficient of roughly 0.5.
Although our main focus is on race and gender, we also assess the extent of discrimination on several other dimensions. A modest contact penalty of 0.6 percentage points is found for applicants listing high school graduation dates implying an age over 40. This gap also varies across employers, with a cross-firm standard deviation of 1.1 percentage points. In contrast to race, gender, and age, we find no significant penalty for membership in a lesbian, gay, bisexual, transgender, or queer (LGBTQ) club or evidence of heterogeneity in that penalty across firms. Likewise, we find insignificant effects of listing gender-neutral pronouns next to an applicant’s name, though estimates for LGBTQ clubs and gender-neutral pronouns are less precise than estimates for race, gender, and age.
Surprisingly, geographic variation in race, gender, and age discrimination is relatively muted. We cannot reject the null hypothesis that mean contact gaps for gender and age are equal across all 50 states, and find only marginally significant evidence against this null for racial contact gaps. In contrast, two-digit Standard Industrial Classification (SIC) codes explain roughly half of firm-level variation in contact gaps for both race and gender. Race and gender contact gaps also vary significantly by job title, but this variation is indistinguishable from noise conditional on firm fixed effects. Contact gaps exhibit limited variation across third-party intermediaries that power firms’ hiring websites, suggesting that screening algorithms are unlikely to drive the firm differences we measure.
Consistent with classic models of customer discrimination, both racial and gender contact gaps are estimated to be larger in sectors intensive in jobs requiring social interaction. In line with the predictions of Becker (1957), racial contact gaps are smaller at more profitable firms. Racial contact gaps also tend to be smaller among federal contractors, which is consistent with Miller (2017)’s finding that contracting with the federal government yields sustained increases in Black employment. Finally, we find that firms with more centralized points of contact (i.e., callbacks originating from the same phone numbers) have much smaller contact gaps, suggesting that human resources practices may be an important mediator of organization-wide biases.
The finding of significant employer heterogeneity in discriminatory conduct motivates an investigation of which particular organizations are likely violating the Civil Rights Act. As a first approach to characterizing detection possibilities, we form EB posterior mean estimates of the contact gap at each firm. Firms with posterior mean contact gaps in the top quartile of the distribution are estimated to account for roughly half of the contacts lost to racial discrimination. Discrimination is disproportionately clustered in customer-facing sectors, including the auto services and sales sector and certain forms of retail. We find large posterior mean contact gaps favoring women at apparel stores and slightly less pronounced gaps favoring men in the wholesale durable sector.
Although posterior means provide best predictions of the extent of discrimination at each firm, it is also of interest to provide an assessment of which companies are likely to be discriminating at all. Applying large-scale multiple-testing techniques introduced by Storey (2002, 2003), we find that 23 of the firms in our study discriminate against Black applicants with at least 95% posterior certainty (i.e., controlling false discovery rates to no more than 5%). This result implies that at least 22 of these 23 firms should be expected to exhibit nonzero racial contact gaps. These discriminating firms are overrepresented in the auto sector, in general merchandising, and among eating and drinking establishments. In contrast, we find only one firm that can be reliably labeled as discriminating against men, and are unable to detect any firms that discriminate against women when limiting false discovery rates to 5%. Our sharper detection power for racial discrimination stems from the fact that a larger share of firms in the population are estimated to discriminate based on race than on gender, increasing the prior probability of discrimination used to draw inferences about the conduct of individual firms. The single firm identified as discriminating against men is an apparel retailer that also discriminates against Black applicants with high posterior certainty.
In principle, firm-wide contact gaps may be driven by a small share of heavily biased jobs. We develop a simple lower bound on the prevalence of job-level discrimination based on split-sample estimates of the job-level variance of contact gaps. At least 7% of all jobs in our experiment discriminate against distinctively Black names. Among the 23 firms we conclude are likely engaged in racial discrimination, at least 20% of the jobs discriminate against Black names. At the modal firm in this group, this bound implies racial discrimination took place in at least 25 distinct U.S. counties, indicating a nationwide pattern of discrimination against Black names.
We conclude with an economic analysis of optimal auditing strategies meant to mimic the objectives and constraints of regulatory authorities such as the EEOC or OFCCP. Building on the framework introduced in Kline and Walters (2021), a hypothetical auditor seeks to investigate firms with large racial contact gaps. Informational constraints limit the expected yield on audits relative to the first-best investigation rule. We show that auditing strategies controlling the false discovery rate can be justified by a scenario in which the auditor seeks to avoid investigations of nondiscriminators and faces ambiguity regarding the share of discriminatory firms in the population. In practice, we find that making decisions based on false discovery rates rather than posterior means yields little reduction in the expected yield on investigations. The 23 firms we classify as discriminating against Black names are estimated to account for nearly 40% of lost contacts to Black applicants in our experiment.
Congressional oversight committees have questioned the EEOC's choice to prioritize systemic investigations of firms over individual-level claims of discrimination (Kim 2015). Our findings demonstrate that it is possible to target the specific firms responsible for a substantial share of discrimination against Black names while maintaining a tight limit on the expected number of false positives. The evidence of discriminatory patterns uncovered here can, in principle, be used by organizations such as the EEOC or OFCCP to target audits and investigations more effectively. Alternatively, this information can be shared directly with the firms, or even made public, potentially enabling companies to preemptively reform their practices, perhaps by adopting the recruiting policies of their less discriminatory peers.
The rest of the article is organized as follows. Section II provides background on employment discrimination and the law. Section III details the experimental design, Section IV describes the data, and Section V reports basic experimental effects. Section VI documents variation in discrimination across firms, while Section VII examines variation across other groupings of jobs. Section VIII investigates relationships between discrimination and observed employer characteristics. Section IX reports estimates of the full distribution of discrimination across firms. Section X uses this distribution to construct posterior estimates for individual firms and assesses the conclusions that can be drawn about discrimination by specific employers. Section XI considers the consequences of our findings for regulatory auditing decisions. Finally, Section XII concludes with a discussion of implications for antidiscrimination policy and directions for future research.
II. Policy Background
Much of the economics literature has focused on separating the contributions of taste-based and statistical discrimination to observed disparities, an exercise that requires inferring the extent to which employer conduct is motivated by beliefs regarding the productivity of different groups of workers (Becker 1957, 1993; Aigner and Cain 1977; Charles and Guryan 2008; Bohren et al. 2019). Recent empirical and methodological work looks at group differences in the treatment of equally qualified people in bail decisions, motor vehicle searches, probation revocations, and other settings (Arnold, Dobbie, and Yang 2018; Arnold, Dobbie, and Hull 2020; Canay, Mogstad, and Mountjoy 2020; Hull 2021; Rose 2021; Feigenberg and Miller 2022). In the employment context, it is widely understood that taste-based and statistical discrimination typically involve disparate treatment of individuals according to legally protected characteristics, which is prohibited by the Civil Rights Act.1
This article is concerned with measuring such disparate treatment, however motivated. The correspondence experiment we study was designed to manipulate employer perceptions of protected characteristics. Although the legal standing of organizations eliciting evidence of discrimination via “testing” remains unresolved (U.S. EEOC 1996), an employer whose decision to contact a job applicant is influenced by the applicant’s perceived race or sex has nonetheless engaged in disparate treatment and nominally violated the provisions of the Civil Rights Act.2 Although it is unclear whether the statistical evidence provided in an audit study would, on its own, be sufficient to successfully litigate a Title VII disparate treatment claim, such evidence may be helpful in building a case or in targeting investigations that lead to the discovery of additional evidence that eventually proves decisive.3 Conversely, correspondence evidence suggesting equal treatment of workers with different characteristics could, in principle, be used by firms to counter charges of employment discrimination. However, further evidence would likely be required for such a determination, as audit studies may fail to detect biases that manifest only at later stages of the hiring process or among applicants with qualification levels outside those considered in the study.
Although the social science literature has proposed several distinct theories and definitions of systemic discrimination (e.g., Pincus 1996; Reskin 2012), our use of this phrase is motivated by the EEOC’s definition of this term as a “pattern or practice” of discrimination (U.S. EEOC 2006b; Kim 2015). The EEOC’s systemic cases may concern either patterns of disparate treatment on protected characteristics or practices that target nonprotected characteristics but nonetheless have disparate effects on protected groups.4 Key to either sort of case is evidence that the pattern or practice is widespread, affecting a company’s hiring behavior at multiple locations. Although our analysis will not reveal the specific polices or practices giving rise to systemic discrimination, we will be able to assess whether a nationwide pattern of discrimination against protected characteristics is present at particular companies. This information may be of use to the EEOC and to local organizations interested in promoting fair hiring practices.5 Evidence of patterns of discrimination by federal contractors is especially pertinent to the OFCCP, which has broad discretion to audit contractors for compliance with executive orders prohibiting employment discrimination and regularly levies fines and, in some cases, even debars contractors when violations are found (Maxwell et al. 2013).
In deciding whether to launch investigations or audits, federal agencies often rely on analyses of employment data. For instance, the “inexorable zero” standard of Justice Sandra Day O’Connor, which refers to the complete absence of a group from a company’s employees, has been taken as an indicator of discrimination, despite the difficulties of ascertaining whether qualified applicants were actually passed over by the firm (Huang 2004).6 In contrast, the correspondence experiment we study directly manipulated employer perceptions, permitting inferences to be drawn regarding average causal effects of protected characteristics on employer conduct. A finding that such effects are present across a large set of establishments suggests a systemic pattern of discrimination. While these patterns may be driven by official hiring practices, they may also reflect implicit biases on the part of employees with hiring authority. In either case, documentation of nationwide patterns can aid efforts to ensure compliance with the law.
III. Experimental Design
Our study aims to measure the distribution of discrimination across the largest employers in the U.S. Figure I summarizes the sampling frame for the experiment. We began with the Fortune 500, splitting holding companies into brands with separate proprietary hiring websites. Data from InfoGroup and Burning Glass were used to determine the geographic distribution of establishments and vacancies, and each company’s hiring portal was investigated for compatibility with our auditing methods. We determined that 108 companies (i.e., separate brands with distinct hiring websites and systems) had sufficient geographic variation and routinely posted enough entry-level jobs on an easily accessible portal to satisfy our sampling criteria. These 108 large firms, 10 of which are subsidiaries of parent companies in the Fortune 500, employed roughly 15 million workers in 2020 according to Compustat and cover a wide array of industries detailed later on in Table X.
We sampled 125 entry-level job vacancies from each employer, with each vacancy corresponding to an establishment in a different U.S. county. Sampling was organized in a series of five waves, with a target of 25 jobs sampled for each firm in each wave. As shown in Figure I, 72 of the 108 firms were sampled in all waves; some firms were excluded from the first wave due to an interruption caused by the COVID-19 pandemic, and others were excluded in later waves because of new technological barriers in their job portals. We randomly ordered firms at the beginning of each wave and moved sequentially through the list, sampling the most recent job posting in a new county for each firm and randomizing ties. Each sampled job received eight job applications with randomized characteristics. This sampling protocol yields a sample size for each employer of 1,000 applications, spread across the 125 jobs, for a total target of approximately 100,000 applications.
Applications were sent to each job in pairs. To minimize the chances of detection by employers, we allowed a gap of one to two days between consecutive pairs.7 Though some vacancies closed while applications were still being sent, 87% of sampled jobs received the full eight applications and 99% of jobs received at least two. As a result of vacancy closures and the exclusion of some firms from some waves, our final sample size amounted to roughly 84,000 applications. As in many previous experiments measuring discrimination (Bertrand and Duflo 2017), we signaled race using racially distinctive names. Our database of distinctive first names started with that of Bertrand and Mullainathan (2004), who used 9 unique names for each race and gender group, and supplemented this list with 10 more names per group from a database of speeding tickets issued in North Carolina between 2006 and 2018. We classified a name as racially distinctive if more than 90% of individuals with that name are of a particular race, and selected the most common distinctive Black and white names for those born between 1974 and 1979. We assembled distinctive last names from the 2010 U.S. Census, selecting names with high race-specific shares among those that occur at least 10,000 times nationally.8 Together with our database of first names, this list generated about 500 unique full names for each race and gender category. One application in each pair was randomly assigned a distinctively white name while the other was randomly assigned a distinctively Black name. We drew names without replacement to ensure that no two applications to the same firm shared a name.
Our experiment also randomly assigned other legally protected applicant characteristics. Sex was conveyed by applicant names. Fifty-percent of names were distinctively female, and the rest distinctively male. Assignment of sex was not stratified; therefore, each job received between zero and eight female applications. Applicants were randomly assigned a date of birth implying an age between 22 and 58 years old, with ages uniformly distributed over this range. Because the Age Discrimination Act of 1967 prohibits discrimination against people aged 40 or older, we focus on differences between applicants over and under 40.
In Bostock v. Clayton County, Georgia (590 U.S. 1-23, 2020), the U.S. Supreme Court ruled that discrimination based on sexual orientation or gender identity violates Title VII of the Civil Rights Act. We began measuring discrimination on these dimensions starting in wave 2 of the experiment. Sexual orientation was conveyed by randomly assigning 10% of applicants to list LGBTQ high school clubs on their résumés. To distinguish between sexual orientation and general effects of clubs, we randomly assigned an additional 10% of applicants to be members of political or academic clubs. We conveyed gender identity by randomly assigning pronouns to 10% of résumés. Half of résumés with pronouns were assigned gender-typical pronouns (he/him for applicants with male names, she/her for applicants with female names), and the other half received gender-neutral pronouns (they/them). Pronouns were listed on applicants’ PDF résumés below their names.
Each fictitious applicant received a large set of additional characteristics. All applicants graduated from high school in the year of their 18th birthday, with school names drawn randomly from a set of public high schools near the target job. Half of applicants received associate degrees. Work histories consisted of two or three jobs with nearby employers providing relevant experience. For example, retail job applicants were assigned employment experience at local restaurants and retailers. In addition to populating fields in the employer’s online job portal, we uploaded a formatted PDF résumé where possible, with résumé templates and formatting drawn from a database of possible layouts. Some example résumés are shown in Online Appendix Figure A1. For employers requiring personality tests or other assessments, we prepopulated all answers to the assessments and randomly assigned responses subject to the constraint that the applicant must pass the assessment. Random assignment of all supplementary characteristics took place automatically, with these characteristics assigned independently of legally protected attributes and each other.
Our primary outcome is whether an employer attempted to contact the fictitious applicant. Phone numbers and e-mail addresses assigned to the fictitious applicants were monitored to determine when employers reached out for an interview. Contact information was assigned to ensure that no two applicants to the same firm shared an e-mail address or phone number. Our analysis focuses on whether the employer tried to contact an applicant by any method within 30 days of applying. We also report results for other follow-up windows and specific contact types. Further details on the experimental design are available in our registered preanalysis plan and in Online Appendix B.9
IV. Summary Statistics
Table I provides summary statistics on two analysis samples. The baseline sample consists of all 108 firms included in at least one wave. As a robustness exercise, we also consider a second sample restricted to the 72 firms sampled in all waves of the experiment.
. | Panel A: All firms . | Panel B: Balanced sample . | ||||
---|---|---|---|---|---|---|
. | White . | Black . | Difference . | White . | Black . | Difference . |
Résumé characteristics | ||||||
Female | 0.499 | 0.499 | −0.001 | 0.500 | 0.498 | 0.003 |
Over 40 | 0.535 | 0.535 | 0.000 | 0.534 | 0.533 | 0.002 |
LGBTQ club member | 0.081 | 0.082 | −0.001 | 0.079 | 0.080 | −0.001 |
Academic club | 0.040 | 0.042 | −0.002 | 0.039 | 0.042 | −0.003* |
Political club | 0.042 | 0.042 | 0.001 | 0.042 | 0.041 | 0.001 |
Gender-neutral pronouns | 0.041 | 0.041 | −0.001 | 0.040 | 0.040 | 0.000 |
Same-gender pronouns | 0.043 | 0.042 | 0.001 | 0.042 | 0.041 | 0.001 |
Associate degree | 0.476 | 0.485 | −0.009** | 0.478 | 0.485 | −0.006* |
Geographic distribution | ||||||
Northeast | 0.150 | 0.150 | −0.000 | 0.152 | 0.152 | −0.000 |
Midwest | 0.220 | 0.220 | 0.000 | 0.221 | 0.221 | 0.000 |
South | 0.416 | 0.416 | −0.000 | 0.423 | 0.423 | −0.000 |
West | 0.214 | 0.214 | 0.000 | 0.204 | 0.204 | −0.000 |
Wave distribution | ||||||
Wave 1 | 0.174 | 0.174 | 0.000 | 0.189 | 0.189 | 0.000 |
Wave 2 | 0.206 | 0.206 | 0.000 | 0.210 | 0.210 | 0.000 |
Wave 3 | 0.215 | 0.215 | −0.000 | 0.204 | 0.204 | −0.000 |
Wave 4 | 0.205 | 0.205 | −0.000 | 0.198 | 0.198 | −0.000 |
Wave 5 | 0.200 | 0.200 | −0.000 | 0.199 | 0.199 | −0.000 |
Contact rates | ||||||
Any contact in 30 days | 0.251 | 0.230 | 0.020*** | 0.256 | 0.234 | 0.022*** |
Voicemail | 0.178 | 0.159 | 0.019*** | 0.185 | 0.166 | 0.019*** |
0.040 | 0.039 | 0.002 | 0.043 | 0.042 | 0.002 | |
Text | 0.033 | 0.032 | 0.000 | 0.028 | 0.027 | 0.001 |
Any contact in 14 days | 0.217 | 0.199 | 0.017*** | 0.222 | 0.203 | 0.019*** |
Any contact in 15–30 days | 0.034 | 0.031 | 0.003*** | 0.034 | 0.031 | 0.003** |
N applications | 41,837 | 41,806 | 83,643 | 32,703 | 32,665 | 65,368 |
N jobs | 11,114 | 8,667 | ||||
N firms | 108 | 72 | ||||
1/2/3/4/5 waves | 3/4/14/15/72 |
. | Panel A: All firms . | Panel B: Balanced sample . | ||||
---|---|---|---|---|---|---|
. | White . | Black . | Difference . | White . | Black . | Difference . |
Résumé characteristics | ||||||
Female | 0.499 | 0.499 | −0.001 | 0.500 | 0.498 | 0.003 |
Over 40 | 0.535 | 0.535 | 0.000 | 0.534 | 0.533 | 0.002 |
LGBTQ club member | 0.081 | 0.082 | −0.001 | 0.079 | 0.080 | −0.001 |
Academic club | 0.040 | 0.042 | −0.002 | 0.039 | 0.042 | −0.003* |
Political club | 0.042 | 0.042 | 0.001 | 0.042 | 0.041 | 0.001 |
Gender-neutral pronouns | 0.041 | 0.041 | −0.001 | 0.040 | 0.040 | 0.000 |
Same-gender pronouns | 0.043 | 0.042 | 0.001 | 0.042 | 0.041 | 0.001 |
Associate degree | 0.476 | 0.485 | −0.009** | 0.478 | 0.485 | −0.006* |
Geographic distribution | ||||||
Northeast | 0.150 | 0.150 | −0.000 | 0.152 | 0.152 | −0.000 |
Midwest | 0.220 | 0.220 | 0.000 | 0.221 | 0.221 | 0.000 |
South | 0.416 | 0.416 | −0.000 | 0.423 | 0.423 | −0.000 |
West | 0.214 | 0.214 | 0.000 | 0.204 | 0.204 | −0.000 |
Wave distribution | ||||||
Wave 1 | 0.174 | 0.174 | 0.000 | 0.189 | 0.189 | 0.000 |
Wave 2 | 0.206 | 0.206 | 0.000 | 0.210 | 0.210 | 0.000 |
Wave 3 | 0.215 | 0.215 | −0.000 | 0.204 | 0.204 | −0.000 |
Wave 4 | 0.205 | 0.205 | −0.000 | 0.198 | 0.198 | −0.000 |
Wave 5 | 0.200 | 0.200 | −0.000 | 0.199 | 0.199 | −0.000 |
Contact rates | ||||||
Any contact in 30 days | 0.251 | 0.230 | 0.020*** | 0.256 | 0.234 | 0.022*** |
Voicemail | 0.178 | 0.159 | 0.019*** | 0.185 | 0.166 | 0.019*** |
0.040 | 0.039 | 0.002 | 0.043 | 0.042 | 0.002 | |
Text | 0.033 | 0.032 | 0.000 | 0.028 | 0.027 | 0.001 |
Any contact in 14 days | 0.217 | 0.199 | 0.017*** | 0.222 | 0.203 | 0.019*** |
Any contact in 15–30 days | 0.034 | 0.031 | 0.003*** | 0.034 | 0.031 | 0.003** |
N applications | 41,837 | 41,806 | 83,643 | 32,703 | 32,665 | 65,368 |
N jobs | 11,114 | 8,667 | ||||
N firms | 108 | 72 | ||||
1/2/3/4/5 waves | 3/4/14/15/72 |
Notes. This table presents summary statistics for the full analysis sample and balanced sample of firms sent applications in all five waves of the experiment. “White” refers to résumés with distinctively white names; “Black” refers to résumés with distinctively Black names. LGBTQ club membership and gender-neutral pronouns were introduced in wave 2. Asterisks indicate significant differences from zero at the following levels: * p < .1, ** p < .05, *** p < .01.
. | Panel A: All firms . | Panel B: Balanced sample . | ||||
---|---|---|---|---|---|---|
. | White . | Black . | Difference . | White . | Black . | Difference . |
Résumé characteristics | ||||||
Female | 0.499 | 0.499 | −0.001 | 0.500 | 0.498 | 0.003 |
Over 40 | 0.535 | 0.535 | 0.000 | 0.534 | 0.533 | 0.002 |
LGBTQ club member | 0.081 | 0.082 | −0.001 | 0.079 | 0.080 | −0.001 |
Academic club | 0.040 | 0.042 | −0.002 | 0.039 | 0.042 | −0.003* |
Political club | 0.042 | 0.042 | 0.001 | 0.042 | 0.041 | 0.001 |
Gender-neutral pronouns | 0.041 | 0.041 | −0.001 | 0.040 | 0.040 | 0.000 |
Same-gender pronouns | 0.043 | 0.042 | 0.001 | 0.042 | 0.041 | 0.001 |
Associate degree | 0.476 | 0.485 | −0.009** | 0.478 | 0.485 | −0.006* |
Geographic distribution | ||||||
Northeast | 0.150 | 0.150 | −0.000 | 0.152 | 0.152 | −0.000 |
Midwest | 0.220 | 0.220 | 0.000 | 0.221 | 0.221 | 0.000 |
South | 0.416 | 0.416 | −0.000 | 0.423 | 0.423 | −0.000 |
West | 0.214 | 0.214 | 0.000 | 0.204 | 0.204 | −0.000 |
Wave distribution | ||||||
Wave 1 | 0.174 | 0.174 | 0.000 | 0.189 | 0.189 | 0.000 |
Wave 2 | 0.206 | 0.206 | 0.000 | 0.210 | 0.210 | 0.000 |
Wave 3 | 0.215 | 0.215 | −0.000 | 0.204 | 0.204 | −0.000 |
Wave 4 | 0.205 | 0.205 | −0.000 | 0.198 | 0.198 | −0.000 |
Wave 5 | 0.200 | 0.200 | −0.000 | 0.199 | 0.199 | −0.000 |
Contact rates | ||||||
Any contact in 30 days | 0.251 | 0.230 | 0.020*** | 0.256 | 0.234 | 0.022*** |
Voicemail | 0.178 | 0.159 | 0.019*** | 0.185 | 0.166 | 0.019*** |
0.040 | 0.039 | 0.002 | 0.043 | 0.042 | 0.002 | |
Text | 0.033 | 0.032 | 0.000 | 0.028 | 0.027 | 0.001 |
Any contact in 14 days | 0.217 | 0.199 | 0.017*** | 0.222 | 0.203 | 0.019*** |
Any contact in 15–30 days | 0.034 | 0.031 | 0.003*** | 0.034 | 0.031 | 0.003** |
N applications | 41,837 | 41,806 | 83,643 | 32,703 | 32,665 | 65,368 |
N jobs | 11,114 | 8,667 | ||||
N firms | 108 | 72 | ||||
1/2/3/4/5 waves | 3/4/14/15/72 |
. | Panel A: All firms . | Panel B: Balanced sample . | ||||
---|---|---|---|---|---|---|
. | White . | Black . | Difference . | White . | Black . | Difference . |
Résumé characteristics | ||||||
Female | 0.499 | 0.499 | −0.001 | 0.500 | 0.498 | 0.003 |
Over 40 | 0.535 | 0.535 | 0.000 | 0.534 | 0.533 | 0.002 |
LGBTQ club member | 0.081 | 0.082 | −0.001 | 0.079 | 0.080 | −0.001 |
Academic club | 0.040 | 0.042 | −0.002 | 0.039 | 0.042 | −0.003* |
Political club | 0.042 | 0.042 | 0.001 | 0.042 | 0.041 | 0.001 |
Gender-neutral pronouns | 0.041 | 0.041 | −0.001 | 0.040 | 0.040 | 0.000 |
Same-gender pronouns | 0.043 | 0.042 | 0.001 | 0.042 | 0.041 | 0.001 |
Associate degree | 0.476 | 0.485 | −0.009** | 0.478 | 0.485 | −0.006* |
Geographic distribution | ||||||
Northeast | 0.150 | 0.150 | −0.000 | 0.152 | 0.152 | −0.000 |
Midwest | 0.220 | 0.220 | 0.000 | 0.221 | 0.221 | 0.000 |
South | 0.416 | 0.416 | −0.000 | 0.423 | 0.423 | −0.000 |
West | 0.214 | 0.214 | 0.000 | 0.204 | 0.204 | −0.000 |
Wave distribution | ||||||
Wave 1 | 0.174 | 0.174 | 0.000 | 0.189 | 0.189 | 0.000 |
Wave 2 | 0.206 | 0.206 | 0.000 | 0.210 | 0.210 | 0.000 |
Wave 3 | 0.215 | 0.215 | −0.000 | 0.204 | 0.204 | −0.000 |
Wave 4 | 0.205 | 0.205 | −0.000 | 0.198 | 0.198 | −0.000 |
Wave 5 | 0.200 | 0.200 | −0.000 | 0.199 | 0.199 | −0.000 |
Contact rates | ||||||
Any contact in 30 days | 0.251 | 0.230 | 0.020*** | 0.256 | 0.234 | 0.022*** |
Voicemail | 0.178 | 0.159 | 0.019*** | 0.185 | 0.166 | 0.019*** |
0.040 | 0.039 | 0.002 | 0.043 | 0.042 | 0.002 | |
Text | 0.033 | 0.032 | 0.000 | 0.028 | 0.027 | 0.001 |
Any contact in 14 days | 0.217 | 0.199 | 0.017*** | 0.222 | 0.203 | 0.019*** |
Any contact in 15–30 days | 0.034 | 0.031 | 0.003*** | 0.034 | 0.031 | 0.003** |
N applications | 41,837 | 41,806 | 83,643 | 32,703 | 32,665 | 65,368 |
N jobs | 11,114 | 8,667 | ||||
N firms | 108 | 72 | ||||
1/2/3/4/5 waves | 3/4/14/15/72 |
Notes. This table presents summary statistics for the full analysis sample and balanced sample of firms sent applications in all five waves of the experiment. “White” refers to résumés with distinctively white names; “Black” refers to résumés with distinctively Black names. LGBTQ club membership and gender-neutral pronouns were introduced in wave 2. Asterisks indicate significant differences from zero at the following levels: * p < .1, ** p < .05, *** p < .01.
In both samples, roughly half of the applications are assigned distinctively Black names. The slight discrepancy between white and Black sample sizes arises because job vacancies were occasionally taken offline before the second application of a race-balanced pair could be submitted. As expected, other résumé characteristics are balanced across Black and white applications. About half of applications in each group are female. Slightly more than half of applications have high school graduation dates implying ages over 40, a consequence of the fact that the set of applicant birth years was not updated between waves 1 and 2. In subsequent waves we updated birth years to maintain a mean age of 40. By chance, white résumés are slightly less likely than Black résumés to list an associate degree.
On average, roughly 24% of applications were contacted by firms within 30 days. Most of these contact attempts arrived within 14 days. While the most common form of contact was voicemail, a substantial minority of applications were contacted via email or text message. In what follows we pool these forms of contact together and focus on effects of protected characteristics on the probability of any contact.
V. Average Contact Gaps
Employers are significantly less likely to contact applicants with distinctively Black names. The bottom panel of Table I reveals that the contact rate in the 30 days following an application is 2 percentage points (9%) higher for white applications than for Black applications in the pooled sample. The corresponding difference in the balanced sample is 2.2 percentage points (again 9%). These effects are driven primarily by gaps in the probability of contact by voicemail. Online Appendix Figure A2 reports race-specific Kaplan-Meier estimates of contact rates and hazards by days since an application was sent. Thirty days after submission, Black and white contact rates differ by 2 percentage points and contact hazards have equalized across groups. We therefore focus on 30-day contact rates for the remainder of the analysis.
Parent income, education, and other features of family background vary across distinctive names in race and gender groups (Bertrand and Mullainathan 2004; Fryer and Levitt 2004; Gaddis 2017). Online Appendix Figure A3 assesses whether employers respond to this variation by estimating separate contact rates for each first name. We fail to reject that first names have no causal effect on contact probabilities in each race-by-sex category (p ≥ .24). A corresponding analysis of last names, depicted in Online Appendix Figure A4, also fails to reject the absence of a causal effect of names on contact rates in each race category (p ≥ .13). These findings suggest that the primary effect of distinctive names is to convey race and gender to the employer. Of course, differences in employer treatment of distinctively Black and white (or male and female) names may in part reflect stereotypes about average productivity differences between these groups. This possibility notwithstanding, the courts—not to mention potential customers, employees, and corporate shareholders—are likely to view claims that an employer discriminates against applicants with Black (or female) names based on productivity grounds as a pretext for illegal discrimination.
Although the overall contact rate fluctuated during the course of our study, Black applicants faced a consistent contact penalty relative to white applicants. Figure II shows monthly Black and white contact rates (left axis) along with the percentage gap between the rates (right axis). Contact rates fell between October 2019 and February 2020 as hiring for seasonal jobs concluded. We paused the experiment from March to August 2020 because of the COVID-19 pandemic. Contact rates were variable in the months after the experiment resumed and sharply elevated in the final wave of our study as many states eased restrictions in the wake of widespread vaccine distribution. The measured contact rate for white applicants exceeded that for Black applicants in 12 of 13 months of the study, and we cannot reject at the 5% level that either the level or percentage contact gaps between white and Black applicants were constant across the study’s five waves (or 13 months).
Our finding of a contact penalty for Black applicants corroborates a large body of evidence from résumé correspondence studies reviewed in Bertrand and Duflo (2017). The 9% proportional contact gap in our study is somewhat smaller than corresponding estimates from previous work. For example, a meta-analysis by Quillian et al. (2017) concludes that white applicants typically receive 36% more callbacks than Black applicants in recent U.S. correspondence experiments. One potential explanation for the smaller proportional effect in our study is that larger firms exhibit less severe discrimination, as reported in a Canadian correspondence experiment described in Banerjee, Reitz, and Oreopoulos (2018). On the other hand, the 2 percentage point average contact gap between white and Black applicants in our experiment aligns closely with the findings of other recent studies. For example, Nunley et al. (2015) report an average contact gap between white and Black applicants of 2.6 percentage points (17% of the Black mean), while Agan and Starr (2018) report a contact gap of 2.4 percentage points (23% of the Black mean). The lower proportional gap in our experiment is a consequence of the higher overall contact rate for our applications combined with a similar level gap in contact rates.
Our study randomized multiple protected applicant characteristics in addition to race. To summarize the overall effects of all randomized characteristics, Table II reports estimates of simple models of employer contact. Column (1) shows the results of fitting a linear probability model for employer contact as a function of race, sex, age, club membership, and pronouns, controlling for associate degrees, region indicators, and wave indicators. Consistent with the mean differences in Table I, Black applications are contacted 2.1 percentage points less often than whites, a highly statistically significant difference (p < 10−32). The corresponding estimate from a logit specification implies that Black applications face roughly 12% lower odds of a callback.
. | Panel A: All firms . | Panel B: Balanced sample . | ||
---|---|---|---|---|
. | LPM . | Logit . | LPM . | Logit . |
. | (1) . | (2) . | (3) . | (4) . |
Black | −0.0205*** | −0.115*** | −0.0222*** | −0.123*** |
(0.00169) | (0.00949) | (0.00193) | (0.0107) | |
Female | 0.000184 | 0.000760 | −0.000249 | −0.00166 |
(0.00300) | (0.0168) | (0.00341) | (0.0189) | |
Over 40 | −0.00587** | −0.0332** | −0.00472 | −0.0265 |
(0.00299) | (0.0167) | (0.00341) | (0.0189) | |
Political club | −0.00180 | −0.00985 | −0.00316 | −0.0172 |
(0.00742) | (0.0406) | (0.00848) | (0.0458) | |
Academic club | 0.00976 | 0.0520 | 0.00550 | 0.0283 |
(0.00764) | (0.0407) | (0.00870) | (0.0461) | |
LGBTQ club | −0.00513 | −0.0287 | −0.0000389 | −0.000671 |
(0.00545) | (0.0302) | (0.00637) | (0.0342) | |
Same-gender pronouns | −0.0139* | −0.0765* | −0.0126 | −0.0677 |
(0.00735) | (0.0412) | (0.00848) | (0.0466) | |
Gender-neutral pronouns | −0.0104 | −0.0572 | −0.0174** | −0.0946** |
(0.00755) | (0.0421) | (0.00857) | (0.0477) | |
Associate degree | 0.00119 | 0.00665 | 0.00254 | 0.0139 |
(0.00303) | (0.0170) | (0.00345) | (0.0191) | |
Midwest | 0.0631*** | 0.323*** | 0.0454*** | 0.230*** |
(0.0120) | (0.0622) | (0.0136) | (0.0692) | |
South | −0.0297*** | −0.170*** | −0.0396*** | −0.221*** |
(0.0103) | (0.0577) | (0.0117) | (0.0638) | |
West | −0.0266** | −0.153** | −0.0386*** | −0.216*** |
(0.0114) | (0.0650) | (0.0131) | (0.0729) | |
Wave 2 | 0.0535*** | 0.318*** | 0.0510*** | 0.302*** |
(0.0106) | (0.0633) | (0.0116) | (0.0691) | |
Wave 3 | 0.0102 | 0.0624 | 0.0167 | 0.102 |
(0.0101) | (0.0650) | (0.0115) | (0.0722) | |
Wave 4 | 0.0393*** | 0.238*** | 0.0416*** | 0.249*** |
(0.0105) | (0.0640) | (0.0118) | (0.0709) | |
Wave 5 | 0.151*** | 0.798*** | 0.162*** | 0.842*** |
(0.0113) | (0.0614) | (0.0127) | (0.0674) | |
Constant | 0.207*** | −1.358*** | 0.219*** | −1.292*** |
(0.0113) | (0.0666) | (0.0127) | (0.0728) | |
N | 83,643 | 83,643 | 65,368 | 65,368 |
. | Panel A: All firms . | Panel B: Balanced sample . | ||
---|---|---|---|---|
. | LPM . | Logit . | LPM . | Logit . |
. | (1) . | (2) . | (3) . | (4) . |
Black | −0.0205*** | −0.115*** | −0.0222*** | −0.123*** |
(0.00169) | (0.00949) | (0.00193) | (0.0107) | |
Female | 0.000184 | 0.000760 | −0.000249 | −0.00166 |
(0.00300) | (0.0168) | (0.00341) | (0.0189) | |
Over 40 | −0.00587** | −0.0332** | −0.00472 | −0.0265 |
(0.00299) | (0.0167) | (0.00341) | (0.0189) | |
Political club | −0.00180 | −0.00985 | −0.00316 | −0.0172 |
(0.00742) | (0.0406) | (0.00848) | (0.0458) | |
Academic club | 0.00976 | 0.0520 | 0.00550 | 0.0283 |
(0.00764) | (0.0407) | (0.00870) | (0.0461) | |
LGBTQ club | −0.00513 | −0.0287 | −0.0000389 | −0.000671 |
(0.00545) | (0.0302) | (0.00637) | (0.0342) | |
Same-gender pronouns | −0.0139* | −0.0765* | −0.0126 | −0.0677 |
(0.00735) | (0.0412) | (0.00848) | (0.0466) | |
Gender-neutral pronouns | −0.0104 | −0.0572 | −0.0174** | −0.0946** |
(0.00755) | (0.0421) | (0.00857) | (0.0477) | |
Associate degree | 0.00119 | 0.00665 | 0.00254 | 0.0139 |
(0.00303) | (0.0170) | (0.00345) | (0.0191) | |
Midwest | 0.0631*** | 0.323*** | 0.0454*** | 0.230*** |
(0.0120) | (0.0622) | (0.0136) | (0.0692) | |
South | −0.0297*** | −0.170*** | −0.0396*** | −0.221*** |
(0.0103) | (0.0577) | (0.0117) | (0.0638) | |
West | −0.0266** | −0.153** | −0.0386*** | −0.216*** |
(0.0114) | (0.0650) | (0.0131) | (0.0729) | |
Wave 2 | 0.0535*** | 0.318*** | 0.0510*** | 0.302*** |
(0.0106) | (0.0633) | (0.0116) | (0.0691) | |
Wave 3 | 0.0102 | 0.0624 | 0.0167 | 0.102 |
(0.0101) | (0.0650) | (0.0115) | (0.0722) | |
Wave 4 | 0.0393*** | 0.238*** | 0.0416*** | 0.249*** |
(0.0105) | (0.0640) | (0.0118) | (0.0709) | |
Wave 5 | 0.151*** | 0.798*** | 0.162*** | 0.842*** |
(0.0113) | (0.0614) | (0.0127) | (0.0674) | |
Constant | 0.207*** | −1.358*** | 0.219*** | −1.292*** |
(0.0113) | (0.0666) | (0.0127) | (0.0728) | |
N | 83,643 | 83,643 | 65,368 | 65,368 |
Notes. This table presents the effects of randomized protected applicant characteristics on the probability of employer contact within 30 days. Panel A includes all firms, while Panel B includes the balanced sample of firms sent applications in every wave of the experiment. Columns (1) and (3) are linear probability models. Columns (2) and (4) are logistic regressions. Standard errors in parentheses are clustered at the job level. Asterisks indicate statistical significance at the following levels: * p < .1, ** p < .05, *** p < .01.
. | Panel A: All firms . | Panel B: Balanced sample . | ||
---|---|---|---|---|
. | LPM . | Logit . | LPM . | Logit . |
. | (1) . | (2) . | (3) . | (4) . |
Black | −0.0205*** | −0.115*** | −0.0222*** | −0.123*** |
(0.00169) | (0.00949) | (0.00193) | (0.0107) | |
Female | 0.000184 | 0.000760 | −0.000249 | −0.00166 |
(0.00300) | (0.0168) | (0.00341) | (0.0189) | |
Over 40 | −0.00587** | −0.0332** | −0.00472 | −0.0265 |
(0.00299) | (0.0167) | (0.00341) | (0.0189) | |
Political club | −0.00180 | −0.00985 | −0.00316 | −0.0172 |
(0.00742) | (0.0406) | (0.00848) | (0.0458) | |
Academic club | 0.00976 | 0.0520 | 0.00550 | 0.0283 |
(0.00764) | (0.0407) | (0.00870) | (0.0461) | |
LGBTQ club | −0.00513 | −0.0287 | −0.0000389 | −0.000671 |
(0.00545) | (0.0302) | (0.00637) | (0.0342) | |
Same-gender pronouns | −0.0139* | −0.0765* | −0.0126 | −0.0677 |
(0.00735) | (0.0412) | (0.00848) | (0.0466) | |
Gender-neutral pronouns | −0.0104 | −0.0572 | −0.0174** | −0.0946** |
(0.00755) | (0.0421) | (0.00857) | (0.0477) | |
Associate degree | 0.00119 | 0.00665 | 0.00254 | 0.0139 |
(0.00303) | (0.0170) | (0.00345) | (0.0191) | |
Midwest | 0.0631*** | 0.323*** | 0.0454*** | 0.230*** |
(0.0120) | (0.0622) | (0.0136) | (0.0692) | |
South | −0.0297*** | −0.170*** | −0.0396*** | −0.221*** |
(0.0103) | (0.0577) | (0.0117) | (0.0638) | |
West | −0.0266** | −0.153** | −0.0386*** | −0.216*** |
(0.0114) | (0.0650) | (0.0131) | (0.0729) | |
Wave 2 | 0.0535*** | 0.318*** | 0.0510*** | 0.302*** |
(0.0106) | (0.0633) | (0.0116) | (0.0691) | |
Wave 3 | 0.0102 | 0.0624 | 0.0167 | 0.102 |
(0.0101) | (0.0650) | (0.0115) | (0.0722) | |
Wave 4 | 0.0393*** | 0.238*** | 0.0416*** | 0.249*** |
(0.0105) | (0.0640) | (0.0118) | (0.0709) | |
Wave 5 | 0.151*** | 0.798*** | 0.162*** | 0.842*** |
(0.0113) | (0.0614) | (0.0127) | (0.0674) | |
Constant | 0.207*** | −1.358*** | 0.219*** | −1.292*** |
(0.0113) | (0.0666) | (0.0127) | (0.0728) | |
N | 83,643 | 83,643 | 65,368 | 65,368 |
. | Panel A: All firms . | Panel B: Balanced sample . | ||
---|---|---|---|---|
. | LPM . | Logit . | LPM . | Logit . |
. | (1) . | (2) . | (3) . | (4) . |
Black | −0.0205*** | −0.115*** | −0.0222*** | −0.123*** |
(0.00169) | (0.00949) | (0.00193) | (0.0107) | |
Female | 0.000184 | 0.000760 | −0.000249 | −0.00166 |
(0.00300) | (0.0168) | (0.00341) | (0.0189) | |
Over 40 | −0.00587** | −0.0332** | −0.00472 | −0.0265 |
(0.00299) | (0.0167) | (0.00341) | (0.0189) | |
Political club | −0.00180 | −0.00985 | −0.00316 | −0.0172 |
(0.00742) | (0.0406) | (0.00848) | (0.0458) | |
Academic club | 0.00976 | 0.0520 | 0.00550 | 0.0283 |
(0.00764) | (0.0407) | (0.00870) | (0.0461) | |
LGBTQ club | −0.00513 | −0.0287 | −0.0000389 | −0.000671 |
(0.00545) | (0.0302) | (0.00637) | (0.0342) | |
Same-gender pronouns | −0.0139* | −0.0765* | −0.0126 | −0.0677 |
(0.00735) | (0.0412) | (0.00848) | (0.0466) | |
Gender-neutral pronouns | −0.0104 | −0.0572 | −0.0174** | −0.0946** |
(0.00755) | (0.0421) | (0.00857) | (0.0477) | |
Associate degree | 0.00119 | 0.00665 | 0.00254 | 0.0139 |
(0.00303) | (0.0170) | (0.00345) | (0.0191) | |
Midwest | 0.0631*** | 0.323*** | 0.0454*** | 0.230*** |
(0.0120) | (0.0622) | (0.0136) | (0.0692) | |
South | −0.0297*** | −0.170*** | −0.0396*** | −0.221*** |
(0.0103) | (0.0577) | (0.0117) | (0.0638) | |
West | −0.0266** | −0.153** | −0.0386*** | −0.216*** |
(0.0114) | (0.0650) | (0.0131) | (0.0729) | |
Wave 2 | 0.0535*** | 0.318*** | 0.0510*** | 0.302*** |
(0.0106) | (0.0633) | (0.0116) | (0.0691) | |
Wave 3 | 0.0102 | 0.0624 | 0.0167 | 0.102 |
(0.0101) | (0.0650) | (0.0115) | (0.0722) | |
Wave 4 | 0.0393*** | 0.238*** | 0.0416*** | 0.249*** |
(0.0105) | (0.0640) | (0.0118) | (0.0709) | |
Wave 5 | 0.151*** | 0.798*** | 0.162*** | 0.842*** |
(0.0113) | (0.0614) | (0.0127) | (0.0674) | |
Constant | 0.207*** | −1.358*** | 0.219*** | −1.292*** |
(0.0113) | (0.0666) | (0.0127) | (0.0728) | |
N | 83,643 | 83,643 | 65,368 | 65,368 |
Notes. This table presents the effects of randomized protected applicant characteristics on the probability of employer contact within 30 days. Panel A includes all firms, while Panel B includes the balanced sample of firms sent applications in every wave of the experiment. Columns (1) and (3) are linear probability models. Columns (2) and (4) are logistic regressions. Standard errors in parentheses are clustered at the job level. Asterisks indicate statistical significance at the following levels: * p < .1, ** p < .05, *** p < .01.
In contrast to the effect of race, the estimated average effect of sex is small and statistically insignificant. Table II shows that the difference in contact rates for male and female applicants is almost exactly zero, and we can reject average contact gaps of roughly 0.6 percentage points or larger in absolute value. This result is consistent with previous studies showing mixed or zero average effects of sex on employer callbacks in the United States and elsewhere (Nunley et al. 2015; Baert 2018).
We find a modest contact penalty for older applicants. The third row in Table II reports a statistically significant gap of 0.6 percentage points between contact rates for applicants under and over age 40. The estimate for the balanced sample is similar in magnitude but statistically insignificant. As shown in Online Appendix Figure A5, the probability of an employer contact declines modestly but monotonically with age, and we can reject the hypothesis that callback rates are constant across quintiles of applicant age at marginal significance levels (p = .052). Our findings for age confirm the result of Neumark, Burn, and Button (2018) that age discrimination is present in the U.S. labor market, though the magnitude of age effects is somewhat smaller in our experiment.
We find limited evidence of effects of sexual orientation and gender identity, though we have less statistical precision to detect effects of these attributes than for race, gender, and age. The estimated effect of LGBTQ clubs is small and statistically insignificant in the full and balanced samples. Gender-typical pronouns are associated with a marginally significant contact penalty of 1.3 percentage points, but this estimate is not significant in the balanced sample. Gender-neutral pronouns are associated with a comparably sized penalty that is statistically insignificant in the full sample but marginally significant in the balanced sample. Standard errors for the effects of LGBTQ club membership and pronouns are roughly three times as large as for race, a consequence of the fact that fewer than 10% of résumés were assigned these characteristics. We can, however, reject the 4.2 percentage point effect of LGBTQ clubs reported by Tilcsik (2011) for an earlier sample of jobs and employers. We also find no effect of listing an associate degree, a null result that is consistent with the findings of Deming et al. (2016) for nonselective jobs.
A large literature emphasizes the “intersectionality” of race and gender discrimination (Crenshaw 1989, 1990). Table III investigates such interactions by comparing the effects of résumé characteristics for white and Black applicants. Female names generate a marginally significant increase in contact rates for white applicants and a marginally significant decrease for Black applicants. The difference between these effects is a statistically significant 1.4 percentage points, implying that the effect of a female name is more positive for whites (or equivalently, that the penalty for a Black name is larger for women). We also find evidence of an interaction between race and LGBTQ club status: whereas white applicants face a contact penalty of 1.6 percentage points for listing membership in an LGBTQ club, Black applicants receive a small, statistically insignificant, contact bonus. This difference is large enough to eliminate the contact penalty for Black names among applications listing LGBTQ club membership. Although we find insignificant differences in effects for several other attributes, a joint test rejects the null hypothesis of no interaction effects across all dimensions in Table III at the 10% level (p = .065), suggesting that the gender and LGBTQ interactions are not an artifact of statistical noise.
. | OLS . | Logit . | ||||
---|---|---|---|---|---|---|
. | White . | Black . | Difference . | White . | Black . | Difference . |
. | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
Female | 0.00716* | −0.00694* | 0.0141** | 0.0388* | −0.0398* | 0.0786** |
(0.00423) | (0.00412) | (0.00579) | (0.0229) | (0.0236) | (0.0322) | |
Over 40 | −0.0104** | −0.00125 | −0.00915 | −0.0562** | −0.00711 | −0.0491 |
(0.00428) | (0.00413) | (0.00590) | (0.0231) | (0.0236) | (0.0328) | |
Political club | −0.00207 | −0.00229 | 0.000220 | −0.0109 | −0.0126 | 0.00171 |
(0.0107) | (0.0105) | (0.0150) | (0.0562) | (0.0587) | (0.0815) | |
Academic club | 0.00341 | 0.0147 | −0.0113 | 0.0173 | 0.0806 | −0.0633 |
(0.0111) | (0.0107) | (0.0155) | (0.0576) | (0.0574) | (0.0817) | |
LGBTQ club | −0.0165** | 0.00631 | −0.0228** | −0.0889** | 0.0349 | −0.124** |
(0.00787) | (0.00763) | (0.0110) | (0.0431) | (0.0419) | (0.0601) | |
Same-gender pronouns | −0.00971 | −0.0165 | 0.00681 | −0.0515 | −0.0934 | 0.0420 |
(0.0106) | (0.0101) | (0.0146) | (0.0571) | (0.0587) | (0.0816) | |
Gender-neutral pronouns | −0.0106 | −0.0103 | −0.000279 | −0.0564 | −0.0578 | 0.00138 |
(0.0108) | (0.0105) | (0.0150) | (0.0581) | (0.0598) | (0.0830) | |
Associate degree | 0.00573 | −0.00152 | 0.00724 | 0.0309 | −0.00869 | 0.0396 |
(0.00431) | (0.00412) | (0.00584) | (0.0233) | (0.0236) | (0.0325) | |
Constant | 0.201*** | 0.185*** | 0.0160*** | −1.377*** | −1.485*** | 0.108*** |
(0.00848) | (0.00820) | (0.00621) | (0.0514) | (0.0538) | (0.0366) | |
N | 41,837 | 41,806 | 83,643 | 41,837 | 41,806 | 83,643 |
χ2 stat for joint significance | 14.71 | 14.54 | ||||
p-value | .0650 | .0687 |
. | OLS . | Logit . | ||||
---|---|---|---|---|---|---|
. | White . | Black . | Difference . | White . | Black . | Difference . |
. | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
Female | 0.00716* | −0.00694* | 0.0141** | 0.0388* | −0.0398* | 0.0786** |
(0.00423) | (0.00412) | (0.00579) | (0.0229) | (0.0236) | (0.0322) | |
Over 40 | −0.0104** | −0.00125 | −0.00915 | −0.0562** | −0.00711 | −0.0491 |
(0.00428) | (0.00413) | (0.00590) | (0.0231) | (0.0236) | (0.0328) | |
Political club | −0.00207 | −0.00229 | 0.000220 | −0.0109 | −0.0126 | 0.00171 |
(0.0107) | (0.0105) | (0.0150) | (0.0562) | (0.0587) | (0.0815) | |
Academic club | 0.00341 | 0.0147 | −0.0113 | 0.0173 | 0.0806 | −0.0633 |
(0.0111) | (0.0107) | (0.0155) | (0.0576) | (0.0574) | (0.0817) | |
LGBTQ club | −0.0165** | 0.00631 | −0.0228** | −0.0889** | 0.0349 | −0.124** |
(0.00787) | (0.00763) | (0.0110) | (0.0431) | (0.0419) | (0.0601) | |
Same-gender pronouns | −0.00971 | −0.0165 | 0.00681 | −0.0515 | −0.0934 | 0.0420 |
(0.0106) | (0.0101) | (0.0146) | (0.0571) | (0.0587) | (0.0816) | |
Gender-neutral pronouns | −0.0106 | −0.0103 | −0.000279 | −0.0564 | −0.0578 | 0.00138 |
(0.0108) | (0.0105) | (0.0150) | (0.0581) | (0.0598) | (0.0830) | |
Associate degree | 0.00573 | −0.00152 | 0.00724 | 0.0309 | −0.00869 | 0.0396 |
(0.00431) | (0.00412) | (0.00584) | (0.0233) | (0.0236) | (0.0325) | |
Constant | 0.201*** | 0.185*** | 0.0160*** | −1.377*** | −1.485*** | 0.108*** |
(0.00848) | (0.00820) | (0.00621) | (0.0514) | (0.0538) | (0.0366) | |
N | 41,837 | 41,806 | 83,643 | 41,837 | 41,806 | 83,643 |
χ2 stat for joint significance | 14.71 | 14.54 | ||||
p-value | .0650 | .0687 |
Notes. This table presents the effects of race interacted with other résumé characteristics. Columns (1) and (3) show estimates of models for employer contact among white applicants, columns (2) and (4) display estimates for Black applicants, and columns (3) and (6) show differences in coefficients between white and Black applicants. Columns (1)–(3) use linear probability models, while columns (4)–(6) use logistic regression. All models control for wave indicators. χ2 statistics and joint p-values come from tests that all differences in reported coefficients other than the constant term are zero. Standard errors in parentheses are clustered at the job level. Asterisks indicate statistical significance at the following levels: * p < .1, ** p < .05, *** p < .01.
. | OLS . | Logit . | ||||
---|---|---|---|---|---|---|
. | White . | Black . | Difference . | White . | Black . | Difference . |
. | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
Female | 0.00716* | −0.00694* | 0.0141** | 0.0388* | −0.0398* | 0.0786** |
(0.00423) | (0.00412) | (0.00579) | (0.0229) | (0.0236) | (0.0322) | |
Over 40 | −0.0104** | −0.00125 | −0.00915 | −0.0562** | −0.00711 | −0.0491 |
(0.00428) | (0.00413) | (0.00590) | (0.0231) | (0.0236) | (0.0328) | |
Political club | −0.00207 | −0.00229 | 0.000220 | −0.0109 | −0.0126 | 0.00171 |
(0.0107) | (0.0105) | (0.0150) | (0.0562) | (0.0587) | (0.0815) | |
Academic club | 0.00341 | 0.0147 | −0.0113 | 0.0173 | 0.0806 | −0.0633 |
(0.0111) | (0.0107) | (0.0155) | (0.0576) | (0.0574) | (0.0817) | |
LGBTQ club | −0.0165** | 0.00631 | −0.0228** | −0.0889** | 0.0349 | −0.124** |
(0.00787) | (0.00763) | (0.0110) | (0.0431) | (0.0419) | (0.0601) | |
Same-gender pronouns | −0.00971 | −0.0165 | 0.00681 | −0.0515 | −0.0934 | 0.0420 |
(0.0106) | (0.0101) | (0.0146) | (0.0571) | (0.0587) | (0.0816) | |
Gender-neutral pronouns | −0.0106 | −0.0103 | −0.000279 | −0.0564 | −0.0578 | 0.00138 |
(0.0108) | (0.0105) | (0.0150) | (0.0581) | (0.0598) | (0.0830) | |
Associate degree | 0.00573 | −0.00152 | 0.00724 | 0.0309 | −0.00869 | 0.0396 |
(0.00431) | (0.00412) | (0.00584) | (0.0233) | (0.0236) | (0.0325) | |
Constant | 0.201*** | 0.185*** | 0.0160*** | −1.377*** | −1.485*** | 0.108*** |
(0.00848) | (0.00820) | (0.00621) | (0.0514) | (0.0538) | (0.0366) | |
N | 41,837 | 41,806 | 83,643 | 41,837 | 41,806 | 83,643 |
χ2 stat for joint significance | 14.71 | 14.54 | ||||
p-value | .0650 | .0687 |
. | OLS . | Logit . | ||||
---|---|---|---|---|---|---|
. | White . | Black . | Difference . | White . | Black . | Difference . |
. | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
Female | 0.00716* | −0.00694* | 0.0141** | 0.0388* | −0.0398* | 0.0786** |
(0.00423) | (0.00412) | (0.00579) | (0.0229) | (0.0236) | (0.0322) | |
Over 40 | −0.0104** | −0.00125 | −0.00915 | −0.0562** | −0.00711 | −0.0491 |
(0.00428) | (0.00413) | (0.00590) | (0.0231) | (0.0236) | (0.0328) | |
Political club | −0.00207 | −0.00229 | 0.000220 | −0.0109 | −0.0126 | 0.00171 |
(0.0107) | (0.0105) | (0.0150) | (0.0562) | (0.0587) | (0.0815) | |
Academic club | 0.00341 | 0.0147 | −0.0113 | 0.0173 | 0.0806 | −0.0633 |
(0.0111) | (0.0107) | (0.0155) | (0.0576) | (0.0574) | (0.0817) | |
LGBTQ club | −0.0165** | 0.00631 | −0.0228** | −0.0889** | 0.0349 | −0.124** |
(0.00787) | (0.00763) | (0.0110) | (0.0431) | (0.0419) | (0.0601) | |
Same-gender pronouns | −0.00971 | −0.0165 | 0.00681 | −0.0515 | −0.0934 | 0.0420 |
(0.0106) | (0.0101) | (0.0146) | (0.0571) | (0.0587) | (0.0816) | |
Gender-neutral pronouns | −0.0106 | −0.0103 | −0.000279 | −0.0564 | −0.0578 | 0.00138 |
(0.0108) | (0.0105) | (0.0150) | (0.0581) | (0.0598) | (0.0830) | |
Associate degree | 0.00573 | −0.00152 | 0.00724 | 0.0309 | −0.00869 | 0.0396 |
(0.00431) | (0.00412) | (0.00584) | (0.0233) | (0.0236) | (0.0325) | |
Constant | 0.201*** | 0.185*** | 0.0160*** | −1.377*** | −1.485*** | 0.108*** |
(0.00848) | (0.00820) | (0.00621) | (0.0514) | (0.0538) | (0.0366) | |
N | 41,837 | 41,806 | 83,643 | 41,837 | 41,806 | 83,643 |
χ2 stat for joint significance | 14.71 | 14.54 | ||||
p-value | .0650 | .0687 |
Notes. This table presents the effects of race interacted with other résumé characteristics. Columns (1) and (3) show estimates of models for employer contact among white applicants, columns (2) and (4) display estimates for Black applicants, and columns (3) and (6) show differences in coefficients between white and Black applicants. Columns (1)–(3) use linear probability models, while columns (4)–(6) use logistic regression. All models control for wave indicators. χ2 statistics and joint p-values come from tests that all differences in reported coefficients other than the constant term are zero. Standard errors in parentheses are clustered at the job level. Asterisks indicate statistical significance at the following levels: * p < .1, ** p < .05, *** p < .01.
VI. Variation in Discrimination across Firms
A central objective of our study is to measure heterogeneity across firms in the effects of protected characteristics on contact rates. If all firms have the same expected contact gap, a job seeker will have little scope to evade discrimination by redirecting their search toward less biased employers. Likewise, regulators at the EEOC or OFCCP would have little to learn from the parent company of an establishment about whether that establishment is likely engaged in discrimination.
In what follows, we use a variety of methods to document that racial and gender contact gaps vary widely across employers and are spatially and temporally stable, suggesting that the organizational structure of employment is in fact highly informative about discrimination at particular establishments. Before doing so, we clarify the statistical framework used to analyze and interpret the experimental results.
VI.A. Statistical Framework
Denote the realized contact gap at job j ∈ {1, …, Jf} of firm f by |$\hat{\Delta }_{fj}$|. For most of our analysis |$\hat{\Delta }_{fj}$| measures the difference between white and Black contact rates at job j, though we also study other binary protected characteristics, such as gender. Denote by Δf the average causal effect of race on contact rates at jobs in firm f, and let |$\hat{\Delta }_{f} = \frac{1}{J_f}\sum _{j=1}^{J_{f}}\hat{\Delta }_{fj}$| be the corresponding experimental estimate given by the white/Black difference in mean contact rates at this firm. As explained in Online Appendix D, the population contact gap Δf measures the expected difference in contact rates between white and Black résumés in our experiment when sent to an average job posted by firm f. Loosely speaking, if we had repeated our experiment many times, sampling many more jobs from the same firms, each estimated firm gap |$\hat{\Delta }_{f}$| would tend toward its population gap Δf.
Generalizing this idea, we also report a cross-wave estimator measuring the average covariance between firm-by-wave contact gaps |$\hat{\Delta }_{ft}$| and |$\hat{\Delta }_{ft^{\prime }}$| for all pairs (t ≠ t′) of waves. Because the noise in each wave’s estimated contact gap is independent of the noise in each other wave, this cross-wave covariance estimator will also yield an unbiased estimate of θ if contact gaps are stable across time. Likewise, we report a cross-state estimator that gives the average covariance between firm-by-state contact gaps |$\hat{\Delta }_{fs}$| and |$\hat{\Delta }_{fs^{\prime }}$| for all pairs (s ≠ s′) of U.S. states in which we sampled jobs from firm f. The ratio of the cross-wave estimator to the bias-corrected estimator provides a measure of the temporal persistence of the firm component of discrimination, while the ratio of the cross-state estimator to the bias-corrected estimator provides a measure of the geographic stability of the firm component.
VI.B. Testing for Firm Components
To test formally for the significance of firm-level contact gap variation, we report a Pearson χ2 test of the null hypothesis that all of the population contact gaps are equal across firms. The p-values derived from this test would be exact if each firm’s sample contact gap were normally distributed and centered around its population gap with variance equal to its squared standard error |$s_f^2$|.
We are also interested in whether gaps are nonnegative or nonpositive for all firms, which implies a common direction of discrimination. A simple but conservative test of the null hypothesis that contact gaps are weakly positive for all firms would be to compare the minimum z-score (|$\frac{\hat{\Delta }_f}{s_f}$|) across firms to the distribution of the minimum of 108 standard normal random variables. To improve power, we instead employ the high-dimensional moment inequality testing procedure of Bai, Santos, and Shaikh (2022), which drops firms with strongly positive z-scores.
The first two columns of Table IV report the results of these tests. Column (1) shows that the null hypothesis that racial contact gaps are equal across firms is decisively rejected by the χ2 test. Column (2) reveals that the null hypothesis that no firms discriminate against white applicants cannot be rejected and yields a p-value of 1.00, while the null that no firms discriminate against Black applicants is decisively rejected (p < .01). The combination of these results suggests that all firms weakly favor white applicants, but some discriminate against Black applicants more than others.
. | . | . | Contact gap SD . | ||
---|---|---|---|---|---|
. | χ2 test of heterogeneity . | p-value for no discrim against: . | Bias-corrected . | Cross-wave . | Cross-state . |
. | (1) . | (2) . | (3) . | (4) . | (5) . |
Race | 276.5 | W: 1.00 | 0.0185 | 0.0168 | 0.0178 |
[.000] | B: .00 | (0.0031) | (0.0032) | (0.0031) | |
Gender | 205.2 | M: .00 | 0.0267 | 0.0287 | 0.0269 |
[.000] | F: .05 | (0.0038) | (0.0035) | (0.0038) | |
Over 40 | 144.6 | Y: .22 | 0.0103 | 0.0044 | 0.0086 |
[.011] | O: .02 | (0.0069) | (0.0158) | (0.0082) |
. | . | . | Contact gap SD . | ||
---|---|---|---|---|---|
. | χ2 test of heterogeneity . | p-value for no discrim against: . | Bias-corrected . | Cross-wave . | Cross-state . |
. | (1) . | (2) . | (3) . | (4) . | (5) . |
Race | 276.5 | W: 1.00 | 0.0185 | 0.0168 | 0.0178 |
[.000] | B: .00 | (0.0031) | (0.0032) | (0.0031) | |
Gender | 205.2 | M: .00 | 0.0267 | 0.0287 | 0.0269 |
[.000] | F: .05 | (0.0038) | (0.0035) | (0.0038) | |
Over 40 | 144.6 | Y: .22 | 0.0103 | 0.0044 | 0.0086 |
[.011] | O: .02 | (0.0069) | (0.0158) | (0.0082) |
Notes. This table presents estimated standard deviations of firm-level contact rate gaps and tests for heterogeneity in gaps. Column (1) displays χ2 test statistics and associated p-values from tests of the null hypothesis of no heterogeneity in discrimination. The test statistic is |$\sum_{f}{\frac{(\hat{\Delta}_{f}-\bar{\Delta})^{2}}{s_{f}^{2}}}$|, where |$\hat{\Delta }_{f}$| is the contact gap estimate for firm f, sf is the estimate’s standard error, and |$\bar{\Delta }$| is the equally weighted average of contact gaps. Column (2) presents tests for one-sided discrimination against white (W), Black (B), male (M), female (F), aged under 40 (Y), and over 40 (O) applications using the methodology in Bai, Santos, and Shaikh (2021). Column (3) reports estimates of the standard deviation of average contact gaps across firms calculated using firm-specific standard errors to correct for bias due to sampling variation in |$\hat{\Delta }_f$|. Columns (4) and (5) report cross-wave and cross-state estimates based on covariances between firm-by-wave and firm-by-state contact gaps. Details on these estimators appear in Online Appendix D. Standard errors for all variance estimators are produced by job-clustered weighted bootstrap. Estimates include all 108 firms.
. | . | . | Contact gap SD . | ||
---|---|---|---|---|---|
. | χ2 test of heterogeneity . | p-value for no discrim against: . | Bias-corrected . | Cross-wave . | Cross-state . |
. | (1) . | (2) . | (3) . | (4) . | (5) . |
Race | 276.5 | W: 1.00 | 0.0185 | 0.0168 | 0.0178 |
[.000] | B: .00 | (0.0031) | (0.0032) | (0.0031) | |
Gender | 205.2 | M: .00 | 0.0267 | 0.0287 | 0.0269 |
[.000] | F: .05 | (0.0038) | (0.0035) | (0.0038) | |
Over 40 | 144.6 | Y: .22 | 0.0103 | 0.0044 | 0.0086 |
[.011] | O: .02 | (0.0069) | (0.0158) | (0.0082) |
. | . | . | Contact gap SD . | ||
---|---|---|---|---|---|
. | χ2 test of heterogeneity . | p-value for no discrim against: . | Bias-corrected . | Cross-wave . | Cross-state . |
. | (1) . | (2) . | (3) . | (4) . | (5) . |
Race | 276.5 | W: 1.00 | 0.0185 | 0.0168 | 0.0178 |
[.000] | B: .00 | (0.0031) | (0.0032) | (0.0031) | |
Gender | 205.2 | M: .00 | 0.0267 | 0.0287 | 0.0269 |
[.000] | F: .05 | (0.0038) | (0.0035) | (0.0038) | |
Over 40 | 144.6 | Y: .22 | 0.0103 | 0.0044 | 0.0086 |
[.011] | O: .02 | (0.0069) | (0.0158) | (0.0082) |
Notes. This table presents estimated standard deviations of firm-level contact rate gaps and tests for heterogeneity in gaps. Column (1) displays χ2 test statistics and associated p-values from tests of the null hypothesis of no heterogeneity in discrimination. The test statistic is |$\sum_{f}{\frac{(\hat{\Delta}_{f}-\bar{\Delta})^{2}}{s_{f}^{2}}}$|, where |$\hat{\Delta }_{f}$| is the contact gap estimate for firm f, sf is the estimate’s standard error, and |$\bar{\Delta }$| is the equally weighted average of contact gaps. Column (2) presents tests for one-sided discrimination against white (W), Black (B), male (M), female (F), aged under 40 (Y), and over 40 (O) applications using the methodology in Bai, Santos, and Shaikh (2021). Column (3) reports estimates of the standard deviation of average contact gaps across firms calculated using firm-specific standard errors to correct for bias due to sampling variation in |$\hat{\Delta }_f$|. Columns (4) and (5) report cross-wave and cross-state estimates based on covariances between firm-by-wave and firm-by-state contact gaps. Details on these estimators appear in Online Appendix D. Standard errors for all variance estimators are produced by job-clustered weighted bootstrap. Estimates include all 108 firms.
Corresponding estimates for gender reveal that the overall zero effect of perceived sex masks a significant firm component to gender discrimination. As can be seen in the second row of Table IV, the χ2 test decisively rejects that gender contact gaps are equal across firms. In conjunction with our earlier finding of no average effect of gender, this result strongly suggests the presence of discrimination against men at some firms and against women at others. Consistent with this idea, column (2) shows that we can reject the null hypothesis of no firms discriminating against men and the null hypothesis of no firms discriminating against women at conventional levels (p ≤ .05). These findings extend and corroborate recent work by Kline and Walters (2021) and Hangartner, Kopp, and Siegenthaler (2021), who conclude that gender discrimination varies bidirectionally across jobs in Mexico and Switzerland, respectively.
The third row of Table IV demonstrates that age discrimination also varies across firms, though less strongly than for race and gender. Column (1) shows that the χ2 test rejects the null hypothesis of constant age discrimination across firms (p = .011). As shown in column (2), we cannot reject the hypothesis that all employers weakly favor younger applicants. By contrast, the null hypothesis that no firms discriminate against older applicants is rejected at conventional levels (p = .03).
VI.C. Variance Component Estimates
The remaining columns of Table IV report estimates of the standard deviation of firm-level contact gaps for race, gender, and age, calculated as the square root of the unbiased variance estimate |$\hat{\theta }$|. The estimates for racial contact gaps reported in the first row imply substantial dispersion in discrimination across firms. As shown in column (3), the bias-corrected estimator yields a precisely estimated standard deviation of racial contact gaps of 1.9 percentage points. The magnitude of this gap is only slightly smaller than the mean effect of 2.1 percentage points reported in Table II. Similarly, the cross-wave and cross-state estimators yield estimated standard deviations of 1.6 and 1.8 percentage points, respectively. The similarity of the bias-corrected, cross-wave, and cross-state estimates imply that the firm component of racial discrimination is both temporally and spatially stable.
Estimates for gender in the second row of Table IV also show large and stable firm-level discrimination components. The bias-corrected estimator reported in column (3) yields a standard deviation of gender contact gaps of 2.7 percentage points. The cross-wave and cross-state estimators produce standard deviations of 2.9 and 2.7 percentage points, again signaling temporal and spatial stability. Consistent with the weaker evidence for firm-level variation in age discrimination reported already, the cross-firm standard deviation in the effect of age over 40 is smaller and equal to 1.0 percentage point. The cross-wave and cross-state estimators produce positive but small estimated firm components, suggesting modest spatial and temporal persistence in age effects. Graphical evidence of the cross-wave stability of race, gender, and age contact gaps is provided in Online Appendix Figure A6, which plots firm contact gaps in each wave against their leave-wave-out means. These plots also reveal that firm contact gaps for race and gender are not significantly correlated with each other.
Online Appendix Table A2 reports corresponding evidence on firm variation in contact gaps in LGBTQ club membership, same-gender pronouns, and gender-neutral pronouns. Our study is less powered to detect firm components along these dimensions than for race, gender, and age. The estimated variance components for the effects of LGBTQ clubs and pronouns are all statistically insignificant. Online Appendix Table A1 shows that patterns for all protected characteristics change little in the sample of firms present in all five waves of the experiment.
VI.D. Effects on Levels versus Proportions
Some of the variation in contact gaps documented in Table IV may stem from overall differences in firm contact rates. To assess this possibility, we fit logit, Poisson, and linear probability models (LPMs) predicting employer contact with an intercept and a Black indicator, separately by firm. We then apply the bias-corrected estimator to estimate the variances of intercept and slope parameters across firms for each model. To determine whether firms with larger contact gaps in levels also exhibit larger proportional gaps, we report bias-corrected estimates of the correlation between LPM and logit or Poisson race coefficients, netting out the portion of the correlation due to sampling error. This exercise omits the five firms with overall contact rates below 3%, for which estimates of odds and ratios are unlikely to be reliable.
The logit and Poisson estimates establish that our finding of a substantial firm component to racial discrimination is not driven by functional form. As shown in Table V, columns (4) and (6), we find large and statistically significant cross-firm variation in logit and Poisson race coefficients, with estimated standard deviations comparable to the mean effect of race in each case. Moreover, the bottom row of Table V reveals that the logit and Poisson coefficients are very highly correlated with the LPM contact gap, exhibiting bias-corrected correlations of 0.89 and 0.81, respectively. This strong correlation implies that conclusions regarding which firms discriminate most are likely to be very similar when discrimination is measured in levels, odds ratios, or proportions. For the remainder of our analysis, we focus on levels, which have the advantage of providing a transparent measure of total contacts lost to discrimination.
. | LPM . | Logit . | Poisson . | |||
---|---|---|---|---|---|---|
. | Intercept . | Slope . | Intercept . | Slope . | Intercept . | Slope . |
. | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
Mean | 0.2547 | −0.0187 | −1.2715 | −0.1102 | −1.6046 | −0.0853 |
(0.0036) | (0.0018) | (0.0276) | (0.0152) | (0.0238) | (0.0131) | |
Std. dev. | 0.1607 | 0.0186 | 0.9755 | 0.1155 | 0.7047 | 0.0837 |
(0.0035) | (0.0035) | (0.0385) | (0.0360) | (0.0382) | (0.0341) | |
Corr. w/own slope | −0.4010 | 1.000 | 0.0519 | 1.000 | 0.0685 | 1.000 |
(0.1098) | – | (0.2074) | – | (0.3092) | – | |
Corr. w/LPM slope | −0.4010 | 1.000 | −0.4274 | 0.8944 | −0.5045 | 0.8075 |
(0.1098) | – | (0.1068) | (0.2095) | (0.1149) | (0.3074) | |
Number of firms | 103 | 103 | 103 |
. | LPM . | Logit . | Poisson . | |||
---|---|---|---|---|---|---|
. | Intercept . | Slope . | Intercept . | Slope . | Intercept . | Slope . |
. | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
Mean | 0.2547 | −0.0187 | −1.2715 | −0.1102 | −1.6046 | −0.0853 |
(0.0036) | (0.0018) | (0.0276) | (0.0152) | (0.0238) | (0.0131) | |
Std. dev. | 0.1607 | 0.0186 | 0.9755 | 0.1155 | 0.7047 | 0.0837 |
(0.0035) | (0.0035) | (0.0385) | (0.0360) | (0.0382) | (0.0341) | |
Corr. w/own slope | −0.4010 | 1.000 | 0.0519 | 1.000 | 0.0685 | 1.000 |
(0.1098) | – | (0.2074) | – | (0.3092) | – | |
Corr. w/LPM slope | −0.4010 | 1.000 | −0.4274 | 0.8944 | −0.5045 | 0.8075 |
(0.1098) | – | (0.1068) | (0.2095) | (0.1149) | (0.3074) | |
Number of firms | 103 | 103 | 103 |
Notes. This table reports estimated means, standard deviations, and correlations of firm-specific intercept and Black slope coefficients from models for employer contact. Columns (1) and (2) show results from linear probability models (LPMs; levels), columns (3) and (4) display results from logit models (log odds), and columns (5) and (6) show results from Poisson regression models (log proportions). Means are averages of firm-specific coefficients. Standard deviations are calculated by subtracting the average squared job-clustered standard error from the sample variance of parameter estimates, then taking the square root. Correlations are computed by subtracting the average job-clustered sampling covariance from the sample covariance of parameter estimates, then dividing by the product of estimated standard deviations. The analysis is restricted to the 103 firms with callback rates above 3%. Standard errors (computed by job-clustered weighted bootstrap) are in parentheses.
. | LPM . | Logit . | Poisson . | |||
---|---|---|---|---|---|---|
. | Intercept . | Slope . | Intercept . | Slope . | Intercept . | Slope . |
. | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
Mean | 0.2547 | −0.0187 | −1.2715 | −0.1102 | −1.6046 | −0.0853 |
(0.0036) | (0.0018) | (0.0276) | (0.0152) | (0.0238) | (0.0131) | |
Std. dev. | 0.1607 | 0.0186 | 0.9755 | 0.1155 | 0.7047 | 0.0837 |
(0.0035) | (0.0035) | (0.0385) | (0.0360) | (0.0382) | (0.0341) | |
Corr. w/own slope | −0.4010 | 1.000 | 0.0519 | 1.000 | 0.0685 | 1.000 |
(0.1098) | – | (0.2074) | – | (0.3092) | – | |
Corr. w/LPM slope | −0.4010 | 1.000 | −0.4274 | 0.8944 | −0.5045 | 0.8075 |
(0.1098) | – | (0.1068) | (0.2095) | (0.1149) | (0.3074) | |
Number of firms | 103 | 103 | 103 |
. | LPM . | Logit . | Poisson . | |||
---|---|---|---|---|---|---|
. | Intercept . | Slope . | Intercept . | Slope . | Intercept . | Slope . |
. | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
Mean | 0.2547 | −0.0187 | −1.2715 | −0.1102 | −1.6046 | −0.0853 |
(0.0036) | (0.0018) | (0.0276) | (0.0152) | (0.0238) | (0.0131) | |
Std. dev. | 0.1607 | 0.0186 | 0.9755 | 0.1155 | 0.7047 | 0.0837 |
(0.0035) | (0.0035) | (0.0385) | (0.0360) | (0.0382) | (0.0341) | |
Corr. w/own slope | −0.4010 | 1.000 | 0.0519 | 1.000 | 0.0685 | 1.000 |
(0.1098) | – | (0.2074) | – | (0.3092) | – | |
Corr. w/LPM slope | −0.4010 | 1.000 | −0.4274 | 0.8944 | −0.5045 | 0.8075 |
(0.1098) | – | (0.1068) | (0.2095) | (0.1149) | (0.3074) | |
Number of firms | 103 | 103 | 103 |
Notes. This table reports estimated means, standard deviations, and correlations of firm-specific intercept and Black slope coefficients from models for employer contact. Columns (1) and (2) show results from linear probability models (LPMs; levels), columns (3) and (4) display results from logit models (log odds), and columns (5) and (6) show results from Poisson regression models (log proportions). Means are averages of firm-specific coefficients. Standard deviations are calculated by subtracting the average squared job-clustered standard error from the sample variance of parameter estimates, then taking the square root. Correlations are computed by subtracting the average job-clustered sampling covariance from the sample covariance of parameter estimates, then dividing by the product of estimated standard deviations. The analysis is restricted to the 103 firms with callback rates above 3%. Standard errors (computed by job-clustered weighted bootstrap) are in parentheses.
VII. Alternative Groupings of Jobs
Taken together, the results of the previous section establish substantial variation across firms in their average contact gaps. In this section, we investigate how the magnitude of this variation compares to other groupings of jobs.
Table VI reports estimates of the dispersion of population contact gaps across several alternate groupings of jobs, some of which are also groupings of firms. To maximize comparability with the firm-level results reported in Table IV, we adjust for imbalance in the number of jobs per firm by weighting the job-level microdata in inverse proportion to the size of each job’s parent firm. As described in Online Appendix D, this weighting ensures that variance components from groupings that nest firms, such as industry or job portal intermediary, can be given an R2 interpretation. In cases where job groupings that do not nest firms have explanatory power, we investigate whether these groupings are significant conditional on firm fixed effects.
. | Race . | Gender . | Over 40 . |
---|---|---|---|
. | (1) . | (2) . | (3) . |
State | 0.0076 | – | – |
(0.0034) | |||
[.038] | [.668] | [.583] | |
Industry | 0.0141 | 0.0190 | 0.0048 |
(0.0021) | (0.0029) | (0.0053) | |
[.000] | [.000] | [.112] | |
Job title SOC-3 code | 0.0136 | 0.0111 | 0.0034 |
(0.0025) | (0.0043) | (0.0105) | |
[.000] | [.007] | [.527] | |
Hiring platform | 0.0059 | 0.0024 | 0.0024 |
intermediary | (0.0025) | (0.0088) | (0.0071) |
[.008] | [.049] | [.212] |
. | Race . | Gender . | Over 40 . |
---|---|---|---|
. | (1) . | (2) . | (3) . |
State | 0.0076 | – | – |
(0.0034) | |||
[.038] | [.668] | [.583] | |
Industry | 0.0141 | 0.0190 | 0.0048 |
(0.0021) | (0.0029) | (0.0053) | |
[.000] | [.000] | [.112] | |
Job title SOC-3 code | 0.0136 | 0.0111 | 0.0034 |
(0.0025) | (0.0043) | (0.0105) | |
[.000] | [.007] | [.527] | |
Hiring platform | 0.0059 | 0.0024 | 0.0024 |
intermediary | (0.0025) | (0.0088) | (0.0071) |
[.008] | [.049] | [.212] |
Notes. This table presents estimates of heterogeneity in average contact rate gaps across states, industries, job titles, and hiring platform intermediaries, along with the results of tests for no heterogeneity across each set of groups. Estimates are standard deviations of group-level contact rate gaps, computed using the same bias-corrected estimator employed in Table IV, column (3). Group variance components are computed weighting jobs in inverse proportion to the number of jobs sampled from each job’s parent firm, so that groupings that nest firms are weighted by the number of firms in each group. Standard errors, produced by job-clustered weighted bootstrap, are reported in parentheses. Dashes indicate negative variance estimates and hence undefined estimated standard deviations. p-values from χ2 tests of no heterogeneity in group-level contact rates are reported in square brackets. The first panel groups jobs by state, with 51 states (including D.C.) represented in the experiment. The second panel groups firms by the 24 two-digit SIC codes in the data. The third panel groups by the 47 three-digit SOC-3 codes for job titles. The final panel groups by the 11 hiring platform intermediaries observed, with firms that use proprietary platforms included as a single group.
. | Race . | Gender . | Over 40 . |
---|---|---|---|
. | (1) . | (2) . | (3) . |
State | 0.0076 | – | – |
(0.0034) | |||
[.038] | [.668] | [.583] | |
Industry | 0.0141 | 0.0190 | 0.0048 |
(0.0021) | (0.0029) | (0.0053) | |
[.000] | [.000] | [.112] | |
Job title SOC-3 code | 0.0136 | 0.0111 | 0.0034 |
(0.0025) | (0.0043) | (0.0105) | |
[.000] | [.007] | [.527] | |
Hiring platform | 0.0059 | 0.0024 | 0.0024 |
intermediary | (0.0025) | (0.0088) | (0.0071) |
[.008] | [.049] | [.212] |
. | Race . | Gender . | Over 40 . |
---|---|---|---|
. | (1) . | (2) . | (3) . |
State | 0.0076 | – | – |
(0.0034) | |||
[.038] | [.668] | [.583] | |
Industry | 0.0141 | 0.0190 | 0.0048 |
(0.0021) | (0.0029) | (0.0053) | |
[.000] | [.000] | [.112] | |
Job title SOC-3 code | 0.0136 | 0.0111 | 0.0034 |
(0.0025) | (0.0043) | (0.0105) | |
[.000] | [.007] | [.527] | |
Hiring platform | 0.0059 | 0.0024 | 0.0024 |
intermediary | (0.0025) | (0.0088) | (0.0071) |
[.008] | [.049] | [.212] |
Notes. This table presents estimates of heterogeneity in average contact rate gaps across states, industries, job titles, and hiring platform intermediaries, along with the results of tests for no heterogeneity across each set of groups. Estimates are standard deviations of group-level contact rate gaps, computed using the same bias-corrected estimator employed in Table IV, column (3). Group variance components are computed weighting jobs in inverse proportion to the number of jobs sampled from each job’s parent firm, so that groupings that nest firms are weighted by the number of firms in each group. Standard errors, produced by job-clustered weighted bootstrap, are reported in parentheses. Dashes indicate negative variance estimates and hence undefined estimated standard deviations. p-values from χ2 tests of no heterogeneity in group-level contact rates are reported in square brackets. The first panel groups jobs by state, with 51 states (including D.C.) represented in the experiment. The second panel groups firms by the 24 two-digit SIC codes in the data. The third panel groups by the 47 three-digit SOC-3 codes for job titles. The final panel groups by the 11 hiring platform intermediaries observed, with firms that use proprietary platforms included as a single group.
VII.A. State
The first panel of Table VI reports estimates of the dispersion of population contact gaps across U.S. states. In contrast to the firm-level results in Table IV, we are unable to reject the absence of a geographic component to gender or age discrimination at even the 10% level. While geographic variation in racial discrimination can be distinguished from zero at the 5% level, the estimated standard deviation of racial contact gaps across states is only 0.8 percentage points, less than half the magnitude of the between-firm standard deviation reported in Table IV.
Controlling for firm fixed effects reduces the modest state variation in contact gaps even further. Table VII uses the leave-out estimator of Kline, Saggio, and Sølvsten (2020) to decompose job-level contact gaps into components attributable to state and firm fixed effects. For race and gender, the job-weighted standard deviations of firm fixed effects are close to the estimates from Table IV, while the standard deviations of state fixed effects are negligible. The estimated variance of state gender gap fixed effects is actually negative, suggesting that this component is very small or zero. To formally test whether the state fixed effects can be distinguished from noise, we employ the high-dimensional heteroskedasticity-robust testing procedure of Anatolyev and Sølvsten (2020), which yields joint p-values of .19 and .48 for the state race and gender gap fixed effects, respectively. By contrast, the null hypothesis that the firm fixed effects jointly equal zero is decisively rejected for race and gender (p < .001). Together, these results establish that the company-level variation documented in Table IV is not explained by differences in the spatial distribution of firms’ job postings.
. | Race . | Gender . | Over 40 . | |||
---|---|---|---|---|---|---|
. | State . | Job title . | State . | Job title . | State . | Job title . |
SD firm effects | 0.0176 | 0.0150 | 0.0253 | 0.0255 | 0.0096 | 0.0088 |
SD job title / state effects | 0.0003 | – | – | 0.0080 | 0.0004 | – |
Covariance | 0.0000 | 0.0001 | 0.0000 | 0.0002 | 0.0000 | 0.0002 |
N jobs | 11,026 | 11,026 | 10,720 | 10,720 | 10,652 | 10,652 |
N firms | 108 | 108 | 108 | 108 | 108 | 108 |
N job titles / states | 51 | 47 | 51 | 47 | 51 | 47 |
N job titles / states >1 firm | 51 | 43 | 51 | 43 | 51 | 43 |
Mean gap | 0.0196 | 0.0196 | 0.0023 | 0.0023 | 0.0037 | 0.0037 |
p-value firm effects | .000 | .0008 | .000 | .000 | .071 | .040 |
p-value job title / state effects | .186 | .327 | .482 | .237 | .86 | .459 |
. | Race . | Gender . | Over 40 . | |||
---|---|---|---|---|---|---|
. | State . | Job title . | State . | Job title . | State . | Job title . |
SD firm effects | 0.0176 | 0.0150 | 0.0253 | 0.0255 | 0.0096 | 0.0088 |
SD job title / state effects | 0.0003 | – | – | 0.0080 | 0.0004 | – |
Covariance | 0.0000 | 0.0001 | 0.0000 | 0.0002 | 0.0000 | 0.0002 |
N jobs | 11,026 | 11,026 | 10,720 | 10,720 | 10,652 | 10,652 |
N firms | 108 | 108 | 108 | 108 | 108 | 108 |
N job titles / states | 51 | 47 | 51 | 47 | 51 | 47 |
N job titles / states >1 firm | 51 | 43 | 51 | 43 | 51 | 43 |
Mean gap | 0.0196 | 0.0196 | 0.0023 | 0.0023 | 0.0037 | 0.0037 |
p-value firm effects | .000 | .0008 | .000 | .000 | .071 | .040 |
p-value job title / state effects | .186 | .327 | .482 | .237 | .86 | .459 |
Notes. This table presents bias-corrected variance component estimates from two-way fixed effect models estimated using the leave-out procedure of Kline, Saggio, and Sølvsten (2020). Columns labeled “Job title” include fixed effects for the first three digits of each job’s O*Net SOC code. Columns labeled “State” include fixed effects for the job’s state. All variance and covariance estimates are job-weighted. Only jobs in the leave-job-out connected set are included for each estimate. Dashes indicate negative variance estimates and hence undefined estimated standard deviations. “N job titles / states >1 firm” is the number of states or job titles in the connected set observed at two or more firms. The final two rows report p-values from tests of the joint hypothesis that all firm or job title / state fixed effects equal zero, computed using the heteroskedasticity-robust procedure of Anatolyev and Sølvsten (2020).
. | Race . | Gender . | Over 40 . | |||
---|---|---|---|---|---|---|
. | State . | Job title . | State . | Job title . | State . | Job title . |
SD firm effects | 0.0176 | 0.0150 | 0.0253 | 0.0255 | 0.0096 | 0.0088 |
SD job title / state effects | 0.0003 | – | – | 0.0080 | 0.0004 | – |
Covariance | 0.0000 | 0.0001 | 0.0000 | 0.0002 | 0.0000 | 0.0002 |
N jobs | 11,026 | 11,026 | 10,720 | 10,720 | 10,652 | 10,652 |
N firms | 108 | 108 | 108 | 108 | 108 | 108 |
N job titles / states | 51 | 47 | 51 | 47 | 51 | 47 |
N job titles / states >1 firm | 51 | 43 | 51 | 43 | 51 | 43 |
Mean gap | 0.0196 | 0.0196 | 0.0023 | 0.0023 | 0.0037 | 0.0037 |
p-value firm effects | .000 | .0008 | .000 | .000 | .071 | .040 |
p-value job title / state effects | .186 | .327 | .482 | .237 | .86 | .459 |
. | Race . | Gender . | Over 40 . | |||
---|---|---|---|---|---|---|
. | State . | Job title . | State . | Job title . | State . | Job title . |
SD firm effects | 0.0176 | 0.0150 | 0.0253 | 0.0255 | 0.0096 | 0.0088 |
SD job title / state effects | 0.0003 | – | – | 0.0080 | 0.0004 | – |
Covariance | 0.0000 | 0.0001 | 0.0000 | 0.0002 | 0.0000 | 0.0002 |
N jobs | 11,026 | 11,026 | 10,720 | 10,720 | 10,652 | 10,652 |
N firms | 108 | 108 | 108 | 108 | 108 | 108 |
N job titles / states | 51 | 47 | 51 | 47 | 51 | 47 |
N job titles / states >1 firm | 51 | 43 | 51 | 43 | 51 | 43 |
Mean gap | 0.0196 | 0.0196 | 0.0023 | 0.0023 | 0.0037 | 0.0037 |
p-value firm effects | .000 | .0008 | .000 | .000 | .071 | .040 |
p-value job title / state effects | .186 | .327 | .482 | .237 | .86 | .459 |
Notes. This table presents bias-corrected variance component estimates from two-way fixed effect models estimated using the leave-out procedure of Kline, Saggio, and Sølvsten (2020). Columns labeled “Job title” include fixed effects for the first three digits of each job’s O*Net SOC code. Columns labeled “State” include fixed effects for the job’s state. All variance and covariance estimates are job-weighted. Only jobs in the leave-job-out connected set are included for each estimate. Dashes indicate negative variance estimates and hence undefined estimated standard deviations. “N job titles / states >1 firm” is the number of states or job titles in the connected set observed at two or more firms. The final two rows report p-values from tests of the joint hypothesis that all firm or job title / state fixed effects equal zero, computed using the heteroskedasticity-robust procedure of Anatolyev and Sølvsten (2020).
VII.B. Industry
In contrast to the results for state, the second row of Table VI reveals substantial dispersion in discrimination across industries. Each firm in the experiment was assigned a two-digit SIC code, grouping together industries that only contained a single firm (see Table X for a list). The firm-weighted standard deviation of racial contact gaps across two-digit industries is 1.4 percentage points, and the corresponding standard deviation of gender contact gaps is 1.9 percentage points. Age contact gaps are small and statistically insignificant. Comparing the industry-level and firm-level standard deviations, we conclude that industry effects explain roughly |$(\frac{0.141}{0.185})^{2}\times 100 = 58\%$| of the variation in racial contact gaps and |$(\frac{0.190}{0.267})^{2}\times 100 = 51\%$| of the variation in gender contact gaps across firms.
VII.C. Job Titles
The finding that industry is an important predictor of multiple dimensions of discrimination leads naturally to the question of whether the sorts of jobs posted by firms are an important predictor of contact gaps. To examine this question, job titles for each job sampled in the experiment were standardized and merged to O*Net job titles using methods described in Online Appendix C. To maximize statistical precision, we map the 131 standardized job titles used in our O*Net merge to 41 SOC-3 codes.10
The third row of Table VI reports that the standard deviation of racial contact gaps across SOC-3 codes is 1.4 percentage points and strongly statistically significant. Gender contact gaps also vary significantly across SOC-3 codes, though that variability appears to be somewhat more muted than was the case with industry. Job title heterogeneity in age contact gaps is small and statistically insignificant.
To parse the separate influence of job titles and firms, Table VII reports a decomposition of job-level contact gaps into job title and firm fixed effects. Applying the bias correction of Kline, Saggio, and Sølvsten (2020), the estimated standard deviation of firm effects across jobs is 0.015, while the estimated variance of SOC-3 job title effects is negative. Using the procedure of Anatolyev and Sølvsten (2020) to test that the job title effects are jointly zero yields a p-value of .33, suggesting that job title effects are not a major source of variation in firm contact gaps in our experiment.11 The firm effects, by contrast, are strongly significant (p < .001).
Job titles also explain a limited share of job-level variation in contact rate gaps between male and female names: the estimated standard deviation of firm effects on gender contact gaps is 0.026, and corresponding SOC-3 job title effects exhibit a standard deviation of only 0.008. The estimated covariance between firm effects and average job title effects at the firm is small and negative. As was the case with race, the null hypothesis that firm effects on gender contact gaps are jointly zero is easily rejected (p < .001) while job title effects are jointly insignificant (p = .24).
VII.D. Intermediaries
The hiring websites of many large companies are hosted by third-party providers of online application systems. These intermediaries often tout their ability to promote diverse and inclusive workplaces via automated screening routines (Raghavan et al. 2020). Eighty-three of the 108 firms in our experiment used an intermediary of some sort. We create 11 intermediary categories, one of which corresponds to the 25 firms hosting their own proprietary job portals and another of which groups together intermediaries employed by a single firm.
The bottom panel of Table VI reports that the standard deviation of racial contact gaps across these intermediary codes is only 0.006. However, this component is precisely estimated and easily distinguishable from zero (p < .01). Gender gaps may also vary somewhat across intermediaries, though this component is estimated less precisely (p = .05). As with other groupings, we lack the precision necessary to detect variation in age discrimination across intermediaries. Though intermediaries seem to predict racial contact gaps, they explain only |$(\frac{0.006}{0.185})^{2}\times 100 = 0.1\%$| of the variation across firms. This finding suggests that intermediaries are not an important mediator of employer conduct toward racially distinctive names. In unreported results, we also found no significant difference in contact gaps between firms that required a battery of cognitive and personality tests and those that did not. The platforms themselves therefore do not appear to be an important driver of the between-firm differences we document.
VIII. Job, Establishment, and Firm Predictors
We summarize relationships between discrimination and observed employer characteristics. Although such relationships may not capture the causal effects of employer attributes on discrimination, they nonetheless offer a low-dimensional summary of the sorts of jobs, establishments, and firms where discrimination tends to be more or less severe. Figures III, IV, and V report coefficients from regressions of contact gaps on job, establishment, and firm attributes, with results for white/Black gaps in Panel A and estimates for male/female gaps in Panel B of each figure. Details on the measurement of all covariates appear in Online Appendix C.
VIII.A. Job Characteristics
The analysis of Section VII.C established that contact gaps vary substantially across job titles, but this variation is insignificant conditional on firm effects. Although this finding suggests that variation in discrimination across job titles is mostly attributable to the identity of the parent firm, studying lower-dimensional summaries of job titles may allow detection of more subtle relationships. A large literature (e.g., Deming 2017; Hurst, Rubinstein, and Shimizu 2021) finds that the task content of work provides a useful summary of changes in the occupational structure of wages and employment. Adopting this approach, Figure III projects job-level contact gaps onto measures of the task content of the job title, constructed based on task requirements in the O*Net following Deming (2017).
The contact penalty for Black names is more pronounced among jobs requiring customer interaction (Panel A). This correlation may reflect employer concerns regarding customer discrimination, the quantitative importance of which has proven difficult to establish decisively (Holzer and Ihlanfeldt 1998; Leonard, Levine, and Giuliano 2010; Hurst, Rubinstein, and Shimizu 2021). Jobs requiring manual skills also exhibit larger racial contact gaps. Panel B shows that jobs requiring social or customer interaction are more likely to favor women, whereas jobs requiring manual skills tend to favor men. This pattern may signal discrimination on the basis of gendered stereotypes regarding characteristically female or male tasks (Goldin 2014; Dahl, Kotsadam, and Rooth 2021). Consistent with our earlier analysis of job title effects, including firm fixed effects renders the relationships between racial discrimination and task content jointly insignificant (p = .20). This finding casts doubt on simplistic versions of the customer discrimination hypothesis where all employers discriminate differentially in customer-facing jobs. For gender, the task content variables are marginally significant conditional on firm fixed effects (p = .01), suggesting that at a typical large firm, men face discrimination in customer-facing jobs while women face discrimination at jobs intensive in manual skills.
Online Appendix Figure A7 decomposes the relationship between contact gaps and job task content into within- and between-industry components. Within-industry relationships between racial contact gaps and task content are weak and statistically insignificant, indicating that the task content correlations documented in Figure III are driven primarily by between-industry variation. Contact gaps are especially strongly related to industry average customer interaction scores (p = .001). In contrast, the relationship between gender contact gaps and task content is strong within and between industries. These results show that discrimination against Black and male names is more intense in customer-facing sectors, regardless of whether the job itself is customer facing. This finding may indicate that firms in different sectors tend to adopt different corporate cultures and human resources practices affecting all their jobs.
VIII.B. Establishment Characteristics
Moving to establishment-level predictors, we find that racial discrimination is unrelated to county- and block-level racial mix. Figure IV, Panel A shows insignificant relationships between job-level racial contact gaps and county and block racial composition, as measured in the workplace area characteristics (WAC) file derived from the Longitudinal Employer-Household Dynamics (LEHD) database.12 It is worth noting, however, that many jobs in our sample did not specify an exact establishment address; consequently, block-level data are unavailable for roughly half of establishments. Our finding of no relationship between discrimination and local racial mix contrasts with the results of Agan and Starr (2020), who show that neighborhood racial composition predicts contact gaps in a sample of jobs in New York and New Jersey. This difference may be explained by our focus on large employers or the broader set of geographies included in our sample.
Racial discrimination appears to be heightened in geographic locations with more prejudiced populations, as proxied by measures of implicit bias and racially charged web searches. Specifically, counties with average Implicit Association Test (IAT) scores indicating more bias against Black people or women (measured from Harvard’s Project Implicit) tend to have larger racial contact gaps (Figure IV, Panel A, top section). Similarly, contact gaps are elevated in designated media areas (DMAs) where households submit more frequent web searches for racial epithets, a measure of prejudice developed by Stephens-Davidowitz (2014). Estimates by region show that racial contact gaps are also lower in Western states. Despite achieving statistical significance, these geographic correlations are all fairly modest in magnitude, which aligns with our earlier finding in Table VI of a small but statistically significant between-state variance component to racial discrimination.
We see little relationship between racial contact gaps and other establishment characteristics, including log establishment employment and the fraction of managers listed in the Reference USA database that are nonwhite or female. Moreover, the bottom of Figure IV, Panel A shows that including firm fixed effects renders the establishment characteristics jointly insignificant (p = .34). Similar to our analysis of job titles, this finding suggests that the bivariate correlations between establishment characteristics and racial contact gaps are explained by the identity of the parent firm.
Gender contact gaps are less strongly related to workplace covariates than are racial gaps. Consistent with our earlier finding in Table VI of a negligible state component to gender gaps, Figure IV, Panel B shows insignificant relationships between gender contact gaps and local demographics, measures of prejudice, and establishment characteristics. We do see significant negative relationships between the male/female contact gap and the block-level share of female workers as well as the share of managers that are women, suggesting that the gender composition of the establishment predicts gender discrimination. These may be chance findings given the many characteristics examined, however, as the establishment characteristics are jointly insignificant with or without firm fixed effects (p ≥ .34).
VIII.C. Firm Characteristics
Firm characteristics are stronger predictors of discrimination than job or establishment characteristics. Consistent with Becker’s (1957) classic model of discrimination and the empirical findings of Pager (2016), we find that more-profitable firms are less biased against Black applicants. Specifically, the top section of Figure V, Panel A reveals a significant negative correlation between firm-level white/Black contact gaps and firm profits per employee. Racial discrimination is not significantly correlated with other measures of firm performance, including sales and overall firm ratings submitted by employees on the Glassdoor (GD) platform.
Racial contact gaps are smaller at companies that previously faced more regulatory scrutiny for employment practices. As shown in the middle section of Figure V, we see less discrimination against Black applicants at firms with more Department of Labor citations for wage and hour violations and for those subject to more employment discrimination cases. Seventy-two of the 108 firms in our experiment are federal contractors.13 Federal contractors exhibit substantially smaller contact gaps, perhaps reflecting the stronger regulatory standards to which they are held by the U.S. government.
Measures of firm diversity suggest less racial discrimination at firms with more demographic diversity among individuals with decision-making authority, but no factor is individually significant. These relationships are even weaker in a multivariate regression controlling for all of the characteristics in Figure V, indicating that some of the apparent correlation between diversity and discrimination is explained by other firm characteristics.
The strongest negative predictor of racial discrimination in our experiment is “callback centralization,” measured as the number of distinct phone numbers used by the firm to contact applicants divided by the total number of jobs with at least one callback times −1. As documented in Online Appendix Table C2 centralization is elevated among federal contractors (p = .038) but we cannot reject that it is unrelated to our other firm-level predictors in a multivariate regression. Because this predictor is calculated using the outcome data, we instrument centralization among one-half of each firm’s jobs with centralization computed in the other half, a split sample IV strategy (Angrist and Krueger 1995) intended to avoid any mechanical relationship between job-level callback propensities and gaps. The negative coefficient estimate suggests that firms at which hiring responsibility is more centralized are less prone to bias, perhaps because rules replace the discretionary judgements of individual workers at firms with more sophisticated human resources practices. Overall, the firm-level variables in Figure V are significant predictors of racial discrimination (joint p < .001).
As with establishment characteristics, firm-level characteristics are less correlated with gender contact gaps than with racial gaps, though we do see some evidence of a relationship between firm diversity and gender discrimination. In particular, contact gaps favor women at firms with more female managers. Consistent with the results of Bertrand et al. (2019), we find an insignificant relationship between the gender mix of a company’s corporate board and gender discrimination, though the point estimate suggests a weak negative correlation between board female share and the male/female gap. Again, the most predictive covariate is contact centralization, which is significantly lower at firms that favor male applicants. Though most of the firm predictors of the gender contact gap are not individually significant, the joint null hypothesis that all coefficients are zero is decisively rejected (p < .001).
IX. The Distribution of Discrimination
Figure VI, Panel A displays the deconvolved density of contact gaps between white and Black applicants, while Panel B reports the density of gaps between male and female applicants. The penalization parameter of the first-step maximum likelihood procedure is calibrated to yield a variance matching the bias-corrected estimate in Table IV.14 In Panel A we restrict the support of the density of racial contact gaps to rule out discrimination against whites—a shape constraint we showed earlier cannot be rejected by our data.15 For comparison with the estimated densities, the background of Figure VI also reports histograms of firm contact gap estimates |$\hat{\Delta }_{f}$|. As a result of the noise in these estimates, the contact gap distributions implied by the histograms are substantially more dispersed than the deconvolved distributions. Pointwise confidence intervals on the estimated densities are reported in Online Appendix Figure A10.
The deconvolved density of racial contact gaps reveals a skewed distribution with a thick tail of extreme discriminators that favor white applicants by more than 5 percentage points. This density can be approximated closely by a log-normal distribution with the same mean and variance. Panel B shows that the estimated distribution of population gender gaps is nearly symmetric around zero and heavily leptokurtic. This distribution turns out to be even more strongly peaked about its mode than a Laplace distribution with identical mean and variance, indicating that many companies exhibit very little gender bias, while a small number of severe discriminators are biased in each direction.
The distributional estimates for both race and gender imply that a large share of discrimination is driven by a small group of highly discriminatory firms. Figure VII summarizes the concentration of discrimination by plotting the Lorenz curve implied by the deconvolved density |$\hat{g}_{\Delta }$|. The Lorenz curve for race measures the share of the total contact gap between white and Black applications in the experiment attributable to firms below each percentile of Δf. Since gender discrimination operates in both directions, the gender curve summarizes concentration of the absolute contact gap |Δf|.
The discrimination Lorenz curves are strongly bowed away from the 45-degree line, implying that discrimination is highly concentrated in particular firms. For example, the race Lorenz curve shows that firms in the top quintile of discrimination are responsible for 46% of lost contacts to Black applicants in our study, whereas firms in the bottom quintile are responsible for less than 5% of lost contacts. The gender contact gaps are even more concentrated, with firms in the top quintile responsible for 56% of aggregate absolute gender differences in the experiment.
The area between each Lorenz curve and the 45-degree line gives the Gini coefficient, which ranges from 0 (perfect equality) to 1 (perfect concentration). For race, the Gini coefficient is roughly 0.40, which is nearly as large as estimates of the Gini for modern U.S. income inequality. For gender, the Gini coefficient is 0.54, substantially higher than Gini income estimates in the U.S. and roughly comparable to Brazil’s level of income inequality.16
X. Firm-Specific Estimates
The finding that discrimination is highly concentrated raises the question of whether it is possible to deduce the contact gaps of particular firms. Firm-specific estimates could, in principle, be shared with company executives, providing them with an assessment of their organization’s biases, or with regulators to help them target audits or other enforcement efforts more effectively. Although the sample contact gaps |$\hat{\Delta }_f$| provide unbiased estimates of the contact gap at each firm, those estimates are often quite noisy. Our analysis of firm-specific discrimination leverages EB methods that “borrow strength” from the full set of firms in the experiment to improve estimates of contact gaps at each specific firm.
X.A. Posterior Mean Estimates
The EB posterior means are highly variable across companies, implying that the experiment contains substantial information about the behavior of individual firms. Online Appendix Figure A11 compares the distributions of observed contact gaps |$\hat{\Delta }_{f}$|, EB posterior means |$\bar{\Delta }_{f}$| and linear predictions |$\tilde{\Delta }_{f}$|, and the estimated prior distribution |$\hat{G}_{\Delta }$|. The distribution of posteriors is more compressed than the observed contact gaps |$\hat{\Delta }_{f}$| or the deconvolved prior distribution |$\hat{G}_\Delta$|, reflecting shrinkage due to the noise in the observed gaps. Unlike the observed contact gaps, the posterior means are strictly positive, inheriting the nonnegativity constraint placed on the prior distribution. In contrast, roughly 12% of the linear shrinkage estimates are negative, a consequence of the symmetric implicit normal prior. The upper tail of the distribution of linear shrinkage estimates is more compressed than is the distribution of empirical Bayes posterior mean estimates, which reflects that the roughly log-normal shape of our estimated prior |$\hat{G}_{\Delta }$| exhibits a fat tail of heavy discriminators. The EB posterior accounts for this fat tail by applying less shrinkage to extreme positive contact gaps. Overall, 46 firms have posterior mean racial contact gaps greater than the average gap of 2 percentage points in the experiment.
Online Appendix Figure A13 assesses the out-of-sample predictive power of these posterior means by shrinking contact gaps constructed using only the first three waves of the experiment and comparing these shrunk values to contact gaps in the final two waves of the experiment. For race, we find a correlation between our EB predictions and the latent contact gaps in the last two waves of 0.7, indicating very significant out of sample forecasting ability even when working with predictions that discard 40% of our microdata.
The posterior mean racial contact gaps vary systematically across industries. Figure VIII reports mean values of |$\bar{\Delta }_f$| and |$\tilde{\Delta }_f$| by two-digit industry. Racial discrimination is estimated to be particularly severe among firms in customer-facing sectors. The posterior mean contact gap averages 4.0 percentage points among the eight firms in the auto dealers and services sector (SIC 55), 2.7 percentage points for the five firms in the eating and drinking sector (SIC 58), and 2.5 percentage points for the four apparel firms (SIC 56) in the experiment. By contrast, the posterior mean racial contact gap averages only 0.9 percentage points among the two engineering services firms (SIC 87), and 1.0 percentage point among the five banking and credit firms (SICs 60–61) and two securities brokerages (SIC 62), and 1.1 percentage points among the four freight and transport firms (SICs 42–47) in the experiment.
Posterior estimates of gender discrimination also vary across industries. Discrimination against men appears concentrated in the apparel sector, where distinctively male names face a severe contact disadvantage of 6.1 percentage points. Discrimination against women appears most pronounced among the two firms in the wholesale durable sector (SIC 50), where distinctively female names face an average contact disadvantage of 3.4 percentage points. In line with the strong peak in the prior distribution around zero reported in Figure VI, Panel B, however, many sectors are estimated to exhibit trivially small gender contact gaps. Indeed, the three firms in the business services sector (SIC 73) exhibit an average posterior mean gender contact gap of zero.
Figure IX plots coefficients from the projection of industry characteristics (normalized to have standard deviation one) on the firm posterior mean contact gaps. Firms estimated to favor white applicants reside in industries with somewhat lower Black employment shares and female employees concentrated in nonmanagement positions, but the relationships are only marginally significant. By contrast, firms estimated to favor male applicants lie in sectors with sharply lower female employment shares, higher unexplained gender wage gaps, and Black employees concentrated in nonmanagement positions. These gender bias correlations align closely with the matched pair audit evidence reported by Neumark, Bank, and Van Nort (1996) who find that women are discriminated against at upscale restaurants, which tend to pay high wages and to be male dominated, but are weakly preferred at lower-price restaurants that tend to pay lower wages and to be female dominated.
One potential explanation for the divergent correlation patterns uncovered for sex and race in Figure IX is that job seekers know that certain sectors (e.g., women’s apparel) discriminate on the basis of gender, perhaps due to a mix of coworker and customer discrimination. This common knowledge allows workers to sort away from biased jobs, mitigating to some extent the burden of discrimination as in Becker’s (1957) classic model. Industry patterns of racial discrimination, by contrast, may be more difficult to discern, particularly if these patterns are driven by variation in opaque corporate recruiting protocols. When discriminatory patterns are not common knowledge, less pronounced sorting patterns will arise and a larger burden may fall on job seekers when search is costly (Black 1995; Bowlus and Eckstein 2002).
X.B. Guarding against False Discoveries
Although the posterior mean estimates of the previous section provide a best guess of the contact gap at each firm, it is possible that some firms with large posterior mean contact gaps have true population gaps of exactly zero. The question of whether a firm’s contact gap is exactly zero has direct legal relevance because the Civil Rights Act prohibits any discrimination based on protected characteristics. To assess the conclusions that can be drawn about which employers are discriminating at all, we consider a related class of EB methods that aims to limit false discoveries.
For each firm in our experiment, we can assign a p-value |$\hat{p}_f$| to the null hypothesis that the firm’s population contact gap is zero by comparing the firm’s z-score to the appropriate tail of a t-distribution with degrees of freedom equal to the number of jobs at the firm minus one. Histograms of the resulting p-values for the null that firm-specific contact gaps equal zero appear in Figure X. Panel A reports one-tailed tests of the null of no discrimination against Black applicants, while Panel B reports two-tailed tests of the null that racial contact gaps are exactly zero. Panel C reports two-tailed tests that gender contact gaps are zero.
If all firms had racial and gender contact gaps equal to zero, we would expect all three histograms to be uniformly distributed. In practice, we see substantial bunching of the |$\hat{p}_f$| at small values. For example, 31 firms (28.7%) have one-tailed p-values for the null of no racial discrimination below .05, and 14 firms (13.0%) have two-tailed |$\hat{p}_f$| below .05 for the null of no gender discrimination. Applying Tukey’s “higher criticism” criterion (Donoho and Jin 2004), even the modestly elevated share of small p-values for gender discrimination indicates a significant departure from uniformity at the 5% level, as |$\sqrt{108}\times \left(\tfrac{0.13-0.05}{\sqrt{0.05\times 0.95}}\right)\approx 3.81>1.96$|. Clearly some firms are discriminating, but which ones?
X.C. Which Firms Discriminate?
Figure X reports choices of λ and the estimated tail density |$\hat{\pi }_0(\lambda )$| for both one- and two-tailed tests of racial discrimination. As expected, the |$\hat{\pi }_0(\lambda )$| correspond roughly to the right asymptote of the plotted discrete density estimates. Superimposed on Figure X are estimates of the local false discovery rates (LFDRs; Efron et al. 2001) implied by setting |$\pi _0=\hat{\pi }_0(\lambda )$|. LFDRs give posterior estimates of the probability that a null hypothesis is true given its p-value. The mean LFDR below a threshold p-value |$\hat{p}_f$| gives an approximation to |$\hat{q}_f$|.19
For one-tailed tests we estimate that π0 ≤ 0.39; that is, that at least 61% of firms discriminate against Black applicants. Unsurprisingly, allowing for bidirectional racial discrimination dissipates power, leading to an upper bound on π0 of 0.54. Table VIII provides a sensitivity analysis involving a few other estimates of π0. Computing the p-values via randomization inference tends to yield more very small p-values, resulting in a correspondingly smaller estimate of π0.20 Estimating π0 with a cubic spline, as in Storey and Tibshirani (2003), yields slightly larger estimates of π0. The final panel of the table reports the upper limit of a 95% confidence interval on π0. For one-sided tests, as few as 40% of firms may be discriminating against Black applicants, whereas under two-tailed tests the share discriminating may be as low as 30%.
. | Race . | Gender . | Age . | |
---|---|---|---|---|
. | One-tailed . | Two-tailed . | Two-tailed . | Two-tailed . |
Bootstrapped λ | ||||
|$\hat{\pi }_0$| | 0.391 | 0.541 | 0.833 | 0.833 |
# q-values ≤ 0.05 | 23 | 8 | 1 | 0 |
# q-values ≤ 0.1 | 45 | 21 | 5 | 1 |
λ | 0.550 | 0.350 | 0.300 | 0.400 |
Randomization inference p-values | ||||
|$\hat{\pi }_0$| | 0.370 | 0.455 | 0.808 | 0.802 |
# q-values ≤ 0.05 | 35 | 24 | 8 | 1 |
# q-values ≤ 0.1 | 55 | 36 | 10 | 1 |
λ | 0.550 | 0.450 | 0.450 | 0.400 |
Smoothed | ||||
|$\hat{\pi }_0$| | 0.451 | 0.882 | 0.854 | 0.832 |
# q-values ≤ 0.05 | 21 | 4 | 1 | 0 |
# q-values ≤ 0.1 | 40 | 18 | 5 | 1 |
95% upper CI for π0 | ||||
|$\hat{\pi }_0$| | 0.602 | 0.696 | 1.000 | 1.000 |
# q-values ≤ 0.05 | 20 | 4 | 1 | 0 |
# q-values ≤ 0.1 | 31 | 18 | 5 | 1 |
. | Race . | Gender . | Age . | |
---|---|---|---|---|
. | One-tailed . | Two-tailed . | Two-tailed . | Two-tailed . |
Bootstrapped λ | ||||
|$\hat{\pi }_0$| | 0.391 | 0.541 | 0.833 | 0.833 |
# q-values ≤ 0.05 | 23 | 8 | 1 | 0 |
# q-values ≤ 0.1 | 45 | 21 | 5 | 1 |
λ | 0.550 | 0.350 | 0.300 | 0.400 |
Randomization inference p-values | ||||
|$\hat{\pi }_0$| | 0.370 | 0.455 | 0.808 | 0.802 |
# q-values ≤ 0.05 | 35 | 24 | 8 | 1 |
# q-values ≤ 0.1 | 55 | 36 | 10 | 1 |
λ | 0.550 | 0.450 | 0.450 | 0.400 |
Smoothed | ||||
|$\hat{\pi }_0$| | 0.451 | 0.882 | 0.854 | 0.832 |
# q-values ≤ 0.05 | 21 | 4 | 1 | 0 |
# q-values ≤ 0.1 | 40 | 18 | 5 | 1 |
95% upper CI for π0 | ||||
|$\hat{\pi }_0$| | 0.602 | 0.696 | 1.000 | 1.000 |
# q-values ≤ 0.05 | 20 | 4 | 1 | 0 |
# q-values ≤ 0.1 | 31 | 18 | 5 | 1 |
Notes. This table reports the results of estimating firm q-values for discrimination using several strategies. Each panel reports an estimated upper bound on the share of nondiscriminating firms (π0) along with numbers of firms with q-values less than 0.1 and 0.05. Estimates are based on p-values taken from a t-test of mean job-level contact rate gaps for each firm, except in the second panel, which uses p-values constructed based on 10,000 simulations permuting race, gender, and age labels. In accordance with how characteristics were stratified in the experiment, race labels are permuted within pairs, while gender and age are permuted unconditionally. The first two panels estimate π0 by choosing the tuning parameter λ based on the bootstrap methodology from Storey et al. (2015). The third panel uses the smoothed estimator from Storey (2003). The final panel reports the upper limit of the 95% upper confidence interval for π0 constructed using the method of Armstrong (2015).
. | Race . | Gender . | Age . | |
---|---|---|---|---|
. | One-tailed . | Two-tailed . | Two-tailed . | Two-tailed . |
Bootstrapped λ | ||||
|$\hat{\pi }_0$| | 0.391 | 0.541 | 0.833 | 0.833 |
# q-values ≤ 0.05 | 23 | 8 | 1 | 0 |
# q-values ≤ 0.1 | 45 | 21 | 5 | 1 |
λ | 0.550 | 0.350 | 0.300 | 0.400 |
Randomization inference p-values | ||||
|$\hat{\pi }_0$| | 0.370 | 0.455 | 0.808 | 0.802 |
# q-values ≤ 0.05 | 35 | 24 | 8 | 1 |
# q-values ≤ 0.1 | 55 | 36 | 10 | 1 |
λ | 0.550 | 0.450 | 0.450 | 0.400 |
Smoothed | ||||
|$\hat{\pi }_0$| | 0.451 | 0.882 | 0.854 | 0.832 |
# q-values ≤ 0.05 | 21 | 4 | 1 | 0 |
# q-values ≤ 0.1 | 40 | 18 | 5 | 1 |
95% upper CI for π0 | ||||
|$\hat{\pi }_0$| | 0.602 | 0.696 | 1.000 | 1.000 |
# q-values ≤ 0.05 | 20 | 4 | 1 | 0 |
# q-values ≤ 0.1 | 31 | 18 | 5 | 1 |
. | Race . | Gender . | Age . | |
---|---|---|---|---|
. | One-tailed . | Two-tailed . | Two-tailed . | Two-tailed . |
Bootstrapped λ | ||||
|$\hat{\pi }_0$| | 0.391 | 0.541 | 0.833 | 0.833 |
# q-values ≤ 0.05 | 23 | 8 | 1 | 0 |
# q-values ≤ 0.1 | 45 | 21 | 5 | 1 |
λ | 0.550 | 0.350 | 0.300 | 0.400 |
Randomization inference p-values | ||||
|$\hat{\pi }_0$| | 0.370 | 0.455 | 0.808 | 0.802 |
# q-values ≤ 0.05 | 35 | 24 | 8 | 1 |
# q-values ≤ 0.1 | 55 | 36 | 10 | 1 |
λ | 0.550 | 0.450 | 0.450 | 0.400 |
Smoothed | ||||
|$\hat{\pi }_0$| | 0.451 | 0.882 | 0.854 | 0.832 |
# q-values ≤ 0.05 | 21 | 4 | 1 | 0 |
# q-values ≤ 0.1 | 40 | 18 | 5 | 1 |
95% upper CI for π0 | ||||
|$\hat{\pi }_0$| | 0.602 | 0.696 | 1.000 | 1.000 |
# q-values ≤ 0.05 | 20 | 4 | 1 | 0 |
# q-values ≤ 0.1 | 31 | 18 | 5 | 1 |
Notes. This table reports the results of estimating firm q-values for discrimination using several strategies. Each panel reports an estimated upper bound on the share of nondiscriminating firms (π0) along with numbers of firms with q-values less than 0.1 and 0.05. Estimates are based on p-values taken from a t-test of mean job-level contact rate gaps for each firm, except in the second panel, which uses p-values constructed based on 10,000 simulations permuting race, gender, and age labels. In accordance with how characteristics were stratified in the experiment, race labels are permuted within pairs, while gender and age are permuted unconditionally. The first two panels estimate π0 by choosing the tuning parameter λ based on the bootstrap methodology from Storey et al. (2015). The third panel uses the smoothed estimator from Storey (2003). The final panel reports the upper limit of the 95% upper confidence interval for π0 constructed using the method of Armstrong (2015).
In our benchmark specification 23 firms have q-values less than 0.05 (Table VIII, top panel, first column). Table IX lists industry, federal contractor status, contact gap estimates, posterior means and quantiles, and p- and q-values for this set of companies (with firm names suppressed). The largest q-value in this set of firms is 0.047, so we should expect at most 23 × 0.047 = 1.08 false discoveries if these 23 firms are classified as discriminating against Black applicants. Interestingly, the firm with the largest q-value has a posterior mean contact gap of 1.8 percentage points and a posterior 5th percentile gap of 0.75 percentage points, indicating that if the deconvolved distribution |$\hat{G}$| is taken as a prior, one can be confident that a nontrivial amount of discrimination is taking place at this firm.
. | . | Federal . | . | . | . | . | Posterior . | Posterior . | Posterior . |
---|---|---|---|---|---|---|---|---|---|
q-value rank . | Industry . | contractor? . | Contact gap . | Std. err. . | p-value . | q-value . | mean . | 5th pctile . | 95th pctile . |
1 | Auto dealers / services | Yes | 0.0952 | 0.0197 | .0000 | 0.0001 | 0.0835 | 0.0450 | 0.1035 |
2 | Auto dealers / services | No | 0.0507 | 0.0143 | .0003 | 0.0061 | 0.0354 | 0.0135 | 0.0673 |
3 | Auto dealers / services | No | 0.0738 | 0.0220 | .0005 | 0.0073 | 0.0489 | 0.0192 | 0.0981 |
4 | Auto dealers / services | No | 0.0787 | 0.0249 | .0010 | 0.0103 | 0.0498 | 0.0202 | 0.1031 |
5 | Apparel stores | No | 0.0733 | 0.0250 | .0022 | 0.0158 | 0.0448 | 0.0187 | 0.0929 |
6 | Other retail | No | 0.0469 | 0.0159 | .0020 | 0.0158 | 0.0286 | 0.0119 | 0.0595 |
7 | Other retail | Yes | 0.0605 | 0.0219 | .0033 | 0.0176 | 0.0365 | 0.0154 | 0.0743 |
8 | General merchandise | Yes | 0.0520 | 0.0187 | .0031 | 0.0176 | 0.0314 | 0.0132 | 0.0641 |
9 | Auto dealers / services | No | 0.0613 | 0.0240 | .0060 | 0.0194 | 0.0370 | 0.0158 | 0.0725 |
10 | Other retail | No | 0.0560 | 0.0214 | .0050 | 0.0194 | 0.0337 | 0.0143 | 0.0669 |
11 | Eating/drinking | No | 0.0560 | 0.0222 | .0064 | 0.0194 | 0.0339 | 0.0144 | 0.0660 |
12 | Auto dealers / services | No | 0.0540 | 0.0215 | .0068 | 0.0194 | 0.0327 | 0.0139 | 0.0634 |
13 | Food stores | Yes | 0.0511 | 0.0204 | .0069 | 0.0194 | 0.0310 | 0.0132 | 0.0599 |
14 | General merchandise | No | 0.0427 | 0.0170 | .0068 | 0.0194 | 0.0259 | 0.0110 | 0.0502 |
15 | Furnishing stores | Yes | 0.0400 | 0.0159 | .0066 | 0.0194 | 0.0242 | 0.0103 | 0.0470 |
16 | Wholesale nondurable | No | 0.0386 | 0.0158 | .0080 | 0.0199 | 0.0235 | 0.0100 | 0.0450 |
17 | Apparel manufacturing | Yes | 0.0350 | 0.0142 | .0078 | 0.0199 | 0.0213 | 0.0090 | 0.0409 |
18 | Building materials | Yes | 0.0373 | 0.0157 | .0093 | 0.0218 | 0.0229 | 0.0097 | 0.0433 |
19 | Health services | Yes | 0.0544 | 0.0240 | .0132 | 0.0292 | 0.0339 | 0.0143 | 0.0627 |
20 | Furnishing stores | No | 0.0400 | 0.0183 | .0152 | 0.0322 | 0.0252 | 0.0106 | 0.0460 |
21 | Eating/drinking | No | 0.0340 | 0.0159 | .0172 | 0.0346 | 0.0217 | 0.0090 | 0.0392 |
22 | General merchandise | No | 0.0423 | 0.0210 | .0229 | 0.0439 | 0.0277 | 0.0114 | 0.0494 |
23 | Insurance / real estate | No | 0.0278 | 0.0140 | .0257 | 0.0472 | 0.0183 | 0.0075 | 0.0325 |
. | . | Federal . | . | . | . | . | Posterior . | Posterior . | Posterior . |
---|---|---|---|---|---|---|---|---|---|
q-value rank . | Industry . | contractor? . | Contact gap . | Std. err. . | p-value . | q-value . | mean . | 5th pctile . | 95th pctile . |
1 | Auto dealers / services | Yes | 0.0952 | 0.0197 | .0000 | 0.0001 | 0.0835 | 0.0450 | 0.1035 |
2 | Auto dealers / services | No | 0.0507 | 0.0143 | .0003 | 0.0061 | 0.0354 | 0.0135 | 0.0673 |
3 | Auto dealers / services | No | 0.0738 | 0.0220 | .0005 | 0.0073 | 0.0489 | 0.0192 | 0.0981 |
4 | Auto dealers / services | No | 0.0787 | 0.0249 | .0010 | 0.0103 | 0.0498 | 0.0202 | 0.1031 |
5 | Apparel stores | No | 0.0733 | 0.0250 | .0022 | 0.0158 | 0.0448 | 0.0187 | 0.0929 |
6 | Other retail | No | 0.0469 | 0.0159 | .0020 | 0.0158 | 0.0286 | 0.0119 | 0.0595 |
7 | Other retail | Yes | 0.0605 | 0.0219 | .0033 | 0.0176 | 0.0365 | 0.0154 | 0.0743 |
8 | General merchandise | Yes | 0.0520 | 0.0187 | .0031 | 0.0176 | 0.0314 | 0.0132 | 0.0641 |
9 | Auto dealers / services | No | 0.0613 | 0.0240 | .0060 | 0.0194 | 0.0370 | 0.0158 | 0.0725 |
10 | Other retail | No | 0.0560 | 0.0214 | .0050 | 0.0194 | 0.0337 | 0.0143 | 0.0669 |
11 | Eating/drinking | No | 0.0560 | 0.0222 | .0064 | 0.0194 | 0.0339 | 0.0144 | 0.0660 |
12 | Auto dealers / services | No | 0.0540 | 0.0215 | .0068 | 0.0194 | 0.0327 | 0.0139 | 0.0634 |
13 | Food stores | Yes | 0.0511 | 0.0204 | .0069 | 0.0194 | 0.0310 | 0.0132 | 0.0599 |
14 | General merchandise | No | 0.0427 | 0.0170 | .0068 | 0.0194 | 0.0259 | 0.0110 | 0.0502 |
15 | Furnishing stores | Yes | 0.0400 | 0.0159 | .0066 | 0.0194 | 0.0242 | 0.0103 | 0.0470 |
16 | Wholesale nondurable | No | 0.0386 | 0.0158 | .0080 | 0.0199 | 0.0235 | 0.0100 | 0.0450 |
17 | Apparel manufacturing | Yes | 0.0350 | 0.0142 | .0078 | 0.0199 | 0.0213 | 0.0090 | 0.0409 |
18 | Building materials | Yes | 0.0373 | 0.0157 | .0093 | 0.0218 | 0.0229 | 0.0097 | 0.0433 |
19 | Health services | Yes | 0.0544 | 0.0240 | .0132 | 0.0292 | 0.0339 | 0.0143 | 0.0627 |
20 | Furnishing stores | No | 0.0400 | 0.0183 | .0152 | 0.0322 | 0.0252 | 0.0106 | 0.0460 |
21 | Eating/drinking | No | 0.0340 | 0.0159 | .0172 | 0.0346 | 0.0217 | 0.0090 | 0.0392 |
22 | General merchandise | No | 0.0423 | 0.0210 | .0229 | 0.0439 | 0.0277 | 0.0114 | 0.0494 |
23 | Insurance / real estate | No | 0.0278 | 0.0140 | .0257 | 0.0472 | 0.0183 | 0.0075 | 0.0325 |
Notes. This table reports estimates of white-Black contact gaps for the 23 firms with q-values less than 0.05. p-values and q-values come from one-sided tests of the null hypothesis that the firm does not discriminate against Black applicants. To ensure that q-values are nondecreasing for nested decision thresholds, we follow Storey (2002, 2003) in estimating |$\hat{q}_f$| as |$\min _{t \ge \hat{p}_f} \widehat{FDR}(t)$|, which implies firms with different p-values may have the same q-value. Posterior means and percentiles are empirical Bayes posteriors constructed using the estimated distribution in Figure VI as the prior.
. | . | Federal . | . | . | . | . | Posterior . | Posterior . | Posterior . |
---|---|---|---|---|---|---|---|---|---|
q-value rank . | Industry . | contractor? . | Contact gap . | Std. err. . | p-value . | q-value . | mean . | 5th pctile . | 95th pctile . |
1 | Auto dealers / services | Yes | 0.0952 | 0.0197 | .0000 | 0.0001 | 0.0835 | 0.0450 | 0.1035 |
2 | Auto dealers / services | No | 0.0507 | 0.0143 | .0003 | 0.0061 | 0.0354 | 0.0135 | 0.0673 |
3 | Auto dealers / services | No | 0.0738 | 0.0220 | .0005 | 0.0073 | 0.0489 | 0.0192 | 0.0981 |
4 | Auto dealers / services | No | 0.0787 | 0.0249 | .0010 | 0.0103 | 0.0498 | 0.0202 | 0.1031 |
5 | Apparel stores | No | 0.0733 | 0.0250 | .0022 | 0.0158 | 0.0448 | 0.0187 | 0.0929 |
6 | Other retail | No | 0.0469 | 0.0159 | .0020 | 0.0158 | 0.0286 | 0.0119 | 0.0595 |
7 | Other retail | Yes | 0.0605 | 0.0219 | .0033 | 0.0176 | 0.0365 | 0.0154 | 0.0743 |
8 | General merchandise | Yes | 0.0520 | 0.0187 | .0031 | 0.0176 | 0.0314 | 0.0132 | 0.0641 |
9 | Auto dealers / services | No | 0.0613 | 0.0240 | .0060 | 0.0194 | 0.0370 | 0.0158 | 0.0725 |
10 | Other retail | No | 0.0560 | 0.0214 | .0050 | 0.0194 | 0.0337 | 0.0143 | 0.0669 |
11 | Eating/drinking | No | 0.0560 | 0.0222 | .0064 | 0.0194 | 0.0339 | 0.0144 | 0.0660 |
12 | Auto dealers / services | No | 0.0540 | 0.0215 | .0068 | 0.0194 | 0.0327 | 0.0139 | 0.0634 |
13 | Food stores | Yes | 0.0511 | 0.0204 | .0069 | 0.0194 | 0.0310 | 0.0132 | 0.0599 |
14 | General merchandise | No | 0.0427 | 0.0170 | .0068 | 0.0194 | 0.0259 | 0.0110 | 0.0502 |
15 | Furnishing stores | Yes | 0.0400 | 0.0159 | .0066 | 0.0194 | 0.0242 | 0.0103 | 0.0470 |
16 | Wholesale nondurable | No | 0.0386 | 0.0158 | .0080 | 0.0199 | 0.0235 | 0.0100 | 0.0450 |
17 | Apparel manufacturing | Yes | 0.0350 | 0.0142 | .0078 | 0.0199 | 0.0213 | 0.0090 | 0.0409 |
18 | Building materials | Yes | 0.0373 | 0.0157 | .0093 | 0.0218 | 0.0229 | 0.0097 | 0.0433 |
19 | Health services | Yes | 0.0544 | 0.0240 | .0132 | 0.0292 | 0.0339 | 0.0143 | 0.0627 |
20 | Furnishing stores | No | 0.0400 | 0.0183 | .0152 | 0.0322 | 0.0252 | 0.0106 | 0.0460 |
21 | Eating/drinking | No | 0.0340 | 0.0159 | .0172 | 0.0346 | 0.0217 | 0.0090 | 0.0392 |
22 | General merchandise | No | 0.0423 | 0.0210 | .0229 | 0.0439 | 0.0277 | 0.0114 | 0.0494 |
23 | Insurance / real estate | No | 0.0278 | 0.0140 | .0257 | 0.0472 | 0.0183 | 0.0075 | 0.0325 |
. | . | Federal . | . | . | . | . | Posterior . | Posterior . | Posterior . |
---|---|---|---|---|---|---|---|---|---|
q-value rank . | Industry . | contractor? . | Contact gap . | Std. err. . | p-value . | q-value . | mean . | 5th pctile . | 95th pctile . |
1 | Auto dealers / services | Yes | 0.0952 | 0.0197 | .0000 | 0.0001 | 0.0835 | 0.0450 | 0.1035 |
2 | Auto dealers / services | No | 0.0507 | 0.0143 | .0003 | 0.0061 | 0.0354 | 0.0135 | 0.0673 |
3 | Auto dealers / services | No | 0.0738 | 0.0220 | .0005 | 0.0073 | 0.0489 | 0.0192 | 0.0981 |
4 | Auto dealers / services | No | 0.0787 | 0.0249 | .0010 | 0.0103 | 0.0498 | 0.0202 | 0.1031 |
5 | Apparel stores | No | 0.0733 | 0.0250 | .0022 | 0.0158 | 0.0448 | 0.0187 | 0.0929 |
6 | Other retail | No | 0.0469 | 0.0159 | .0020 | 0.0158 | 0.0286 | 0.0119 | 0.0595 |
7 | Other retail | Yes | 0.0605 | 0.0219 | .0033 | 0.0176 | 0.0365 | 0.0154 | 0.0743 |
8 | General merchandise | Yes | 0.0520 | 0.0187 | .0031 | 0.0176 | 0.0314 | 0.0132 | 0.0641 |
9 | Auto dealers / services | No | 0.0613 | 0.0240 | .0060 | 0.0194 | 0.0370 | 0.0158 | 0.0725 |
10 | Other retail | No | 0.0560 | 0.0214 | .0050 | 0.0194 | 0.0337 | 0.0143 | 0.0669 |
11 | Eating/drinking | No | 0.0560 | 0.0222 | .0064 | 0.0194 | 0.0339 | 0.0144 | 0.0660 |
12 | Auto dealers / services | No | 0.0540 | 0.0215 | .0068 | 0.0194 | 0.0327 | 0.0139 | 0.0634 |
13 | Food stores | Yes | 0.0511 | 0.0204 | .0069 | 0.0194 | 0.0310 | 0.0132 | 0.0599 |
14 | General merchandise | No | 0.0427 | 0.0170 | .0068 | 0.0194 | 0.0259 | 0.0110 | 0.0502 |
15 | Furnishing stores | Yes | 0.0400 | 0.0159 | .0066 | 0.0194 | 0.0242 | 0.0103 | 0.0470 |
16 | Wholesale nondurable | No | 0.0386 | 0.0158 | .0080 | 0.0199 | 0.0235 | 0.0100 | 0.0450 |
17 | Apparel manufacturing | Yes | 0.0350 | 0.0142 | .0078 | 0.0199 | 0.0213 | 0.0090 | 0.0409 |
18 | Building materials | Yes | 0.0373 | 0.0157 | .0093 | 0.0218 | 0.0229 | 0.0097 | 0.0433 |
19 | Health services | Yes | 0.0544 | 0.0240 | .0132 | 0.0292 | 0.0339 | 0.0143 | 0.0627 |
20 | Furnishing stores | No | 0.0400 | 0.0183 | .0152 | 0.0322 | 0.0252 | 0.0106 | 0.0460 |
21 | Eating/drinking | No | 0.0340 | 0.0159 | .0172 | 0.0346 | 0.0217 | 0.0090 | 0.0392 |
22 | General merchandise | No | 0.0423 | 0.0210 | .0229 | 0.0439 | 0.0277 | 0.0114 | 0.0494 |
23 | Insurance / real estate | No | 0.0278 | 0.0140 | .0257 | 0.0472 | 0.0183 | 0.0075 | 0.0325 |
Notes. This table reports estimates of white-Black contact gaps for the 23 firms with q-values less than 0.05. p-values and q-values come from one-sided tests of the null hypothesis that the firm does not discriminate against Black applicants. To ensure that q-values are nondecreasing for nested decision thresholds, we follow Storey (2002, 2003) in estimating |$\hat{q}_f$| as |$\min _{t \ge \hat{p}_f} \widehat{FDR}(t)$|, which implies firms with different p-values may have the same q-value. Posterior means and percentiles are empirical Bayes posteriors constructed using the estimated distribution in Figure VI as the prior.
Though we expect at most 1 of the 23 firms with q-values below 0.05 to have racial contact gaps equal to zero, the actual number of false discoveries may differ from its expected value. To get a sense of how many false discoveries could potentially arise in an unfavorable scenario, Online Appendix Figure A14 plots the posterior distribution of false discoveries implied by the LFDRs of these 23 firms.21 Reassuringly, the posterior probability mass function of false discoveries is tightly concentrated around its mean, with the posterior chances of three or more of these firms exhibiting contact gaps of zero being less than 2%.
The lower panels of Table VIII reveal that conclusions regarding the set of firms likely to be discriminating against Black names are remarkably robust to the method used to bound π0. In fact, if we use randomization inference–based p-values to estimate π0, the 23 firms assigned racial discrimination q-values less than 0.05 in our baseline analysis have an average LFDR of only 0.025, suggesting the false discovery rate for this collection of firms may actually be 2.5% or lower. When π0 is set to the upper limit of its 95% confidence interval—an extremely conservative choice—20 firms have q-values below 0.05 (Table VIII, bottom panel, first column). This prior insensitivity arises because many firms have very small p-values, as shown in Table IX.
Consistent with the posterior mean estimates in Figure VIII, we find a clear industry pattern among firms with low q-values for discrimination against Black applicants. As shown in Table X, firms detected as discriminating against Black names are highly concentrated in the auto dealers and services sector, where six of the eight firms in our experiment have q-values below 0.05. The mean LFDR in this sector is 0.13, implying that at least 87% of the firms in this industry discriminate against Black applicants. Other sectors with a high concentration of racial discrimination include other retail (SIC 59), where three of the seven firms have q-values below 0.05, and furnishing stores (SIC 57), where two of four firms have low q-values. Mean LFDRs are substantially higher than 0.05 in these sectors, indicating that the firm-specific p-values remain somewhat dispersed within industry. Notably, 8 of the 23 firms with q-values less than 0.05 are federal contractors, including the firm with the highest posterior mean level of racial discrimination.
. | . | . | Race . | Gender . | ||||
---|---|---|---|---|---|---|---|---|
. | . | N . | W-B post . | # q-val . | Mean . | M-F post . | # q-val . | Mean . |
SIC . | Industry . | firms . | gap . | < 0.05 . | LFDR . | gap . | < 0.05 . | LFDR . |
20 | Food products | 2 | 0.015 | 0 | 0.900 | −0.004 | 0 | 0.993 |
23 | Apparel manufacturing | 2 | 0.021 | 1 | 0.170 | 0.007 | 0 | 0.702 |
24–35 | Other manufacturing | 4 | 0.018 | 0 | 0.361 | 0.012 | 0 | 0.669 |
42–47 | Freight / transport | 4 | 0.011 | 0 | 0.822 | 0.001 | 0 | 0.941 |
48 | Communications | 2 | 0.017 | 0 | 0.340 | 0.013 | 0 | 0.972 |
49 | Electric / gas | 3 | 0.015 | 0 | 0.339 | 0.002 | 0 | 0.980 |
50 | Wholesale durable | 2 | 0.017 | 0 | 0.293 | 0.034 | 0 | 0.555 |
51 | Wholesale nondurable | 11 | 0.018 | 1 | 0.456 | 0.005 | 0 | 0.865 |
52 | Building materials | 3 | 0.014 | 1 | 0.544 | 0.012 | 0 | 0.849 |
53 | General merchandise | 12 | 0.023 | 3 | 0.276 | −0.001 | 0 | 0.867 |
54 | Food stores | 5 | 0.025 | 1 | 0.356 | 0.009 | 0 | 0.821 |
55 | Auto dealers / services | 8 | 0.040 | 6 | 0.127 | 0.005 | 0 | 0.882 |
56 | Apparel stores | 4 | 0.025 | 1 | 0.253 | −0.061 | 1 | 0.416 |
57 | Furnishing stores | 4 | 0.022 | 2 | 0.304 | −0.006 | 0 | 0.787 |
58 | Eating / drinking | 5 | 0.027 | 2 | 0.303 | 0.003 | 0 | 0.926 |
59 | Other retail | 7 | 0.022 | 3 | 0.314 | −0.002 | 0 | 0.971 |
60–61 | Banks / credit | 5 | 0.010 | 0 | 0.651 | 0.002 | 0 | 0.778 |
62 | Securities brokers | 2 | 0.010 | 0 | 0.410 | −0.011 | 0 | 0.654 |
63–65 | Insurance / real estate | 8 | 0.013 | 1 | 0.463 | −0.003 | 0 | 0.915 |
70 | Accommodation | 2 | 0.015 | 0 | 0.527 | 0.001 | 0 | 1.000 |
73 | Business services | 3 | 0.012 | 0 | 0.539 | 0.000 | 0 | 0.942 |
75–76 | Auto / repair services | 3 | 0.013 | 0 | 0.474 | 0.015 | 0 | 0.624 |
80 | Health services | 5 | 0.016 | 1 | 0.726 | −0.009 | 0 | 0.909 |
87 | Engineering services | 2 | 0.009 | 0 | 0.348 | −0.001 | 0 | 0.965 |
. | . | . | Race . | Gender . | ||||
---|---|---|---|---|---|---|---|---|
. | . | N . | W-B post . | # q-val . | Mean . | M-F post . | # q-val . | Mean . |
SIC . | Industry . | firms . | gap . | < 0.05 . | LFDR . | gap . | < 0.05 . | LFDR . |
20 | Food products | 2 | 0.015 | 0 | 0.900 | −0.004 | 0 | 0.993 |
23 | Apparel manufacturing | 2 | 0.021 | 1 | 0.170 | 0.007 | 0 | 0.702 |
24–35 | Other manufacturing | 4 | 0.018 | 0 | 0.361 | 0.012 | 0 | 0.669 |
42–47 | Freight / transport | 4 | 0.011 | 0 | 0.822 | 0.001 | 0 | 0.941 |
48 | Communications | 2 | 0.017 | 0 | 0.340 | 0.013 | 0 | 0.972 |
49 | Electric / gas | 3 | 0.015 | 0 | 0.339 | 0.002 | 0 | 0.980 |
50 | Wholesale durable | 2 | 0.017 | 0 | 0.293 | 0.034 | 0 | 0.555 |
51 | Wholesale nondurable | 11 | 0.018 | 1 | 0.456 | 0.005 | 0 | 0.865 |
52 | Building materials | 3 | 0.014 | 1 | 0.544 | 0.012 | 0 | 0.849 |
53 | General merchandise | 12 | 0.023 | 3 | 0.276 | −0.001 | 0 | 0.867 |
54 | Food stores | 5 | 0.025 | 1 | 0.356 | 0.009 | 0 | 0.821 |
55 | Auto dealers / services | 8 | 0.040 | 6 | 0.127 | 0.005 | 0 | 0.882 |
56 | Apparel stores | 4 | 0.025 | 1 | 0.253 | −0.061 | 1 | 0.416 |
57 | Furnishing stores | 4 | 0.022 | 2 | 0.304 | −0.006 | 0 | 0.787 |
58 | Eating / drinking | 5 | 0.027 | 2 | 0.303 | 0.003 | 0 | 0.926 |
59 | Other retail | 7 | 0.022 | 3 | 0.314 | −0.002 | 0 | 0.971 |
60–61 | Banks / credit | 5 | 0.010 | 0 | 0.651 | 0.002 | 0 | 0.778 |
62 | Securities brokers | 2 | 0.010 | 0 | 0.410 | −0.011 | 0 | 0.654 |
63–65 | Insurance / real estate | 8 | 0.013 | 1 | 0.463 | −0.003 | 0 | 0.915 |
70 | Accommodation | 2 | 0.015 | 0 | 0.527 | 0.001 | 0 | 1.000 |
73 | Business services | 3 | 0.012 | 0 | 0.539 | 0.000 | 0 | 0.942 |
75–76 | Auto / repair services | 3 | 0.013 | 0 | 0.474 | 0.015 | 0 | 0.624 |
80 | Health services | 5 | 0.016 | 1 | 0.726 | −0.009 | 0 | 0.909 |
87 | Engineering services | 2 | 0.009 | 0 | 0.348 | −0.001 | 0 | 0.965 |
Notes. This table shows the results of aggregating firm-specific posterior estimates of race and gender discrimination to the industry level. Industries that include only one firm are grouped together with proximate SIC codes. The column “W-B post gap” shows industry averages of posterior mean white/Black contact gaps. The column “M-F post gap” displays industry averages of posterior mean male/female contact gaps. The column “# q-val < 0.05” gives the number of firms in the industry with q-values below 0.05. The column “mean LFDR” reports the mean local false discovery rate (LFDR) among firms in the industry. Firm level q-values and LFDRs were estimated using the procedure of Storey et al. (2015). The distribution of race LFDRs is depicted in Figure X, Panel A. The distribution of gender LFDRs is depicted in Figure X, Panel C.
. | . | . | Race . | Gender . | ||||
---|---|---|---|---|---|---|---|---|
. | . | N . | W-B post . | # q-val . | Mean . | M-F post . | # q-val . | Mean . |
SIC . | Industry . | firms . | gap . | < 0.05 . | LFDR . | gap . | < 0.05 . | LFDR . |
20 | Food products | 2 | 0.015 | 0 | 0.900 | −0.004 | 0 | 0.993 |
23 | Apparel manufacturing | 2 | 0.021 | 1 | 0.170 | 0.007 | 0 | 0.702 |
24–35 | Other manufacturing | 4 | 0.018 | 0 | 0.361 | 0.012 | 0 | 0.669 |
42–47 | Freight / transport | 4 | 0.011 | 0 | 0.822 | 0.001 | 0 | 0.941 |
48 | Communications | 2 | 0.017 | 0 | 0.340 | 0.013 | 0 | 0.972 |
49 | Electric / gas | 3 | 0.015 | 0 | 0.339 | 0.002 | 0 | 0.980 |
50 | Wholesale durable | 2 | 0.017 | 0 | 0.293 | 0.034 | 0 | 0.555 |
51 | Wholesale nondurable | 11 | 0.018 | 1 | 0.456 | 0.005 | 0 | 0.865 |
52 | Building materials | 3 | 0.014 | 1 | 0.544 | 0.012 | 0 | 0.849 |
53 | General merchandise | 12 | 0.023 | 3 | 0.276 | −0.001 | 0 | 0.867 |
54 | Food stores | 5 | 0.025 | 1 | 0.356 | 0.009 | 0 | 0.821 |
55 | Auto dealers / services | 8 | 0.040 | 6 | 0.127 | 0.005 | 0 | 0.882 |
56 | Apparel stores | 4 | 0.025 | 1 | 0.253 | −0.061 | 1 | 0.416 |
57 | Furnishing stores | 4 | 0.022 | 2 | 0.304 | −0.006 | 0 | 0.787 |
58 | Eating / drinking | 5 | 0.027 | 2 | 0.303 | 0.003 | 0 | 0.926 |
59 | Other retail | 7 | 0.022 | 3 | 0.314 | −0.002 | 0 | 0.971 |
60–61 | Banks / credit | 5 | 0.010 | 0 | 0.651 | 0.002 | 0 | 0.778 |
62 | Securities brokers | 2 | 0.010 | 0 | 0.410 | −0.011 | 0 | 0.654 |
63–65 | Insurance / real estate | 8 | 0.013 | 1 | 0.463 | −0.003 | 0 | 0.915 |
70 | Accommodation | 2 | 0.015 | 0 | 0.527 | 0.001 | 0 | 1.000 |
73 | Business services | 3 | 0.012 | 0 | 0.539 | 0.000 | 0 | 0.942 |
75–76 | Auto / repair services | 3 | 0.013 | 0 | 0.474 | 0.015 | 0 | 0.624 |
80 | Health services | 5 | 0.016 | 1 | 0.726 | −0.009 | 0 | 0.909 |
87 | Engineering services | 2 | 0.009 | 0 | 0.348 | −0.001 | 0 | 0.965 |
. | . | . | Race . | Gender . | ||||
---|---|---|---|---|---|---|---|---|
. | . | N . | W-B post . | # q-val . | Mean . | M-F post . | # q-val . | Mean . |
SIC . | Industry . | firms . | gap . | < 0.05 . | LFDR . | gap . | < 0.05 . | LFDR . |
20 | Food products | 2 | 0.015 | 0 | 0.900 | −0.004 | 0 | 0.993 |
23 | Apparel manufacturing | 2 | 0.021 | 1 | 0.170 | 0.007 | 0 | 0.702 |
24–35 | Other manufacturing | 4 | 0.018 | 0 | 0.361 | 0.012 | 0 | 0.669 |
42–47 | Freight / transport | 4 | 0.011 | 0 | 0.822 | 0.001 | 0 | 0.941 |
48 | Communications | 2 | 0.017 | 0 | 0.340 | 0.013 | 0 | 0.972 |
49 | Electric / gas | 3 | 0.015 | 0 | 0.339 | 0.002 | 0 | 0.980 |
50 | Wholesale durable | 2 | 0.017 | 0 | 0.293 | 0.034 | 0 | 0.555 |
51 | Wholesale nondurable | 11 | 0.018 | 1 | 0.456 | 0.005 | 0 | 0.865 |
52 | Building materials | 3 | 0.014 | 1 | 0.544 | 0.012 | 0 | 0.849 |
53 | General merchandise | 12 | 0.023 | 3 | 0.276 | −0.001 | 0 | 0.867 |
54 | Food stores | 5 | 0.025 | 1 | 0.356 | 0.009 | 0 | 0.821 |
55 | Auto dealers / services | 8 | 0.040 | 6 | 0.127 | 0.005 | 0 | 0.882 |
56 | Apparel stores | 4 | 0.025 | 1 | 0.253 | −0.061 | 1 | 0.416 |
57 | Furnishing stores | 4 | 0.022 | 2 | 0.304 | −0.006 | 0 | 0.787 |
58 | Eating / drinking | 5 | 0.027 | 2 | 0.303 | 0.003 | 0 | 0.926 |
59 | Other retail | 7 | 0.022 | 3 | 0.314 | −0.002 | 0 | 0.971 |
60–61 | Banks / credit | 5 | 0.010 | 0 | 0.651 | 0.002 | 0 | 0.778 |
62 | Securities brokers | 2 | 0.010 | 0 | 0.410 | −0.011 | 0 | 0.654 |
63–65 | Insurance / real estate | 8 | 0.013 | 1 | 0.463 | −0.003 | 0 | 0.915 |
70 | Accommodation | 2 | 0.015 | 0 | 0.527 | 0.001 | 0 | 1.000 |
73 | Business services | 3 | 0.012 | 0 | 0.539 | 0.000 | 0 | 0.942 |
75–76 | Auto / repair services | 3 | 0.013 | 0 | 0.474 | 0.015 | 0 | 0.624 |
80 | Health services | 5 | 0.016 | 1 | 0.726 | −0.009 | 0 | 0.909 |
87 | Engineering services | 2 | 0.009 | 0 | 0.348 | −0.001 | 0 | 0.965 |
Notes. This table shows the results of aggregating firm-specific posterior estimates of race and gender discrimination to the industry level. Industries that include only one firm are grouped together with proximate SIC codes. The column “W-B post gap” shows industry averages of posterior mean white/Black contact gaps. The column “M-F post gap” displays industry averages of posterior mean male/female contact gaps. The column “# q-val < 0.05” gives the number of firms in the industry with q-values below 0.05. The column “mean LFDR” reports the mean local false discovery rate (LFDR) among firms in the industry. Firm level q-values and LFDRs were estimated using the procedure of Storey et al. (2015). The distribution of race LFDRs is depicted in Figure X, Panel A. The distribution of gender LFDRs is depicted in Figure X, Panel C.
To further compare results based on posterior means and q-values, Online Appendix Figure A15 plots the posterior mean racial contact gaps (|$\bar{\Delta }_f$|) from the previous section against the |$\hat{q}_f$| from our preferred specification. Bracketing the posterior means are 95% EB credible intervals (EBCIs) connecting each firm’s posterior 2.5th percentile contact gap to its posterior 97.5th percentile. If the prior |$\hat{G}_{\Delta }$| were estimated without error, then 95% of the population contact gaps would be expected to lie within these confidence intervals. The lower limit of each EBCI is positive because the estimated prior imposed that racial contact gaps are almost surely positive. By contrast, the q-values were derived under the assumption that 39% of firms have contact gaps of exactly zero. As expected the posterior mean contact gaps are generally decreasing in |$\hat{q}_f$| but the relationship between the two measures is not perfectly monotone.
As a result of the higher concentration of gender contact gaps near zero, it is more difficult to detect individual firms discriminating on the basis of gender than on the basis of race. Figure X, Panel C shows the distribution of p-values derived from tests that gender contact gaps are zero. Here the Storey et al. (2015) procedure produces an upper bound on π0 of 0.83, implying that at least 17% of firms discriminate on the basis of gender. Moreover, the 95% confidence interval on π0 extends to 1, suggesting that we cannot reject the null that none of the firms discriminate based on gender. This conclusion is clearly at odds with our earlier higher criticism calculation, not to mention the tests presented in Table IV, which decisively rejected the null that gender contact gaps are equal across firms. This discrepancy likely arises because the Armstrong (2015) test is designed to have good power properties in settings where π0 is not close to 1, a condition which seems to be violated here.22 Likewise, the 95% confidence interval for the proportion of firms not discriminating against older applicants also includes 1, which is unsurprising given that the tests reported in Table IV detected only modest firm heterogeneity in age discrimination.
These high estimated bounds on π0 lead to high lower bounds on the posterior probabilities of gender discrimination for most firms. Consequently, Table VIII shows that only one firm has a q-value for gender discrimination below 0.05.23Table X indicates that this company is in the apparel sector. Based on its posterior mean, this apparel store is discriminating against men. Interestingly, the same store also has a q-value below 0.05 for racial discrimination. Although the apparel sector (SIC 56) has a large average posterior mean contact gap favoring women, the mean LFDR in the sector is relatively high, suggesting industry membership is not, in itself, dispositive of gender discrimination.
X.D. Prevalence versus Severity
Having established with high posterior certainty that 23 firms favor white applicants on average, we now examine whether these firms’ racial contact gaps could have been generated by a small minority of discriminating jobs. This distinction between the prevalence and severity of racial discrimination is arguably pertinent to the legal notion of systemic discrimination as a widespread pattern of organizational behavior. Kline and Walters (2021) show that the share of jobs that discriminate is not point identified in audit designs sending a small number of applications to each job. Consequently, we rely on a simple bounding approach to assess the prevalence of discrimination across jobs within firms.
To formalize the notion of job-level discrimination prevalence, it is convenient to again work with a mixture representation. Suppose that a proportion 1 − φf of the jobs at firm f have contact gaps of exactly zero.24 With this notation, the firm-wide mean contact gap can be written |$\Delta _f=\phi _f \dot{\Delta }_{f}$|, where |$\dot{\Delta }_{f}$| gives the average contact gap among discriminating jobs in firm f. Here |$\dot{\Delta }_{f}$| provides a measure of discrimination severity, and φf indexes the prevalence of discrimination.
An unbiased estimate of the variance of job-level gaps can be computed by taking the covariance between contact gaps for the first and last two application pairs sent to each job. Applying this approach, Online Appendix Table A4 reports that the standard deviation of contact gaps across all jobs in the experiment is 0.073. The mean gap across jobs is 0.020 with associated standard error of 0.002. Consequently, the lower-bound prevalence is estimated to be |$\frac{(0.020)^2 - (0.002)^2}{(0.020)^2- (0.002)^2+(0.073)^2}\approx 0.07$|, indicating that at least 7% of jobs in the experiment as a whole discriminate against Black names.
We can conduct a corresponding calculation at each firm, using |$\hat{\Delta }_f^2 - s_f^2$| as a bias-corrected estimate of each firm’s |$\Delta _f^2$|. Figure XI illustrates these firm-specific estimates, which are quite noisy, ordered by the firm’s q-value. As expected, firms with lower q-values tend to have higher job-level prevalence bounds. To reduce sampling error, the solid line plots the average bound among jobs at firms with q-values under a threshold level. Firms with q < 0.1, for example, have a lower-bound prevalence of 18%. The 23 firms with |$\hat{q}_f<0.05$| exhibit a prevalence of at least 20%, suggesting that discrimination against Black names is widespread among the establishments that make up these firms.
XI. Detection Possibilities
The EEOC, OFCCP, and several local organizations, such as the New York City Commission on Human Rights, proactively investigate employer discrimination on an ongoing basis. Statistical evidence is a legally recognized basis for such decisions.25 We now consider the stylized decision problem faced by a hypothetical auditor charged with deciding whether to investigate the firms in our study and show how EB posterior means and |${q}$|-values can be used to derive optimal investigation rules.
XI.A. The Auditor’s Problem
Consider an auditor concerned with racial discrimination who can launch investigations into the conduct of any firm in our experiment at cost c ∈ (0, 1). Let δf ∈ {0, 1} be an indicator for the decision to launch an investigation into firm f and |$\mathcal {D}$| the collection of these indicators.
If, based on |$\mathcal {E}$|, an auditor with preferences Ui were to settle on beliefs over contact gaps coinciding with the deconvolved distribution |$\hat{G}_{\Delta }$|, she would investigate all firms with EB posterior means |$\bar{\Delta }_f$| exceeding c. If the auditor instead believes population contact gaps are normally distributed with a variance equal to that reported in Table IV, she will investigate all firms with linear shrinkage estimates |$\tilde{\Delta }_f$| exceeding c.
The decision problem is somewhat trickier for an auditor with preferences Ue who is willing to entertain the possibility that a large share of firms are not discriminating at all. Recall that the probability of nondiscrimination π0 is, in general, only bounded by our experiment (Efron et al. 2001; Kline and Walters 2021). Faced with this ambiguity, an auditor with preferences Ue might reasonably consider the largest value of π0 consistent with the experimental evidence. Optimizing against this least favorable value |$\pi _0^{\dagger }$| of π0 leads the auditor to investigate all firms with |$LFDR_f(\pi _0^{\dagger })<1-c$|. This minimax decision rule coincides with a q-value based threshold, because q-values are running averages of (sorted) LFDRs.
A natural question raised by these derivations is how often a minimax auditor concerned with extensive-margin discrimination would dispute the decisions of an EB auditor concerned with the intensive margin of discrimination. In principle, LFDR-based rankings of firm behavior can differ substantially from rankings based on posterior means (Gu and Koenker 2020). Reassuringly, we demonstrate that little would be lost from investigating firms based on q-value thresholds even from the perspective of an auditor with preferences given by Ui and smooth priors given by |$\hat{G}_{\Delta }$|.
XI.B. Detection Possibility Frontiers
Figure XII illustrates the trade-off the auditor faces between the costs of investigating more firms and the benefits of finding additional large contact gaps. Suppose that 1,000 Black applications are sent at random to jobs equally distributed across the firms in our experiment, and contact gaps among these firms follow the estimated distribution |$\hat{G}_{\Delta }$|. The figure reports the contacts expected to be lost due to racial discrimination among investigated firms under various investigation threshold rules. The dotted 45-degree line gives the results of investigating firms at random. Since |$\hat{G}_{\Delta }$| exhibits a mean contact gap of 2.1 percentage points (see Figure VI), investigating all the firms would “save” roughly 20 contacts per 1,000 applications, while investigating half of the firms at random would save 10 contacts.
The solid line illustrates the detection possibilities frontier available to the auditor if she observed the Δf without error. This infeasible frontier is simply a rescaled Lorenz curve for the distribution |$\hat{G}_{\Delta }$|. Reflecting that distribution’s fat tail, the worst 20% of discriminating firms are responsible for roughly half of the lost contacts. The preferences of an auditor with objective Ui can be visualized as indifference lines with slope −1,000c. An optimum occurs at a point of tangency between the indifference line and the detection frontier.
The dashed dotted line illustrates the frontier that arises when the auditor selects firms based on their posterior means |$\bar{\Delta }_{f}$|. The vertical distance between the posterior mean frontier and the true contact gap frontier reflects the cost of ranking firms according to their posterior means rather than their true contact gaps. Because the distribution of posteriors is more compressed than |$\hat{G}_{\Delta }$|, the auditor must investigate roughly a quarter of the firms based on their posterior means to isolate those responsible for half of lost contacts.
Selecting firms using the linear shrinkage estimator |$\tilde{\Delta }_f$| instead of |$\bar{\Delta }_f$| is estimated to entail only a small degradation of the possibilities frontier. This robustness reflects the high degree of rank correlation between the posterior mean and the linear shrinkage estimator (ρ = 0.9). Though the firm rankings are highly correlated across shrinkage methods, an auditor would likely choose to investigate fewer firms based on the linear shrinkage estimator, which predicts that fewer firms are engaged in severe discrimination against Black applicants.
Finally, the dashed line illustrates the frontier that arises when selecting firms based on q-values under the maintained assumption that contact gaps are distributed according to |$\hat{G}_{\Delta }$|. The expected cost of ranking firms based on their q-values, as would be optimal under preference scheme Ue, rather than their posterior means is surprisingly small, though performance degrades somewhat when more than half of the firms are investigated. Notably, the roughly 21% (|$\frac{23}{108}$|) of firms with q-values less than or equal to 0.05 are responsible for approximately 37% of lost contacts. Investigating the same share of firms based on posterior mean rankings would be expected to yield only an additional 4% of lost contacts. Evidently, the price to be paid for control over false discoveries in our setting is fairly small. More generally, these results imply that it is possible to detect individual firms responsible for a substantial share of the contacts lost to racial discrimination while maintaining a tight limit on the expected number of false-positive investigations of nondiscriminators.
XII. Conclusion
Our analysis establishes that many large U.S. employers exhibit nationwide patterns of racial discrimination that are temporally and spatially stable. Racial and gender contact gaps are highly concentrated in particular firms. We estimate that the 20% of firms discriminating most heavily against Black names are responsible for nearly half of the contacts lost to racial discrimination in our experiment. Racial discrimination appears to be widespread among the jobs posted by these firms.
In principle, the concentration of discriminatory behavior in a subpopulation of employers could dampen the economy-wide consequences of discrimination, as workers can sort away from biased firms (Becker 1957). Such a conclusion hinges crucially, however, on whether workers are aware of firm differences in average behavior. The relatively weak correlations between racial contact gaps and local demographics uncovered in our analysis give us reason to question this assumption. Rather, our impression is that the identities of the 23 firms conclusively determined to be discriminating against Black names would come as a surprise to the companies involved and to the public at large. The identities of the companies likely discriminating on the basis of perceived sex are somewhat less surprising, conforming more closely to gendered stereotypes regarding work norms.
The concentration of discrimination among particular employers may amplify group disparities if discriminatory firms tend to offer higher wages. While we found no relationship at the industry level between racial wage gaps and racial contact gaps, industry contact gaps favoring men were found to be predictive of larger gender wage gaps. An interesting topic for future research is to assess the extent to which the firm-level contact gaps uncovered in this experiment correlate with group disparities in firm wage fixed effects such as those studied by Card, Cardoso, and Kline (2016) or Gerard et al. (2021).
The fact that we can only confidently identify 23 firms as engaging in discrimination against Black names when using a massive correspondence experiment reveals the difficulty of the signal extraction problem associated with estimating firm-specific biases from application-level data. As described in Avivi et al. (2021), the firm-wide patterns documented here can potentially be used to design follow-up correspondence experiments aimed at accurately measuring biases at particular jobs, information that may be of interest both to regulators and companies interested in monitoring their own behavior.
The EEOC maintains an internal target for the share of its litigation docket composed of systemic discrimination cases. The appropriate level of this target has been a topic of recurring debate in Congress (Kim 2015). Our finding that discrimination is highly concentrated in particular companies lends some credence to the notion that appropriately targeted systemic investigations have the potential to remedy, and perhaps prevent, discrimination affecting a wide swath of the labor force.
Enforcement actions are inevitably costly and contentious. It is natural to wonder whether bias at the most discriminatory firms can be preemptively reduced or eliminated by modifying organizational hiring practices. A large experimental psychology literature studying behavioral interventions designed to reduce prejudice has failed to produce a “silver bullet” treatment with proven effectiveness.27 One of the strongest (negative) predictors of both racial and gender contact gaps found in our correspondence experiment is callback centralization, which is notably elevated among federal contractors. This finding leads us to suspect that human resources practices play an important role in translating the biased judgements of individuals into biased behavior by organizations. Although centralizing interview decisions might reduce discrimination, such changes may also simply postpone discrimination to a later stage of the hiring process. Determining whether it is possible to improve recruiting practices in a way that promotes equity and productivity remains an important and active area of research (Bergman, Li, and Raymond 2020; Raghavan et al. 2020).
Data Availability
Data and code replicating the tables and figures in this article can be found in Kline, Rose, and Walters (2022) in the Harvard Dataverse, https://doi.org/10.7910/DVN/HLO4XC.
Footnotes
We thank Sendhil Mullainathan, Sapna Raj, and Jenny Yang for helpful conversations; our discussants, Peter Hull and Peter Q. Blair; and seminar participants at UC Berkeley, Chicago Booth, the University of Chicago HCEO Working Group, the University of Wisconsin, CEPR, IZA, the NBER Labor Studies meetings, CESifo, Opportunity Insights, Chicago Harris, the Trans Pacific Labor Seminar, and Cornell University for useful comments. We also thank the J-PAL North America Social Policy Research Initiative for generous funding support. A preanalysis plan for this project can be found under AEA RCT registry number AEARCTR-0004739. Outstanding research assistance was provided by Jake Anderson, Hadar Avivi, Elena Marchetti-Bowick, Ross Chu, Brian Collica, Nicole Gandre, and Ben Scuderi. We are grateful to our determined team of undergraduate volunteers, without whom this project would not have been possible: May Adberg, Jason Chen, Stephanie Cong, Simon Duabis, Daniel Dychala, Samuel Gao, Alexandra Groscost, Victoria Haworth, Camille Hillion, Ben Keltner, Mary Kruberg, Jiaxin Lei, Carol Lee, Collin Lu, Oliver McNeil, Riley Odom, Sarah Phung, Eric Phillips, Stephanie Ross, Marcus Sander, Pat Tagari, Quinghuai Tan, Lydia Wen, Zijun Xu, Xilin Ying, Andy Zhong, Leila Zhua, Yingjia Zhang, and Yiran Zi.
EEOC guidelines clearly state that “an employer may not base hiring decisions on stereotypes and assumptions about a person’s race, color, religion, sex (including gender identity, sexual orientation, and pregnancy), national origin, age (40 or older), disability or genetic information” (see https://www.eeoc.gov/prohibited-employment-policiespractices).
For discussion of the potential legal ramifications of handling fictitious applications based on racial perceptions of names, see U.S. Equal Employment Opportunity Commission v. Target Corp., 460 F.3d 946 (7th Cir. 2006), Onwuachi-Willig and Barnes (2005), and Fryer and Levitt (2004), note 27. In cases where no aggrieved person has claimed standing, the EEOC can file a commissioner’s charge alleging Title VII violations or launch a directed investigation into violations of either the Age Discrimination in Employment Act or the Equal Pay Act. In fiscal years 2016–2019, the EEOC averaged 13 commissioner’s charges and 138 directed investigations per year.
Explicit evidence of intent to discriminate is not required to establish a prima facie case for disparate treatment. The EEOC’s guidance states that “discriminatory motive can be inferred from the fact that there were differences in treatment” (International Brotherhood of Teamsters v. United States, 431 U.S. 324, 1977). In some cases, large statistical disparities alone can also constitute prima facie evidence of intentional discrimination (Hazelwood School Dist. v. United States, 433 U.S. 299 (1977)).
For instance, in 2019, the EEOC brought a systemic lawsuit against Schuster trucking for subjecting job applicants to a physical abilities test that was alleged to have a disparate impact on women (U.S. EEOC 2019). However another 2019 case, against Sactacular Holdings LLC, an adult retail chain, alleged disparate treatment after a male job applicant was told by employees at two separate stores that the company does not consider men for sales associate positions (U.S. EEOC 2020).
For example, the New York City Commission on Human Rights has a mandate to test for discrimination in housing and labor markets and has assisted in the staging of matched-pairs audits of bias by landlords (Fang, Guess, and Humphreys 2019) and employers (Pager, Bonikowski, and Western 2009).
The EEOC compliance manual references this standard in its guidelines for evaluating systemic discrimination: “a pattern or practice would be established if, despite the fact that Blacks made up 20 percent of a company’s applicants for manufacturing jobs and 22 percent of the available manufacturing workers, not one of the 87 jobs filled during a six year period went to a Black applicant” (U.S. EEOC 2006a). As the Supreme Court notes in Teamsters v. United States, 431 U.S. 324 (1977), “the proof of the pattern or practice supports an inference that any particular employment decision, during the period in which the discriminatory policy was in force, was made in pursuit of that policy.”
Pairs were sent every other day during wave 1, when most applications were submitted by human research assistants, to manage workloads. Beginning in wave 2, when the majority of applications were submitted automatically by software we developed, one pair was sent per day. Some pairs were occasionally sent with longer time lags due to workload or technological constraints, but overall 94% of applications were sent within eight days of the first.
All names used are presented in Online Appendix B along with additional details on experimental design.
The preanalysis plan is stored in the AEA RCT registry with number AEARCTR-0004739.
We suspect little meaningful variation is lost from this aggregation as the bias-corrected variance of racial contact gaps across SOC-3 codes is numerically indistinguishable from the bias-corrected variance across standardized job titles.
Recall, however, that the experiment only sampled entry-level jobs that were easy to audit with our résumé technology. It may be that job titles are an important predictor of discrimination in the broader population of jobs.
The WAC block-level data appear to provide an accurate measure of workplace racial composition. For a small number of the firms in our sample we were able to obtain EEO-1 records documenting the racial mix of establishments with 50 or more workers. Among the 426 establishments for which we have these data, the correlation between the EEO-1 and block-level WAC measures of the fraction of Black workers is 0.79.
The federal contractor status of each firm in our experiment was obtained directly from OFCCP as part of a FOIA request.
As Efron and Tibshirani (1996) note in a closely related context, imposing such moment constraints can provide an attractive balance between local adaptivity and respecting certain global properties of the density.
For race, we set the support of Gμ to [0, max f (zf) + 0.5]. The support of GΔ is assumed to be [0, max f (zf)max f (sf)]. For gender, we assume the supports of Gμ and GΔ are [min f (zf) − 0.5, max f (zf) + 0.5] and [min f (zf)max f (sf), max f (zf)max f (sf)], respectively. A deconvolved density of racial contact gaps that does not impose the positive support restriction is reported in Online Appendix Figure A12.
See Storey (2003) and Efron (2016) for more on EB interpretations of false discovery rates. We have implicitly assumed that at least one firm has a |$\hat{p}_f$| less than p.
In practice we follow Storey (2002, 2003) in estimating |$\hat{q}_f$| as |$\min _{t \ge \hat{p}_f} \widehat{FDR}(t)$|, which ensures that q-values are nondecreasing for nested rejection thresholds.
Letting |$f_{\hat{p}}$| denote the density of observed p-values, we can define |$LFDR\left(p\right)=\frac{\pi _0}{f_{\hat{p}}(p)}$|. It is straightforward to verify that |$FDR\left(p\right)=\frac{\int _0^p f_{\hat{p}}\left(b\right) LFDR\left(b\right)db}{F_{\hat{p}}\left(p\right)}.$| Because we use a kernel smoother to estimate |$f_{\hat{p}}$|, the running average of LFDR estimates does not numerically match |$\hat{q}_f$| in sample.
Randomization-based tests avoid reliance on asymptotics but evaluate the “sharp” null that none of the firm’s contact decisions were influenced by protected characteristics. See Ding (2017) for further discussion of how to interpret such tests.
The number of false discoveries follows a Poisson binomial posterior distribution with probabilities given by the LFDRs of the hypotheses under consideration. See Basu et al. (2021) for discussion.
As Armstrong (2015, 2093) notes, his procedure “looks at the larger ordered p-values in order to achieve adaptivity to the smoothness of the distribution of p-values under the alternative in a setting where π may not be close to 1.”
Note that this firm has a q-value below 0.05 even when |$\hat{\pi }_{0}=1$|. This occurs because |$\hat{p}_{f}$| is well below |$\hat{F}_{\hat{p}}(\hat{p}_{f})$|, so |$\hat{q}_{f}$| is small even when plugging in an upper bound on π0 of unity.
One reason that a particular job may not discriminate is that its population contact rate may be zero, for instance, because the job may have already been filled. Consequently, even a firm with a practice of always discriminating in hiring might, by this definition, exhibit a φf < 1.
For example, the U.S. Department of Labor’s Administrative Review Board ruled in Office of Federal Contract Compliance Programs (2016), U.S. Department of Labor v. Bank of America that “the more severe the statistical disparity, the less additional evidence is needed to prove that the reason was race discrimination. Very extreme cases of statistical disparity may permit the trier of fact to conclude intentional race discrimination occurred without needing additional evidence.” See Office of Federal Contract Compliance Programs (2019) for a similar ruling by the Office of Administrative Law Judges.
Both utility functions can be viewed as special cases of the more general preference scheme |$U(\mathcal {D})=\sum _{f=1}^{F} \delta _f(\Delta _f^\frac{1}{p} - c)$|, where p ≥ 1 governs the auditor’s risk aversion. When p = 1, the auditor is risk neutral and U = Ui. As p → ∞, the auditor grows increasingly risk averse and U approaches Ue.
A recent review of this evidence by Paluck et al. (2020) concludes that “a fair assessment of our data on implicit prejudice reduction is that the evidence is thin. Together with the lack of evidence for diversity training, these studies do not justify the enthusiasm with which implicit prejudice reduction trainings have been received in the world over the past decade.”