Discrimination in Multi-Phase Systems: Evidence from Child Protection ∗

We develop empirical tools for studying discrimination in multi-phase systems, and apply them to the setting of foster care placement by child protective services. Leveraging the quasi-random assignment of two sets of decision-makers—initial hotline call screeners and subsequent investigators—we study how unwarranted racial disparities arise and propagate through this system. Using a sample of over 200,000 maltreatment allegations, we find that calls involving Black children are 55% more likely to result in foster care placement than calls involving white children with the same potential for future maltreatment in the home. Call screeners account for up to 19% of this unwarranted disparity, with the remainder due to investigators. Unwarranted disparity is concentrated in cases with potential for future maltreatment, suggesting that white children may be harmed by “under-placement” in high-risk situations. JEL Codes: C26, I31, J13, J15.


I. Introduction
Large racial disparities have been documented in many high-stakes settings-such as employment, healthcare, housing, and criminal justice-raising concerns of discrimination by individual decision-makers.At the same time, there is growing understanding that a focus on individual decisions can yield an incomplete view of discrimination.An extensive theoretical literature shows how discrimination can arise and compound across multiple decision-makers within interconnected systems (e.g., Loury, 1976; Pincus, 1996; Powell, 2008; Small and Pager,  2020).From this "systems-based" perspective, an analysis of individual discrimination by, for example, bail judges, may understate the true level of inequity in pretrial release decisions by failing to account for previous discrimination by police officers.Broader analyses of how discrimination arises and perpetuates across such multi-phase systems may be necessary to understand and form appropriate policy responses.
Measuring discrimination in multi-phase systems is challenging for several reasons, however.Raw disparities may either overstate or understate true levels of discrimination because of omitted variables bias (OVB), while conventional regression adjustment may add included variables bias (IVB) by controlling for channels of discrimination.Datasets linking multiple phases are often unavailable and may not include the kinds of exogenous variation that can help address such biases (e.g., Arnold, Dobbie, and Hull, 2022).Interpreting and integrating findings of discrimination across systems may further require new analytic tools (Bohren, Hull,  and Imas, 2022).
We develop an empirical framework that overcomes these challenges and apply it to the setting of foster care placement by U.S. child protective services (CPS).CPS aims to prevent child maltreatment by investigating reported cases of abuse or neglect and placing children into foster care when deemed necessary to ensure their safety.In practice, CPS involvement is remarkably common: by age 18, 37% of children experience a maltreatment investigation and 5% spend time in foster care (Wildeman and Emanuel, 2014; Kim et al., 2017).CPS involvement is also racially disparate: the majority of Black children (53%) experience an investigation, compared to 28% of white children, and Black children are twice as likely to spend time in foster care (10%, compared to 5% of white children).
There is enormous interest in these racial disparities and the extent to which they reflect discrimination in the actions of CPS decision-makers.For example, both the United Nations and the American Bar Association recently released reports calling for the U.S. to take all appropriate measures to eliminate racial discrimination in child protection (Kelly, 2022;  White and Persson, 2022).Such interest reflects the fact that CPS actions can have a tremendous impact on the lives of children and parents and involve a difficult trade-off.On one hand, leaving children in high-risk situations may lead to subsequent maltreatment which is associated with decreased educational attainment and future earnings (Currie and  Spatz Widom, 2010), and increased criminal activity (Currie and Tekin, 2012).On the other hand, foster care placement is among the most far-reaching government interventions with large potential effects on a child's educational attainment, earnings, and criminal activity (Bald et al., 2022).Discrimination in foster care placement thus stands to exacerbate inequities in many long-term outcomes.
Disparities in foster care placement can arise at two key phases within the CPS system.In an initial screening phase, incoming calls that allege child maltreatment are routed through a central state-level hotline.Call screeners at the center decide whether to "screen-in" the call: i.e., advance it to a formal investigation.In a subsequent investigation phase, screened-in cases are allocated to a regional investigator who visits the family and decides whether the child should be placed into foster care.Initial discrimination by screeners can thus be perpetuated, mitigated, or compounded by subsequent investigator decisions. 1 measure discrimination in this multi-phase system by unwarranted disparities (UDs): racial differences in screener and investigator decision rates, conditional on a child's potential for subsequent maltreatment in the home. 2 This measure builds on Arnold, Dobbie, and  Hull (2021, 2022), who study discrimination in bail judge decisions via racial disparities in pretrial release rates conditional on a defendant's potential for pretrial misconduct.As in their context, our UD discrimination measures are natural given clear decision-maker objectives. 3nwarranted disparities capture classic drivers of inequity in economics such as racial bias (Becker, 1957) and statistical discrimination (Phelps, 1972; Arrow, 1973; Aigner and Cain,  1977), as well as indirect forms of discrimination arising from non-race characteristics.
The key identification challenge in measuring UDs is the selective observability of a child's potential for subsequent maltreatment in the home.Maltreatment potential is directly observed among children not removed from home, but unobserved among those placed into foster care.We address this challenge by leveraging the quasi-random assignment of both 1 While there may also be discrimination among those reporting suspected child maltreatment to the state's hotline, most reporters are not CPS decision-makers.Consequently, this stage of potential discrimination falls outside the purview of CPS.
2 Specifically, we condition on a child's potential for a subsequent maltreatment investigation in the home within six months-a common proxy in the child welfare literature (Antle et al., 2009; Putnam-Hornstein,  Prindle, and Hammond, 2021).We show robustness of our main findings to a wide range of alternative proxies.
3 Our measure differs from those in earlier studies of disparities at the investigation phase which condition on observable traits rather than future maltreatment potential.Such studies include Drake et al. (2011); Font,  Berger, and Slack (2012); Putnam-Hornstein et al. (2013).
screeners and investigators to cases, building on the "identification at infinity" approach in Arnold, Dobbie, and Hull (2022).This approach generates estimates of UD in screener and investigator decisions, which we then combine to estimate overall UD in eventual placement rates.We further decompose the placement UD into the shares attributed to screeners and investigators, via a decomposition that builds on Bohren, Hull, and Imas (2022).
To build intuition for our approach, consider estimating UDs in screening-in decisions and imagine a randomly assigned screener whose calls exhibit a placement rate of virtually zero (either because they screen-out virtually all calls or because virtually none of their screened-in calls result in placement).By virtue of random assignment and this low placement rate, the subsequent at-home maltreatment rates observed among Black and white children assigned to this screener are close to the average rates among all Black and white children.These race-specific maltreatment rates thus capture the correlation between maltreatment potential and race in the full population of calls, and they can be used to correct for OVB in raw screening-in disparities.Absent such a screener, we estimate these key parameters by extrapolating the at-home maltreatment rates of quasi-randomly assigned screeners with low placement rates.A similar approach can be used to estimate UDs at the investigation phase, focusing on investigators with low placement rates among screened-in children.
We implement this identification strategy using administrative data from Michigan CPS spanning 2008 to 2019 and present three key findings.First, we find significant evidence of unwarranted disparity by both screeners and investigators.Calls involving Black children are screened-in at a 5 percentage point (8%) higher rate than calls involving white children with identical potential for subsequent maltreatment.The estimated UD is roughly 35% larger than a controlled observational disparity of 3.7 percentage points, which points to the importance of accounting for the selectively observed maltreatment potential.At the investigation phase, we find that investigators amplify initial screening disparities-despite observing effectively everything seen by screeners and deliberating over a much longer time frame.Specifically, investigators place screened-in Black children at a 1.7 percentage point (50%) higher rate than screened-in white children with identical potential for future maltreatment.This estimated UD is nearly 70% larger than the corresponding controlled observational disparity.
Our second set of findings link the UDs in screener and investigator decisions via a decomposition of UD in eventual placement rates.Overall, calls involving Black children are 1.1 percentage points (55%) more likely to end up in foster care relative to calls involving white children with identical maltreatment potential.The decompositions show that screener decisions account for between 13% to 19% of overall placement UD, with investigators driving the remainder.The fact that call screeners drive a significant share of eventual UD in foster care placement is somewhat surprising, since only a small share of screened-in investigations result in placement.This finding further illustrates the importance of a systems-based analysis of discrimination in high-stakes settings like CPS: our estimates show that eliminating UD in foster care placement rates may require intervention at both phases of CPS involvement.
In our third set of results, we document a striking form of heterogeneity in the UDs: the placement disparity is concentrated among children with subsequent maltreatment potential in the home, with calls involving Black children placed in foster care at twice the rate of calls involving white children in this subpopulation (8% versus 4%).In contrast, the placement disparity is small and statistically insignificant in the subpopulation of children without maltreatment potential.The finding that unwarranted disparity is concentrated among high-risk cases implies that a higher placement rate may offer relative protection to Black children.Indeed, prior work in our setting finds that both Black and white children at risk of subsequent maltreatment in the home have better outcomes when placed in foster care, including a lower likelihood of subsequent maltreatment and adult criminal justice contact along with better educational outcomes (Baron and Gross, 2022; Gross and Baron, 2022).
These findings add nuance to ongoing policy debates over the reform of CPS, which focus on the possibility that Black children are "over-placed" in foster care.While we do find that calls involving Black children with future maltreatment potential in Michigan are disproportionately placed in foster care, white children may be harmed by "under-placement" in these high-risk situations.A back-of-the-envelope calculation shows that lowering the placement rate of screened-in Black children to equalize placement rates across race would increase the Black-white adult conviction gap by 10%.
Further unpacking this third result, we show that investigators-the primary drivers of placement UD-exhibit a racial concordance effect in cases with maltreatment potential, being significantly less likely to place children of their own race than other children.Since the vast majority of investigators in Michigan are white, this concordance effect yields higher conditional placement rates for Black children.This finding suggests that the leniency afforded by white investigators to white parents may, perhaps counterintuitively, lead to worse outcomes for their children relative to Black children who are placed at higher rates.
Our findings withstand a battery of robustness checks and extensions, including qualitatively similar estimates based on non-parametric UD bounds that relax our baseline identifying assumptions.Importantly, we show that our findings are not unique to Michigan: national (though more limited) data allow us to construct non-parametric bounds on investigator UD for almost all states in the U.S.This analysis reveals that UD in low-risk cases tends to be small nationwide, while UD in high-risk cases is typically as large or larger than in Michigan.
This study contributes to several related literatures.First and foremost, we build on Arnold,  Dobbie, and Hull (2022) and Bohren, Hull, and Imas (2022) by developing a practical empirical framework for studying how discrimination perpetuates and compounds across multiple decision-makers in a high-stakes system.As mentioned above, a large theoretical literature emphasizes this possibility and its implications for policy.4But these insights, while potentially valuable in many areas within economics, are often hard to bring to data because of the non-random decision-making at either or multiple phases of a system.We provide a framework for conducting such analyses when the multiple decision-makers are quasi-randomly assigned.5 Second, we add to a large literature studying the equity and efficiency of CPS systems.Our analysis of unwarranted disparity leverages a new source of variation (quasi-random screener assignment) to study the decision to launch investigations.While there is a growing literature examining the causal effects of foster care on the outcomes of screened-in children (see Bald  et al. (2022) for a review), much less is known about the broader effects of CPS systems. 6ird, we add to a recent literature using quasi-experimental variation to estimate various notions of bias and discrimination in high-stakes decisions, such as pretrial release (e.g., Arnold, Dobbie, and Yang (2018), Hull (2021), Arnold, Dobbie, and Hull (2022), Rambachan  (2022), and Canay, Mogstad, and Mountjoy (2022)), traffic stops (e.g., Goncalves and Mello  (2021) and Feigenberg and Miller (2022)) and lending (e.g., Dobbie et al. (2021)).Our analysis benefits from the fact that CPS systems feature many decision-makers with very low "treatment" rates, allowing for both precise non-parametric inferences on overall UDs and the statistical power to distinguish between disparities among high-and low-risk home situations.Our framework for linking such heterogeneity to welfare considerations and policy responses may be useful in future studies of unwarranted disparity.
The rest of the paper is organized as follows.Section II.describes the CPS system in more detail.Section III.develops our UD measures at each phase of the system, as well as the UD decomposition and our identification strategies.Section IV. describes our analysis samples and presents motivating results.Section V. presents our main findings on how UD propagates through CPS.Section VI. further unpacks the main source of UD at the investigation phase, examining heterogeneity by investigator characteristics, exploring potential drivers, and summarizing several extensions including the national UD analysis.Section VII.concludes.

II. Setting
The CPS system aims to protect children from maltreatment in their home environment.Figure I summarizes the process in Michigan and most states.CPS involvement begins when a call is made to the state's central hotline to report suspected child abuse (e.g., bruises or burns) or neglect (e.g., improper supervision due to parental substance abuse).Anyone can make a report to the hotline, though the most common reporters are educators and law enforcement personnel (Benson, Fitzpatrick, and Bondurant, 2022).
Calls to the Michigan CPS hotline are answered by screeners in two central offices-one in Grand Rapids and one in Detroit-which share a hotline number.Calls typically last about 15 minutes.Screeners have substantial discretion in whether to screen-in a call, though they follow general guidelines in screening-out calls that do not conform with state law and guidance from Michigan's Department of Health and Human Services (MDHHS).Screeners are instructed to screen-in calls to minimize the likelihood of subsequent maltreatment if the call is screened-out.7Screeners play no other role in the process: if a call is screened-in (roughly 60% of all calls), it is sent to the alleged victim's local child welfare office for formal investigation.A screened-out call concludes MDHHS involvement.Screeners do not systematically learn the eventual outcome of a given investigation or screened-out call.
Screened-in calls are assigned to an investigator who has 24 hours to begin an investigation, 72 hours to establish face-to-face contact with the alleged child victim, and 30 days to complete the investigation.The investigator then makes two primary decisions.First, they must decide whether there is enough evidence to substantiate the allegation.This determination is based on interviews with the child maltreatment reporter, family members, police, and potentially medical reports.Around 75% of all screened-in cases in our sample are unsubstantiated; an unsubstantiated finding concludes the investigation.If the investigation is substantiated, the investigator makes a judgement on whether to place the child in foster care. 8nder CPS guidelines, the primary justification for placement is child safety: investigators are instructed to place the child in foster care if the child is in "imminent risk" of maltreatment in the home, but to otherwise keep the child with their family. 9If the investigator determines that the potential for subsequent maltreatment is high, she requests that her supervisor submit a court petition to place the child in foster care.In practice, it is rare for either the supervisor or the judge to disagree with the investigator's recommendation.
Approximately 3.5% of all screened-in cases in our sample result in foster care placement.In these cases the child is placed with either an unrelated foster family, relatives, or (much less frequently) in a group home, while their custodial parents receive services to support reunification.On average, children spend approximately 17 months in foster care.Following this spell in foster care, roughly 50% of children are reunified with their birth parents, 34% are adopted or have legal guardianship transferred, 9% percent exit the system as independent adults upon turning 18; remaining children fall into less common exit categories such as informal guardianship with relatives (Gross and Baron, 2022).
Regardless of the placement decision, investigators can formally open a CPS case and recommend "targeted services" to support the family in cases that have been substantiated.These services are typically preventative referrals and range from substance abuse to parenting classes, though parents are not typically compelled to use them.In 6.8% of all investigations in our sample, investigators opened a case and recommended targeted services without removing the child from their home.Anecdotally, takeup of such service recommendations is very low. 10 Importantly for our analysis of this system, both screeners and investigators are quasi-randomly assigned.Incoming calls enter a queue, with the hotline system routing each call to the available screener who has been waiting the longest since her last call; this makes screener assignment as-good-as-random conditional on the exact day and shift.Once referred to a local office, screened-in calls are quasi-randomly assigned to the office's investigators.Every county in Michigan has at least one local office, with some larger and responses to match their priors (Gillingham and Humphreys, 2010; Bosk, 2015).Moreover, the tool itself allows for discretionary adjustments to risk levels.As we show below, there is a strong first-stage relationship between investigator removal tendencies and foster care placement within offices, suggesting that even where investigators are guided by the risk assessment, they maintain significant discretion over placement.
9 For example, the MDHHS Children's Protective Services Policy Manual reads: "placement of children out of their homes should occur only if their well-being cannot be safeguarded with their families" (MDHHS (2020), p.3).It further instructs investigators to recommend placement "in situations where the child is unsafe, or when there is resistance to, or failure to benefit from, CPS intervention and that resistance/failure is causing an imminent risk of harm to the child" (p.5).
10 While our data do not contain information on the takeup of these services, we have reviewed data from MDHHS on the takeup rates of SafeCare, a parental education program commonly offered to families and a program that the state is currently evaluating.Takeup rates for this program typically range from 5% to 10%.Completion rates are significantly lower, on the order of 1% to 2%.more urban counties containing multiple offices.Some offices further split investigators into geographic-based teams.Within teams, the assignment of most cases is rotational-reports cycle through investigators based on who is next up in the rotation and investigators are not assigned based on their specific characteristics or skill sets.There are two exceptions to this quasi-random assignment: cases of sexual abuse tend to be assigned to more experienced investigators, and repeat reports involving a child who was recently investigated are often re-assigned to the initial investigator.We exclude these cases from our analysis and isolate quasi-random assignment by conditioning on a child's ZIP code and investigation year.

III.A. Unwarranted Disparity Measures and Decompositions
We formalize our empirical approach by considering a population of cases referred to CPS.Each case i involves either a Black or white child, indicated by R i ∈ {b, w}.Each child has a potential for at-home future maltreatment Y * i ∈ {0, 1}, with Y * i = 1 indicating future abuse or neglect when the child is not removed from the home.Cases are first handled by call screeners who decide whether to advance the case to investigation.Among screened-in cases, investigators then decide whether to place the child into foster care.If the child is either screened-out or screened-in but not placed, maltreatment potential is realized and observable via their maltreatment outcomes.Otherwise, Y * i is not realized and hence not observable.
To develop our discrimination measures and decompositions in this setting, we first imagine that cases are assigned to a single representative screener and, when screened-in, a single representative investigator.Decision-maker heterogeneity will play a central role in our identification strategy, developed below.Here we abstract away from heterogeneity to ease notation, letting S i ∈ {0, 1} indicate whether case i is screened-in by the representative screener and (when screened-in) D i ∈ {0, 1} indicate whether the representative investigator chooses foster care placement for the child.The product of these indicators, P i = S i D i , then indicates whether incoming case i ultimately results in placement.
We measure discrimination in screening and investigation decisions as unwarranted disparities (UDs): racial differences in decision rates conditional on a child's potential for subsequent maltreatment in the home, Y * i .This measure builds on Arnold, Dobbie, and Hull (2021,  2022), who study discrimination in bail decisions via racial disparities in pretrial release rates conditional on a defendant's potential for pretrial misconduct.As in their context, our UD discrimination measure is natural given clear decision-maker objectives: under CPS guidelines, the primary justification for foster care placement is a potential for subsequent maltreatment in the home.Arnold, Dobbie, and Hull (2021, 2022) show how such a measure aligns with the legal theory of disparate impact, economic notions of discrimination among equally productive workers, as well as more recent notions of algorithmic discrimination from the computer science literature. 11Importantly, unwarranted disparity can arise from both "direct" discrimination on the basis of race itself and "indirect" discrimination through non-race characteristics that are correlated with race (such as poverty levels).
To build up to our UD measures, we first define a conditional disparity in screening-in rates among Black and white children without future maltreatment potential: along with the corresponding disparity among children with future maltreatment potential: We measure the overall screener UD by averaging these two conditional disparities, with weights given by the average future maltreatment risk in the population of all cases, μ = E[Y * i ].Thus, ∆ S captures the expected level of UD when encountering a representative pool of children with unknown future maltreatment potential.
Next, among screened-in cases, we define the investigator's UDs: Here, ∆ D 0 gives the investigator's placement rate disparity for screened-in cases without future maltreatment potential while ∆ D 1 gives the screened-in placement rate disparity for cases with future maltreatment potential.We again measure the overall investigator UD by averaging these two conditional disparities: , where now the weights μS=1 = E[Y * i | S i = 1] correspond to the average maltreatment risk of screened-in cases.
11 Disparate impact is one of two main legal doctrines of discrimination in U.S. case law.It concerns the discriminatory effects of a policy or practice rather than a decision-maker's intent.The disparate impact standard applies to programs and activities receiving federal financial assistance via Title VI of the 1964 Civil Rights Act, including the child protection systems we consider (DHHS, 2016; DOJ, 2016).Both screening and investigation are explicitly required to comply with this standard (DHHS, 2016).See Section I.A of Arnold,  Dobbie, and Hull (2022) for more background and discussion of relevant case law.
Finally, we define placement rate disparities across all cases referred to CPS: with the overall placement UD again given by an average: We link the screener, investigator, and placement UDs via a decomposition of placement UDs into components due to the screener and investigator.Specifically, since P i = S i D i , we have: for y ∈ {0, 1}, where Equation ( 8) decomposes placement UD into two components: one involving screener UD (∆ S y ) and the other involving investigator UD (∆ D y ).These UDs are weighted by ω S y (the placement rate of screened-in white children with Y * i = y) and ω D y (the screened-in rate of Black children with Y * i = y), respectively.These weights are non-negative, though they do not sum to one in general.The equation is derived similarly to classic Kitagawa-Oaxaca-Blinder (KOB) decompositions, by adding and subtracting (8 to the first line and rearranging terms.The decomposition of ∆ P 0 and ∆ P 1 can be further averaged with the μ weights to write ∆ P in terms of ∆ S y and ∆ D y .These decompositions build on Bohren, Hull, and Imas (2022), who propose a general framework for studying how discrimination perpetuates across multiple connected decisions across time or domains.
As usual with KOB decompositions, an alternative version of Equation ( 8) comes from changing the "order" of decomposition.Namely, we can also write: where ωS to the first line of Equation ( 8) and again rearranging terms.Again, this alternative decomposition can be averaged with the μ weights to write ∆ P in terms of ∆ S y and ∆ D y .
Either decomposition-Equation ( 8) or ( 9)-can be used to study how UD arises and propagates through the two-phase child protection system.Since placement rates tend to be low, the screener disparity weights ω S y and ωS y are likely to be small; the screener components ∆ S y ω S y and ∆ S y ωS y are therefore not likely to account for the majority of placement UD.Nevertheless, to the extent that meaningful unwarranted disparity is found at both the screener and investigator phase, Equations ( 8) and ( 9) show how much intervention may be required in each phase to eliminate unwarranted disparity in eventual foster care placement rates.
The fundamental challenge in bringing these UD measures and decompositions to data is the selective observability of maltreatment potential.Among children who are not placed into foster care, maltreatment potential is directly revealed by their future maltreatment outcomes.But since future maltreatment in the home is unobserved among children who are placed in foster care, we cannot directly estimate Equations ( 1)-( 9).We next develop our quasi-experimental strategy for addressing this identification challenge.

III.B. Identification Strategies
The first step of our empirical approach is to rewrite Equations ( 1)-( 9) in terms of a set of directly estimable moments and a set of unknown parameters capturing the average at-home maltreatment risk of certain populations of Black and white children.The second is to estimate or bound these key parameters using the quasi-random assignment of screeners and investigators.Here we first develop this approach for estimating the screener UD measures.We then discuss how the approach extends to the investigator UD measures and decompositions of placement UDs; we also discuss an alternative bounding approach.
The screener UD measures (1)-( 2) are based on terms that can be rewritten as: and Since screening decisions S i are directly observed, and since the future maltreatment outcomes of screened-out children (with (1 − S i ) = 1) directly reveal their at-home maltreatment potential, the numerators in these expressions do not suffer from the selective observability of Y * i .The challenge of estimating screener UDs therefore reduces to the challenge of estimating the parameters in the denominators, These parameters reflect the average at-home maltreatment risk of Black and white children in the full population of cases, and they are not directly estimable because of the selective observability of Y * i .Specifically, we cannot directly estimate each µ r because Y * i is unobserved when child i is placed into foster care.
Our estimation strategy for the key µ r parameters builds on Arnold, Dobbie, and Hull  (2021, 2022) by leveraging variation across as-good-as-randomly assigned screeners in the observed maltreatment rates of children not placed into foster care.To build intuition for this approach, suppose that in addition to our representative screener we have identified a "supremely negligent" screener whose calls exhibit a placement rate of virtually zero (either because they screen-out virtually all calls or because virtually none of their screened-in calls result in placement).Suppose further that we randomly assign some subset of cases to this supremely negligent screener and then measure the foster care placement rates and subsequent maltreatment outcomes among cases not resulting in foster care placement.By virtue of randomization, the average Y * i among Black and white children in the cases handled by the supremely negligent screener would be the same as in the full population-that is, both would have the same race-specific subsequent maltreatment means, µ r .Moreover, the observed maltreatment outcomes of the handled cases which do not result in foster care placement would be close to these means.This follows because the supremely negligent screener has a placement rate that is close to zero, making the assigned Black and white cases not resulting in foster care placement close to representative of the full sample of Black and white cases.
Absent such a supremely negligent screener, the key mean risk parameters µ r can be estimated by extrapolating variation in observed at-home maltreatment rates across quasi-randomly assigned screeners with low placement rates.This approach is conceptually similar to how average potential outcomes at a treatment cutoff can be extrapolated from nearby observations in standard regression discontinuity designs.Here, potential maltreatment risk is extrapolated from quasi-randomly assigned screeners with low placement rates to the hypothetical supremely negligent screener whose placement rate is zero.12Formalizing this approach to estimating screener UDs requires enriched notation to capture screener heterogeneity and an estimation strategy that allows for the fact that screeners are only conditionally as-good-as-randomly assigned.Consider two linear regressions: estimated in the full population of cases, with S i ∈ {0, 1} here denoting screen-in status, and: estimated only among screened-out cases (with S i = 0).Here, indicates that case i is assigned to one of J screeners j, and X i is a vector of day-by-shift fixed effects, conditional on which screener assignment is as-good-as-random.When X i is de-meaned, the ϕ S jw and ϕ S jb coefficients capture strata-adjusted screen-in rates of screener j for white and Black children, respectively.Likewise, the ψ S jw and ψ S jb coefficients capture strata-adjusted maltreatment rates among the white and Black children screened-out by screener j.Ordinary least square (OLS) estimates of these coefficients can be used to estimate screener-specific versions of the numerators in Equations ( 10) and (11). 13r estimating the remaining risk parameters µ r , consider two analogous regressions: estimated in the full population of cases, with P i ∈ {0, 1} here denoting placement status, and: estimated only among cases not resulting in foster care placement (with P i = 0).Here, the ϕ P jr coefficients can be used to identify screeners with low placement rates among children of race r, while the ψ P jr coefficients capture the subsequent maltreatment rates among children of race r who are not placed into foster care following assignment to screener j.Following the above intuition, µ r can be estimated by extrapolating estimates of ψ P jr across screeners with low ϕ P jr estimates; for example, one could estimate µ r by the vertical intercept of linear (or more flexible) regressions of the ψ P jr estimates on the ϕ P jr estimates across screeners j.
With estimates of µ r and the screener-specific numerators in Equations ( 10)-( 11), , where S ij denotes the potential screening decision of screener j when assigned to case i.These expressions follow when screeners are quasi-randomly assigned (so the Z ij are independent of (R i , Y * i , (S ij ) J j=1 ) given X i ) and when regression-adjustment for X i is sufficient (see Arnold, Dobbie, and Hull (2022)).We show below that our results are nearly identical without regression-adjustment, however.This is because variation in the relevant outcome variables is largely driven by variation within, rather than across, the strata in X i (Baron  and Gross, 2022).
screener-specific UD measures ∆ S j0 and ∆ S j1 can be estimated following Equations ( 1) and ( 2).These can then be aggregated across screeners, weighting by caseloads, to estimate average screener UDs ∆ S 0 and ∆ S 1 .These can then be averaged, weighting by the natural estimate of μ = µ b P r(R i = b) + µ w P r(R i = w), to obtain an estimate of overall average screener UD.
The same approach can be used to estimate investigator UDs, placement UDs in the full sample of cases, and the two UD decompositions.For investigator UDs, which condition on screened-in cases, the key parameters µ ] capture the race-specific maltreatment risk of screened-in cases; this follows analogously to Equations ( 10)-( 11).These µ S=1 r can again be estimated by extrapolating estimates from versions of Equations ( 14)-( 15) that condition on S i = 1 (such that P i = D i , the investigator decision) and which replace the screener assignment dummies Z ij and controls X i with corresponding investigator assignment dummies and rotation (ZIP code by year) fixed effects.These regressions also yield estimates of the remaining moments in the analogous equations to Equations ( 10)-( 11), giving estimates of investigator-specific UD measures ∆ D j0 and ∆ D j1 (following Equations ( 4) and ( 5)), average investigator UD estimates that aggregate these across investigators (weighting by caseloads), and an overall measure of average investigator UD that weights by the natural estimate ).Finally, average placement UDs in the full population of calls (and their decompositions) can be estimated by combining estimates of average screener and investigator UDs with resulting estimates of the decomposition weights: either (ω S y , ω D y ) or (ω S y , ωD y ).
An alternative strategy of bounding unwarranted disparity-which does not rely on any extrapolation-uses the fact that the selectively observed The lower bounds µ rL and µ S=1 rL come from assuming all children placed into foster care, for whom future at-home maltreatment is unobserved, have Y * i = 0.The upper bounds µ rU and µ S=1 rU come from assuming all such children have Y * i = 1.These bounds are directly estimable from observed placement and maltreatment rates and can be tightened by focusing on screeners or investigators with lower placement rates (Arnold, Dobbie, and Hull, 2022).Bounds on the different UD measures and decompositions can then be formed by combining these µ r and µ S=1 r bounds with the directly estimable moments.Note that the width of these bounds will be equal to the either the placement rate in the full population of calls (when bounding µ r ), or the placement rate in the population of screened-in calls (when bounding µ S=1 r ).Therefore, these bounds are likely to be informative in our context, given relatively low placement rates.We explore this alternative approach below.

III.C. Identifying Assumptions
Our empirical strategy relies on two primary assumptions.
First, we require the as-good-as-random assignment of screeners and investigators who vary in their tendency to place children into foster care.This assumption is consistent with our understanding of the assignment process in Michigan CPS, detailed in Section II., as well as previous studies using such variation to estimate the causal effects of foster care placement.Below, we show a variety of balance tests that indirectly validate as-good-as-random assignment in our data.
The second key assumption is an exclusion restriction: that screener and investigator assignment only systematically impact subsequent maltreatment potential through the decision to place a child into foster care.If agents differentially affect maltreatment potential in other ways, then our estimates of average maltreatment rates may reflect these additional impacts and may not be representative of the full population.This assumption is consistent with the relatively narrow mandates of screeners and investigators described in Section II.. Nevertheless, we develop below a series of extensions that test and relax the exclusion restriction by allowing for different possible direct effects of investigation, such as those from targeted services or unobserved contact effects.
Importantly, although our approach can be understood as leveraging quasi-random screener and investigator assignment as instruments for placement-similar to conventional instrumental variables (IV) studies of foster care effects-it does not require the conventional IV assumption of first-stage monotonicity: i.e., that decision-makers have a common ranking of cases by their appropriateness for foster care placement. 14Intuitively, the extrapolated estimates of µ r and µ S=1 r are valid as long as the average relationship between subsequent maltreatment rates and placement rates across screeners and investigators can be reliably estimated (at least for low placement rates). 15In practice, the large number of screeners and investigators with low placement rates in our setting help make these extrapolations reliable.
15 To see how reliable extrapolation is possible when first-stage monotonicity fails, consider a simple model of investigators' placement decisions: where ν ij |κ j , λ j ∼ U (0, 1) and (κ j , λ j ) are random investigator-specific parameters (here we implicitly condition on S i = 1 and a single race R i throughout).
Further assume ).This model can violate conventional first-stage monotonicity, since investigators can differ both in their ordering of individuals by the appropriateness of placement (ν ij ) and their relative skill at predicting subsequent maltreatment (λ j ).Nevertheless, when E[λ j |κ j ] is constant (linear) in κ j , average future maltreatment rates are linear (quadratic) in placement rates, such that simple parametric extrapolations identify mean risk µ.

IV.A. Data Sources
Most of our analysis is based on data from MDHHS.We observe all child maltreatment investigations (screened-in calls) in Michigan from January 2008 to December 2019.Additionally, we observe data on screened-out calls from January 2017 to December 2019.From these data we construct two analysis samples: a "screener sample" to estimate screener UDs from 2017 to 2019 and an "investigator sample" to estimate investigator UDs from 2008 to 2019.We combine estimates from these samples to derive and decompose placement UDs.
For each hotline call, we observe child and screener identifiers, the child's age and gender, the child's race and ethnicity (discussed more below), the call's exact date and time, and the allegation and reporter types as coded by the screener (e.g., physical abuse versus neglect, and educational personnel versus law enforcement).For screened-in calls, we also observe the relationship of the child to the alleged perpetrator (as coded by the investigator), the child's ZIP code, and indicators for substantiation and foster care placement.Screened-in calls also include the identifier of the investigator who handled the case.For a subset of these cases (from January 2008 to June 2017), we further see the name of the investigator, which we use to predict the investigator's race, ethnicity, and gender.
Two variables in our analysis merit special focus.The first is child race, R i .Generally, hotline screeners in Michigan do not directly ask callers for the race of the alleged victim. 16ather, they observe racial information from a centralized database.If a child has had prior interactions with a CPS investigator-either as the alleged victim or as another involved child (e.g., siblings or family members)-this database contains the race as previously coded by the CPS investigator who visited the home.Otherwise, the database contains the self-reported race from a state-wide database (called MIBridges) with detailed information on families receiving various state benefits. 17In practice, the vast majority of children reported to the hotline have prior interactions with either CPS or MIBridges such that the database contains a child's race in most cases. 18We use this database to construct R i . 19e second key variable for our analysis is subsequent maltreatment potential Y * i .Our primary maltreatment measure considers whether a child was re-investigated within six months of the focal investigation.Re-investigation is a common measure of subsequent maltreatment in the child welfare literature and in quality measurement among policy makers (Antle et al., 2009;  Putnam-Hornstein and Needell, 2011; Casanueva et al., 2015; Putnam-Hornstein et al., 2015;  Putnam-Hornstein, Prindle, and Hammond, 2021).While it serves as an imperfect proxy for actual child maltreatment, as it only accounts for cases reported to CPS, re-investigation is a substantive interaction with authorities that entails a report to CPS, a decision by a hotline screener to screen-in the case (as detailed above, only 60% of all hotline calls are assigned for investigation), and an investigation of the family for up to 30 days.Below we show robustness of our main findings to a large number of other proxies, such as re-investigation over other time horizons, substantiation for subsequent maltreatment, and subsequent placement into foster care.A subsequent investigation is also a natural starting point for the analysis because it is not impacted by decisions of the initial investigator: while subsequent investigations within a few months may be re-assigned to the initial investigator who will again make the substantiation and foster care placement decision, neither the decision to report nor to screen-in a case (the two steps required for re-investigation) involve the initial investigator. 20portantly, our analysis is not premised on the view that differences in re-investigation rates are unaffected by discrimination in society more broadly.Differences in re-investigation risk could, for example, be driven by the over-reporting of Black children to CPS.Indeed, prior research suggests that Black children may be disproportionately likely to be reported by medical personnel conditional on case severity (Lane et al., 2002).This scenario could cause us to understate UDs, since they would inflate measured maltreatment risk for Black children.Nevertheless, our goal in conditioning on re-investigation risk is to isolate a particular form of UD from the CPS system which may be reliably targeted by policy, holding fixed other forms 18 Race information is missing in fewer than 10% of all calls.The state believes that the majority of these instances involve Native American children, for whom the screener does not input further race details once their Native American status is established.Nevertheless, to check whether these missing data affect our results, we have confirmed we obtain virtually identical UD estimates when inverse-weighting by an estimate of the probability that the race field is non-missing as a function of observable characteristics of the call (such as the child's age, the reporter type, and the nature of the allegation).
19 One concern with race data in child welfare may be the intentional miss-coding of race on the part of investigators, particularly in light of recent evidence that such miss-coding is present in the criminal justice system (Luh, 2022).We believe that typical models of miss-coding in our context would attenuate the racial gaps that we document; that is, if investigators tried to hide racial gaps in foster care placement, they may code a Black child as white, which would attenuate our estimates of unwarranted disparity.
20 Note that a subsequent investigation for child maltreatment in the home within six months is generally not observable for children placed into foster care.The average stay in foster care is 17 months and very few children return home within six months.
of discrimination that may be harder to quantify or address through reforms by CPS.

IV.B. Screener Analysis Sample and Balance Checks
The screener analysis sample is constructed from 558,434 unique hotline calls of white and Black children in Michigan between January 2017 and June 2019.We first drop observations with a missing screener identifier (N = 13, 885).To minimize noise in our measures of screener and investigator tendencies, we drop calls assigned to screeners with fewer than 100 calls in the sample (N = 413) and screened-in investigations assigned to investigators with fewer than 200 cases (N = 161, 494).To ensure that all calls in the sample are quasi-randomly assigned both to screeners and investigators, we drop calls involving sexual abuse as these are handled by a non-random subset of investigators (N = 15, 909).We also drop screened-in cases with missing child ZIP code information since quasi-random assignment of investigators is conditional on geography (N = 30, 462).Finally, we drop repeat reports within one year since repeat screened-in investigations tend to be assigned to the initial investigator (N = 129, 701).
The resulting analysis sample, summarized in columns 1-3 of Table I, consists of 206,570 hotline calls involving 190,776 children and 162 screeners.65% of children are white and around half are female; the average child is 8.5 years old.24% of all calls included at least one physical abuse allegation, while 88% included at least one neglect allegation and 1% included an unspecified maltreatment allegation.57% of calls come from a mandated reporter (e.g., a teacher, police officer, or doctor), while nearly 20% come from a family member.
The foster care placement rate in the full population of calls is 2.0% (1.7% for white children and 2.6% for Black children).This incorporates a screening-in rate of 63% among Black cases and 58% among white cases.White children are more likely to experience subsequent maltreatment when left at home (either due to being screened-out, or screened-in but not placed in foster care) in the full population of calls: 13.7% of white children experience subsequent maltreatment within six months, compared to 12.6% of Black children.
Table A.I checks an implication of the as-good-as-random assignment of screeners: that observable child and case characteristics are uncorrelated with the screening tendencies of the assigned screener.Separately by race, each column reports point estimates from an OLS regression of an indicator equal to one if the call was screened-in (Columns 1 and 3) and the screener's screening-in tendency (Columns 2 and 4) on child and call characteristics and day-by-shift fixed effects.As expected due to the rotational assignment of screeners, a rich set of characteristics are not jointly predictive of the instrument (p = 0.122 for Black children and p = 0.135 for white children) despite being very predictive of the decision to screen-in (with F-statistics of 244 and 632, respectively).

IV.C. Investigator Analysis Sample and Balance Checks
The investigator analysis sample is constructed similarly, making only the necessary sample restrictions for the investigator-level analysis.We start with the dataset of screened-in calls from Gross and Baron (2022), extended to June 2019.We focus on the 374,776 unique investigations in this dataset that did not involve repeat reports within one year and that were assigned to a primary CPS investigator who handled at least 200 investigations of white and Black children.As in the screener analysis sample, we drop cases that involved sexual abuse (N = 25, 907), cases with missing child ZIP code information (N = 46, 965), and cases involving children not classified as white or Black (N = 48, 523).Finally, we drop cases in rotations with only one investigation (N = 1, 830) and cases assigned to investigators who handled either investigations of only white or only Black children (N = 6, 975).
The resulting analysis sample, summarized in columns 4-6 of Table I, consists of 244,576 investigations involving 203,438 children and 814 investigators.69% of children are white and 48% are female; the average child is around seven years old.Nearly 30% of investigations include a physical abuse allegation with roughly 85% including a neglect allegation.The rate of cases including a physical abuse allegation is higher for Black children, while the rate of cases including a neglect allegation is higher for white children.Around 92% of alleged perpetrators include a parent or stepparent.
Overall, 3.4% of investigated (screened-in) children in this sample are placed in foster care (3% of white children and 4.2% of Black children).Around 16% of investigated but non-placed children are re-investigated for child maltreatment within six months.As in the population of all hotline calls, white children are more likely to be re-investigated within 6 months in the set of screened-in calls.The gap in subsequent maltreatment rates is larger in the investigator sample, however: 16.9% of white children compared to 14.5% of Black children.
Table A.II checks an implication of the as-good-as-random assignment of investigators: that observable child and case characteristics are uncorrelated with the foster care placement tendencies of the assigned investigator.As is the case for screeners, a rich set of characteristics are not jointly predictive of the placement instrument (p = 0.175 for Black children and p = 0.166 for white children) despite being very predictive of the decision to place in foster care (with F-statistics of 169 and 315, respectively).Moreover, in Table A.III, we use the more limited sample period where we observe investigators' demographic information to show that investigators of a given race/gender are not differentially likely to be assigned to same-race/same-gender cases.

IV.D. Descriptive Disparity Analysis
As an initial analysis of screening and placement racial disparities, we estimate descriptive regressions of screening and placement decisions (conditional on the call being screened-in) on an indicator for a child being Black controlling for a variety of child and call characteristics.Table A.IV presents these estimates.Columns 1 and 3 present estimates from simple bivariate regressions without any controls.The 5.3 percentage point screening disparity and the 1.2 percentage point placement disparity correspond to the gaps previously discussed in Table I.
In columns 2 and 4 of Table A.IV, we additionally control for rotation fixed effects-day-by-shift in the screening decision and ZIP code-by-year in the placement decision-and the observable characteristics listed in Table I.The screening disparity shrinks to a 3.7 percentage point gap (6% of the mean screening rate), while the placement disparity shrinks to a 1 percentage point gap (29% of the mean placement rate).
A significant disparity in screening and placement rates thus remains among observably similar children and cases.But the implications of these controlled disparities for UDs are at this point unclear: we cannot adjust these descriptive regressions for subsequent maltreatment potential.Consequently, the overall disparities in Columns 1 and 3 of Table A.IV may suffer from OVB and either over-or under-state the true level of UD across child welfare screeners and investigators in Michigan.The controlled disparities in Columns 2 and 4 may furthermore suffer from included variables bias (IVB) if the controls include mediators of UD.We next present UD estimates that overcome both challenges, following the Section III.approach.

V. Main Results
We present our primary findings in four stages.First, we present UD estimates for screeners (Section V.A.) and investigators (Section V.B.).Second, we present and decompose estimates of placement UD (Section V.C.).Third, we show robustness to a series of relaxations of the exclusion restriction (Section V.D.).Fourth, we document heterogeneity by subsequent maltreatment potential and consider its welfare implications (Section V.E.).

V.A. Unwarranted Disparity in Screener Decisions
As shown in Section III., the identification challenge in estimating UDs in screeners' decisions reduces to the challenge of identifying two key parameters: white and Black mean risk in the full population of calls.We begin by presenting estimates of these two key parameters.We then use these parameters to estimate average screener UD.
Using the screener sample, we extrapolate variation in the race-specific placement and subsequent maltreatment rates across the quasi-randomly assigned screeners.Panel A of Figure II plots this variation, with a binned scatterplot of strata-adjusted estimates of screeners' placement and subsequent maltreatment rates among calls that did not result in foster care placement.The figure reveals that subsequent maltreatment risk is higher for white cases and that a large number of screeners have a placement rate close to zero, suggesting plausible grounds for extrapolation.The intercepts of each pair of specifications yield estimates of the key mean risk parameters, µ b and µ w . 21Given the low treatment rates in this setting, a linear, quadratic, and a local linear extrapolation all yield very similar estimates of the intercept.We generally use linear extrapolation throughout for simplicity.
Column 1, Panel A of Table II reports estimates of race-specific subsequent maltreatment risk, µ b and µ w , from our linear specification.The average maltreatment rate in the six months following an investigation is 0.129 (SE=0.001) in the population of Black children and 0.141 (SE=0.001) in the population of white children.That is, in the full population of hotline calls, we estimate that white children are 1.2 percentage points more likely to experience subsequent maltreatment in the home relative to Black children.
We combine these race-specific mean risk estimates with estimates of screeners' screen-in and conditional subsequent maltreatment rates to estimate UDs in each screener's decisions.We then average across screeners, weighting by their individual caseloads to obtain an estimate of the overall average screener UD (∆ S ).We find a weighted average UD of 0.050 (SE=0.001),suggesting that, on average, calls involving Black children are screened-in at a 5 percentage points higher rate than calls involving white children with identical potential for future maltreatment (Column 1, Panel B).This represents a disparity of roughly 8% of the overall screen-in rate of 60%.The estimated UD is about 35% larger than the controlled observational disparity of 3.7 percentage points (Table A.IV), which underscores the importance of accounting for the selectively observed maltreatment potential.
To examine the sensitivity of our estimates of screener UD to different values of race-specific maltreatment risk, we also construct the non-parametric bounds described in Section III.. We first compute a range of possible mean maltreatment risk parameters, given by the overall average rates of placement and future maltreatment in the full population of calls.The lower bound in this range is obtained by assuming that no children placed in foster care would have experienced subsequent maltreatment, while the upper bound is obtained by assuming that all children placed in foster care would have experienced maltreatment in their homes.Applying this logic to the values in Table I, we estimate Black mean risk bounds of µ b ∈ [0.125, 0.151] and white mean risk bounds of µ w ∈ [0.137, 0.154].We then estimate the range of overall average screener UDs given all combinations of (µ b , µ w ) in these bounds.Panel A of Figure III shows robust evidence of average screener UD.The range of possible UDs, as measured in the case-weighted average ∆ S , is tightly estimated to be from around 4.95 percentage points to around 5.1 percentage points.The black lines in the figure show that our estimates of average screener UD from linear, quadratic, and local linear extrapolations fall around the middle of the non-parametric bounds. 22

V.B. Unwarranted Disparity in Investigator Decisions
We follow a similar approach to estimating unwarranted disparity in investigators' decisions: we first estimate white and Black mean risk in the population of screened-in calls using the investigator sample.We then use these parameters to estimate average investigator UD.
In Panel B of Figure II, we extrapolate variation in the race-specific placement and subsequent maltreatment rates across the quasi-randomly assigned investigators.Column 2, Panel A of Table II reports  .The average future maltreatment rate is 0.155 (SE=0.003) in the population of screened-in Black children and 0.175 (SE=0.003) in the population of screened-in white children.As in the population of all hotline calls, screened-in white children are more likely to experience subsequent maltreatment relative to screened-in Black children.However, the difference in maltreatment risk is larger in the screened-in sample (2 percentage points versus 1.2 percentage points among all calls).
Column 2 of Panel B reports the corresponding estimates of average overall investigator UD (∆ D ).On average, investigators place Black children at a 1.7 percentage point higher rate than white children with identical potential for future maltreatment (SE=0.002).This represents a disparity of roughly 50% relative to the placement rate among screened-in calls of 3.4%, and is nearly 70% larger than the controlled observational disparity in Table A.IV.
We also construct non-parametric bounds of average investigator UD, following the same logic we used to bound average screener UD but now focusing only on screened-in cases.We estimate Black mean risk bounds among screened-in calls of µ S=1 estimated to be from around 1.2 percentage points to around 2.1 percentage points.23

V.C. Placement Unwarranted Disparity and Decomposition
Foster care placement ultimately arises via a combination of screeners' decisions to screen-in the call and investigators' decisions to place children into foster care.We next link the UDs in screener and investigator decisions by a decomposition of overall placement UD (∆ P ).Panel C of Table II shows estimates of overall placement UD from Equations ( 8) and ( 9).
Calls involving Black children are 1.1 percentage points more likely to end up in foster care than calls involving white children with identical maltreatment potential (SE=0.001).This represents a disparity of roughly 55% relative to the placement rate among all calls of 2%.
Our two decompositions in Panel C show that screener decisions account for between 13% and 19% of overall placement UD, with investigators driving the remainder.The fact that call screeners drive a significant share of eventual UD in foster care placement is somewhat surprising, since only a small share of screened-in investigations result in placement.This finding highlights the importance of a systems-based analysis of discrimination in high-stakes settings like CPS: our estimates show that eliminating unwarranted disparity in foster care placement rates may require intervention at both phases of CPS involvement.Policies that are able to reduce unwarranted disparity in only the investigation or screening phase leave behind significant discrimination.At the same time, the fact that investigators are the main contributors to overall placement UD underscores the importance of further unpacking UD in investigators' decisions, which we do in Section VI..

V.D. Relaxing the Exclusion Restriction
The primary threat to the validity of our UD estimates, given quasi-random-assignment, is the possibility of direct effects of screeners and investigators on subsequent at-home maltreatment potential.Here we discuss several tests and extensions of our main estimation procedure to check sensitivity to such exclusion restriction violations.Taken together, these exercises suggest our baseline UD estimates are robust to a wide range of possible violations.
We first examine the exclusion restriction for the screener analysis.Given their narrow mandate and responsibilities, screeners have little scope for affecting future maltreatment potential except through the screening-in decision.However, in principle, this decision could directly affect maltreatment potential in a way that introduces bias in our analysis.Specifically, screening-in a call could see the underlying Y * i changed through the effect of investigation on home conditions or behavior.24Such screening-in effects could bias our estimates of µ r , which come from pooling together screened-out calls with calls that are screened-in but do not result in foster care placement.
We develop a procedure in Online Appendix B to test for such direct screening-in effects.This procedure uses the fact that we can estimate the marginal effect of being screened-in on measured at-home maltreatment, (1 i , via a standard IV approach leveraging quasi-random screener assignment.We can then bound the marginal effect of being screened-in on Y * i using the fact that it is binary and the fact that we can measure the analogous marginal effect on P i .Applying this procedure, we estimate tight bounds on the direct screening-in effect of [−0.006, 0.014] (albeit with a standard error of 0.030 on each bound).Hence, we do not find strong evidence to suggest the act of screening-in meaningfully affects at-home maltreatment potential-supporting the exclusion restriction for our screener analysis.
We next turn to a set of specification checks that examine the plausibility of the exclusion restriction for the investigation phase.Here, the scope for unmeasured direct effects is conceivably larger since investigators can interact with the child's parents over an extended period and may refer the family to targeted services as an alternative to foster care.If such services have meaningful effects on at-home maltreatment potential, or if the extended exposure to investigators causes parents to change their behavior in a manner relevant to at-home maltreatment potential, our estimates of the key screened-in mean risk parameters µ S=1 r , which only consider the placement decision, could be biased. 25 develop a series of extensions in Online Appendix B that relax the investigator exclusion restriction by allowing for direct effects of targeted services or investigator contact more broadly.We first show that without any exclusion restriction we can still bound a particular measure of unwarranted disparity: one that conditions on a measure of subsequent maltreatment potential that may be impacted by investigators' differential provision of services or unobserved contact effects.These "exclusion-free" bounds come from a version of the Figure III bounds that only uses aggregate (i.e., state-level) rates of placement and subsequent at-home maltreatment, and thus make no assumptions on investigator assignment or excludability.As shown in Table A.VII, we obtain from this exercise bounds on overall average investigator UD of [0.009, 0.015] with standard errors of around 0.001 on each bound.These are similar to (if somewhat smaller than) our baseline estimates of average investigator UD. 26 The exclusion-free bounds can be seen to capture a policy-relevant measure of unwarranted disparity: the differential rate that investigators place Black children into foster care vs. white children of equal maltreatment risk, conditional on any other interventions that might affect this risk.To gauge the importance of such conditioning, we extend this approach in two ways.First, we adjust the exclusion-free bounds to remove any effects of targeted services on maltreatment potential using the observed service assignment rate (of 6.8%) and a priori bounds on the service takeup rate.Intuitively, when assignment and takeup rates are low, such adjustment would have minimal effects on the exclusion-free bounds even for arbitrarily large effects of targeted services on maltreatment potential.Panel A of Table A.VII shows we obtain average investigator UD bounds of [0.008, 0.017] with standard errors of around 0.001 on each bound when assuming a takeup rate of 10%, which is on the high end of our estimates of targeted service takeup in this setting (as discussed in Section II.).For these bounds to include zero, takeup would have to be much higher than this conservative estimate-around 60%, again allowing for arbitrarily large service effects. 27cond, we adjust the exclusion-free bounds to remove any unobserved direct effects of investigator contact on subsequent maltreatment potential.Intuitively, such effects would bias our estimates of µ S=1 r up or down relative to the true race-specific maltreatment risk one would observe in the absence of investigator contact.We thus test how the exclusion-free bounds change as we allow for different ranges of bias in our estimates.Panel B of Table A.VII shows we obtain average investigator UD bounds of [0.007, 0.018] with standard errors of around 0.001 when we allow bias in the range of ±1 percentage point, which is approximately the estimated upper bound of screening-in effects discussed above.For these bounds to include zero, the bias would have to be much higher-in the range of ±5 percentage points.Notably, the endpoints of these bounds arise in these simulations when Black and white children experience opposite-signed effects of investigator contact: the lower bound crosses zero when investigator contact decreases Black maltreatment potential by around 5 percentage points while simultaneously increasing white maltreatment potential by around 5 percentage points.

V.E. Heterogeneity by Maltreatment Potential
Recall that our baseline placement UD measure (∆ P ) is a weighted average of two conditional disparities: the disparity among children with potential for subsequent maltreatment (∆ P 1 ), and the disparity among children without potential for subsequent maltreatment (∆ P 0 ).The same is true for our measures of UDs in screener (∆ S ) and investigator decisions (∆ D ).While the ∆ averages are of significant policy interest, capturing the expected level of UD when the decision-maker encounters a representative pool of children with unknown future maltreatment potential, here we document striking heterogeneity in some of their subcomponents.
To set the stage for these results, consider the interpretation of the two subcomponents of ∆ P .A finding of ∆ P 0 > 0 would suggest that Black children without future maltreatment potential are "over-placed" in foster care relative to white children without future maltreatment potential.This disparity would be unambiguously harmful to Black children, since family separation is costly and they would have been safe remaining in their homes.In contrast, a finding of ∆ P 1 > 0 may suggest that the relatively higher placement rate of Black children is actually protective, since they experience subsequent maltreatment in their homes and previous research in our context shows that the causal effects of foster care are beneficial for both Black and white children-especially in cases with maltreatment potential in the home (Baron and Gross, 2022; Gross and Baron, 2022).Thus, a finding of ∆ P 1 > 0 could be interpreted as "under-placement" of white children relative to equally-risky Black children.
Table III shows significant heterogeneity in the UDs by the unobserved level of maltreatment potential.The average estimate of overall placement UD (1.1 percentage points) is concentrated in the population of children with maltreatment potential, with an estimate of ∆ P 1 of 4 percentage points.This means 7.9% of Black children with maltreatment potential are placed in foster care-twice the placement rate experienced by white children at 3.9% (Table A.VIII).In contrast, there is a much smaller disparity among children without maltreatment potential, with an estimate of ∆ P 0 of around 0.5 percentage points.
The systems-based analysis reveals a further interesting pattern: ultimate placement UDs are largely driven by disparities in the population of calls with subsequent maltreatment potential, despite significant UDs in screener decisions in both subpopulations (of Y * i = 1 and Y * i = 0) of 3.8 and 5.2 percentage points, respectively.We thus find that the heterogeneity in placement UDs is due to the actions of investigators, who make the ultimate placement recommendation.Specifically, we find that investigators amplify initial screener UDs in the subpopulation with Y * i = 1, despite observing the same or more information and deliberating over a much longer time frame.We estimate an average investigator UD of nearly 6 percentage points in this subpopulation.At the same time, investigators mitigate initial screener UDs in the subpopulation with Y * i = 0. We estimate a statistically insignificant investigator UD of 0.8 percentage points in this subpopulation.Table A.VIII shows these results follow from very different investigator placement rates across these subpopulations.Investigators place 13.4% and 7.6% of Black and white screened-in children with maltreatment potential, respectively, but only 2.9% vs 2.1% of Black and white screened-in-children without maltreatment potential, respectively.This is likely because of investigators' skill at inferring underlying risk.

Welfare Implications
These results provide a better understanding of racial discrimination in child protection, which may help guide the appropriate policy response.As mentioned above, a question that our findings may speak to is whether Black children are "over-placed" relative to white children or whether white children are "under-placed."Properly evaluating this question requires an understanding of the benefits and costs of placement for marginal cases.
Two pieces of evidence suggest that in our context foster care placement improves the well-being of screened-in children, reducing subsequent maltreatment and improving longer-term outcomes.First, the quasi-experimental variation in Panel B of Figure II shows that the likelihood of subsequent maltreatment is decreasing in investigator placement rates.Maltreatment is known to carry substantial costs for children's short-and long-term outcomes (Currie and Spatz Widom, 2010; Currie and Tekin, 2012; Soares et al., 2021; Doyle and Aizer,  2018), and failing to place high-risk cases means these costs are more likely to be realized.
Second, among marginal cases where investigator assignment matters for foster care placement decisions, prior research in our setting suggests foster care improves longer-term, welfare-relevant outcomes for screened-in children.Gross and Baron (2022) and Baron and  Gross (2022) show that foster care placement causes significant declines in later-in-life criminal justice contact as well as improvements in schooling outcomes such as attendance, test scores, high school graduation, and postsecondary enrollment.These benefits are similar for white and Black children; if anything, the estimates are larger for white children in percent terms.Moreover, comparisons across investigator types in Baron and Gross (2022) suggest that the effects are most positive for children assigned to low-placement-rate investigators.This demonstrates the intuitive notion that marginal cases at particularly high-risk are more likely to benefit from placement.Our current results show that racial disparities in placement are found precisely in these high-risk cases (Y * = 1). 28Thus, our findings suggest that high-risk cases may benefit from placement both in terms of short-run maltreatment risk and longer-run outcomes, such as improvements in educational outcomes and reductions in criminal justice involvement.A full welfare consideration would include a wider set of benefits and costs, as well as the unobserved willingness to pay by parents to avoid child placement in foster care.Nevertheless, in contrast to the motivation for numerous reforms recently proposed in child welfare policy-which are based on the premise that Black children tend to be over-placed relative to white children-our findings seem consistent with white children at high risk of continued maltreatment being relatively under-placed compared to Black children.Consider, for example, a policy that lowers the placement rate of Black children to equalize white and Black placement rates among screened-in children with identical maltreatment potential.A back-of-the-envelope calculation based on our estimates, combined with observed racial disparities in adult criminal justice contact in Michigan and estimates of the causal effects of foster care on adult crime from Baron and Gross (2022), suggest such a policy would increase the Black-white adult conviction gap by 10% (see Online Appendix C for details).

VI. Unpacking Investigator UDs
The rest of this paper further unpacks investigator UDs, given their primary role in shaping overall foster care placement UD.We first study heterogeneity in investigator UDs by investigator characteristics (Section VI.A.) and examine possible drivers (Section VI.B.).We then summarize a variety of additional robustness checks (Section VI.C.) and an extension which probes external validity with national data (Section VI.D.).

VI.A. Heterogeneity by Investigator Characteristics
Table IV reports estimates from OLS regressions of the estimated ∆ D j1 on several investigator characteristics.We focus on investigator UDs among children with subsequent maltreatment potential, where investigator UD is concentrated.Results are similar for the overall ∆ D j 's. 30 Column 1 of Table IV documents significant racial concordance effects in investigator decisions: 2008)), the effects are likely to be most positive precisely in cases with subsequent maltreatment potential.
29 Such a policy has spurred various recently proposed reforms, exemplified by initiatives such as the Minnesota African American Family Preservation Act, Nassau and Kent County's Blind Removals programs, and policy recommendations outlined in the New York State Bar Association's "Resolution addressing systemic racism in the child welfare system of the State of New York." 30 As discussed in Section IV., a subset of MDHHS data (from 2008 to 2017) contains investigators' names, which we use to predict demographic characteristics.We focus on the 699 investigators for whom we have such information in this section.Table A.IX shows that our estimates of race-specific maltreatment risk and investigator UD are nearly identical in this sample.
white investigators tend to place white children in high-risk situations at lower rates than Black children in high-risk situations, and vice-versa for Black investigators.Specifically, white investigators are estimated to have an average ∆ D j1 of 8.4 percentage points (the control mean in this specification) while Black investigators are estimated to have an average ∆ D j1 of -5.7 percentage points (the sum of the control mean and the Black investigator coefficient estimate).These estimates suggest that investigators of a given race may give the "benefit of the doubt" to families of their same race in high-risk situations.Because the vast majority of investigators in Michigan are white, however, on average Black children have higher placement rates in cases with maltreatment potential.
The table also shows that other investigator traits are associated with smaller disparities.Female investigators, investigators working in urban counties, and investigators with a caseload featuring an above-median share of Black children have lower levels of UD in this subpopulation.

VI.B. Potential Drivers
As noted in Section III., UDs can arise from direct discrimination on the basis of race (either via taste-based discrimination or statistical discrimination), as well as indirect discrimination through non-race characteristics.We first examine the scope for indirect discrimination by testing whether adjusting for child-specific traits (such as age, gender, and family size) as well as investigation-specific traits (such as the nature of the allegations and relationship to the alleged perpetrator) changes the UD estimates.To do this, we residualize the estimated investigator-and race-specific placement and subsequent maltreatment rates by the child and investigation characteristics in Column 4 of Table I.We then recompute mean maltreatment risk and investigator UDs with these adjusted rates.
Table A.X suggests limited scope for indirect discrimination on non-race characteristics.Adjusting for child-and investigation-specific traits leads to very similar estimates of average investigator UD relative to our baseline estimates in Table II (1.6 and 1.7 percentage points, respectively).This finding suggests UD is similar across these non-race characteristics; indeed, we find UD present among every subgroup of children.Table A.XI shows analyses conducted separately for specific subgroups of children.While estimates of UD are larger for female children relative to male children, younger children relative to older children, and investigations that do not involve physical abuse relative to those that do, all estimates are at or above 1 percentage point and statistically significant.In other words, we find meaningful UD across all observed subgroups.
We next study the potential drivers of direct discrimination.Without imposing additional structure on the quasi-experimental variation, it is difficult to disentangle racial bias and statistical discrimination (Hull, 2021).In Online Appendix D, we estimate a structural model of investigator decision-making akin to that in Arnold, Dobbie, and Hull (2022), which parameterizes the quasi-experimental variation via a series of marginal treatment effect frontiers.Estimates from this model suggest that racial bias, either from racial preferences or inaccurate beliefs, is the primary driver of UD in foster care placement decisions.We find little evidence for statistical discrimination: estimates suggest that investigators act on similarly-precise signals of maltreatment potential by race.Furthermore, estimates of mean maltreatment risk by race suggest investigators should, if accurately statistically discriminating, place white children at higher rates than Black children with identical maltreatment potential, as opposed to the lower placement rates that we observe.

VI.C. Additional Robustness Checks
Strata-adjustment and weighting in the extrapolations.Throughout the paper, we account for the fact that investigators are only quasi-randomly assigned conditional on geography and time using a linear adjustment that accounts for strata fixed effects.Column 2 of Table A.XIII shows that our results are nearly identical if we do not adjust for strata fixed effects.Moreover, in our baseline extrapolations, we inversely weight by the variance of estimation error in each investigator's re-investigation rate.Column 3 shows that our results are robust to not weighting the specification.
Empirical Bayes shrinkage.In principle, estimation error may attenuate the estimated relationship between placement rates and rates of subsequent at-home maltreatment across investigators.In practice, such attenuation is likely to be minimal since we restrict attention to investigators who see a large number of cases.To verify this, we conduct a standard empirical Bayes "shrinkage" correction to the placement rate estimates (separately by race).Column 4 shows that this exercise yields very similar estimates to our main results, consistent with there being minimal bias from the first-step estimation error.
Omitted payoff bias.Our main outcome throughout the study is whether the child was re-investigated for alleged child maltreatment in the home within six months of the focal investigation.31Table A.XIV considers robustness to alternative time frames for re-investigation, finding that the specific period considered has little impact.In Columns 2 through 5 of the table we instead use re-investigation within two, three, four or five months. 32eassuringly, we find very similar estimates of investigator UD across all horizons.
We next examine robustness of our proxy for subsequent maltreatment potential.Table A.XV shows that we obtain very similar results when we instead consider a range of other outcomes, including the potential for a subsequent substantiated investigation and the potential for subsequent foster care placement.While we prefer subsequent re-investigation as Y * i because other measures may be endogenously determined through re-assignment to the initial investigator, it is reassuring that we find similar levels of UD by conditioning on more severe (though potentially endogenous) maltreatment proxies.
A related concern is the potential disconnect between actual maltreatment versus reported maltreatment.For example, subsequent investigations could be partially driven by racial biases in the reporting of child maltreatment.Prior research suggests that Black children may be disproportionately likely to be reported conditional on case severity (Lane et al., 2002).Such bias would tend to understate investigator UD in our context by inflating the risk of a subsequent investigation for screened-in Black children left at home.Nevertheless, we explore this concern using the screener sample, which contains the category of the initial maltreatment reporter.When we estimate UDs separately for children referred by mandated reporters (social workers, educational, medical, and law enforcement personnel) and non-mandated reporters (e.g., neighbors or other family members), we find similar estimates of UD across these two reporter types (Table A.XVI).These results suggest that our main estimates are not driven by biases from a particular reporter type.
Maltreatment in foster care.Another possible concern with the interpretation of our baseline UD measures, which condition only on a child's potential for subsequent maltreatment in the home, is that children may also face maltreatment in foster care.If maltreatment in foster care is common, and if investigators select children for foster care placement "on gains" of maltreatment potential inside vs. outside of foster care, then a more appropriate UD measure might condition on both potential outcomes.Specifically, questions of possible "over-" and "under-placement" of Black and white children might be better answered by estimating disparities in foster care placement among Black and white children with the same potential reduction in subsequent maltreatment when placed.
Two key empirical facts suggest that such an analysis is unlikely to affect our main conclusions.First, subsequent maltreatment is rare among children placed into foster care: in our data, only around 3% of children in foster care see a subsequent investigation while in foster care, with fewer than 0.8% seeing a substantiated investigation. 33Second, investigators' removal the focal investigation, and we do not observe a disposition date in the data. 33These numbers line up with federal data on substantiated maltreatment in foster care (USDHHS, 2021).
decisions are uncorrelated with the potential for maltreatment in foster care: Figure A.I shows that the slope of the regression relating maltreatment rates in foster care to investigator placement rates is insignificant for both white and Black children.This finding is unsurprising, given the CPS mandate for investigators to focus on maltreatment potential in the home and the fact that investigators have minimal involvement in the decision over where to place a child when removed from home.34Taken together, these facts suggest investigators select "on levels" of maltreatment potential in the home (their stated mandate) and not "on gains" from reduced maltreatment potential in care (which they have limited information and control over).
In this case, our baseline estimates also capture disparities among Black and white children with the same potential reduction in subsequent maltreatment when placed into foster care.

VI.D. External Validity
Because CPS is administered at the state or local level and the decision-making process of CPS investigators may vary across states, a natural question is whether our key findings are generalizable outside of Michigan.To examine this question, we use a nationwide dataset from the National Child Abuse and Neglect Data System (NCANDS, 2023) that has fewer variables, but covers most of the US from 2008 to 2019 (matching our main sample period).This dataset contains information on child maltreatment investigations in most states in the U.S. The data include key variables such as the child's race, subsequent CPS investigations, and whether the investigation resulted in foster care placement.While the data do not contain unique investigator identifiers (which prevents us from using extrapolation methods), we apply the aggregate non-parametric bounds discussed in Section V.D. to estimate state-specific investigator UDs.Online Appendix E details the data and approach, and shows that we can replicate our main findings for Michigan using this more limited dataset.
Non-parametric bounds show that our primary findings are generalizable to most U.S. states.Overall, we estimate nationwide average investigator UD bounds of [0.003, 0.017].That is, the average state in the NCANDS data places screened-in Black children in foster care at higher rates than screened-in white children facing the same future maltreatment potential.Figure IV further plots the average estimate in state-specific UD bounds, separately for cases with and without maltreatment potential.As in Michigan, investigator UD in cases without maltreatment potential is near zero in most states while UD in cases with maltreatment potential tends to be as large or larger.We estimate a nationwide average investigator UD among cases without maltreatment potential of 0.6 percentage points, while the estimated average investigator UD among cases with maltreatment potential is 3.7 percentage points.

VII. Conclusion
This paper develops an empirical framework for studying how discrimination arises and evolves across multi-phase systems.We apply this framework in the context of child protection, a high-stakes and often-debated system, leveraging the quasi-random assignment of both CPS screeners and investigators.We find substantial unwarranted disparity in both phases of the CPS system we study, with investigators inheriting and amplifying initial disparities by screeners.Overall, we find that screeners account for up to 19% of total unwarranted disparity in foster care placement, with investigators accounting for the remainder.Strikingly, in our context, this UD is concentrated in the population of children with subsequent maltreatment potential-a finding that generalizes to most other U.S. states.
Our results have important policy implications for tackling discrimination in foster care placement.The fact that UD arises from both phases of the CPS system suggests that addressing racial disparities may require systemic intervention: policies that target discrimination by focusing only on one phase of CPS may leave untouched significant unwarranted disparity.Furthermore, the finding that UD is concentrated in high-risk cases is relevant to often-discussed policies focused on raising the threshold to place Black children in foster care.Since foster care appears protective to marginally placed children in this setting, such interventions may disproportionately harm Black children by keeping them in risky home environments.Our empirical framework yields a tractable foundation for the important next step of developing appropriate policy responses.
The methods in this paper may also prove useful in other settings where large and pervasive racial disparities have been documented, such as employment, healthcare, housing, and criminal justice.Like child protection, decisions in these settings usually arise across multiple phases, and it is often challenging to disentangle discrimination from selection.Recent work in such settings has developed a variety of quasi-experimental tools to estimate causal parameters.Bringing our framework to these settings, in combination with these tools, may yield new insights on persistent inequities and help design effective policy interventions.Notes.The table reports estimates of mean maltreatment risk and unwarranted disparity in both phases of the CPS system.Panel A reports estimates of race-specific average subsequent maltreatment risk, both in the full population of calls and in the population of screened-in calls.These estimates come from the linear extrapolations in Figure II.Panel B reports estimates of average overall screener and investigator UD (weighted by the number of cases assigned to each screener and investigator, respectively).Panel C reports overall placement UD in the full population of cases, as well as the share of placement UD that is due to screeners' decisions according to Equations ( 8) and ( 9).Robust standard errors for estimates of maltreatment risk in the population of all calls, as well as average screener UD, are two-way clustered at the child and screener level.Robust standard errors for estimates of maltreatment risk in the population of screened-in calls, as well as average investigator UD, are two-way clustered at the child and investigator level.Robust standard errors for estimates of overall placement UD, as well as the percent of placement UD due to screeners and investigators, reflect uncertainty in the underlying estimates of both average screener and investigator UDs.All standard errors are obtained via a bootstrapping procedure (with 500 replications) and appear in parentheses.Notes.This table presents estimates of average screener, investigator, and placement UD separately by maltreatment potential.Specifically, the first row presents estimates of ∆ S , ∆ S 1 , and ∆ S 0 , respectively.We then present estimates of ∆ D , ∆ D 1 , and ∆ D 0 .Finally, we combine estimates of screener and investigator UDs to estimate overall placement UD (∆ P , ∆ P 1 , and ∆ P 0 ) according to Equations ( 8) and ( 9).Robust standard errors for estimates of screener UDs are two-way clustered at the child and screener level.Robust standard errors for estimates of investigator UDs are two-way clustered at the child and investigator level.Robust standard errors for estimates of placement UD reflect uncertainty in the underlying estimates of both screener and investigator UDs.All standard errors are obtained via a bootstrapping procedure (with 500 replications) and appear in parentheses.
estimates of µ S=1 w and µ S=1 b

b∈
[0.139, 0.181] and white mean risk bounds of µ S=1 w ∈ [0.164, 0.194].We then estimate the range of overall average investigator UDs in Panel B of Figure III, given all combinations of (µ S=1 b , µ S=1w ) in these bounds.The range of possible UDs, as measured in the case-weighted average ∆ D , is 22 Tighter bounds on average screener UD are obtained by using the placement and future maltreatment rates of screeners with low placement rates.TableA.V shows that restricting to screeners with an estimated (strata-adjusted) placement rate of 0.01 or lower yields mean risk bounds of µ b ∈ [0.127, 0.137] and µ w ∈ [0.138, 0.148].These allow us to construct a tighter bound on average screener UD of [0.050, 0.051].

Figure
Figure I: Child Protection in Michigan

Notes.
Panel A shows how our estimates of average screener UD change under different estimates of Black and white mean risk in the full population of calls.The mean risk estimates obtained from the linear, quadratic, and local linear extrapolations in Panel A of Figure II are indicated by solid, dashed, and dotted lines, respectively.The ranges of Black and white mean risk reflect the bounds implied by the average placement and subsequent maltreatment rates in the screener sample: µ b ∈ [0.125, 0.151] and µ w ∈ [0.137, 0.154].Panel B repeats this exercise, but for average investigator UD.The ranges of Black and white mean risk in the population of screened-in calls implied by the average placement and subsequent maltreatment rates in the investigator sample are: µ S=1 b ∈ [0.139, 0.181] and µ S=1 w ∈ [0.164, 0.194].

Figure
Figure IV: Nationwide Estimates of Average Investigator UD, by Maltreatment Potential

Table II :
Estimates of Mean Maltreatment Risk and Unwarranted Disparity

Table III :
Estimates of UD by Subsequent Maltreatment Potential