-
PDF
- Split View
-
Views
-
Cite
Cite
Ben Weidmann, Luke Miratrix, Missing, Presumed different: Quantifying the risk of Attrition Bias in Education Evaluations, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 184, Issue 2, April 2021, Pages 732–760, https://doi.org/10.1111/rssa.12677
- Share Icon Share
Abstract
We estimate the magnitude of attrition bias for 10 randomized controlled trials (RCTs) in education. We make use of a unique feature of administrative school data in England that allows us to analyse post-test academic outcomes for nearly all students, including those who originally dropped out of the RCTs we analyse. We find that the typical magnitude of attrition bias is 0.015 effect size units (ES), with no estimate greater than 0.034 ES. This suggests that, in practice, the risk of attrition bias is limited. However, this risk should not be ignored as we find some evidence against the common ‘Missing At Random’ assumption. Attrition appears to be more problematic for treated units. We recommend that researchers incorporate uncertainty due to attrition bias, as well as performing sensitivity analyses based on the types of attrition mechanisms that are observed in practice.
1 INTRODUCTION AND BACKGROUND
Attrition has been described as ‘the Achilles Heel of the randomized experiment’ (Shadish et al., 1998 p.3). Attrition looms as a threat because it can undermine group equivalence, eroding the methodological strength at the heart of randomized evaluations. In short, attrition can cause bias.
Attrition bias is the focus of this paper. We define attrition bias as the difference between the expected average treatment effect (ATE) estimate of the final analysis sample, and the ATE of the randomization sample. Our main goal is to quantify and explore the nature of attrition bias in practice. We focus on the context of education research, a field which has seen a large increase in the number of randomized experiments over the past two decades (Connolly et al., 2018).
The threat of attrition bias plays a significant role in assessing the quality of education evaluations. The What Works Clearinghouse (WWC) and the Education Endowment Foundation (EEF)—organizations responsible for setting evidence standards in the United States and the United Kingdom—both have threshold rates of attrition (EEF, 2014; WWC, 2017). Beyond these thresholds, studies officially lose credibility. In the case of the EEF, for example, if attrition is greater than 50% then the results of the evaluation are largely disregarded.
Despite the awareness of attrition as a threat to the quality of education research, remarkably little scholarship has focussed on quantifying the magnitude of attrition bias (Dong & Lipsey, 2011). The reason is simple: estimating attrition bias requires outcome information from pupils who, by definition, are no longer participating in research.
In response to this fundamental empirical challenge, existing literature has largely focussed on simulation studies. These studies demonstrate scenarios for which attrition bias is larger than the typical effect sizes in education interventions (Dong & Lipsey, 2011; Lewis, 2013; Lortie-Forgues & Inglis, 2019; WWC, 2014). Equally, it is well-known that if attrition is unrelated to either treatment status or outcomes, then randomized experiments remain unbiased regardless of the level of attrition (Little & Rubin, 2019).
While theory and simulation studies illustrate the potential for attrition bias to cause problems, they provide practitioners with limited guidance about the risk of attrition bias in practice. Deke and Chiang (2017) take the first step towards providing such guidance. They attempt to sidestep the fundamental challenge of estimating attrition bias by analysing pre-test academic achievement as proxies for post-test outcomes. Pre-tests are often completed by pupils who stay in the evaluation (responders) as well as those who ultimately drop out (attriters). Using pre-tests, Deke and Chiang estimate attrition bias for four experiments, in each case comparing the estimated sample average treatment effect (SATE) of the whole sample to the estimated SATE of responders.
While Deke and Chiang (2017) represents an important step forward, it has several limitations. First and foremost, attrition may be shaped by events that happen after randomization. For example, during the course of an evaluation, a school may experience a change of leadership. If the new leader decides that implementing a research intervention is a distraction, they may drop out of the study. The resulting attrition could lead to bias if the leadership change coincides with, or causes, a decline in academic attainment. This bias would not be captured in an analysis that used a pre-test as a proxy for post-test outcomes. Second, analysing attrition bias using pre-tests makes it difficult to know whether attrition bias is problematic conditional on predictive covariates as the pre-test—by far the most predictive covariate—is being used as the outcome. As Deke and Chiang note ‘after conditioning on the pre-test, the residual difference in the post-test between respondents and nonrespondents could be completely different from the observed pre-test difference’ (p139). Finally, the study only looked at four interventions. This makes it difficult to describe the distribution of attrition bias across studies, and to estimate the typical value of attrition bias. The relative lack of cases also makes it hard to analyse some of the factors that might moderate attrition bias.
We avoid these limitations by utilizing a unique feature of English administrative school data. Specifically, we make use of an archive of randomized controlled trials (RCTs) that can be linked to a census of pupil and school information. Ten of the RCTs in the archive met two crucial conditions: a) the original outcomes were subject to attrition; b) after the intervention, students had sat a compulsory achievement test. For the 10 RCTs in our sample, we were able to obtain post-test academic achievement outcomes for students who exited the original randomized experiments. By comparing these outcomes to the equivalent outcomes of responder students, we can estimate attrition bias.
Attrition rates in our sample of 10 RCTs were typical of education experiments. At the student level, the mean rate of attrition across the 10 studies was 19 %. A broader analysis of education RCTs in the United Kingdom also found mean student-level attrition of 19 % (n = 79 experiments, see Demack et al., 2020).
Our work connects to a vast literature on missing data. Much of that work explores techniques to impute missing values under different assumptions about the attrition mechanism, for example, data that are Missing At Random or ‘MAR’ (Brunton-Smith et al., 2014; Carpenter & Plewis, 2011; Goldstein et al., 2014; Rubin, 1987; Sterne et al., 2009). Our paper complements this literature by providing an empirical assessment of how far away from MAR attrition mechanisms tend to be in practice, and what the consequences are for estimates of average treatment effects in the context of education. We also examine how the less sophisticated, but widely used, technique of regression adjustment reduces attrition bias.
The paper makes four contributions. First, we present novel estimates of attrition bias for 10 education RCTs, spanning 22 outcomes. This advances empirical scholarship by providing estimates of bias based on post-test outcomes. Using techniques from meta-analysis, we then estimate the typical magnitude of attrition bias across studies and outcomes.
Second, we present a framework for decomposing attrition bias into four components: the rate of attrition in each treatment arm, and the association between attrition and outcomes in each arm. We quantify the magnitude of these components across 22 study-outcome pairs, and report parameter values that define how pernicious attrition mechanisms tend to be in practice.
Third, we examine the plausibility of the ‘Missing At Random’ (MAR) assumption. In most real-world situations, this assumption is untestable. In our context, however, we are able to test MAR at the study-outcome level, as well as providing a global test across all the outcomes in our sample of evaluations. We find evidence against MAR.
Finally, we provide two substantive recommendations for researchers in the field: check whether conclusions are sensitive to ‘worst-observed case’ attrition mechanisms and incorporate uncertainty from attrition bias. For both recommendations we offer simple techniques that can be used in applied research. We illustrate these techniques with the REACH evaluation, an RCT of a reading intervention in England (Sibieta, 2016).
The paper is organized as follows. Section 2 describes the data and the interventions that underpin our analyses. Section 3 defines and decomposes attrition bias. Section 4 illustrates our approach to estimating attrition bias and presents headline estimates across 22 study-outcome pairs. In section 5 we test the MAR assumption and examine some potential predictors of pernicious attrition mechanisms. Section 6 provides researchers with recommendations about dealing with attrition bias, and section 7 concludes.
2 DATA AND INTERVENTIONS
2.1 Data
Our analysis relies on a unique set of linked databases in England. The key data source is an archive of RCTs maintained by the Education Endowment Foundation (EEF). The crucial feature of the archive is that it can be linked to the National Pupil Database (NPD), a census of publicly funded schools and pupils that represents over 90 % of English school children (DfE, 2015). The NPD contains standardized achievement measures at multiple year levels, along with information about student demographic characteristics. The outcomes and covariates available for analysis are summarized in Table 1. These data provide an unusual opportunity to study attrition bias. Because the RCT archive is linked to a census that contains achievement data, we can examine post-randomization outcomes data for students who would normally be lost to research.
Student achievement (national assessments) | Year 2 (age 7); end of ‘Key Stage 1’ | Maths, Reading |
Year 6 (age 11); end of ‘Key Stage 2’ | Maths, Reading, Writing | |
Year 11 (age 16); end of ‘Key Stage 4’ | Maths, English, Science | |
Student demographics | Age | Months of age |
FSM | Free-school-meal status | |
Female | Binary indicator of gender |
Student achievement (national assessments) | Year 2 (age 7); end of ‘Key Stage 1’ | Maths, Reading |
Year 6 (age 11); end of ‘Key Stage 2’ | Maths, Reading, Writing | |
Year 11 (age 16); end of ‘Key Stage 4’ | Maths, English, Science | |
Student demographics | Age | Months of age |
FSM | Free-school-meal status | |
Female | Binary indicator of gender |
This table describes the outcomes and covariates available for our analyses. For outcomes in Year 6 and Year 11, we use previous student achievement tests as a covariate. Data are described in NPD (2015).
Student achievement (national assessments) | Year 2 (age 7); end of ‘Key Stage 1’ | Maths, Reading |
Year 6 (age 11); end of ‘Key Stage 2’ | Maths, Reading, Writing | |
Year 11 (age 16); end of ‘Key Stage 4’ | Maths, English, Science | |
Student demographics | Age | Months of age |
FSM | Free-school-meal status | |
Female | Binary indicator of gender |
Student achievement (national assessments) | Year 2 (age 7); end of ‘Key Stage 1’ | Maths, Reading |
Year 6 (age 11); end of ‘Key Stage 2’ | Maths, Reading, Writing | |
Year 11 (age 16); end of ‘Key Stage 4’ | Maths, English, Science | |
Student demographics | Age | Months of age |
FSM | Free-school-meal status | |
Female | Binary indicator of gender |
This table describes the outcomes and covariates available for our analyses. For outcomes in Year 6 and Year 11, we use previous student achievement tests as a covariate. Data are described in NPD (2015).
The outcomes of the original RCTs were researcher-administered achievement tests in literacy, mathematics and science. These outcomes were naturally subject to attrition. For each reported outcome we find an analogous outcome in the NPD. For example, the ‘Shared Maths’ RCT used two maths modules of the Interactive Computerised Assessment System as the primary outcome (Lloyd et al., 2015); to estimate attrition bias, we use the total marks on the Key Stage 2 maths assessment (NPD, 2015). In effect, we create almost-complete datasets for the original RCTs—while knowing the attrition status of students in these studies—by changing the outcome measure to one used in a compulsory test.
We sought NPD tests that were administered as soon as possible after RCT intervention had finished. For many of the RCTs, there was a short delay between the original outcome measure and the NPD outcome. Across our 10 RCTs, the median delay was 7 months.
Finally, we note that despite the possible differences in the timing and content of the tests, there were strong correlations between the original evaluation outcomes and the outcomes we use in our attrition analyses. The mean correlation across 22 outcomes was . This value is attenuated by measurement error and would be strictly less than one even if the tests measured exactly the same domain at the same time.
2.2 Interventions
The 10 interventions we analyse represent all the available randomized trials from the EEF archive that met two criteria: a) the original outcome was subject to attrition; and b) pupils subsequently sat a standardized national achievement test.
The interventions were quite diverse. While the overarching purpose of all 10 interventions was to raise academic achievement, programmes pursued this mission in a variety of ways. Some were directly focused on achievement outcomes and targeted at low-achieving pupils. For example, the ‘LIT programme’ provided struggling readers in year 7 with small-group instruction for 3–4 hours each week over a period of 8 months. Other interventions were less direct. ‘Act, Sing, Play’, for example, sought to raise achievement of whole classes by running music workshops for children in year 2. There was also diversity in the ages of the pupils who participated in the 10 interventions, ranging in age from 6 to 15. Finally, there was diversity in the level of randomization: six of the studies were randomized at the school level, with the other four being randomized within school. Table 2 summarizes the interventions.
Intervention . | Brief description of intervention . | Pupils . | Attrition† . | Outcomes (ES‡) . | Reference . |
---|---|---|---|---|---|
Act, Sing, Play (asp) | Music and drama workshops for students in year 2, once a week for 32 weeks | 894 | 7.8 | Maths (0.00) English (0.03) | Haywood et al. (2015) |
Changing Mindsets INSET (cmi) | Professional development course for primary school teachers in how to develop Growth Mindset in pupils | 1,035 | 10.8 | Maths (0.01) English (−0.11) | Rienzo et al. (2015) |
Changing Mindsets Pupil (cmp) | 6-week course of mentoring and workshops for year 5 students, with a focus on developing pupils’ growth mindset | 195 | 8.2 | Maths (0.10) English (0.18) | Rienzo et al. (2015) |
Dialogic Teaching (dt) | Year 5 teachers trained to encourage dialogue, argument and oral explanation | 4,918 | 21.4 | Maths (0.09) English (0.15) | Jay et al. (2017) |
LIT programme (lit) | Targeted literacy intervention for struggling readers in year 7 for 3-4 hours per week for 8 months. | 5,286 | 19.0 | English (0.09) | Crawford and Skipp (2014) |
Mind the Gap (mtg) | Teacher training and parent workshops, over a 5 week period, to help year 4 students be more ‘meta-cognitive’. | 1,496 | 60.1 | Maths and Reading ( − 0.14) | Dorsett et al. (2014) |
ReflectEd (ref) | Weekly lessons for year 5 pupils over a 6- month period, focused on strategies to monitor/manage their own learning | 1,843 | 15.4 | Maths (0.30) Reading (−0.15) | Motteram et al. (2016) |
Shared Maths (sm) | Cross-age peer math tutoring: older pupils (year 6) work with younger ones (year 4) for 20 mins per week for 2 years. | 3,119 | 14.1 | Maths (0.02 Reading (§) | Lloyd et al. (2015) |
Talk of the Town (tott) | Whole-school intervention to help support the development of children's speech, language and communication. | 1,512 | 14.8 | Reading ( − 0.03 Maths (§) | Thurston et al. (2016) |
Texting Parents (tp) | Parents of secondary school pupils sent text messages about homework, upcoming tests etc., over 11 months | 5,026 | 14.4 | English (0.03 Maths (0.07 Science (−0.01 | Miller et al. (2016) |
Intervention . | Brief description of intervention . | Pupils . | Attrition† . | Outcomes (ES‡) . | Reference . |
---|---|---|---|---|---|
Act, Sing, Play (asp) | Music and drama workshops for students in year 2, once a week for 32 weeks | 894 | 7.8 | Maths (0.00) English (0.03) | Haywood et al. (2015) |
Changing Mindsets INSET (cmi) | Professional development course for primary school teachers in how to develop Growth Mindset in pupils | 1,035 | 10.8 | Maths (0.01) English (−0.11) | Rienzo et al. (2015) |
Changing Mindsets Pupil (cmp) | 6-week course of mentoring and workshops for year 5 students, with a focus on developing pupils’ growth mindset | 195 | 8.2 | Maths (0.10) English (0.18) | Rienzo et al. (2015) |
Dialogic Teaching (dt) | Year 5 teachers trained to encourage dialogue, argument and oral explanation | 4,918 | 21.4 | Maths (0.09) English (0.15) | Jay et al. (2017) |
LIT programme (lit) | Targeted literacy intervention for struggling readers in year 7 for 3-4 hours per week for 8 months. | 5,286 | 19.0 | English (0.09) | Crawford and Skipp (2014) |
Mind the Gap (mtg) | Teacher training and parent workshops, over a 5 week period, to help year 4 students be more ‘meta-cognitive’. | 1,496 | 60.1 | Maths and Reading ( − 0.14) | Dorsett et al. (2014) |
ReflectEd (ref) | Weekly lessons for year 5 pupils over a 6- month period, focused on strategies to monitor/manage their own learning | 1,843 | 15.4 | Maths (0.30) Reading (−0.15) | Motteram et al. (2016) |
Shared Maths (sm) | Cross-age peer math tutoring: older pupils (year 6) work with younger ones (year 4) for 20 mins per week for 2 years. | 3,119 | 14.1 | Maths (0.02 Reading (§) | Lloyd et al. (2015) |
Talk of the Town (tott) | Whole-school intervention to help support the development of children's speech, language and communication. | 1,512 | 14.8 | Reading ( − 0.03 Maths (§) | Thurston et al. (2016) |
Texting Parents (tp) | Parents of secondary school pupils sent text messages about homework, upcoming tests etc., over 11 months | 5,026 | 14.4 | English (0.03 Maths (0.07 Science (−0.01 | Miller et al. (2016) |
This table summarizes the 10 RCTs analysed in this paper; n = number of pupils, at randomization, with valid pupil identifiers in the National Pupil Database.
Pupil level attrition rate.
Effect size; is shorthand for effect size units defined as the ATE divided by the population standard deviation of the outcome measure; the first outcome listed in the evaluation is highlighted in italics.
The reading results for Shared Maths were not reported due to missing data; for Talk of the Town, KS2 maths was a tertiary outcome, not reported in the trial.
Intervention . | Brief description of intervention . | Pupils . | Attrition† . | Outcomes (ES‡) . | Reference . |
---|---|---|---|---|---|
Act, Sing, Play (asp) | Music and drama workshops for students in year 2, once a week for 32 weeks | 894 | 7.8 | Maths (0.00) English (0.03) | Haywood et al. (2015) |
Changing Mindsets INSET (cmi) | Professional development course for primary school teachers in how to develop Growth Mindset in pupils | 1,035 | 10.8 | Maths (0.01) English (−0.11) | Rienzo et al. (2015) |
Changing Mindsets Pupil (cmp) | 6-week course of mentoring and workshops for year 5 students, with a focus on developing pupils’ growth mindset | 195 | 8.2 | Maths (0.10) English (0.18) | Rienzo et al. (2015) |
Dialogic Teaching (dt) | Year 5 teachers trained to encourage dialogue, argument and oral explanation | 4,918 | 21.4 | Maths (0.09) English (0.15) | Jay et al. (2017) |
LIT programme (lit) | Targeted literacy intervention for struggling readers in year 7 for 3-4 hours per week for 8 months. | 5,286 | 19.0 | English (0.09) | Crawford and Skipp (2014) |
Mind the Gap (mtg) | Teacher training and parent workshops, over a 5 week period, to help year 4 students be more ‘meta-cognitive’. | 1,496 | 60.1 | Maths and Reading ( − 0.14) | Dorsett et al. (2014) |
ReflectEd (ref) | Weekly lessons for year 5 pupils over a 6- month period, focused on strategies to monitor/manage their own learning | 1,843 | 15.4 | Maths (0.30) Reading (−0.15) | Motteram et al. (2016) |
Shared Maths (sm) | Cross-age peer math tutoring: older pupils (year 6) work with younger ones (year 4) for 20 mins per week for 2 years. | 3,119 | 14.1 | Maths (0.02 Reading (§) | Lloyd et al. (2015) |
Talk of the Town (tott) | Whole-school intervention to help support the development of children's speech, language and communication. | 1,512 | 14.8 | Reading ( − 0.03 Maths (§) | Thurston et al. (2016) |
Texting Parents (tp) | Parents of secondary school pupils sent text messages about homework, upcoming tests etc., over 11 months | 5,026 | 14.4 | English (0.03 Maths (0.07 Science (−0.01 | Miller et al. (2016) |
Intervention . | Brief description of intervention . | Pupils . | Attrition† . | Outcomes (ES‡) . | Reference . |
---|---|---|---|---|---|
Act, Sing, Play (asp) | Music and drama workshops for students in year 2, once a week for 32 weeks | 894 | 7.8 | Maths (0.00) English (0.03) | Haywood et al. (2015) |
Changing Mindsets INSET (cmi) | Professional development course for primary school teachers in how to develop Growth Mindset in pupils | 1,035 | 10.8 | Maths (0.01) English (−0.11) | Rienzo et al. (2015) |
Changing Mindsets Pupil (cmp) | 6-week course of mentoring and workshops for year 5 students, with a focus on developing pupils’ growth mindset | 195 | 8.2 | Maths (0.10) English (0.18) | Rienzo et al. (2015) |
Dialogic Teaching (dt) | Year 5 teachers trained to encourage dialogue, argument and oral explanation | 4,918 | 21.4 | Maths (0.09) English (0.15) | Jay et al. (2017) |
LIT programme (lit) | Targeted literacy intervention for struggling readers in year 7 for 3-4 hours per week for 8 months. | 5,286 | 19.0 | English (0.09) | Crawford and Skipp (2014) |
Mind the Gap (mtg) | Teacher training and parent workshops, over a 5 week period, to help year 4 students be more ‘meta-cognitive’. | 1,496 | 60.1 | Maths and Reading ( − 0.14) | Dorsett et al. (2014) |
ReflectEd (ref) | Weekly lessons for year 5 pupils over a 6- month period, focused on strategies to monitor/manage their own learning | 1,843 | 15.4 | Maths (0.30) Reading (−0.15) | Motteram et al. (2016) |
Shared Maths (sm) | Cross-age peer math tutoring: older pupils (year 6) work with younger ones (year 4) for 20 mins per week for 2 years. | 3,119 | 14.1 | Maths (0.02 Reading (§) | Lloyd et al. (2015) |
Talk of the Town (tott) | Whole-school intervention to help support the development of children's speech, language and communication. | 1,512 | 14.8 | Reading ( − 0.03 Maths (§) | Thurston et al. (2016) |
Texting Parents (tp) | Parents of secondary school pupils sent text messages about homework, upcoming tests etc., over 11 months | 5,026 | 14.4 | English (0.03 Maths (0.07 Science (−0.01 | Miller et al. (2016) |
This table summarizes the 10 RCTs analysed in this paper; n = number of pupils, at randomization, with valid pupil identifiers in the National Pupil Database.
Pupil level attrition rate.
Effect size; is shorthand for effect size units defined as the ATE divided by the population standard deviation of the outcome measure; the first outcome listed in the evaluation is highlighted in italics.
The reading results for Shared Maths were not reported due to missing data; for Talk of the Town, KS2 maths was a tertiary outcome, not reported in the trial.
2.3 Attrition in the RCTs
Attrition rates varied across studies and treatment arms, with a mean rate of 19% (see Figure 1). This mean rate of attrition is higher than typical RCTs in clinical medicine (Crutzen et al., 2013). But mean attrition in our sample of studies exactly mirrors a broader sample of education RCTs in England (Demack et al. 2020, n = 79, mean attrition = 19%) and is only marginally higher than samples of RCTs in public health interventions (Crutzen et al., 2015, n = 60 studies, median attrition = 14%), and economics (Hirshleifer et al., 2019, n = 90 studies, mean attrition = 15%).
![Attrition rates. Left panel shows rates of attrition in the treatment (blue) and control (red) arms; right panel illustrates the difference between the rates of treatment attrition and control attrition [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/184/2/10.1111_rssa.12677/1/m_rssa_184_2_732_fig-1.jpeg?Expires=1747889746&Signature=v0S8NjG-nTRPsxKcYBrZp6wI8m7YbnFqI5kWtz0t5Oan8JSpIFNyVXv7xneNeNAwKx-D1iWqmZpzbPbGnD1zFOaYXPFbujc9nIAwPtje4M9PzoK5t7rjH3fdp7sQn1MqMFgdSpl2cL1q~4jfON8q5obWb8ZPFF7SKqYgU2iv3MkwHlkMh2hNQV4D5o6~gFFQOCu8763fbTRMPs7BAme0C3PW2ABWUALa7gtYFmH1-SWwZrsT1jGNws0MzMkoHqiqSop5a6LOCpEnGqWtjDu8ofLZwcoNDHwoBjnS3cWPyKDtmyr625UMOzW46JrBNaZyTIz7ia1-muW5nuPNzhS3ZQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Attrition rates. Left panel shows rates of attrition in the treatment (blue) and control (red) arms; right panel illustrates the difference between the rates of treatment attrition and control attrition [Colour figure can be viewed at https://academic.oup.com/]
An examination of the 10 original evaluation reports reveals that attrition was a prominent concern in the minds of the researchers. Each of the RCTs include a ‘padlock rating’, provided by peer reviewers as a measure of study quality. These ratings are based on five criteria: design, power, attrition, balance and other threats to validity. The lowest score across these criteria defines the final rating. In 5 of the 10 evaluations attrition was cited as the limiting factor. Two further trials listed ‘imbalance’ as the limiting factor, with the authors in both cases noting the role of differential attrition in creating imbalance. The prominence of attrition as a concern is a stark reminder that, from the point of view of researchers, attrition is arguably the biggest threat to generating unbiased experimental results (Deke & Chiang, 2017; Greenberg & Barnow, 2014).
3 CONCEPTUAL FRAMEWORK
3.1 Overview
Consider an evaluation with n students in the sample at randomization and let Ti be a randomly assigned binary treatment indicator for student i. In general, Ti = 1 for students assigned to treatment, and Ti = 0 for students assigned to control. We use potential outcomes notation in which Yi(t) denotes the post-treatment outcome when Ti = t. In our analyses, outcomes are standardized achievement tests across a range of domains including maths, reading and science. The estimand of interest is the finite-sample SATE. This is denoted by The ‘full’ subscript indicates that we are interested in the average treatment effect for all the units in our original sample, before any attrition. The ‘E[]’ denotes the simple average across the n units in the sample.
Estimates of can be biased by non-random attrition. To formalize this, let Ai(t) be a potential outcome for attrition under treatment assignment t. Ai(t) can take two values: 1 indicates that a student has attritted; 0 indicates that a student remained in the evaluation. For example, Ai(1) = 1 describes a unit who attritted after being assigned to treatment (a ‘Treatment Attriter’). Figure 2 summarizes our setup. Our original treatment and control groups are directly comparable (up to imbalance caused by randomness in treatment assignment). Our evaluation sample consists only of the treatment responders and the control responders.

There are two concerns with an analysis of the evaluation sample. First, the treatment responders and control responders may not be directly comparable to each other if different types of students tended to attrite when exposed to treatment as compared to control. Consider an extreme case in which all struggling schools drop out of a particularly time-intensive treatment arm. The final treatment group would have the remaining high performers, and the control group a mix of both. This systematic imbalance could bias impact estimates due to this systematic imbalance between the two groups. We call this ‘differential attrition bias’.
There is a second, more subtle source of attrition bias. Even if we generate a correct impact estimate for our final evaluation sample, it may not generalize to the full sample. Consider the case where those who benefit most from treatment also least likely to drop out. In this case, even in the absence of differential attrition bias, impact estimates from the evaluation sample would be too high. We call this ‘generalizability attrition bias’.
This is not a causal effect estimand: while these two groups may share some portion of students, they do not necessarily share all of them, so our difference is not a well-defined treatment versus control contrast on the same units. is, however, what is being estimated by contrasting the outcomes of treated and control units in the responder sample. This is almost always the contrast that applied researchers examine. In a simple difference-in-means analysis, this amounts to estimating .
To better understand attrition bias, we decompose β into four elements: the rate of attrition in each treatment arm (PT, PC) and the extent to which the attrition mechanism is associated with outcomes, in each arm (ΔT, ΔC). The next sub-section defines these terms and illustrates the decomposition.
3.2 Bias decomposition
Equation ((5)) shows attrition bias can be conceptualized as a function of four parameters: , and . From (6) it is clear that attrition bias stems from asymmetries between the treatment and control side, either in terms of the attrition rate, the differences between responders and nonresponders, or both. We examine each of the elements in (6) empirically: the attrition rates and were briefly discussed in section 2 and the Δ’s will be analysed in detail in section 5.
3.3 Covariate adjustment
Our parameter β represents attrition bias when no attempt is made to account for attrition using measured pre-treatment covariates (X). We next discuss the extent to which covariate adjustment can help reduce attrition bias.
Importantly, regression adjustment can only correct for imbalance between the treatment and control groups in terms of observed covariates (and generally assumes a linear relationship between covariates and outcomes). If the treatment responders and control responders differ in unobserved ways, the evaluation may still suffer from differential attrition bias after adjustment. In other words, the success of adjustment depends on the MAR assumption being true. This illustrates how attrition can potentially undermine the integrity of randomized experiments, by forcing researchers to rely on adjustments and assumptions that are similar to those used in observational studies.
If MAR holds, we have , giving no remaining bias other than ‘generalizability attrition bias’, that is, the bias due to our evaluation sample not being representative of our full sample (for an overview of the generalizability literature in education, see Tipton & Olsen, 2018). To reduce ‘generalizability’ bias, researchers could estimate attrition weights, akin to sampling weights, and re-weight each unit so that the evaluation sample resembled the full sample (or, indeed, the population of interest). This would correct for generalizability attrition bias, under the assumption that Y(t) and A(t) are independent given X.
We sidestep weighting-based approaches and adjust using linear regression. Specifically, we estimate the core parameters defined above— —using multilevel regression models that account for the clustered structure of education data, in which students are nested within schools. We now describe our straightforward approach to estimation.
4 ESTIMATES OF ATTRITION BIAS
4.1 Estimating attrition bias
For each intervention and outcome, we fit this model to two samples: the ‘responder sample’ (students who provided data for the initial evaluation) and the ‘full sample’ (all units, with valid pupil identifiers, who were recorded as being randomized). The estimand depends on the sample used in fitting the model. For example, when we fit Model 1 using the full evaluation sample, τ is τFULL As such, generating an estimate of attrition bias for a particular study-outcome pair involves the following three steps:
- (i)
Fit Model 1 only using units from the ‘responder’ sample (all i s.t. Ai = 0). Estimate
- (ii)
Refit Model 1 using the full sample (all i). Here, the estimate of will be
- (iii)
Take the difference:
For simplicity we assume throughout our analysis that provides unbiased estimates of the ATE, that is, we make the typical assumptions about SUTVA and treatment compliance. Similarly, we assume that the regression model is correctly specified. These assumptions are not strictly required, as our primary focus is to compare how estimates change in the presence of attrition. If one of the assumptions necessary to generate an unbiased estimation of a causal estimand does not strictly hold, the change wrought by attrition is still of interest.
4.2 Attrition bias after covariate adjustment
There are two reasons to examine attrition bias in the context of covariate-adjusted impact estimates. First, we are interested in the magnitude of attrition bias in practice. As applied researchers generally adjust for covariate imbalance—and all 10 evaluations considered here adjusted for covariates—we follow this convention. Second, by estimating attrition bias both with and without covariate adjustment, we are able to examine the extent to which condition on covariates repairs attrition bias. As a final note, estimates of attrition bias may also benefit from precision gains if variation in outcomes is captured by covariates. This could improve our ability to detect attrition bias even when there is no differential attrition.
4.3 Estimates of β and βX
Figure 3 presents initial estimates of β and βX. The plot also presents 95% confidence intervals for These uncertainty estimates are based on a simulation procedure described in Appendix A. We use simulation-based inference rather than conventional standard errors to account for two dependencies in our data: first, the responder sample is a sub-sample of the full sample; second, within each study bias estimates across outcomes will be correlated.

Estimates of attrition bias, before conditioning ( and after ()
Note: this figure represents initial point estimates of attrition bias before conditioning on covariates () and afterwards (. Estimates of have 95% CIs, derived from simulations described in Appendix A. [Colour figure can be viewed at https://academic.oup.com/]
There are two things to note about Figure 3. First, it appears as though conditioning on covariates lessens attrition bias. The estimates of tend to be closer to zero than the estimates. Second, 20 of the 22 estimates of have confidence intervals that include zero.
Next, we analyse the distribution of attrition bias across interventions and outcomes. The boxplots at the bottom of Figure 3 are a useful starting point in this endeavour. However, these data are over-dispersed due to measurement error. This may create a misleading impression about the typical magnitude of attrition bias. To see why the distributions underlying the boxplots are over-dispersed, consider an attrition mechanism that is completely random: . If this mechanism was responsible for deleting data from each of our 10 evaluation samples, estimates of attrition bias would be non-zero even though no bias had been introduced. In other words, the raw estimates of bias presented in the boxplots include both underlying attrition bias and ‘attrition sampling variation’—that is, variation due to which units left the study.
We account for this error using tools from meta-analysis. This approach addresses two overlapping goals: to present estimated distributions for β and βX that are not over-dispersed due to sampling variation, and to estimate the typical degree of attrition bias for our setting. The aim in both cases is to help education researchers and funders understand the typical magnitude of attrition bias in typical RCTs.
Where:
- (i)
v = the mean attrition bias across all interventions and outcomes
- (ii)
= the underlying attrition bias for outcome k in intervention w. This has a variance of reflecting the fact that not all interventions will have the same attrition bias. could change due to context, the nature of the treatment, the outcome, and so on.
- (iii)
= observed bias. This deviates from underlying bias with a variance of , which is largely determined by the level of attrition.
Appendix B provides details of our approach to estimating these parameters, which draws heavily on random effects meta-analysis (Higgins et al., 2009). For each intervention-outcome pair we calculate a constrained empirical Bayes’ estimate of attrition bias, (Weiss et al., 2017). The estimated distributions of and are presented in Figure 4.

Final estimates of underlying attrition bias
Notes: the top panel shows estimates of constrained empirical Bayes estimates of for 10 studies and 22 outcomes. The bottom panel is the equivalent plot after conditioning on covariates . The estimated mean is shown with a grey dotted line. [Colour figure can be viewed at https://academic.oup.com/]
The top panel of Figure 4 represents our best guess at the distribution of attrition bias in cases where researchers do not adjust for observed differences between treated and control units. The mean of the estimated distribution is , with mean absolute value of (with being a shorthand for effect size units, with being equal to the standard deviation of the outcome measure). Where we control for covariates, including a pre-test, the estimated distribution of has a mean of , and a mean absolute value of . No values of have a magnitude greater than This suggests that, in practice, the typical magnitude of attrition bias is small, particularly when researchers have access to predicative covariates.
4.4 Contextualizing attrition bias
To put the magnitude of these attrition bias estimates into context, we offer three points of reference. First, note that the What Works Clearinghouse set a threshold for problematic bias at 0.05 (WWC, 2014). The EEF takes a similar position, and views 0.05 as a threshold beyond which bias becomes a substantial concern. None of the estimates of attrition bias presented above are greater than this threshold—including the estimates that do not condition on covariates.
Second, a recent meta-analysis of 14 interventions that were similar to those studied in this paper found that the typical value of selection bias due to non-random assignment was 0.15 without covariates, and 0.03 after controlling for observable characteristics (Weidmann & Miratrix, 2020). The findings show that, in the absence of predictive covariates, typical selection bias due to non-random assignment is roughly six times larger than typical attrition bias. When researchers have access to predictive covariates, typical selection bias is roughly twice as large as typical attrition bias.
Third, we draw readers attention to initial estimates of external validity bias due to non-random sampling. To our knowledge such estimates only exist for one programme (Reading First), and suggest that the mean bias due to non-randomly selected study samples is 0.1 (Bell et al., 2016). Given the scarcity of evidence, we draw no firm conclusions. However, if an evaluation has the policy-relevant goal of estimating a population average treatment effect, it is plausible that the risk of external validity bias overshadows internal-validity risks. In light of this, we argue that it is an urgent priority to develop a stronger understanding of the risk of external validity bias.
5 DECOMPOSING AND UNDERSTANDING ATTRITION MECHANISMS
5.1 Decomposing the elements of attrition bias
With this setup , and . Values of closer to zero suggest that the attrition mechanism in treatment arm is less associated with outcomes.
In Model 4, . The parameter represents the mean difference in outcomes between ‘control responders’ and ‘control attriters’ who have the same covariate values. , and is the equivalent parameter on the treatment side. While estimates of these parameters contain useful information about the nature of attrition mechanisms, our primary focus is on the distribution of the Δ parameters across studies and outcomes. To that end, note that the issue of over-dispersal, discussed above with reference to and , also applies to the estimates. We therefore rely on the same meta-analytic tools to model the underlying distributions of the Δ parameters.
Where:
- (i)
= the mean difference, across all studies, in the average outcomes of ‘control attriters’ and ‘control responders’.
- (ii)
= the underlying difference, for intervention w and outcome k, between the mean outcome of ‘control attriters’ and ‘control responders’. This has a variance of across intervention-outcome pairs.
- (iii)
= the observed mean difference in the average outcome of ‘control attriters’ and ‘control responders’. This has sampling variance of .
For analyses in which we condition on covariates, equivalent parameters have an X superscript. For example, is the difference in mean outcomes for ‘treatment attriters’ and ‘treatment responders’, after conditioning on covariates with a linear model.
We conduct four separate meta-analyses, one each for and . Once again we compute a set of 22 constrained empirical Bayes estimates for each parameter: and (Weiss et al., 2017). The results, along with core parameter estimates from the meta-analyses, are presented in Figure 5. Raw estimates of and are provided in Appendix C (see Figure C1).

Estimates of distributions
Note: Appendix B describes the estimation approach for the meta-analyses. is the estimated mean of each distribution; is the estimated standard deviation. Due to the small number of interventions we could not use as the basis of a confidence interval (Higgins et al., 2009). [Colour figure can be viewed at https://academic.oup.com/]
Figure 5 emphasizes two features of our data. First, conditioning on covariates substantially reduces the perniciousness of attrition mechanisms. The distributions of and are centred much closer to zero than their unadjusted counterparts. Second, it appears as though there is a systematic relationship between attrition and outcomes. Units who leave studies appear to have worse outcomes than responders, even after conditioning on covariates. This is particularly true on the treatment side. Almost all the estimates of and are negative, or zero (<0.005. A similar effect is present on the control side, but it less pronounced.
To emphasize the potential difference between the treatment and control side, we perform a non-parametric test of the hypothesis that the mean value of is the same for both arms, . To generate a draw under the null, we permute treatment status within each study-outcome pair and calculate mean()−mean(). We compare the observed value of mean()−mean() with 10,000 draws under the null, and find evidence against the hypothesis that the means are equivalent (p = 0.01). We then test the analogous hypothesis for and similarly find that the difference between the means is significant (p = 0.008). In both cases—with and without conditioning on covariates—attrition appears to be more problematic on the treatment side.
The meta-analyses underlying Figure 5 can also be viewed as tests of two common assumptions: ‘Missing At Random’ (MAR) and ‘Missing Completely At Random’ (MCAR). Specifically, if MCAR holds across our studies then the distributions of and will have a mean of zero. This is clearly not the case. and are negative and their 95% confidence intervals do not include zero (both p-values are <0.001).
There is also some evidence that attrition mechanisms do not meet the MAR assumption. In particular, and is significantly different from zero (p = 0.002). The picture on the control side is slightly less clear. After adjusting for covariates, the mean difference between attriters and responders is negative () but not significantly different from zero (. Overall, however, these results cast substantial doubt on MAR being a plausible assumption in our context.
We see further evidence against the MAR assumption when we examine the 22 individual study-outcome pairs in our data. For each pair we use a likelihood ratio test to examine the hypothesis that . When applied to Model 4, this hypothesis test is a test of MAR. After correcting for multiple comparisons (Hochberg, 1988) we find 5 of 22 reject MAR. Full results of these hypothesis tests are presented in Table 3. In sum, these findings reinforce the conclusions of the distributional analyses and suggest that, in our context, researchers cannot safely assume that attrition mechanisms will meet the MAR assumption.
Project . | Missing Completely At Random ( . | Missing At Random ( . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Maths . | Reading . | Writing . | English . | Science . | Maths . | Reading . | Writing . | English . | Science . | |
asp | 0.394 | 0.373 | - | - | - | 0.294 | 0.457 | - | - | - |
cmi | 0.686 | 0.074 | 0.004 | - | - | 0.314 | 0.326 | 0.168 | - | - |
cmp | 0.069 | 0.514 | 0.062 | - | - | 0.102 | 0.337 | 0.166 | - | - |
dt | <0.001 | 0.052 | - | - | - | 0.035 | 0.851 | - | - | - |
lit | - | - | - | <0.001 | - | - | - | - | <0.001 | - |
mtg | 0.024 | 0.337 | - | - | - | 0.018 | 0.441 | - | - | - |
ref | 0.058 | 0.396 | - | - | - | 0.137 | 0.468 | - | - | - |
sm | <0.001 | <0.001 | - | - | - | 0.001 | 0.309 | - | - | - |
tott | 0.539 | 0.046 | - | - | - | 0.968 | 0.112 | - | - | - |
tp | <0.001 | - | - | <0.001 | <0.001 | <0.001 | - | - | <0.001 | <0.001 |
Project . | Missing Completely At Random ( . | Missing At Random ( . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Maths . | Reading . | Writing . | English . | Science . | Maths . | Reading . | Writing . | English . | Science . | |
asp | 0.394 | 0.373 | - | - | - | 0.294 | 0.457 | - | - | - |
cmi | 0.686 | 0.074 | 0.004 | - | - | 0.314 | 0.326 | 0.168 | - | - |
cmp | 0.069 | 0.514 | 0.062 | - | - | 0.102 | 0.337 | 0.166 | - | - |
dt | <0.001 | 0.052 | - | - | - | 0.035 | 0.851 | - | - | - |
lit | - | - | - | <0.001 | - | - | - | - | <0.001 | - |
mtg | 0.024 | 0.337 | - | - | - | 0.018 | 0.441 | - | - | - |
ref | 0.058 | 0.396 | - | - | - | 0.137 | 0.468 | - | - | - |
sm | <0.001 | <0.001 | - | - | - | 0.001 | 0.309 | - | - | - |
tott | 0.539 | 0.046 | - | - | - | 0.968 | 0.112 | - | - | - |
tp | <0.001 | - | - | <0.001 | <0.001 | <0.001 | - | - | <0.001 | <0.001 |
This table presents the p-values for hypothesis tests examining the Missing Completely At Random (MCAR; left panel) and Missing At Random (MAR; right panel) assumptions. The project acronyms are listed in Table 1. Cells with ‘-‘ represent a domain that was not tested. Bold font indicates that individual null hypothesis tests were rejected at , after a Hochberg (1988) multiple-comparison correction.
Project . | Missing Completely At Random ( . | Missing At Random ( . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Maths . | Reading . | Writing . | English . | Science . | Maths . | Reading . | Writing . | English . | Science . | |
asp | 0.394 | 0.373 | - | - | - | 0.294 | 0.457 | - | - | - |
cmi | 0.686 | 0.074 | 0.004 | - | - | 0.314 | 0.326 | 0.168 | - | - |
cmp | 0.069 | 0.514 | 0.062 | - | - | 0.102 | 0.337 | 0.166 | - | - |
dt | <0.001 | 0.052 | - | - | - | 0.035 | 0.851 | - | - | - |
lit | - | - | - | <0.001 | - | - | - | - | <0.001 | - |
mtg | 0.024 | 0.337 | - | - | - | 0.018 | 0.441 | - | - | - |
ref | 0.058 | 0.396 | - | - | - | 0.137 | 0.468 | - | - | - |
sm | <0.001 | <0.001 | - | - | - | 0.001 | 0.309 | - | - | - |
tott | 0.539 | 0.046 | - | - | - | 0.968 | 0.112 | - | - | - |
tp | <0.001 | - | - | <0.001 | <0.001 | <0.001 | - | - | <0.001 | <0.001 |
Project . | Missing Completely At Random ( . | Missing At Random ( . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Maths . | Reading . | Writing . | English . | Science . | Maths . | Reading . | Writing . | English . | Science . | |
asp | 0.394 | 0.373 | - | - | - | 0.294 | 0.457 | - | - | - |
cmi | 0.686 | 0.074 | 0.004 | - | - | 0.314 | 0.326 | 0.168 | - | - |
cmp | 0.069 | 0.514 | 0.062 | - | - | 0.102 | 0.337 | 0.166 | - | - |
dt | <0.001 | 0.052 | - | - | - | 0.035 | 0.851 | - | - | - |
lit | - | - | - | <0.001 | - | - | - | - | <0.001 | - |
mtg | 0.024 | 0.337 | - | - | - | 0.018 | 0.441 | - | - | - |
ref | 0.058 | 0.396 | - | - | - | 0.137 | 0.468 | - | - | - |
sm | <0.001 | <0.001 | - | - | - | 0.001 | 0.309 | - | - | - |
tott | 0.539 | 0.046 | - | - | - | 0.968 | 0.112 | - | - | - |
tp | <0.001 | - | - | <0.001 | <0.001 | <0.001 | - | - | <0.001 | <0.001 |
This table presents the p-values for hypothesis tests examining the Missing Completely At Random (MCAR; left panel) and Missing At Random (MAR; right panel) assumptions. The project acronyms are listed in Table 1. Cells with ‘-‘ represent a domain that was not tested. Bold font indicates that individual null hypothesis tests were rejected at , after a Hochberg (1988) multiple-comparison correction.
5.2 What predicts pernicious attrition?
Are there situations in which problematic attrition mechanisms are more likely? Here we examine three possible predictors of non-random attrition, starting with school year. Seven of the 10 trials focused on outcomes from year 6 (age 11–12) which limits our ability to draw conclusions about the effect of year on attrition mechanisms. However, we note that two of the trials with the largest values of were in year 11 (see Figure 6). Moreover, examination of the MAR tests at the study-outcome level shows that four of the five rejections of the MAR hypothesis came from the two trials with outcomes at year 11. As we only have two cases, we are tentative in drawing conclusions. However, we suggest that researchers working with outcomes for later year groups should be especially wary of attrition.

by year level
Notes: is the magnitude of the association between attrition and outcomes, conditional on covariates. Each panel shows results from 10 RCTs and 22 outcomes. The left panel presents estimates from control students, the right panel from treated students. [Colour figure can be viewed at https://academic.oup.com/]
Second, we examine whether there is any association between the size of a study and the nature of attrition. We find a positive association between and trial size, as seen in Figure 7. A simple regression analysis found that an increase of 1,000 students in a trial arm was associated with an increase in of 0.038 and, similarly, an increase in of 0.036. The result on the treatment side is much more uncertain, due to greater variation and the strong influence of an outlier. This suggests that there may be a positive association between bigger trials and bias-inducing attrition mechanisms, although our evidence is not strong.

by number of pupils
Notes: is the magnitude of the association between attrition and outcomes, conditional on covariates. Each panel shows results from 10 RCTs and 22 outcomes (left = control arm, right = treatment arm). The solid line is from a simple linear regression. The numbers reported on the figure are Pearson correlation coefficients and associated 95% CIs. [Colour figure can be viewed at https://academic.oup.com/]
Finally, we tested the hypothesis that the rate of attrition is associated with its perniciousness. We find no evidence that attrition mechanisms are more problematic in cases where attrition is high, as summarized in Figure 8, which shows the lack of association between attrition proportion and for both treatment and control.

by attrition rate
Notes: is the magnitude of the association between attrition and outcomes, conditional on covariates. Each panel shows results from 10 RCTs and 22 outcomes (left = control arm, right = treatment arm). The solid line is from a simple linear regression. The numbers reported on the figure are Pearson correlation coefficients and associated 95% CIs. [Colour figure can be viewed at https://academic.oup.com/]
6 RECOMMENDATIONS FOR APPLIED RESEARCHERS
Evidence presented in section 5 suggests that, in our context, outcome data cannot safely be assumed to be Missing At Random, let alone Missing Completely At Random. We argue that researchers should respond in three ways. First, adjust for covariate differences in estimating treatment effects. This is already common practice, so we do not discuss it further. Second, perform sensitivity analyses to see whether core findings are robust to the types of attrition mechanisms we observe in practice. Third, incorporate ‘attrition bias uncertainty’ into inferences. This section describes the latter two recommendations, culminating in a worked example using the evaluation of REACH programme (Sibieta, 2016).
6.1 Sensitivity analyses, based on observed attrition mechanisms
Randomized experiments frequently include sensitivity analyses to assess whether results are robust to attrition. This is often done in terms of missing-data imputation. However, given the possibility that MAR is violated, there is a strong argument to go further and explore the potential influence of attrition bias due to unobserved characteristics.
This in itself is not a novel idea and various approaches to bounding unobserved biases have been proposed (e.g. Manski, 1990). The difficulty with sensitivity analyses is that they often yield very wide ranges that far exceed typical effect sizes. In response, we argue that researchers should ground their sensitivity analyses in estimates of how bias-inducing attrition mechanisms tend to be in practice.
Nearly all randomized experiments report both impact estimates and attrition rates ( and . This means that, if analysts are willing to make assumptions about and , equation ((7)) can be widely used to assess how sensitive findings are to attrition. Meanwhile, equation ((8)) shows that the sensitivity of an impact estimate depends on two factors: differential attrition mechanisms ( and differential attrition rates ( For any given study, the differential attrition rate is known. Consequently, we recommend that researchers take and as given and focus their sensitivity analyses on the potential influence of differential attrition mechanisms (.
One way to do this is to generate a simple sensitivity plot with on the y-axis and the differential attrition mechanism on the x-axis (either or , depending on whether is positive or negative). We suggest that the x-axis span the range from ‘no differential attrition’ to a ‘worst observed case’. This ‘worst observed case’ can be defined by analyses of the sort we present above.
Under ‘no differential attrition’, . The worst-case estimates of will depend on whether the finding is positive or negative. Consider the case of a positive impact estimate: . This finding will be undermined by positive values of . A sufficiently large value of will send to zero. To find the ‘worst observed case’ we refer researchers to Table C1 which summarizes our 22 estimates of and . Across our set of 22 outcomes, the worst observed case of attrition for a positive finding is . In the case when we control for covariates this value is . Last, to generate the sensitivity analysis, researchers need to select a value of (or ). As a default, we recommend using the median observed value (−0.033). The choice of is unlikely to be consequential relative to the impact of differential attrition mechanism (. However, researchers who are interested in a more comprehensive sensitivity analysis can simply calculate across a two dimensional grid of and values. Of course, researchers could choose any parameters they believe are appropriate to include in this grid. But, in keeping with our broader recommendations, we suggest that these values be grounded by attrition mechanisms that have been observed in practice. For example, we suggest that the grid be bounded by the maxima and minima of the relevant parameters reported in Table C1.
6.2 Incorporating attrition bias into uncertainty estimates
The threat of attrition bias is a source of uncertainty in estimating the SATE on the full sample ( This uncertainty should arguably be incorporated into inferences. There are multiple possible approaches, including a fully Bayesian analysis in which the attrition mechanism in each treatment arm is explicitly modelled. For education researchers pursuing this strategy, the results we report here represent a good starting point for priors. However, in keeping with a frequentist framework, we propose augmenting conventional standard errors by the expected magnitude of attrition. The result is an inflated ‘rule of thumb standard error’ for the SATE. These adjusted SEs can be thought of as predicting total error, including the error of attrition (for an argument for interpreting SEs as an estimate of error in this way, see Sundberg (2003)). This adjustment can be done on any study, given an initial standard error and attrition rates for treated and control units. This approach can also be viewed as a less conservative sensitivity check: while the prior section investigates the worst case, the following captures a ‘typical case’ (assuming our targeted study is believable similar to our reference studies with regards to bias).
We condition on the observed and , as they are directly observed. The little information we have on or suggests a lack of clear association, so we drop this cross term (we fail to reject the null that ). Thus, (9) becomes , and we can calculate based on observed attrition and estimates of the squared magnitude of and : , where for arm . Estimates for and parameters—with and without conditioning on covariates—are presented in Table 4. Finally, we can calculate a revised uncertainty estimate, tailored to the observed level of attrition: .
. | No covariates . | Covariate adjusted . | ||||
---|---|---|---|---|---|---|
. | . | . | . | . | . | |
Control | 0.021 | 0.007 | −0.122 | 0.008 | 0.006 | −0.040 |
Treatment | 0.065 | 0.021 | −0.210 | 0.023 | 0.011 | −0.107 |
. | No covariates . | Covariate adjusted . | ||||
---|---|---|---|---|---|---|
. | . | . | . | . | . | |
Control | 0.021 | 0.007 | −0.122 | 0.008 | 0.006 | −0.040 |
Treatment | 0.065 | 0.021 | −0.210 | 0.023 | 0.011 | −0.107 |
estimates come from our sample of 10 RCTs and 22 outcomes. For model equations and parameter definitions, see Section 5.1.
. | No covariates . | Covariate adjusted . | ||||
---|---|---|---|---|---|---|
. | . | . | . | . | . | |
Control | 0.021 | 0.007 | −0.122 | 0.008 | 0.006 | −0.040 |
Treatment | 0.065 | 0.021 | −0.210 | 0.023 | 0.011 | −0.107 |
. | No covariates . | Covariate adjusted . | ||||
---|---|---|---|---|---|---|
. | . | . | . | . | . | |
Control | 0.021 | 0.007 | −0.122 | 0.008 | 0.006 | −0.040 |
Treatment | 0.065 | 0.021 | −0.210 | 0.023 | 0.011 | −0.107 |
estimates come from our sample of 10 RCTs and 22 outcomes. For model equations and parameter definitions, see Section 5.1.
Attrition-adjusted standard errors, strictly larger than the unadjusted standard errors, can be interpreted as an estimate of the magnitude of our overall error (it is in fact closer to an estimate of the expected RMSE). Corresponding confidence intervals take the attrition bias as an unknown, zero-centred variable with a magnitude that depends on the proportion of units attrited and the evidence from prior studies about the link between attrition and outcomes. Studies with lower rates of attrition will have smaller adjustments.
6.3 Example: REACH reading intervention
To make these ideas more concrete, we present example analyses using the evaluation of the REACH reading intervention (Sibieta, 2016). This is a randomized experiment in the EEF archive for which the threat of attrition bias is unknown. In this case, the control units were given the treatment straight after the trial concluded as part of a ‘wait-list’ design. This makes it impossible to use our approach to estimate attrition bias.
The evaluation reports an estimated treatment effect of with a standard error of . Attrition in the two arms of the trial was above average: and . The researchers had access to a pre-test, along with demographic covariates, and conditioned on these variables in estimating the treatment effect. As such, we focus on and which describe attrition mechanisms after conditioning on covariates.
First, we conduct a sensitivity analysis. Note that as is positive, the finding will be overturned for sufficiently large values of Using (8), we produce Figure 9:

Example sensitivity analysis for REACH
Note: plot uses reported attrition rates from the REACH trial and presents sensitivity analyses according to equation ((8)), setting .
Under the ‘worst-observed case’ (from the 22 in our sample) the estimated average treatment effect declines from 0.329 to 0.251. While this is a marked reduction, the point estimate remains positive and the 95% confidence interval does not include zero. Overall, these sensitivity results suggest that the core finding of the REACH evaluation—that the intervention had a positive average treatment effect—is quite robust to attrition bias. This is despite the fact that the RCT suffered from a rate of attrition that, according to the EEF rating scale, would have limited the evaluation to receiving a maximum quality rating of 3 of 5.
Adjusting for attrition uncertainty will generally result in quite modest changes (a 12% increase), indicating that attrition, relative to other uncertainty, is typically minor. In this case, the width of the uncertainty interval extended from 0.388 to 0.436.
7 LIMITATIONS AND CONCLUSIONS
Our study of attrition bias has two main limitations. First, the census data are themselves subject to some missingness and difficulties with matching. For each RCT in the archive, we searched the National Pupil Database for students who were present at randomization. Across the 10 RCTs, an average of 1.2% of randomized students could not be matched to the NPD. A further 2.6% of pupils were missing an outcome measure, which meant they were excluded from our analysis. In total, our analysis included an average of 96.2% of the students who were recorded as being present at randomization. We note that this small level of missingness could bias our estimates of attrition bias ( and the nature of bias mechanisms (. That said, we found no association between the level of missingness in the National Pupil Database and any of the four aforementioned sets of parameters.
Second, we note that the findings we present here may not generalize easily to other settings. Even within the field of education, we are cautious about generalizing to a broad set of RCTs. The set of interventions examined here are diverse and typical of educational programs that are evaluated with RCTs. However, these interventions generally affected a small percentage of total instruction time and were relatively short lived, generally lasting less than a year. This may be particularly relevant given the evidence we present suggesting that the relationship between attrition and outcomes is stronger on the treatment side than the control. One interpretation of this finding is that the perniciousness of attrition mechanisms could be a function of intervention intensity. This conjecture is something we hope to test formally in future work. More generally, we note that the nature of attrition bias fundamentally depends on the nature of attrition mechanisms, which may differ when moving to a new setting, for example, from school to university contexts.
Raising our sights beyond the field of education we believe that, at best, our results provide weak priors on the nature of attrition mechanisms. More importantly, we hope that this work provides researchers in other disciplines with a framework and a set of tools to analyse attrition bias in their setting. While data will often be the limiting factor, the increasing prevalence of publicly available administrative datasets may provide opportunities for progress.
8 CONCLUSION
Overall, the analyses presented here suggest that the threat of attrition bias is limited in our context. While attrition is a highly salient risk, other threats—for example, external validity bias due to non-random sampling—may be substantially more problematic in terms of generating practical, useable knowledge from education evaluations.
This is not to say that attrition mechanisms can safely be treated as ‘missing at random’ or ‘missing completely at random’. We find evidence that students who leave studies tend to perform worse than those who remain. This pattern is particularly pronounced for treated students. Moreover, this tendency persists even after conditioning on baseline achievement. That said, we re-emphasize that these associations do not appear to be strong enough to induce large-scale bias.
We suggest that researchers respond to this evidence in two main ways: incorporating ‘attrition bias uncertainty’ into their inferences and completing sensitivity analyses using empirically grounded estimates of attrition mechanisms that have been observed in practice. We also recommend that, consistent with common practice, researchers present treatment effects after conditioning on observed covariates.
As more studies are added to the RCT archive, we intend to present a more detailed picture of attrition mechanisms, including an exploration of why some evaluations seem to suffer from pernicious attrition. In the meantime, it seems sensible to presume that students who are missing differ in unobserved ways from those who stay involved in research, while noting that these differences often lead to relatively minor levels of attrition bias.
ACKNOWLEDGEMENTS
We are grateful to the UK Department for Education for providing access to the National Pupil Database, and to the Education Endowment Foundation for providing access to their archive of randomized controlled trials. We are also grateful to the editors and anonymous reviewers at the JRSS(A) for their valuable suggestions. This work reflects the views of the authors and none of the aforementioned parties.
REFERENCES
APPENDIX A Simulation-based uncertainty estimates
To generate uncertainty estimates for attrition bias, we conduct a simulation-based procedure. We use simulation-based inference rather than conventional standard errors to account for two dependencies in our data: first, the responder sample is a sub-sample of the full sample; second, within each study bias estimates across outcomes will be correlated.
We perform two related procedures. The first generates uncertainty estimates for . Here we simulate a world in which attrition is completely random, that is, a world in which the MCAR assumption is true by design. We condition on several dimensions: the number of pupils at randomization (N), observed attrition rate (, the observed treatment assignment (, and observed outcomes (. For each study, we complete the following two-step process 1000 times:
- (a)
Permute the observed binary attrition indicator. We define the ‘null responder’ sample as all units for whom this permuted indicator is equal to zero.
- (b)
Generate an estimate of for each outcome by estimating Model 1 twice for each: once for the full sample, once for the ‘null responder’ sample.
The standard deviation of the difference of our two estimates across the 1000 replicates gives an estimated standard error for the difference, under the null. We can also obtain the correlation structure of the estimated differences for outcomes nested within a given study.
The second procedure generates uncertainty estimates for . Here we simulate a world in which attrition is determined by observed covariates . We again condition on several dimensions: N, , and . For each study, we complete the following three-step process 1000 times:
- (a)
Fit two propensity score models for attrition:
- (i)
- (ii)
- (i)
- (b)
Define a set of responders, based on each student's observed covariate profile .
- (i)
For treated units, draw from . If this equals one then the unit is a ‘null treatment responder’.
- (ii)
For control units, draw from . If this equals one then the unit is a ‘null control responder’.
- (i)
- (c)
Generate an estimate of by estimating Model 2 twice: once for the full sample, once for the ‘null responder’ sample.
Models are defined in section 4.
APPENDIX B Meta-analysis Details
Where:
- (a)
v = the mean attrition bias across all interventions and outcomes
- (b)
= the true attrition bias for outcome in intervention . This has a variance of reflecting the fact that attrition bias may vary due to context, the nature of the programme and so on.
- (c)
Observed bias deviates from underlying bias with a variance of . This sampling variation largely depends on how many schools participated in intervention w.
While individual estimates of minimize RMSE, an empirical distribution based on these estimates will underestimate the variability in bias estimates across studies and outcomes (Weiss et al., 2017). As such, we follow the procedure of Weiss et al. (2017, p.13) and scale our shrunken estimates so that their variance is equal to the estimated value of .
We follow the same procedure for other parameters of interest: .
APPENDIX C Parameter estimates
Outcome . | Project . | Year group . | No covariates . | Covariates . | ||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | |||
Reading | asp | 2 | −0.091 | −0.183 | 0.092 | −0.048 | −0.102 | 0.054 |
Maths | asp | 2 | −0.027 | −0.110 | 0.083 | 0.049 | −0.045 | 0.093 |
Reading | cmi | 6 | −0.101 | −0.290 | 0.189 | 0.009 | −0.140 | 0.149 |
Writing | cmi | 6 | −0.195 | −0.327 | 0.131 | −0.066 | −0.146 | 0.079 |
Maths | cmi | 6 | −0.051 | −0.153 | 0.102 | 0.079 | −0.024 | 0.103 |
Reading | cmp | 6 | −0.125 | −0.086 | −0.038 | −0.011 | −0.031 | 0.020 |
Writing | cmp | 6 | −0.157 | −0.062 | −0.095 | −0.069 | 0.002 | −0.071 |
Maths | cmp | 6 | −0.145 | −0.049 | −0.096 | −0.045 | 0.001 | −0.047 |
Reading | dt | 6 | −0.106 | −0.094 | −0.013 | 0.007 | −0.008 | 0.014 |
Maths | dt | 6 | −0.266 | −0.189 | −0.077 | −0.125 | −0.076 | −0.049 |
Maths | mtg | 6 | −0.074 | −0.277 | 0.204 | 0.046 | −0.238 | 0.284 |
Reading | mtg | 6 | −0.070 | −0.118 | 0.048 | 0.034 | −0.091 | 0.125 |
Maths | ref | 6 | −0.120 | −0.286 | 0.166 | −0.068 | −0.183 | 0.115 |
Reading | ref | 6 | −0.010 | −0.160 | 0.151 | 0.081 | −0.005 | 0.086 |
Maths | sm | 6 | −0.139 | −0.467 | 0.328 | −0.029 | −0.230 | 0.202 |
Reading | sm | 6 | −0.128 | −0.347 | 0.219 | −0.028 | −0.090 | 0.063 |
Reading | tott | 6 | −0.102 | −0.267 | 0.165 | −0.037 | −0.181 | 0.145 |
Maths | tott | 6 | −0.089 | −0.104 | 0.015 | −0.024 | 0.003 | −0.027 |
English | lit | 11 | 0.023 | −0.213 | 0.236 | −0.112 | −0.175 | 0.064 |
Science | tp | 11 | −0.263 | −0.613 | 0.350 | −0.175 | −0.434 | 0.259 |
Maths | tp | 11 | −0.140 | −0.227 | 0.086 | −0.146 | −0.115 | −0.031 |
English | tp | 11 | −0.302 | 0.000 | −0.301 | −0.209 | −0.046 | −0.163 |
Min | −0.302 | −0.613 | −0.301 | −0.209 | −0.434 | −0.163 | ||
Median | −0.113 | −0.186 | 0.097 | −0.033 | −0.091 | 0.072 | ||
Max | 0.023 | 0.000 | 0.350 | 0.081 | 0.003 | 0.284 |
Outcome . | Project . | Year group . | No covariates . | Covariates . | ||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | |||
Reading | asp | 2 | −0.091 | −0.183 | 0.092 | −0.048 | −0.102 | 0.054 |
Maths | asp | 2 | −0.027 | −0.110 | 0.083 | 0.049 | −0.045 | 0.093 |
Reading | cmi | 6 | −0.101 | −0.290 | 0.189 | 0.009 | −0.140 | 0.149 |
Writing | cmi | 6 | −0.195 | −0.327 | 0.131 | −0.066 | −0.146 | 0.079 |
Maths | cmi | 6 | −0.051 | −0.153 | 0.102 | 0.079 | −0.024 | 0.103 |
Reading | cmp | 6 | −0.125 | −0.086 | −0.038 | −0.011 | −0.031 | 0.020 |
Writing | cmp | 6 | −0.157 | −0.062 | −0.095 | −0.069 | 0.002 | −0.071 |
Maths | cmp | 6 | −0.145 | −0.049 | −0.096 | −0.045 | 0.001 | −0.047 |
Reading | dt | 6 | −0.106 | −0.094 | −0.013 | 0.007 | −0.008 | 0.014 |
Maths | dt | 6 | −0.266 | −0.189 | −0.077 | −0.125 | −0.076 | −0.049 |
Maths | mtg | 6 | −0.074 | −0.277 | 0.204 | 0.046 | −0.238 | 0.284 |
Reading | mtg | 6 | −0.070 | −0.118 | 0.048 | 0.034 | −0.091 | 0.125 |
Maths | ref | 6 | −0.120 | −0.286 | 0.166 | −0.068 | −0.183 | 0.115 |
Reading | ref | 6 | −0.010 | −0.160 | 0.151 | 0.081 | −0.005 | 0.086 |
Maths | sm | 6 | −0.139 | −0.467 | 0.328 | −0.029 | −0.230 | 0.202 |
Reading | sm | 6 | −0.128 | −0.347 | 0.219 | −0.028 | −0.090 | 0.063 |
Reading | tott | 6 | −0.102 | −0.267 | 0.165 | −0.037 | −0.181 | 0.145 |
Maths | tott | 6 | −0.089 | −0.104 | 0.015 | −0.024 | 0.003 | −0.027 |
English | lit | 11 | 0.023 | −0.213 | 0.236 | −0.112 | −0.175 | 0.064 |
Science | tp | 11 | −0.263 | −0.613 | 0.350 | −0.175 | −0.434 | 0.259 |
Maths | tp | 11 | −0.140 | −0.227 | 0.086 | −0.146 | −0.115 | −0.031 |
English | tp | 11 | −0.302 | 0.000 | −0.301 | −0.209 | −0.046 | −0.163 |
Min | −0.302 | −0.613 | −0.301 | −0.209 | −0.434 | −0.163 | ||
Median | −0.113 | −0.186 | 0.097 | −0.033 | −0.091 | 0.072 | ||
Max | 0.023 | 0.000 | 0.350 | 0.081 | 0.003 | 0.284 |
Table presents all values, along with their minimums, medians, and maximums, for attrition bias parameters across 10 RCTs and 22 outcomes. ‘No covariates’ indicate that attrition bias parameters have been estimated without controlling for any observed characteristics. ‘Covariate adjusted’ estimates condition on observed characteristics. In all cells we present constrained empirical Bayes estimates of parameters, as described in section 4.3 and Appendix A. The project acronyms are defined in Table 1.
Outcome . | Project . | Year group . | No covariates . | Covariates . | ||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | |||
Reading | asp | 2 | −0.091 | −0.183 | 0.092 | −0.048 | −0.102 | 0.054 |
Maths | asp | 2 | −0.027 | −0.110 | 0.083 | 0.049 | −0.045 | 0.093 |
Reading | cmi | 6 | −0.101 | −0.290 | 0.189 | 0.009 | −0.140 | 0.149 |
Writing | cmi | 6 | −0.195 | −0.327 | 0.131 | −0.066 | −0.146 | 0.079 |
Maths | cmi | 6 | −0.051 | −0.153 | 0.102 | 0.079 | −0.024 | 0.103 |
Reading | cmp | 6 | −0.125 | −0.086 | −0.038 | −0.011 | −0.031 | 0.020 |
Writing | cmp | 6 | −0.157 | −0.062 | −0.095 | −0.069 | 0.002 | −0.071 |
Maths | cmp | 6 | −0.145 | −0.049 | −0.096 | −0.045 | 0.001 | −0.047 |
Reading | dt | 6 | −0.106 | −0.094 | −0.013 | 0.007 | −0.008 | 0.014 |
Maths | dt | 6 | −0.266 | −0.189 | −0.077 | −0.125 | −0.076 | −0.049 |
Maths | mtg | 6 | −0.074 | −0.277 | 0.204 | 0.046 | −0.238 | 0.284 |
Reading | mtg | 6 | −0.070 | −0.118 | 0.048 | 0.034 | −0.091 | 0.125 |
Maths | ref | 6 | −0.120 | −0.286 | 0.166 | −0.068 | −0.183 | 0.115 |
Reading | ref | 6 | −0.010 | −0.160 | 0.151 | 0.081 | −0.005 | 0.086 |
Maths | sm | 6 | −0.139 | −0.467 | 0.328 | −0.029 | −0.230 | 0.202 |
Reading | sm | 6 | −0.128 | −0.347 | 0.219 | −0.028 | −0.090 | 0.063 |
Reading | tott | 6 | −0.102 | −0.267 | 0.165 | −0.037 | −0.181 | 0.145 |
Maths | tott | 6 | −0.089 | −0.104 | 0.015 | −0.024 | 0.003 | −0.027 |
English | lit | 11 | 0.023 | −0.213 | 0.236 | −0.112 | −0.175 | 0.064 |
Science | tp | 11 | −0.263 | −0.613 | 0.350 | −0.175 | −0.434 | 0.259 |
Maths | tp | 11 | −0.140 | −0.227 | 0.086 | −0.146 | −0.115 | −0.031 |
English | tp | 11 | −0.302 | 0.000 | −0.301 | −0.209 | −0.046 | −0.163 |
Min | −0.302 | −0.613 | −0.301 | −0.209 | −0.434 | −0.163 | ||
Median | −0.113 | −0.186 | 0.097 | −0.033 | −0.091 | 0.072 | ||
Max | 0.023 | 0.000 | 0.350 | 0.081 | 0.003 | 0.284 |
Outcome . | Project . | Year group . | No covariates . | Covariates . | ||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | |||
Reading | asp | 2 | −0.091 | −0.183 | 0.092 | −0.048 | −0.102 | 0.054 |
Maths | asp | 2 | −0.027 | −0.110 | 0.083 | 0.049 | −0.045 | 0.093 |
Reading | cmi | 6 | −0.101 | −0.290 | 0.189 | 0.009 | −0.140 | 0.149 |
Writing | cmi | 6 | −0.195 | −0.327 | 0.131 | −0.066 | −0.146 | 0.079 |
Maths | cmi | 6 | −0.051 | −0.153 | 0.102 | 0.079 | −0.024 | 0.103 |
Reading | cmp | 6 | −0.125 | −0.086 | −0.038 | −0.011 | −0.031 | 0.020 |
Writing | cmp | 6 | −0.157 | −0.062 | −0.095 | −0.069 | 0.002 | −0.071 |
Maths | cmp | 6 | −0.145 | −0.049 | −0.096 | −0.045 | 0.001 | −0.047 |
Reading | dt | 6 | −0.106 | −0.094 | −0.013 | 0.007 | −0.008 | 0.014 |
Maths | dt | 6 | −0.266 | −0.189 | −0.077 | −0.125 | −0.076 | −0.049 |
Maths | mtg | 6 | −0.074 | −0.277 | 0.204 | 0.046 | −0.238 | 0.284 |
Reading | mtg | 6 | −0.070 | −0.118 | 0.048 | 0.034 | −0.091 | 0.125 |
Maths | ref | 6 | −0.120 | −0.286 | 0.166 | −0.068 | −0.183 | 0.115 |
Reading | ref | 6 | −0.010 | −0.160 | 0.151 | 0.081 | −0.005 | 0.086 |
Maths | sm | 6 | −0.139 | −0.467 | 0.328 | −0.029 | −0.230 | 0.202 |
Reading | sm | 6 | −0.128 | −0.347 | 0.219 | −0.028 | −0.090 | 0.063 |
Reading | tott | 6 | −0.102 | −0.267 | 0.165 | −0.037 | −0.181 | 0.145 |
Maths | tott | 6 | −0.089 | −0.104 | 0.015 | −0.024 | 0.003 | −0.027 |
English | lit | 11 | 0.023 | −0.213 | 0.236 | −0.112 | −0.175 | 0.064 |
Science | tp | 11 | −0.263 | −0.613 | 0.350 | −0.175 | −0.434 | 0.259 |
Maths | tp | 11 | −0.140 | −0.227 | 0.086 | −0.146 | −0.115 | −0.031 |
English | tp | 11 | −0.302 | 0.000 | −0.301 | −0.209 | −0.046 | −0.163 |
Min | −0.302 | −0.613 | −0.301 | −0.209 | −0.434 | −0.163 | ||
Median | −0.113 | −0.186 | 0.097 | −0.033 | −0.091 | 0.072 | ||
Max | 0.023 | 0.000 | 0.350 | 0.081 | 0.003 | 0.284 |
Table presents all values, along with their minimums, medians, and maximums, for attrition bias parameters across 10 RCTs and 22 outcomes. ‘No covariates’ indicate that attrition bias parameters have been estimated without controlling for any observed characteristics. ‘Covariate adjusted’ estimates condition on observed characteristics. In all cells we present constrained empirical Bayes estimates of parameters, as described in section 4.3 and Appendix A. The project acronyms are defined in Table 1.
![Distribution of Δ̂ and Δ̂X for control arms (top panel) and treatment arms (bottom panel) [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssa/184/2/10.1111_rssa.12677/1/m_rssa_184_2_732_fig-10.jpeg?Expires=1747889746&Signature=heBPzMAgFz0bo5aHBwxWa1pyN3BLyl0MS4OJkewYxuMcPVlbRIyg2rn9A0AZELwhutok97Ft182Q08InqASbFii4m-udDqpAchQWhhccdIdYGIMvsrid8K2h~lOAuWYFsN~pEQCnXtZL4DY9OpcXyTkOfWFhazY230C-WvfXP1w51g5zhJAj7KlhqK9kpgK2qSAekNxVDdLPF5HvMeFtnrSEolAzkv8LcptMd57twzCrJjM27mJTMkzu04gzh9CN6IoyMHw3Nu5YSkcvFGey2MF5hDrNUacFqI2kyezSuEJ7ZkXMUkhx8NgmNrxuyPmPTEbMjPETXHU1QYtvFPJPzw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Distribution of and for control arms (top panel) and treatment arms (bottom panel) [Colour figure can be viewed at https://academic.oup.com/]