Abstract

We estimate the magnitude of attrition bias for 10 randomized controlled trials (RCTs) in education. We make use of a unique feature of administrative school data in England that allows us to analyse post-test academic outcomes for nearly all students, including those who originally dropped out of the RCTs we analyse. We find that the typical magnitude of attrition bias is 0.015 effect size units (ES), with no estimate greater than 0.034 ES. This suggests that, in practice, the risk of attrition bias is limited. However, this risk should not be ignored as we find some evidence against the common ‘Missing At Random’ assumption. Attrition appears to be more problematic for treated units. We recommend that researchers incorporate uncertainty due to attrition bias, as well as performing sensitivity analyses based on the types of attrition mechanisms that are observed in practice.

1 INTRODUCTION AND BACKGROUND

Attrition has been described as ‘the Achilles Heel of the randomized experiment’ (Shadish et al., 1998 p.3). Attrition looms as a threat because it can undermine group equivalence, eroding the methodological strength at the heart of randomized evaluations. In short, attrition can cause bias.

Attrition bias is the focus of this paper. We define attrition bias as the difference between the expected average treatment effect (ATE) estimate of the final analysis sample, and the ATE of the randomization sample. Our main goal is to quantify and explore the nature of attrition bias in practice. We focus on the context of education research, a field which has seen a large increase in the number of randomized experiments over the past two decades (Connolly et al., 2018).

The threat of attrition bias plays a significant role in assessing the quality of education evaluations. The What Works Clearinghouse (WWC) and the Education Endowment Foundation (EEF)—organizations responsible for setting evidence standards in the United States and the United Kingdom—both have threshold rates of attrition (EEF, 2014; WWC, 2017). Beyond these thresholds, studies officially lose credibility. In the case of the EEF, for example, if attrition is greater than 50% then the results of the evaluation are largely disregarded.

Despite the awareness of attrition as a threat to the quality of education research, remarkably little scholarship has focussed on quantifying the magnitude of attrition bias (Dong & Lipsey, 2011). The reason is simple: estimating attrition bias requires outcome information from pupils who, by definition, are no longer participating in research.

In response to this fundamental empirical challenge, existing literature has largely focussed on simulation studies. These studies demonstrate scenarios for which attrition bias is larger than the typical effect sizes in education interventions (Dong & Lipsey, 2011; Lewis, 2013; Lortie-Forgues & Inglis, 2019; WWC, 2014). Equally, it is well-known that if attrition is unrelated to either treatment status or outcomes, then randomized experiments remain unbiased regardless of the level of attrition (Little & Rubin, 2019).

While theory and simulation studies illustrate the potential for attrition bias to cause problems, they provide practitioners with limited guidance about the risk of attrition bias in practice. Deke and Chiang (2017) take the first step towards providing such guidance. They attempt to sidestep the fundamental challenge of estimating attrition bias by analysing pre-test academic achievement as proxies for post-test outcomes. Pre-tests are often completed by pupils who stay in the evaluation (responders) as well as those who ultimately drop out (attriters). Using pre-tests, Deke and Chiang estimate attrition bias for four experiments, in each case comparing the estimated sample average treatment effect (SATE) of the whole sample to the estimated SATE of responders.

While Deke and Chiang (2017) represents an important step forward, it has several limitations. First and foremost, attrition may be shaped by events that happen after randomization. For example, during the course of an evaluation, a school may experience a change of leadership. If the new leader decides that implementing a research intervention is a distraction, they may drop out of the study. The resulting attrition could lead to bias if the leadership change coincides with, or causes, a decline in academic attainment. This bias would not be captured in an analysis that used a pre-test as a proxy for post-test outcomes. Second, analysing attrition bias using pre-tests makes it difficult to know whether attrition bias is problematic conditional on predictive covariates as the pre-test—by far the most predictive covariate—is being used as the outcome. As Deke and Chiang note ‘after conditioning on the pre-test, the residual difference in the post-test between respondents and nonrespondents could be completely different from the observed pre-test difference’ (p139). Finally, the study only looked at four interventions. This makes it difficult to describe the distribution of attrition bias across studies, and to estimate the typical value of attrition bias. The relative lack of cases also makes it hard to analyse some of the factors that might moderate attrition bias.

We avoid these limitations by utilizing a unique feature of English administrative school data. Specifically, we make use of an archive of randomized controlled trials (RCTs) that can be linked to a census of pupil and school information. Ten of the RCTs in the archive met two crucial conditions: a) the original outcomes were subject to attrition; b) after the intervention, students had sat a compulsory achievement test. For the 10 RCTs in our sample, we were able to obtain post-test academic achievement outcomes for students who exited the original randomized experiments. By comparing these outcomes to the equivalent outcomes of responder students, we can estimate attrition bias.

Attrition rates in our sample of 10 RCTs were typical of education experiments. At the student level, the mean rate of attrition across the 10 studies was 19 %. A broader analysis of education RCTs in the United Kingdom also found mean student-level attrition of 19 % (n = 79 experiments, see Demack et al., 2020).

Our work connects to a vast literature on missing data. Much of that work explores techniques to impute missing values under different assumptions about the attrition mechanism, for example, data that are Missing At Random or ‘MAR’ (Brunton-Smith et al., 2014; Carpenter & Plewis, 2011; Goldstein et al., 2014; Rubin, 1987; Sterne et al., 2009). Our paper complements this literature by providing an empirical assessment of how far away from MAR attrition mechanisms tend to be in practice, and what the consequences are for estimates of average treatment effects in the context of education. We also examine how the less sophisticated, but widely used, technique of regression adjustment reduces attrition bias.

The paper makes four contributions. First, we present novel estimates of attrition bias for 10 education RCTs, spanning 22 outcomes. This advances empirical scholarship by providing estimates of bias based on post-test outcomes. Using techniques from meta-analysis, we then estimate the typical magnitude of attrition bias across studies and outcomes.

Second, we present a framework for decomposing attrition bias into four components: the rate of attrition in each treatment arm, and the association between attrition and outcomes in each arm. We quantify the magnitude of these components across 22 study-outcome pairs, and report parameter values that define how pernicious attrition mechanisms tend to be in practice.

Third, we examine the plausibility of the ‘Missing At Random’ (MAR) assumption. In most real-world situations, this assumption is untestable. In our context, however, we are able to test MAR at the study-outcome level, as well as providing a global test across all the outcomes in our sample of evaluations. We find evidence against MAR.

Finally, we provide two substantive recommendations for researchers in the field: check whether conclusions are sensitive to ‘worst-observed case’ attrition mechanisms and incorporate uncertainty from attrition bias. For both recommendations we offer simple techniques that can be used in applied research. We illustrate these techniques with the REACH evaluation, an RCT of a reading intervention in England (Sibieta, 2016).

The paper is organized as follows. Section 2 describes the data and the interventions that underpin our analyses. Section 3 defines and decomposes attrition bias. Section 4 illustrates our approach to estimating attrition bias and presents headline estimates across 22 study-outcome pairs. In section 5 we test the MAR assumption and examine some potential predictors of pernicious attrition mechanisms. Section 6 provides researchers with recommendations about dealing with attrition bias, and section 7 concludes.

2 DATA AND INTERVENTIONS

2.1 Data

Our analysis relies on a unique set of linked databases in England. The key data source is an archive of RCTs maintained by the Education Endowment Foundation (EEF). The crucial feature of the archive is that it can be linked to the National Pupil Database (NPD), a census of publicly funded schools and pupils that represents over 90 % of English school children (DfE, 2015). The NPD contains standardized achievement measures at multiple year levels, along with information about student demographic characteristics. The outcomes and covariates available for analysis are summarized in Table 1. These data provide an unusual opportunity to study attrition bias. Because the RCT archive is linked to a census that contains achievement data, we can examine post-randomization outcomes data for students who would normally be lost to research.

TABLE 1

Overview of outcomes and covariates

Student achievement (national assessments)Year 2 (age 7); end of ‘Key Stage 1’Maths, Reading
Year 6 (age 11); end of ‘Key Stage 2’Maths, Reading, Writing
Year 11 (age 16); end of ‘Key Stage 4’Maths, English, Science
Student demographicsAgeMonths of age
FSMFree-school-meal status
FemaleBinary indicator of gender
Student achievement (national assessments)Year 2 (age 7); end of ‘Key Stage 1’Maths, Reading
Year 6 (age 11); end of ‘Key Stage 2’Maths, Reading, Writing
Year 11 (age 16); end of ‘Key Stage 4’Maths, English, Science
Student demographicsAgeMonths of age
FSMFree-school-meal status
FemaleBinary indicator of gender

This table describes the outcomes and covariates available for our analyses. For outcomes in Year 6 and Year 11, we use previous student achievement tests as a covariate. Data are described in NPD (2015).

TABLE 1

Overview of outcomes and covariates

Student achievement (national assessments)Year 2 (age 7); end of ‘Key Stage 1’Maths, Reading
Year 6 (age 11); end of ‘Key Stage 2’Maths, Reading, Writing
Year 11 (age 16); end of ‘Key Stage 4’Maths, English, Science
Student demographicsAgeMonths of age
FSMFree-school-meal status
FemaleBinary indicator of gender
Student achievement (national assessments)Year 2 (age 7); end of ‘Key Stage 1’Maths, Reading
Year 6 (age 11); end of ‘Key Stage 2’Maths, Reading, Writing
Year 11 (age 16); end of ‘Key Stage 4’Maths, English, Science
Student demographicsAgeMonths of age
FSMFree-school-meal status
FemaleBinary indicator of gender

This table describes the outcomes and covariates available for our analyses. For outcomes in Year 6 and Year 11, we use previous student achievement tests as a covariate. Data are described in NPD (2015).

The outcomes of the original RCTs were researcher-administered achievement tests in literacy, mathematics and science. These outcomes were naturally subject to attrition. For each reported outcome we find an analogous outcome in the NPD. For example, the ‘Shared Maths’ RCT used two maths modules of the Interactive Computerised Assessment System as the primary outcome (Lloyd et al., 2015); to estimate attrition bias, we use the total marks on the Key Stage 2 maths assessment (NPD, 2015). In effect, we create almost-complete datasets for the original RCTs—while knowing the attrition status of students in these studies—by changing the outcome measure to one used in a compulsory test.

We sought NPD tests that were administered as soon as possible after RCT intervention had finished. For many of the RCTs, there was a short delay between the original outcome measure and the NPD outcome. Across our 10 RCTs, the median delay was 7 months.

Finally, we note that despite the possible differences in the timing and content of the tests, there were strong correlations between the original evaluation outcomes and the outcomes we use in our attrition analyses. The mean correlation across 22 outcomes was ρ̂¯=0.72. This value is attenuated by measurement error and would be strictly less than one even if the tests measured exactly the same domain at the same time.

2.2 Interventions

The 10 interventions we analyse represent all the available randomized trials from the EEF archive that met two criteria: a) the original outcome was subject to attrition; and b) pupils subsequently sat a standardized national achievement test.

The interventions were quite diverse. While the overarching purpose of all 10 interventions was to raise academic achievement, programmes pursued this mission in a variety of ways. Some were directly focused on achievement outcomes and targeted at low-achieving pupils. For example, the ‘LIT programme’ provided struggling readers in year 7 with small-group instruction for 3–4 hours each week over a period of 8 months. Other interventions were less direct. ‘Act, Sing, Play’, for example, sought to raise achievement of whole classes by running music workshops for children in year 2. There was also diversity in the ages of the pupils who participated in the 10 interventions, ranging in age from 6 to 15. Finally, there was diversity in the level of randomization: six of the studies were randomized at the school level, with the other four being randomized within school. Table 2 summarizes the interventions.

TABLE 2

Summary of interventions

InterventionBrief description of interventionPupilsAttritionOutcomes (ES)Reference
Act, Sing, Play (asp)Music and drama workshops for students in year 2, once a week for 32 weeks8947.8Maths (0.00σ)  
English (0.03σ)
Haywood et al. (2015)
Changing Mindsets INSET (cmi)Professional development course for primary school teachers in how to develop Growth Mindset in pupils1,03510.8Maths (0.01σ)  
English (−0.11σ)
Rienzo et al. (2015)
Changing Mindsets Pupil (cmp)6-week course of mentoring and workshops for year 5 students, with a focus on developing pupils’ growth mindset1958.2Maths (0.10σ)  
English (0.18σ)
Rienzo et al. (2015)
Dialogic Teaching (dt)Year 5 teachers trained to encourage dialogue, argument and oral explanation4,91821.4Maths (0.09σ)  
English (0.15σ)
Jay et al. (2017)
LIT programme (lit)Targeted literacy intervention for struggling readers in year 7 for 3-4 hours per week for 8 months.5,28619.0English (0.09σ)Crawford and Skipp (2014)
Mind the Gap (mtg)Teacher training and parent workshops, over a 5 week period, to help year 4 students be more ‘meta-cognitive’.1,49660.1Maths and Reading (0.14σ)Dorsett et al. (2014)
ReflectEd (ref)Weekly lessons for year 5 pupils over a 6- month period, focused on strategies to monitor/manage their own learning1,84315.4Maths (0.30σ)  
Reading (−0.15σ)
Motteram et al. (2016)
Shared Maths (sm)Cross-age peer math tutoring: older pupils (year 6) work with younger ones (year 4) for 20 mins per week for 2 years.3,11914.1Maths (0.02σ)  
Reading (§)
Lloyd et al. (2015)
Talk of the Town (tott)Whole-school intervention to help support the development of children's speech, language and communication.1,51214.8Reading (0.03σ)  
Maths (§)
Thurston et al. (2016)
Texting Parents (tp)Parents of secondary school pupils sent text messages about homework, upcoming tests etc., over 11 months5,02614.4English (0.03σ)  
Maths (0.07σ)  
Science (−0.01σ)
Miller et al. (2016)
InterventionBrief description of interventionPupilsAttritionOutcomes (ES)Reference
Act, Sing, Play (asp)Music and drama workshops for students in year 2, once a week for 32 weeks8947.8Maths (0.00σ)  
English (0.03σ)
Haywood et al. (2015)
Changing Mindsets INSET (cmi)Professional development course for primary school teachers in how to develop Growth Mindset in pupils1,03510.8Maths (0.01σ)  
English (−0.11σ)
Rienzo et al. (2015)
Changing Mindsets Pupil (cmp)6-week course of mentoring and workshops for year 5 students, with a focus on developing pupils’ growth mindset1958.2Maths (0.10σ)  
English (0.18σ)
Rienzo et al. (2015)
Dialogic Teaching (dt)Year 5 teachers trained to encourage dialogue, argument and oral explanation4,91821.4Maths (0.09σ)  
English (0.15σ)
Jay et al. (2017)
LIT programme (lit)Targeted literacy intervention for struggling readers in year 7 for 3-4 hours per week for 8 months.5,28619.0English (0.09σ)Crawford and Skipp (2014)
Mind the Gap (mtg)Teacher training and parent workshops, over a 5 week period, to help year 4 students be more ‘meta-cognitive’.1,49660.1Maths and Reading (0.14σ)Dorsett et al. (2014)
ReflectEd (ref)Weekly lessons for year 5 pupils over a 6- month period, focused on strategies to monitor/manage their own learning1,84315.4Maths (0.30σ)  
Reading (−0.15σ)
Motteram et al. (2016)
Shared Maths (sm)Cross-age peer math tutoring: older pupils (year 6) work with younger ones (year 4) for 20 mins per week for 2 years.3,11914.1Maths (0.02σ)  
Reading (§)
Lloyd et al. (2015)
Talk of the Town (tott)Whole-school intervention to help support the development of children's speech, language and communication.1,51214.8Reading (0.03σ)  
Maths (§)
Thurston et al. (2016)
Texting Parents (tp)Parents of secondary school pupils sent text messages about homework, upcoming tests etc., over 11 months5,02614.4English (0.03σ)  
Maths (0.07σ)  
Science (−0.01σ)
Miller et al. (2016)

This table summarizes the 10 RCTs analysed in this paper; n = number of pupils, at randomization, with valid pupil identifiers in the National Pupil Database.

Pupil level attrition rate.

Effect size; σ is shorthand for effect size units defined as the ATE divided by the population standard deviation of the outcome measure; the first outcome listed in the evaluation is highlighted in italics.

§

The reading results for Shared Maths were not reported due to missing data; for Talk of the Town, KS2 maths was a tertiary outcome, not reported in the trial.

TABLE 2

Summary of interventions

InterventionBrief description of interventionPupilsAttritionOutcomes (ES)Reference
Act, Sing, Play (asp)Music and drama workshops for students in year 2, once a week for 32 weeks8947.8Maths (0.00σ)  
English (0.03σ)
Haywood et al. (2015)
Changing Mindsets INSET (cmi)Professional development course for primary school teachers in how to develop Growth Mindset in pupils1,03510.8Maths (0.01σ)  
English (−0.11σ)
Rienzo et al. (2015)
Changing Mindsets Pupil (cmp)6-week course of mentoring and workshops for year 5 students, with a focus on developing pupils’ growth mindset1958.2Maths (0.10σ)  
English (0.18σ)
Rienzo et al. (2015)
Dialogic Teaching (dt)Year 5 teachers trained to encourage dialogue, argument and oral explanation4,91821.4Maths (0.09σ)  
English (0.15σ)
Jay et al. (2017)
LIT programme (lit)Targeted literacy intervention for struggling readers in year 7 for 3-4 hours per week for 8 months.5,28619.0English (0.09σ)Crawford and Skipp (2014)
Mind the Gap (mtg)Teacher training and parent workshops, over a 5 week period, to help year 4 students be more ‘meta-cognitive’.1,49660.1Maths and Reading (0.14σ)Dorsett et al. (2014)
ReflectEd (ref)Weekly lessons for year 5 pupils over a 6- month period, focused on strategies to monitor/manage their own learning1,84315.4Maths (0.30σ)  
Reading (−0.15σ)
Motteram et al. (2016)
Shared Maths (sm)Cross-age peer math tutoring: older pupils (year 6) work with younger ones (year 4) for 20 mins per week for 2 years.3,11914.1Maths (0.02σ)  
Reading (§)
Lloyd et al. (2015)
Talk of the Town (tott)Whole-school intervention to help support the development of children's speech, language and communication.1,51214.8Reading (0.03σ)  
Maths (§)
Thurston et al. (2016)
Texting Parents (tp)Parents of secondary school pupils sent text messages about homework, upcoming tests etc., over 11 months5,02614.4English (0.03σ)  
Maths (0.07σ)  
Science (−0.01σ)
Miller et al. (2016)
InterventionBrief description of interventionPupilsAttritionOutcomes (ES)Reference
Act, Sing, Play (asp)Music and drama workshops for students in year 2, once a week for 32 weeks8947.8Maths (0.00σ)  
English (0.03σ)
Haywood et al. (2015)
Changing Mindsets INSET (cmi)Professional development course for primary school teachers in how to develop Growth Mindset in pupils1,03510.8Maths (0.01σ)  
English (−0.11σ)
Rienzo et al. (2015)
Changing Mindsets Pupil (cmp)6-week course of mentoring and workshops for year 5 students, with a focus on developing pupils’ growth mindset1958.2Maths (0.10σ)  
English (0.18σ)
Rienzo et al. (2015)
Dialogic Teaching (dt)Year 5 teachers trained to encourage dialogue, argument and oral explanation4,91821.4Maths (0.09σ)  
English (0.15σ)
Jay et al. (2017)
LIT programme (lit)Targeted literacy intervention for struggling readers in year 7 for 3-4 hours per week for 8 months.5,28619.0English (0.09σ)Crawford and Skipp (2014)
Mind the Gap (mtg)Teacher training and parent workshops, over a 5 week period, to help year 4 students be more ‘meta-cognitive’.1,49660.1Maths and Reading (0.14σ)Dorsett et al. (2014)
ReflectEd (ref)Weekly lessons for year 5 pupils over a 6- month period, focused on strategies to monitor/manage their own learning1,84315.4Maths (0.30σ)  
Reading (−0.15σ)
Motteram et al. (2016)
Shared Maths (sm)Cross-age peer math tutoring: older pupils (year 6) work with younger ones (year 4) for 20 mins per week for 2 years.3,11914.1Maths (0.02σ)  
Reading (§)
Lloyd et al. (2015)
Talk of the Town (tott)Whole-school intervention to help support the development of children's speech, language and communication.1,51214.8Reading (0.03σ)  
Maths (§)
Thurston et al. (2016)
Texting Parents (tp)Parents of secondary school pupils sent text messages about homework, upcoming tests etc., over 11 months5,02614.4English (0.03σ)  
Maths (0.07σ)  
Science (−0.01σ)
Miller et al. (2016)

This table summarizes the 10 RCTs analysed in this paper; n = number of pupils, at randomization, with valid pupil identifiers in the National Pupil Database.

Pupil level attrition rate.

Effect size; σ is shorthand for effect size units defined as the ATE divided by the population standard deviation of the outcome measure; the first outcome listed in the evaluation is highlighted in italics.

§

The reading results for Shared Maths were not reported due to missing data; for Talk of the Town, KS2 maths was a tertiary outcome, not reported in the trial.

2.3 Attrition in the RCTs

Attrition rates varied across studies and treatment arms, with a mean rate of 19% (see Figure 1). This mean rate of attrition is higher than typical RCTs in clinical medicine (Crutzen et al., 2013). But mean attrition in our sample of studies exactly mirrors a broader sample of education RCTs in England (Demack et al. 2020, n = 79, mean attrition = 19%) and is only marginally higher than samples of RCTs in public health interventions (Crutzen et al., 2015, n = 60 studies, median attrition = 14%), and economics (Hirshleifer et al., 2019, n = 90 studies, mean attrition = 15%).

Attrition rates. Left panel shows rates of attrition in the treatment (blue) and control (red) arms; right panel illustrates the difference between the rates of treatment attrition and control attrition [Colour figure can be viewed at https://academic.oup.com/]
FIGURE 1

Attrition rates. Left panel shows rates of attrition in the treatment (blue) and control (red) arms; right panel illustrates the difference between the rates of treatment attrition and control attrition [Colour figure can be viewed at https://academic.oup.com/]

An examination of the 10 original evaluation reports reveals that attrition was a prominent concern in the minds of the researchers. Each of the RCTs include a ‘padlock rating’, provided by peer reviewers as a measure of study quality. These ratings are based on five criteria: design, power, attrition, balance and other threats to validity. The lowest score across these criteria defines the final rating. In 5 of the 10 evaluations attrition was cited as the limiting factor. Two further trials listed ‘imbalance’ as the limiting factor, with the authors in both cases noting the role of differential attrition in creating imbalance. The prominence of attrition as a concern is a stark reminder that, from the point of view of researchers, attrition is arguably the biggest threat to generating unbiased experimental results (Deke & Chiang, 2017; Greenberg & Barnow, 2014).

3 CONCEPTUAL FRAMEWORK

3.1 Overview

Consider an evaluation with n students in the sample at randomization and let Ti be a randomly assigned binary treatment indicator for student i. In general, Ti = 1 for students assigned to treatment, and Ti = 0 for students assigned to control. We use potential outcomes notation in which Yi(t) denotes the post-treatment outcome when Ti = t. In our analyses, outcomes are standardized achievement tests across a range of domains including maths, reading and science. The estimand of interest is the finite-sample SATE. This is denoted by τFULL=EY1-Y0. The ‘full’ subscript indicates that we are interested in the average treatment effect for all the units in our original sample, before any attrition. The ‘E[]’ denotes the simple average across the n units in the sample.

Estimates of τFULL can be biased by non-random attrition. To formalize this, let Ai(t) be a potential outcome for attrition under treatment assignment t. Ai(t) can take two values: 1 indicates that a student has attritted; 0 indicates that a student remained in the evaluation. For example, Ai(1) = 1 describes a unit who attritted after being assigned to treatment (a ‘Treatment Attriter’). Figure 2 summarizes our setup. Our original treatment and control groups are directly comparable (up to imbalance caused by randomness in treatment assignment). Our evaluation sample consists only of the treatment responders and the control responders.

Conceptual overview
FIGURE 2

Conceptual overview

There are two concerns with an analysis of the evaluation sample. First, the treatment responders and control responders may not be directly comparable to each other if different types of students tended to attrite when exposed to treatment as compared to control. Consider an extreme case in which all struggling schools drop out of a particularly time-intensive treatment arm. The final treatment group would have the remaining high performers, and the control group a mix of both. This systematic imbalance could bias impact estimates due to this systematic imbalance between the two groups. We call this ‘differential attrition bias’.

There is a second, more subtle source of attrition bias. Even if we generate a correct impact estimate for our final evaluation sample, it may not generalize to the full sample. Consider the case where those who benefit most from treatment also least likely to drop out. In this case, even in the absence of differential attrition bias, impact estimates from the evaluation sample would be too high. We call this ‘generalizability attrition bias’.

Attrition bias, as we define it, encompasses both differential and generalizability attrition bias. To state our definition more formally, let τ~Responder be the expected contrast between the ‘treated responders’ and ‘control responders’:
(1)

This is not a causal effect estimand: while these two groups may share some portion of students, they do not necessarily share all of them, so our difference is not a well-defined treatment versus control contrast on the same units. τ~R is, however, what is being estimated by contrasting the outcomes of treated and control units in the responder sample. This is almost always the contrast that applied researchers examine. In a simple difference-in-means analysis, this amounts to estimating EY|T=1,A=0-EY|T=0,A=0.

Using (1), we define ‘attrition bias’ as the difference between the preferred estimand (τFULL) and the typical focus of applied researchers (τ~R):
(2)

To better understand attrition bias, we decompose β into four elements: the rate of attrition in each treatment arm (PT, PC) and the extent to which the attrition mechanism is associated with outcomes, in each arm (ΔT, ΔC). The next sub-section defines these terms and illustrates the decomposition.

3.2 Bias decomposition

Let βT be the attrition bias in the treatment arm:
(3)
 βT represents the gap, in terms of average outcome under treatment, between the full sample and the ‘treatment responders’.
Defining PT=PA1=1, and using the law of total probability, we have:
(4)
Next, let ΔT be the difference in expected outcomes between ‘treatment attriters’ and ‘treatment responders’:
(5)
Combining (5) and (6) we have:
Following the same logic on the control side, where PC=P(A(0)=1), we have:
Returning to the definition of attrition bias in (2), we have:
(6)

Equation ((5)) shows attrition bias can be conceptualized as a function of four parameters: PT,PC,ΔT, and ΔC. From (6) it is clear that attrition bias stems from asymmetries between the treatment and control side, either in terms of the attrition rate, the differences between responders and nonresponders, or both. We examine each of the elements in (6) empirically: the attrition rates PT and PC were briefly discussed in section 2 and the Δ’s will be analysed in detail in section 5.

3.3 Covariate adjustment

Our parameter β represents attrition bias when no attempt is made to account for attrition using measured pre-treatment covariates (X). We next discuss the extent to which covariate adjustment can help reduce attrition bias.

Differential attrition can cause systematic differences in X between the treatment and control groups. Conditioning on X—for example using a regression model—can limit the effect of these differences. Regression adjustment in effect estimates expected outcomes conditional on covariates, and then averages the difference of these across the shared distribution of covariates in the evaluation sample to obtain an overall adjusted impact estimate. Consider the expected outcomes for responders with a given value of X:
Our adjusted estimand is then:
The expectation is over the full distribution X in the evaluation sample. Our remaining attrition bias after adjustment is:

Importantly, regression adjustment can only correct for imbalance between the treatment and control groups in terms of observed covariates (and generally assumes a linear relationship between covariates and outcomes). If the treatment responders and control responders differ in unobserved ways, the evaluation may still suffer from differential attrition bias after adjustment. In other words, the success of adjustment depends on the MAR assumption being true. This illustrates how attrition can potentially undermine the integrity of randomized experiments, by forcing researchers to rely on adjustments and assumptions that are similar to those used in observational studies.

If MAR holds, we have ft(x)=EY(t)|A(t)=0,X=x=EY(t)|X=x, giving no remaining bias other than ‘generalizability attrition bias’, that is, the bias due to our evaluation sample not being representative of our full sample (for an overview of the generalizability literature in education, see Tipton & Olsen, 2018). To reduce ‘generalizability’ bias, researchers could estimate attrition weights, akin to sampling weights, and re-weight each unit so that the evaluation sample resembled the full sample (or, indeed, the population of interest). This would correct for generalizability attrition bias, under the assumption that Y(t) and A(t) are independent given X.

We sidestep weighting-based approaches and adjust using linear regression. Specifically, we estimate the core parameters defined above—β,βX,ΔT,ΔC,ΔTX,ΔCX —using multilevel regression models that account for the clustered structure of education data, in which students are nested within schools. We now describe our straightforward approach to estimation.

4 ESTIMATES OF ATTRITION BIAS

4.1 Estimating attrition bias

Consider Model 1, in which Yij is the outcome that student i in school j achieves, and Tij is a binary treatment indicator
 
 

For each intervention and outcome, we fit this model to two samples: the ‘responder sample’ (students who provided data for the initial evaluation) and the ‘full sample’ (all units, with valid pupil identifiers, who were recorded as being randomized). The τ estimand depends on the sample used in fitting the model. For example, when we fit Model 1 using the full evaluation sample, τ is τFULL. As such, generating an estimate of attrition bias for a particular study-outcome pair involves the following three steps:

  • (i)

    Fit Model 1 only using units from the ‘responder’ sample (all i s.t. Ai = 0). Estimate τ̂R.

  • (ii)

    Refit Model 1 using the full sample (all i). Here, the estimate of τ will be τ̂FULL.

  • (iii)

    Take the difference: β̂=τ̂FULL-τ̂R.

For simplicity we assume throughout our analysis that τ̂Full provides unbiased estimates of the ATE, that is, we make the typical assumptions about SUTVA and treatment compliance. Similarly, we assume that the regression model is correctly specified. These assumptions are not strictly required, as our primary focus is to compare how estimates change in the presence of attrition. If one of the assumptions necessary to generate an unbiased estimation of a causal estimand does not strictly hold, the change wrought by attrition is still of interest.

4.2 Attrition bias after covariate adjustment

βX measures attrition bias after using a linear model to condition on observed covariates X. Conditioning on X using a model aims to address any imbalance in observed characteristics due to attrition (or chance treatment assignment). Estimation of βX involves the same three-step process as estimation of β. The only change from Model 1 is the inclusion of covariates in the estimation model, represented by Xij:
 
 

There are two reasons to examine attrition bias in the context of covariate-adjusted impact estimates. First, we are interested in the magnitude of attrition bias in practice. As applied researchers generally adjust for covariate imbalance—and all 10 evaluations considered here adjusted for covariates—we follow this convention. Second, by estimating attrition bias both with and without covariate adjustment, we are able to examine the extent to which condition on covariates repairs attrition bias. As a final note, estimates of attrition bias may also benefit from precision gains if variation in outcomes is captured by covariates. This could improve our ability to detect attrition bias even when there is no differential attrition.

4.3 Estimates of β and βX

Figure 3 presents initial estimates of β and βX. The plot also presents 95% confidence intervals for β̂X. These uncertainty estimates are based on a simulation procedure described in Appendix  A. We use simulation-based inference rather than conventional standard errors to account for two dependencies in our data: first, the responder sample is a sub-sample of the full sample; second, within each study bias estimates across outcomes will be correlated.

Estimates of attrition bias, before conditioning (β̂) and after (β̂X)
FIGURE 3

Estimates of attrition bias, before conditioning (β̂) and after (β̂X)

Note: this figure represents initial point estimates of attrition bias before conditioning on covariates (β̂) and afterwards (β̂X). Estimates of β̂X have 95% CIs, derived from simulations described in Appendix  A. [Colour figure can be viewed at https://academic.oup.com/]

There are two things to note about Figure 3. First, it appears as though conditioning on covariates lessens attrition bias. The estimates of tend to be closer to zero than the β̂ estimates. Second, 20 of the 22 estimates of β̂X have confidence intervals that include zero.

Next, we analyse the distribution of attrition bias across interventions and outcomes. The boxplots at the bottom of Figure 3 are a useful starting point in this endeavour. However, these data are over-dispersed due to measurement error. This may create a misleading impression about the typical magnitude of attrition bias. To see why the distributions underlying the boxplots are over-dispersed, consider an attrition mechanism Anull that is completely random: AnullY1,Y0,X. If this mechanism was responsible for deleting data from each of our 10 evaluation samples, estimates of attrition bias would be non-zero even though no bias had been introduced. In other words, the raw estimates of bias presented in the boxplots include both underlying attrition bias and ‘attrition sampling variation’—that is, variation due to which units left the study.

We account for this error using tools from meta-analysis. This approach addresses two overlapping goals: to present estimated distributions for β and βX that are not over-dispersed due to sampling variation, and to estimate the typical degree of attrition bias for our setting. The aim in both cases is to help education researchers and funders understand the typical magnitude of attrition bias in typical RCTs.

Observed attrition bias estimates β̂kw are assumed to be made up of several components:
 

Where:

  • (i)

    v = the mean attrition bias across all interventions and outcomes

  • (ii)

    βkw = the underlying attrition bias for outcome k in intervention w. This has a variance of η2 reflecting the fact that not all interventions will have the same attrition bias. βkw could change due to context, the nature of the treatment, the outcome, and so on.

  • (iii)

    β̂kw = observed bias. This deviates from underlying bias βkw with a variance of σkw2, which is largely determined by the level of attrition.

Appendix  B provides details of our approach to estimating these parameters, which draws heavily on random effects meta-analysis (Higgins et al., 2009). For each intervention-outcome pair we calculate a constrained empirical Bayes’ estimate of attrition bias, β~kw (Weiss et al., 2017). The estimated distributions of β~ and β~X are presented in Figure 4.

Final estimates of underlying attrition bias (β~,β~X)
FIGURE 4

Final estimates of underlying attrition bias (β~,β~X)

Notes: the top panel shows estimates of constrained empirical Bayes estimates of β~ for 10 studies and 22 outcomes. The bottom panel is the equivalent plot after conditioning on covariates β~X. The estimated mean is shown with a grey dotted line. [Colour figure can be viewed at https://academic.oup.com/]

The top panel of Figure 4 represents our best guess at the distribution of attrition bias in cases where researchers do not adjust for observed differences between treated and control units. The mean of the estimated β~ distribution is ν̂=-0.013σ, with mean absolute value of 0.026σ (with σ being a shorthand for effect size units, with σ being equal to the standard deviation of the outcome measure). Where we control for covariates, including a pre-test, the estimated distribution of β~X has a mean of ν̂=-0.004σ, and a mean absolute value of 0.015σ. No values of β~X have a magnitude greater than 0.034σ. This suggests that, in practice, the typical magnitude of attrition bias is small, particularly when researchers have access to predicative covariates.

4.4 Contextualizing attrition bias

To put the magnitude of these attrition bias estimates into context, we offer three points of reference. First, note that the What Works Clearinghouse set a threshold for problematic bias at 0.05σ (WWC, 2014). The EEF takes a similar position, and views 0.05σ as a threshold beyond which bias becomes a substantial concern. None of the estimates of attrition bias presented above are greater than this threshold—including the estimates that do not condition on covariates.

Second, a recent meta-analysis of 14 interventions that were similar to those studied in this paper found that the typical value of selection bias due to non-random assignment was 0.15σ without covariates, and 0.03σ after controlling for observable characteristics (Weidmann & Miratrix, 2020). The findings show that, in the absence of predictive covariates, typical selection bias due to non-random assignment is roughly six times larger than typical attrition bias. When researchers have access to predictive covariates, typical selection bias is roughly twice as large as typical attrition bias.

Third, we draw readers attention to initial estimates of external validity bias due to non-random sampling. To our knowledge such estimates only exist for one programme (Reading First), and suggest that the mean bias due to non-randomly selected study samples is 0.1σ (Bell et al., 2016). Given the scarcity of evidence, we draw no firm conclusions. However, if an evaluation has the policy-relevant goal of estimating a population average treatment effect, it is plausible that the risk of external validity bias overshadows internal-validity risks. In light of this, we argue that it is an urgent priority to develop a stronger understanding of the risk of external validity bias.

5 DECOMPOSING AND UNDERSTANDING ATTRITION MECHANISMS

5.1 Decomposing the elements of attrition bias

Section 3 shows that attrition bias can be viewed as a function of four parameters: the rate of attrition in each arm (Pt, Pc) and the extent to which attrition is associated with outcomes (Δt, Δc). These parameters can be estimated using a model that accounts for the clustering of students within schools:
 
 

With this setup Δ̂C=γ̂, and Δ̂T=γ̂+δ̂. Values of Δa closer to zero suggest that the attrition mechanism in treatment arm ais less associated with outcomes.

Because education researchers frequently have access to predictive covariates—as we do for all 10 of the randomized experiments in our dataset—we also consider versions of the Δ parameters that condition on covariates (ΔCX and ΔTX). To estimate these quantities we add covariates and their interactions to Model 3:
 
 

In Model 4, Δ̂CX=γ̂2. The parameter ΔCX represents the mean difference in outcomes between ‘control responders’ and ‘control attriters’ who have the same covariate values. Δ̂TX=γ̂2+δ̂2, and is the equivalent parameter on the treatment side. While estimates of these parameters contain useful information about the nature of attrition mechanisms, our primary focus is on the distribution of the Δ parameters across studies and outcomes. To that end, note that the issue of over-dispersal, discussed above with reference to β̂ and β̂X, also applies to the Δ̂ estimates. We therefore rely on the same meta-analytic tools to model the underlying distributions of the Δ parameters.

As an example of our modelling approach consider Δ̂C,kw: the difference in the average outcomes between control attriters and control responders, for outcome k in intervention w. We model this as follows:
 

Where:

  • (i)

    ϕC = the mean difference, across all studies, in the average outcomes of ‘control attriters’ and ‘control responders’.

  • (ii)

    ΔC,kw = the underlying difference, for intervention w and outcome k, between the mean outcome of ‘control attriters’ and ‘control responders’. This has a variance of θC2 across intervention-outcome pairs.

  • (iii)

    Δ̂C,kw = the observed mean difference in the average outcome of ‘control attriters’ and ‘control responders’. This has sampling variance of ψkw2.

For analyses in which we condition on covariates, equivalent parameters have an X superscript. For example, ΔT,kwX is the difference in mean outcomes for ‘treatment attriters’ and ‘treatment responders’, after conditioning on covariates with a linear model.

We conduct four separate meta-analyses, one each for ΔC,ΔT,ΔCX and ΔTX. Once again we compute a set of 22 constrained empirical Bayes estimates for each parameter: Δ~C,Δ~T,Δ~CX and Δ~TX (Weiss et al., 2017). The results, along with core parameter estimates from the meta-analyses, are presented in Figure 5. Raw estimates of Δ̂C,Δ̂T,Δ̂CX and Δ̂TX are provided in Appendix  C (see Figure C1).

Estimates of Δ~ distributions
FIGURE 5

Estimates of Δ~ distributions

Note: Appendix  B describes the estimation approach for the meta-analyses. ϕ̂ is the estimated mean of each distribution; θ̂ is the estimated standard deviation. Due to the small number of interventions we could not use θ̂ as the basis of a confidence interval (Higgins et al., 2009). [Colour figure can be viewed at https://academic.oup.com/]

Figure 5 emphasizes two features of our data. First, conditioning on covariates substantially reduces the perniciousness of attrition mechanisms. The distributions of Δ~CX and Δ~TX are centred much closer to zero than their unadjusted counterparts. Second, it appears as though there is a systematic relationship between attrition and outcomes. Units who leave studies appear to have worse outcomes than responders, even after conditioning on covariates. This is particularly true on the treatment side. Almost all the estimates of Δ~T and Δ~TX are negative, or zero (<0.005σ). A similar effect is present on the control side, but it less pronounced.

To emphasize the potential difference between the treatment and control side, we perform a non-parametric test of the hypothesis that the mean value of Δ~ is the same for both arms, H0:ϕC=ϕT. To generate a draw under the null, we permute treatment status within each study-outcome pair and calculate mean(Δ~C)−mean(Δ~T). We compare the observed value of mean(Δ~C)−mean(Δ~T) with 10,000 draws under the null, and find evidence against the hypothesis that the means are equivalent (p = 0.01). We then test the analogous hypothesis for Δ~X and similarly find that the difference between the means is significant (p = 0.008). In both cases—with and without conditioning on covariates—attrition appears to be more problematic on the treatment side.

The meta-analyses underlying Figure 5 can also be viewed as tests of two common assumptions: ‘Missing At Random’ (MAR) and ‘Missing Completely At Random’ (MCAR). Specifically, if MCAR holds across our studies then the distributions of ΔT and ΔC will have a mean of zero. This is clearly not the case. ϕ̂T and ϕ̂C are negative and their 95% confidence intervals do not include zero (both p-values are <0.001).

There is also some evidence that attrition mechanisms do not meet the MAR assumption. In particular, ϕ̂TX=-0.11 and is significantly different from zero (p = 0.002). The picture on the control side is slightly less clear. After adjusting for covariates, the mean difference between attriters and responders is negative (ϕ̂CX=-0.04) but not significantly different from zero (p=0.11). Overall, however, these results cast substantial doubt on MAR being a plausible assumption in our context.

We see further evidence against the MAR assumption when we examine the 22 individual study-outcome pairs in our data. For each pair we use a likelihood ratio test to examine the hypothesis that δ=γ=0. When applied to Model 4, this hypothesis test is a test of MAR. After correcting for multiple comparisons (Hochberg, 1988) we find 5 of 22 reject MAR. Full results of these hypothesis tests are presented in Table 3. In sum, these findings reinforce the conclusions of the distributional analyses and suggest that, in our context, researchers cannot safely assume that attrition mechanisms will meet the MAR assumption.

TABLE 3

p-values for MCAR and MAR for individual studies

ProjectMissing Completely At Random (H0:δ=γ=0)Missing At Random (H0:δ2=γ2=0)
MathsReadingWritingEnglishScienceMathsReadingWritingEnglishScience
asp0.3940.373---0.2940.457---
cmi0.6860.0740.004--0.3140.3260.168--
cmp0.0690.5140.062--0.1020.3370.166--
dt<0.0010.052---0.0350.851---
lit---<0.001----<0.001-
mtg0.0240.337---0.0180.441---
ref0.0580.396---0.1370.468---
sm<0.001<0.001---0.0010.309---
tott0.5390.046---0.9680.112---
tp<0.001--<0.001<0.001<0.001--<0.001<0.001
ProjectMissing Completely At Random (H0:δ=γ=0)Missing At Random (H0:δ2=γ2=0)
MathsReadingWritingEnglishScienceMathsReadingWritingEnglishScience
asp0.3940.373---0.2940.457---
cmi0.6860.0740.004--0.3140.3260.168--
cmp0.0690.5140.062--0.1020.3370.166--
dt<0.0010.052---0.0350.851---
lit---<0.001----<0.001-
mtg0.0240.337---0.0180.441---
ref0.0580.396---0.1370.468---
sm<0.001<0.001---0.0010.309---
tott0.5390.046---0.9680.112---
tp<0.001--<0.001<0.001<0.001--<0.001<0.001

This table presents the p-values for hypothesis tests examining the Missing Completely At Random (MCAR; left panel) and Missing At Random (MAR; right panel) assumptions. The project acronyms are listed in Table 1. Cells with ‘-‘ represent a domain that was not tested. Bold font indicates that individual null hypothesis tests were rejected at α=0.05, after a Hochberg (1988) multiple-comparison correction.

TABLE 3

p-values for MCAR and MAR for individual studies

ProjectMissing Completely At Random (H0:δ=γ=0)Missing At Random (H0:δ2=γ2=0)
MathsReadingWritingEnglishScienceMathsReadingWritingEnglishScience
asp0.3940.373---0.2940.457---
cmi0.6860.0740.004--0.3140.3260.168--
cmp0.0690.5140.062--0.1020.3370.166--
dt<0.0010.052---0.0350.851---
lit---<0.001----<0.001-
mtg0.0240.337---0.0180.441---
ref0.0580.396---0.1370.468---
sm<0.001<0.001---0.0010.309---
tott0.5390.046---0.9680.112---
tp<0.001--<0.001<0.001<0.001--<0.001<0.001
ProjectMissing Completely At Random (H0:δ=γ=0)Missing At Random (H0:δ2=γ2=0)
MathsReadingWritingEnglishScienceMathsReadingWritingEnglishScience
asp0.3940.373---0.2940.457---
cmi0.6860.0740.004--0.3140.3260.168--
cmp0.0690.5140.062--0.1020.3370.166--
dt<0.0010.052---0.0350.851---
lit---<0.001----<0.001-
mtg0.0240.337---0.0180.441---
ref0.0580.396---0.1370.468---
sm<0.001<0.001---0.0010.309---
tott0.5390.046---0.9680.112---
tp<0.001--<0.001<0.001<0.001--<0.001<0.001

This table presents the p-values for hypothesis tests examining the Missing Completely At Random (MCAR; left panel) and Missing At Random (MAR; right panel) assumptions. The project acronyms are listed in Table 1. Cells with ‘-‘ represent a domain that was not tested. Bold font indicates that individual null hypothesis tests were rejected at α=0.05, after a Hochberg (1988) multiple-comparison correction.

5.2 What predicts pernicious attrition?

Are there situations in which problematic attrition mechanisms are more likely? Here we examine three possible predictors of non-random attrition, starting with school year. Seven of the 10 trials focused on outcomes from year 6 (age 11–12) which limits our ability to draw conclusions about the effect of year on attrition mechanisms. However, we note that two of the trials with the largest values of Δ~X were in year 11 (see Figure 6). Moreover, examination of the MAR tests at the study-outcome level shows that four of the five rejections of the MAR hypothesis came from the two trials with outcomes at year 11. As we only have two cases, we are tentative in drawing conclusions. However, we suggest that researchers working with outcomes for later year groups should be especially wary of attrition.

Δ~X by year level
FIGURE 6

Δ~X by year level

Notes:|Δ~X| is the magnitude of the association between attrition and outcomes, conditional on covariates. Each panel shows results from 10 RCTs and 22 outcomes. The left panel presents estimates from control students, the right panel from treated students. [Colour figure can be viewed at https://academic.oup.com/]

Second, we examine whether there is any association between the size of a study and the nature of attrition. We find a positive association between |Δ~X| and trial size, as seen in Figure 7. A simple regression analysis found that an increase of 1,000 students in a trial arm was associated with an increase in Δ~TX of 0.038σ and, similarly, an increase in Δ~CX of 0.036σ. The result on the treatment side is much more uncertain, due to greater variation and the strong influence of an outlier. This suggests that there may be a positive association between bigger trials and bias-inducing attrition mechanisms, although our evidence is not strong.

Δ~X by number of pupils
FIGURE 7

Δ~X by number of pupils

Notes:|Δ~X| is the magnitude of the association between attrition and outcomes, conditional on covariates. Each panel shows results from 10 RCTs and 22 outcomes (left = control arm, right = treatment arm). The solid line is from a simple linear regression. The numbers reported on the figure are Pearson correlation coefficients and associated 95% CIs. [Colour figure can be viewed at https://academic.oup.com/]

Finally, we tested the hypothesis that the rate of attrition is associated with its perniciousness. We find no evidence that attrition mechanisms are more problematic in cases where attrition is high, as summarized in Figure 8, which shows the lack of association between attrition proportion and Δ~X for both treatment and control.

Δ~X by attrition rate
FIGURE 8

Δ~X by attrition rate

Notes:|Δ~X| is the magnitude of the association between attrition and outcomes, conditional on covariates. Each panel shows results from 10 RCTs and 22 outcomes (left = control arm, right = treatment arm). The solid line is from a simple linear regression. The numbers reported on the figure are Pearson correlation coefficients and associated 95% CIs. [Colour figure can be viewed at https://academic.oup.com/]

6 RECOMMENDATIONS FOR APPLIED RESEARCHERS

Evidence presented in section 5 suggests that, in our context, outcome data cannot safely be assumed to be Missing At Random, let alone Missing Completely At Random. We argue that researchers should respond in three ways. First, adjust for covariate differences in estimating treatment effects. This is already common practice, so we do not discuss it further. Second, perform sensitivity analyses to see whether core findings are robust to the types of attrition mechanisms we observe in practice. Third, incorporate ‘attrition bias uncertainty’ into inferences. This section describes the latter two recommendations, culminating in a worked example using the evaluation of REACH programme (Sibieta, 2016).

6.1 Sensitivity analyses, based on observed attrition mechanisms

Randomized experiments frequently include sensitivity analyses to assess whether results are robust to attrition. This is often done in terms of missing-data imputation. However, given the possibility that MAR is violated, there is a strong argument to go further and explore the potential influence of attrition bias due to unobserved characteristics.

This in itself is not a novel idea and various approaches to bounding unobserved biases have been proposed (e.g. Manski, 1990). The difficulty with sensitivity analyses is that they often yield very wide ranges that far exceed typical effect sizes. In response, we argue that researchers should ground their sensitivity analyses in estimates of how bias-inducing attrition mechanisms tend to be in practice.

To do this, consider τ̂, the estimated SATE, conditional on values for ΔC=ΔC and ΔT=ΔT:
(7)
 
(8)

Nearly all randomized experiments report both impact estimates τ̂R and attrition rates (PT and PC). This means that, if analysts are willing to make assumptions about ΔT and ΔC, equation ((7)) can be widely used to assess how sensitive findings are to attrition. Meanwhile, equation ((8)) shows that the sensitivity of an impact estimate depends on two factors: differential attrition mechanisms (ΔT-ΔC) and differential attrition rates (PT-PC). For any given study, the differential attrition rate is known. Consequently, we recommend that researchers take PT and PC as given and focus their sensitivity analyses on the potential influence of differential attrition mechanisms (ΔT-ΔC).

One way to do this is to generate a simple sensitivity plot with τ on the y-axis and the differential attrition mechanism on the x-axis (either ΔT-ΔC or ΔC-ΔT, depending on whether τ̂ is positive or negative). We suggest that the x-axis span the range from ‘no differential attrition’ to a ‘worst observed case’. This ‘worst observed case’ can be defined by analyses of the sort we present above.

Under ‘no differential attrition’, ΔT=ΔC. The worst-case estimates of ΔC-ΔT will depend on whether the finding is positive or negative. Consider the case of a positive impact estimate: τ̂R>0. This finding will be undermined by positive values of ΔC-ΔT. A sufficiently large value of ΔC-ΔT will send τ to zero. To find the ‘worst observed case’ we refer researchers to Table C1 which summarizes our 22 estimates of Δ~C,Δ~T,Δ~CX and Δ~TX. Across our set of 22 outcomes, the worst observed case of attrition for a positive finding is max(Δ~C-Δ~T)=0.350. In the case when we control for covariates this value is maxΔ~CX-Δ~CX=0.284. Last, to generate the sensitivity analysis, researchers need to select a value of ΔC (or ΔCX). As a default, we recommend using the median observed value (−0.033). The choice of ΔC is unlikely to be consequential relative to the impact of differential attrition mechanism (ΔC-ΔT). However, researchers who are interested in a more comprehensive sensitivity analysis can simply calculate τ across a two dimensional grid of ΔT and ΔC values. Of course, researchers could choose any parameters they believe are appropriate to include in this grid. But, in keeping with our broader recommendations, we suggest that these values be grounded by attrition mechanisms that have been observed in practice. For example, we suggest that the grid be bounded by the maxima and minima of the relevant Δ parameters reported in Table C1.

6.2 Incorporating attrition bias into uncertainty estimates

The threat of attrition bias is a source of uncertainty in estimating the SATE on the full sample (τFULL). This uncertainty should arguably be incorporated into inferences. There are multiple possible approaches, including a fully Bayesian analysis in which the attrition mechanism in each treatment arm is explicitly modelled. For education researchers pursuing this strategy, the results we report here represent a good starting point for priors. However, in keeping with a frequentist framework, we propose augmenting conventional standard errors by the expected magnitude of attrition. The result is an inflated ‘rule of thumb standard error’ for the SATE. These adjusted SEs can be thought of as predicting total error, including the error of attrition (for an argument for interpreting SEs as an estimate of error in this way, see Sundberg (2003)). This adjustment can be done on any study, given an initial standard error and attrition rates for treated and control units. This approach can also be viewed as a less conservative sensitivity check: while the prior section investigates the worst case, the following captures a ‘typical case’ (assuming our targeted study is believable similar to our reference studies with regards to bias).

We start with the total error in our estimate, which is the sum of estimation error and attrition bias:
In the above, e is our estimation error when we take τ~^R as an estimate of τ~R; we assume it is unbiased with Eτ~^R=τ~R. To include uncertainty from attrition, we further assume that attrition bias is an unknown random variable independent from estimation error (i.e. we make the simplifying assumption that cove,β=0). This gives a mean squared error (MSE) of:
We write E[β2] because attrition is not necessarily 0 centred. The above is a decomposition of the overall error. The second term is distributed as eN0,SE2. We estimate Eβ2, tailoring it to our study s, using our reference distribution of studies and the observed levels of attrition in s. We are assuming our new study is drawn from a similar attrition distribution as our reference set. Using (6) we then have the following expression for η2, the variance of attrition bias across outcomes:
(9)

We condition on the observed PT and PC, as they are directly observed. The little information we have on EΔTΔC or EΔTXΔCX suggests a lack of clear association, so we drop this cross term (we fail to reject the null that ρΔT,ΔC=ρΔTX,ΔCX=0). Thus, (9) becomes ηs2=PT,s2EΔT2+PC,s2EΔC2, and we can calculate ηs2 based on observed attrition and estimates of the squared magnitude of ΔC and ΔT: η̂s2=PT,s2ΔT2¯+PC,s2ΔC2¯, where Δa2¯=θ̂a2+ϕ̂a2 for arm a. Estimates for θ and ϕ parameters—with and without conditioning on covariates—are presented in Table 4. Finally, we can calculate a revised uncertainty estimate, tailored to the observed level of attrition: SE^revised,s=η̂s2+SE^s2.

TABLE 4

Summary of attrition parameters

No covariatesCovariate adjusted
Δ2¯magnitude2θ2^varianceϕ̂meanΔ2¯magnitude2θ2^varianceϕ̂mean
Control0.0210.007−0.1220.0080.006−0.040
Treatment0.0650.021−0.2100.0230.011−0.107
No covariatesCovariate adjusted
Δ2¯magnitude2θ2^varianceϕ̂meanΔ2¯magnitude2θ2^varianceϕ̂mean
Control0.0210.007−0.1220.0080.006−0.040
Treatment0.0650.021−0.2100.0230.011−0.107

estimates come from our sample of 10 RCTs and 22 outcomes. For model equations and parameter definitions, see Section 5.1.

TABLE 4

Summary of attrition parameters

No covariatesCovariate adjusted
Δ2¯magnitude2θ2^varianceϕ̂meanΔ2¯magnitude2θ2^varianceϕ̂mean
Control0.0210.007−0.1220.0080.006−0.040
Treatment0.0650.021−0.2100.0230.011−0.107
No covariatesCovariate adjusted
Δ2¯magnitude2θ2^varianceϕ̂meanΔ2¯magnitude2θ2^varianceϕ̂mean
Control0.0210.007−0.1220.0080.006−0.040
Treatment0.0650.021−0.2100.0230.011−0.107

estimates come from our sample of 10 RCTs and 22 outcomes. For model equations and parameter definitions, see Section 5.1.

Attrition-adjusted standard errors, strictly larger than the unadjusted standard errors, can be interpreted as an estimate of the magnitude of our overall error (it is in fact closer to an estimate of the expected RMSE). Corresponding confidence intervals take the attrition bias as an unknown, zero-centred variable with a magnitude that depends on the proportion of units attrited and the evidence from prior studies about the link between attrition and outcomes. Studies with lower rates of attrition will have smaller adjustments.

6.3 Example: REACH reading intervention

To make these ideas more concrete, we present example analyses using the evaluation of the REACH reading intervention (Sibieta, 2016). This is a randomized experiment in the EEF archive for which the threat of attrition bias is unknown. In this case, the control units were given the treatment straight after the trial concluded as part of a ‘wait-list’ design. This makes it impossible to use our approach to estimate attrition bias.

The evaluation reports an estimated treatment effect of τ~^R=0.329 with a standard error of SE^=0.099. Attrition in the two arms of the trial was above average: PT=0.278 and PC=0.323. The researchers had access to a pre-test, along with demographic covariates, and conditioned on these variables in estimating the treatment effect. As such, we focus on ΔCX and ΔTX which describe attrition mechanisms after conditioning on covariates.

First, we conduct a sensitivity analysis. Note that as τ~^R is positive, the finding will be overturned for sufficiently large values of ΔCX-ΔTX. Using (8), we produce Figure 9:

Example sensitivity analysis for REACH
FIGURE 9

Example sensitivity analysis for REACH

Note: plot uses reported attrition rates from the REACH trial and presents sensitivity analyses according to equation ((8)), setting Δ~CX=-0.033.

Under the ‘worst-observed case’ (from the 22 in our sample) the estimated average treatment effect declines from 0.329σ to 0.251σ. While this is a marked reduction, the point estimate remains positive and the 95% confidence interval does not include zero. Overall, these sensitivity results suggest that the core finding of the REACH evaluation—that the intervention had a positive average treatment effect—is quite robust to attrition bias. This is despite the fact that the RCT suffered from a rate of attrition that, according to the EEF rating scale, would have limited the evaluation to receiving a maximum quality rating of 3 of 5.

Second, to assess uncertainty if this study was in line, in terms of attrition bias, with our reference sample, we extend our uncertainty estimates as follows:
 
 
Then, our adjusted error estimate is:

Adjusting for attrition uncertainty will generally result in quite modest changes (a 12% increase), indicating that attrition, relative to other uncertainty, is typically minor. In this case, the width of the uncertainty interval extended from 0.388σ to 0.436σ.

7 LIMITATIONS AND CONCLUSIONS

Our study of attrition bias has two main limitations. First, the census data are themselves subject to some missingness and difficulties with matching. For each RCT in the archive, we searched the National Pupil Database for students who were present at randomization. Across the 10 RCTs, an average of 1.2% of randomized students could not be matched to the NPD. A further 2.6% of pupils were missing an outcome measure, which meant they were excluded from our analysis. In total, our analysis included an average of 96.2% of the students who were recorded as being present at randomization. We note that this small level of missingness could bias our estimates of attrition bias (β~,β~X) and the nature of bias mechanisms (Δ~,Δ~X). That said, we found no association between the level of missingness in the National Pupil Database and any of the four aforementioned sets of parameters.

Second, we note that the findings we present here may not generalize easily to other settings. Even within the field of education, we are cautious about generalizing to a broad set of RCTs. The set of interventions examined here are diverse and typical of educational programs that are evaluated with RCTs. However, these interventions generally affected a small percentage of total instruction time and were relatively short lived, generally lasting less than a year. This may be particularly relevant given the evidence we present suggesting that the relationship between attrition and outcomes is stronger on the treatment side than the control. One interpretation of this finding is that the perniciousness of attrition mechanisms could be a function of intervention intensity. This conjecture is something we hope to test formally in future work. More generally, we note that the nature of attrition bias fundamentally depends on the nature of attrition mechanisms, which may differ when moving to a new setting, for example, from school to university contexts.

Raising our sights beyond the field of education we believe that, at best, our results provide weak priors on the nature of attrition mechanisms. More importantly, we hope that this work provides researchers in other disciplines with a framework and a set of tools to analyse attrition bias in their setting. While data will often be the limiting factor, the increasing prevalence of publicly available administrative datasets may provide opportunities for progress.

8 CONCLUSION

Overall, the analyses presented here suggest that the threat of attrition bias is limited in our context. While attrition is a highly salient risk, other threats—for example, external validity bias due to non-random sampling—may be substantially more problematic in terms of generating practical, useable knowledge from education evaluations.

This is not to say that attrition mechanisms can safely be treated as ‘missing at random’ or ‘missing completely at random’. We find evidence that students who leave studies tend to perform worse than those who remain. This pattern is particularly pronounced for treated students. Moreover, this tendency persists even after conditioning on baseline achievement. That said, we re-emphasize that these associations do not appear to be strong enough to induce large-scale bias.

We suggest that researchers respond to this evidence in two main ways: incorporating ‘attrition bias uncertainty’ into their inferences and completing sensitivity analyses using empirically grounded estimates of attrition mechanisms that have been observed in practice. We also recommend that, consistent with common practice, researchers present treatment effects after conditioning on observed covariates.

As more studies are added to the RCT archive, we intend to present a more detailed picture of attrition mechanisms, including an exploration of why some evaluations seem to suffer from pernicious attrition. In the meantime, it seems sensible to presume that students who are missing differ in unobserved ways from those who stay involved in research, while noting that these differences often lead to relatively minor levels of attrition bias.

ACKNOWLEDGEMENTS

We are grateful to the UK Department for Education for providing access to the National Pupil Database, and to the Education Endowment Foundation for providing access to their archive of randomized controlled trials. We are also grateful to the editors and anonymous reviewers at the JRSS(A) for their valuable suggestions. This work reflects the views of the authors and none of the aforementioned parties.

REFERENCES

Bell
,
S.H.
,
Olsen
,
R.B.
,
Orr
,
L.L.
&
Stuart
,
E.A.
(
2016
)
Estimates of external validity bias when impact evaluations select sites nonrandomly
.
Educational Evaluation and Policy Analysis
,
38
(
2
),
318
335
.

Brunton-Smith
,
I.
,
Carpenter
,
J.
,
Kenward
,
M.
&
Tarling
,
R.
(
2014
)
Multiple imputation for handling missing data in social research
.
Social Research Update
,
65
.

Carpenter
,
J.
&
Plewis
,
I.
(
2011
) Analysing longitudinal studies with non-response: Issues and statistical methods. In:
Williams
,
M.
&
Vogt
,
W.P.
(Eds.)
The SAGE Handbook of Innovation in Social Research Methods
.
SAGE Publications
.

Connolly
,
P.
,
Keenan
,
C.
&
Urbanska
,
K.
(
2018
)
The trials of evidence-based practice in education: A systematic review of randomised controlled trials in education research 1980–2016
.
Educational Research
,
60
(
3
),
276
291
.

Crawford
,
C.
&
Skipp
,
A.
(
2014
).
LIT Programme
.
IFS and NatCen
.

Crutzen
,
R.
,
Viechtbauer
,
W.
,
Kotz
,
D.
&
Spigt
,
M.
(
2013
)
No differential attrition was found in randomized controlled trials published in general medical journals: A meta-analysis
.
Journal of Clinical Epidemiology
,
66
(
9
),
948
954
.

Crutzen
,
R.
,
Viechtbauer
,
W.
,
Spigt
,
M.
&
Kotz
,
D.
(
2015
)
Differential attrition in health behaviour change trials: A systematic review and meta-analysis
.
Psychology & Health
,
30
(
1
),
122
134
.

Deke
,
J.
&
Chiang
,
H.
(
2017
)
The WWC attrition standard: sensitivity to assumptions and opportunities for refining and adapting to new contexts
.
Evaluation Review
,
41
(
2
),
130
154
.

Demack
,
S.
,
Maxwell
,
B.
,
Coldwell
,
M.
,
Stevens
,
A.
,
Wolstenholme
,
C.
,
Reaney-Wood
,
S.
et al. (
2020
)
Review of EEF Reports
.
Education Endowment Foundation
.

DfE
, (
2015
)
Statistical First Release: Schools, pupils and their characteristics: January 2015
.
UK Department for Education
.

Dong
,
N.
&
Lipsey
,
M.W.
(
2011
)
Biases in Estimating Treatment Effects Due to Attrition in Randomized Controlled Trials and Cluster Randomized Controlled Trials
. Paper presented at the Society for Research on Educational Effectiveness.

Dorsett
,
R.
,
Rienzo
,
C.
,
Rolfe
,
H.
,
Burns
,
H.
,
Robertson
,
B.
,
Thorpe
,
B.
& et al. (
2014
)
Mind the Gap
.
National Institute of Economic and Social Research
.

EEF
(
2014
)
Classification of the security of findings from EEF evaluations
.
Education Endowment Foundation
. Available at: https://v1.educationendowmentfoundation.org.uk/uploads/pdf/Classifying_the_security_of_EEF_findings_FINAL.pdf

Goldstein
,
H.
,
Carpenter
,
J.R.
&
Browne
,
W.J.
(
2014
)
Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms
.
Journal of the Royal Statistical Society Series A (Statistics in Society)
,
553
564
.

Greenberg
,
D.
&
Barnow
,
B.S.
(
2014
)
Flaws in evaluations of social programs: Illustrations from randomized controlled trials
.
Evaluation Review
,
38
(
5
),
359
387
.

Haywood
,
S.
,
Griggs
,
J.
,
Lloyd
,
C.
,
Morris
,
S.
,
Kiss
,
Z.
&
Skipp
,
A.
(
2015
)
Creative Futures: Act, Sing, Play
.
NatCen
.

Higgins
,
J.P.
,
Thompson
,
S.G.
&
Spiegelhalter
,
D.J.
(
2009
)
A re-evaluation of random-effects meta-analysis
.
Journal of the Royal Statistical Society: Series A (Statistics in Society)
,
172
(
1
),
137
159
.

Hirshleifer
,
S.
,
Ortiz Becerra
,
K.
&
Ghanem
,
D.
(
2019
)
Testing Attrition Bias in Field Experiments
. Paper presented at the Agricultural & Applied Economics Association, Atlanta.

Hochberg
,
Y.
(
1988
)
A sharper Bonferroni procedure for multiple tests of significance
.
Biometrika
,
75
(
4
),
800
802
.

Jay
,
T.
,
Willis
,
B.
,
Thomas
,
P.
,
Taylor
,
R.
,
Moore
,
N.
,
Burnett
,
C.
et al. (
2017
)
Dialogic Teaching
.
Sheffield Hallam University
.

Killip
,
S.
,
Mahfoud
,
Z.
&
Pearce
,
K.
(
2004
)
What is an intracluster correlation coefficient? Crucial concepts for primary care researchers
.
Annals of Family Medicine
,
2
(
3
),
204
208
.

Lewis
,
M.S.
(
2013
)
A Series of Sensitivity Analyses Examining the What Works Clearinghouse's Guidelines on Attrition Bias
. Dissertation, submitted for PhD at Ohio University.

Little
,
R.J.
&
Rubin
,
D.B.
(
2019
)
Statistical Analysis with Missing Data
.
John Wiley & Sons
.

Lloyd
,
C.
,
Edovald
,
T.
,
Morris
,
S.
,
Skipp
,
A.
KIss
,
Z.
&
Haywood
,
S.
(
2015
)
Durham Shared Maths Project
.
NatCen Social Research
.

Lortie-Forgues
,
H.
&
Inglis
,
M.
(
2019
)
Rigorous large-scale educational RCTs are often uninformative: should we be concerned?
Educational Researcher
,
48
(
3
),
158
166
.

Manski
,
C.F.
(
1990
)
Nonparametric bounds on treatment effects
.
The American Economic Review
,
80
(
2
),
319
323
.

Miller
,
S.
,
Davison
,
J.
,
Yohanis
,
J.
,
Sloan
,
S.
,
Gildea
,
A.
&
Thurston
,
A.
(
2016
)
Texting Parents
.
Queen's University Belfast
.

Motteram
,
G.
,
Choudry
,
S.
,
Kalambouka
,
A.
,
Hutcheson
,
G.
&
Barton
,
A.
(
2016
)
ReflectED
.
Manchester Institute of Education
.

Rienzo
,
C.
,
Rolfe
,
H.
&
Wilkinson
,
D.
(
2015
)
Changing Mindsets
.
National Institute for Economic and Social Research
.

Rubin
,
D.B.
(
1987
)
Multiple Imputation for Nonresponse in Surveys
.
New York
:
Wiley
.

Shadish
,
W.R.
,
Hu
,
X.
,
Glaser
,
R.R.
,
Kownacki
,
R.
&
Wong
,
S.
(
1998
)
A Method for Exploring the Effects of Attrition in Randomized Experiments With Dichotomous Outcomes
.
Psychological Methods
,
3
(
1
),
3
22
.

Sibieta
,
L.
(
2016
).
REACH Evaluation Report
.
Institute of Fiscal Studies
.

Sterne
,
J.A.
,
White
,
I.R.
,
Carlin
,
J.B.
,
Spratt
,
M.
,
Royston
,
P.
,
Kenward
,
M.G.
et al. (
2009
)
Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls
.
BMJ
,
338
.

Sundberg
,
R.
(
2003
)
Conditional statistical inference and quantification of relevance
.
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
,
65
(
1
),
299
315
.

Thurston
,
A.
,
Roseth
,
C.
,
O'Hare
,
L.
,
Davison
,
J.
&
Stark
,
P.
(
2016
)
Talk of the Town
.
Queens University Belfast
.

Tipton
,
E.
&
Olsen
,
R.B.
(
2018
)
A review of statistical methods for generalizing from evaluations of educational interventions
.
Educational Researcher
,
47
(
8
),
516
524
.

Weidmann
,
B.
&
Miratrix
,
L.
(
2020
)
Lurking inferential monsters? Quantifying selection bias in evaluations of school programs
.
Journal of Policy Analysis and Management
. https://doi.org/10.1002/pam.22236

Weiss
,
M.J.
,
Bloom
,
H.S.
,
Verbitsky-Savitz
,
N.
,
Gupta
,
H.
,
Vigil
,
A.E.
&
Cullinan
,
D.N.
(
2017
)
How much do the effects of education and training programs vary across sites?
Journal of Research on Educational Effectiveness
,
1
34
.

WWC
(
2014
)
Assessing Attrition Bias
.
What Works Clearinghouse
.

WWC
. (
2017
).
Standards Handbook 4.0
.
What Works Clearninghouse
.

APPENDIX A Simulation-based uncertainty estimates

To generate uncertainty estimates for attrition bias, we conduct a simulation-based procedure. We use simulation-based inference rather than conventional standard errors to account for two dependencies in our data: first, the responder sample is a sub-sample of the full sample; second, within each study bias estimates across outcomes will be correlated.

We perform two related procedures. The first generates uncertainty estimates for β̂kw. Here we simulate a world in which attrition is completely random, that is, a world in which the MCAR assumption is true by design. We condition on several dimensions: the number of pupils at randomization (N), observed attrition rate (P), the observed treatment assignment (Tobs), and observed outcomes (Yobs). For each study, we complete the following two-step process 1000 times:

  • (a)

    Permute the observed binary attrition indicator. We define the ‘null responder’ sample as all units for whom this permuted indicator is equal to zero.

  • (b)

    Generate an estimate of β̂kwNull for each outcome by estimating Model 1 twice for each: once for the full sample, once for the ‘null responder’ sample.

The standard deviation of the difference of our two estimates across the 1000 replicates gives an estimated standard error for the difference, under the null. We can also obtain the correlation structure of the estimated differences for outcomes nested within a given study.

The second procedure generates uncertainty estimates for β̂kwX. Here we simulate a world in which attrition is determined by observed covariates X. We again condition on several dimensions: N, PT,PC, Tobs and Yobs. For each study, we complete the following three-step process 1000 times:

  • (a)

    Fit two propensity score models for attrition:

    • (i)

      PiAiobs=1|Tiobs=1=logit-1γtXi

    • (ii)

      PiAiobs=1|Tiobs=0=logit-1γcXi

  • (b)

    Define a set of responders, based on each student's observed covariate profile Xi.

    • (i)

      For treated units, draw from Bern1-P̂iAiobs=1|Tiobs=1. If this equals one then the unit is a ‘null treatment responder’.

    • (ii)

      For control units, draw from Bern1-P̂iAiobs=1|Tiobs=0. If this equals one then the unit is a ‘null control responder’.

  • (c)

    Generate an estimate of β̂kwX,null by estimating Model 2 twice: once for the full sample, once for the ‘null responder’ sample.

Models are defined in section 4.

APPENDIX B Meta-analysis Details

This appendix presents our approach to generating constrained empirical Bayes’ estimates of bias. We use a meta-analysis framework to model ‘attrition sampling variation’ (Weidmann & Mirtarix, 2020). Our observed bias estimates β̂kw are assumed to be made up of several components:
 

Where:

  • (a)

    v = the mean attrition bias across all interventions and outcomes

  • (b)

    βkw = the true attrition bias for outcome k in intervention w. This has a variance of η2 reflecting the fact that attrition bias may vary due to context, the nature of the programme and so on.

  • (c)

    Observed bias β̂kw deviates from underlying bias βkw with a variance of σkw2. This sampling variation largely depends on how many schools participated in intervention w.

To estimate the variance of bias var^βkw=η̂2, we use the method-of-moments approach from Higgins et al. (2009):
Where:
Estimates of σ̂kw-2 come from the simulations under the null, described in Appendix  A: σ̂kw-2=varβ̂kw. K is the effective sample size and is a function of the total number of outcomes, N, the mean number of outcomes per study (k) and the estimated intra-class correlation of β̂ (ρ̂), as per Killip et al. (2004):
The estimate of ρ̂ comes from a multilevel model in which β̂kwNαw,σe2, αwNγ0,σa2, and ρ̂=σ̂a2σ̂a2+σ̂e2. Letting ω̂kw=σ̂kw2+η̂2-1, we estimate the mean attrition bias:
Next, we generate simple, parametric empirical Bayes estimates of the attrition bias for intervention w and outcome k:  
where λ̂kw=σ̂kw2σ̂kw2+η̂2.

While individual estimates of βkw minimize RMSE, an empirical distribution based on these estimates will underestimate the variability in bias estimates across studies and outcomes (Weiss et al., 2017). As such, we follow the procedure of Weiss et al. (2017, p.13) and scale our shrunken estimates so that their variance is equal to the estimated value of η̂2.

We follow the same procedure for other parameters of interest: βX,ΔT,ΔC,ΔTX,ΔCX.

APPENDIX C Parameter estimates

TABLE C1

Summary values of Δ~ parameters

OutcomeProjectYear groupNo covariatesCovariates
Δ~CΔ~TΔ~C-Δ~TΔ~CXΔ~TXΔ~CX-Δ~TX
Readingasp2−0.091−0.1830.092−0.048−0.1020.054
Mathsasp2−0.027−0.1100.0830.049−0.0450.093
Readingcmi6−0.101−0.2900.1890.009−0.1400.149
Writingcmi6−0.195−0.3270.131−0.066−0.1460.079
Mathscmi6−0.051−0.1530.1020.079−0.0240.103
Readingcmp6−0.125−0.086−0.038−0.011−0.0310.020
Writingcmp6−0.157−0.062−0.095−0.0690.002−0.071
Mathscmp6−0.145−0.049−0.096−0.0450.001−0.047
Readingdt6−0.106−0.094−0.0130.007−0.0080.014
Mathsdt6−0.266−0.189−0.077−0.125−0.076−0.049
Mathsmtg6−0.074−0.2770.2040.046−0.2380.284
Readingmtg6−0.070−0.1180.0480.034−0.0910.125
Mathsref6−0.120−0.2860.166−0.068−0.1830.115
Readingref6−0.010−0.1600.1510.081−0.0050.086
Mathssm6−0.139−0.4670.328−0.029−0.2300.202
Readingsm6−0.128−0.3470.219−0.028−0.0900.063
Readingtott6−0.102−0.2670.165−0.037−0.1810.145
Mathstott6−0.089−0.1040.015−0.0240.003−0.027
Englishlit110.023−0.2130.236−0.112−0.1750.064
Sciencetp11−0.263−0.6130.350−0.175−0.4340.259
Mathstp11−0.140−0.2270.086−0.146−0.115−0.031
Englishtp11−0.3020.000−0.301−0.209−0.046−0.163
Min−0.302−0.613−0.301−0.209−0.434−0.163
Median−0.113−0.1860.097−0.033−0.0910.072
Max0.0230.0000.3500.0810.0030.284
OutcomeProjectYear groupNo covariatesCovariates
Δ~CΔ~TΔ~C-Δ~TΔ~CXΔ~TXΔ~CX-Δ~TX
Readingasp2−0.091−0.1830.092−0.048−0.1020.054
Mathsasp2−0.027−0.1100.0830.049−0.0450.093
Readingcmi6−0.101−0.2900.1890.009−0.1400.149
Writingcmi6−0.195−0.3270.131−0.066−0.1460.079
Mathscmi6−0.051−0.1530.1020.079−0.0240.103
Readingcmp6−0.125−0.086−0.038−0.011−0.0310.020
Writingcmp6−0.157−0.062−0.095−0.0690.002−0.071
Mathscmp6−0.145−0.049−0.096−0.0450.001−0.047
Readingdt6−0.106−0.094−0.0130.007−0.0080.014
Mathsdt6−0.266−0.189−0.077−0.125−0.076−0.049
Mathsmtg6−0.074−0.2770.2040.046−0.2380.284
Readingmtg6−0.070−0.1180.0480.034−0.0910.125
Mathsref6−0.120−0.2860.166−0.068−0.1830.115
Readingref6−0.010−0.1600.1510.081−0.0050.086
Mathssm6−0.139−0.4670.328−0.029−0.2300.202
Readingsm6−0.128−0.3470.219−0.028−0.0900.063
Readingtott6−0.102−0.2670.165−0.037−0.1810.145
Mathstott6−0.089−0.1040.015−0.0240.003−0.027
Englishlit110.023−0.2130.236−0.112−0.1750.064
Sciencetp11−0.263−0.6130.350−0.175−0.4340.259
Mathstp11−0.140−0.2270.086−0.146−0.115−0.031
Englishtp11−0.3020.000−0.301−0.209−0.046−0.163
Min−0.302−0.613−0.301−0.209−0.434−0.163
Median−0.113−0.1860.097−0.033−0.0910.072
Max0.0230.0000.3500.0810.0030.284

Table presents all Δ values, along with their minimums, medians, and maximums, for attrition bias parameters across 10 RCTs and 22 outcomes. ‘No covariates’ indicate that attrition bias parameters have been estimated without controlling for any observed characteristics. ‘Covariate adjusted’ estimates condition on observed characteristics. In all cells we present constrained empirical Bayes estimates of Δ parameters, as described in section 4.3 and Appendix  A. The project acronyms are defined in Table 1.

TABLE C1

Summary values of Δ~ parameters

OutcomeProjectYear groupNo covariatesCovariates
Δ~CΔ~TΔ~C-Δ~TΔ~CXΔ~TXΔ~CX-Δ~TX
Readingasp2−0.091−0.1830.092−0.048−0.1020.054
Mathsasp2−0.027−0.1100.0830.049−0.0450.093
Readingcmi6−0.101−0.2900.1890.009−0.1400.149
Writingcmi6−0.195−0.3270.131−0.066−0.1460.079
Mathscmi6−0.051−0.1530.1020.079−0.0240.103
Readingcmp6−0.125−0.086−0.038−0.011−0.0310.020
Writingcmp6−0.157−0.062−0.095−0.0690.002−0.071
Mathscmp6−0.145−0.049−0.096−0.0450.001−0.047
Readingdt6−0.106−0.094−0.0130.007−0.0080.014
Mathsdt6−0.266−0.189−0.077−0.125−0.076−0.049
Mathsmtg6−0.074−0.2770.2040.046−0.2380.284
Readingmtg6−0.070−0.1180.0480.034−0.0910.125
Mathsref6−0.120−0.2860.166−0.068−0.1830.115
Readingref6−0.010−0.1600.1510.081−0.0050.086
Mathssm6−0.139−0.4670.328−0.029−0.2300.202
Readingsm6−0.128−0.3470.219−0.028−0.0900.063
Readingtott6−0.102−0.2670.165−0.037−0.1810.145
Mathstott6−0.089−0.1040.015−0.0240.003−0.027
Englishlit110.023−0.2130.236−0.112−0.1750.064
Sciencetp11−0.263−0.6130.350−0.175−0.4340.259
Mathstp11−0.140−0.2270.086−0.146−0.115−0.031
Englishtp11−0.3020.000−0.301−0.209−0.046−0.163
Min−0.302−0.613−0.301−0.209−0.434−0.163
Median−0.113−0.1860.097−0.033−0.0910.072
Max0.0230.0000.3500.0810.0030.284
OutcomeProjectYear groupNo covariatesCovariates
Δ~CΔ~TΔ~C-Δ~TΔ~CXΔ~TXΔ~CX-Δ~TX
Readingasp2−0.091−0.1830.092−0.048−0.1020.054
Mathsasp2−0.027−0.1100.0830.049−0.0450.093
Readingcmi6−0.101−0.2900.1890.009−0.1400.149
Writingcmi6−0.195−0.3270.131−0.066−0.1460.079
Mathscmi6−0.051−0.1530.1020.079−0.0240.103
Readingcmp6−0.125−0.086−0.038−0.011−0.0310.020
Writingcmp6−0.157−0.062−0.095−0.0690.002−0.071
Mathscmp6−0.145−0.049−0.096−0.0450.001−0.047
Readingdt6−0.106−0.094−0.0130.007−0.0080.014
Mathsdt6−0.266−0.189−0.077−0.125−0.076−0.049
Mathsmtg6−0.074−0.2770.2040.046−0.2380.284
Readingmtg6−0.070−0.1180.0480.034−0.0910.125
Mathsref6−0.120−0.2860.166−0.068−0.1830.115
Readingref6−0.010−0.1600.1510.081−0.0050.086
Mathssm6−0.139−0.4670.328−0.029−0.2300.202
Readingsm6−0.128−0.3470.219−0.028−0.0900.063
Readingtott6−0.102−0.2670.165−0.037−0.1810.145
Mathstott6−0.089−0.1040.015−0.0240.003−0.027
Englishlit110.023−0.2130.236−0.112−0.1750.064
Sciencetp11−0.263−0.6130.350−0.175−0.4340.259
Mathstp11−0.140−0.2270.086−0.146−0.115−0.031
Englishtp11−0.3020.000−0.301−0.209−0.046−0.163
Min−0.302−0.613−0.301−0.209−0.434−0.163
Median−0.113−0.1860.097−0.033−0.0910.072
Max0.0230.0000.3500.0810.0030.284

Table presents all Δ values, along with their minimums, medians, and maximums, for attrition bias parameters across 10 RCTs and 22 outcomes. ‘No covariates’ indicate that attrition bias parameters have been estimated without controlling for any observed characteristics. ‘Covariate adjusted’ estimates condition on observed characteristics. In all cells we present constrained empirical Bayes estimates of Δ parameters, as described in section 4.3 and Appendix  A. The project acronyms are defined in Table 1.

Distribution of Δ̂ and Δ̂X for control arms (top panel) and treatment arms (bottom panel) [Colour figure can be viewed at https://academic.oup.com/]
FIGURE C1

Distribution of Δ̂ and Δ̂X for control arms (top panel) and treatment arms (bottom panel) [Colour figure can be viewed at https://academic.oup.com/]

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)