Decomposing Learning Inequalities in East Africa: How Much Does Sorting Matter?

Economic inequalities reﬂect inequalities in educational opportunities which in turn<br>are due to both household and school-related factors. Although these factors plausibly<br>co-vary, few studies have considered the extent to which sorting between schools and<br>households might aggravate educational inequalities. To ﬁll this gap we develop a novel<br>variance decomposition and apply it to data on educational outcomes for over 1 million<br>children from East Africa. Our results indicate that sorting accounts for as much as 8<br>percent of the test-score variance, a ﬁgure similar in magnitude to the contribution of<br>differences in school quality alone. Empirical simulations of steady-state educational<br>inequalities show that policies to mitigate sorting between households and schools<br>could further reduce educational inequalities over the long-run substantially, equal to<br>cutting the inter-generational persistence of educational attainment by more than half.


Introduction
Children in low-and middle-income countries face a wide range of educational and developmental challenges (Pritchett, 2013;Black et al., 2017;Lu et al., 2020). Furthermore, average national conditions mask considerable within-country variation, evidenced by large gaps in educational achievement along dimensions typically associated with household income inequalities (e.g., Watkins, 2012;Behrman, 2020). Inequalities associated with educational achievement feed into subsequent labour market inequalities, as well as longerterm differences in social mobility. And while the presence of such educational inequalities represents a call to action, as per the UN's Sustainable Development Goal to 'leave no child behind', its drivers remain only partly understood, particularly in lower-income countries.
In this paper we focus on educational inequality arising from one potentially important yet understudied source -sorting between schools and households. While a vast literature attests that both household and school factors are key determinants of educational achievement (e.g., Coleman, 1966;Björklund and Salvanes, 2011;Hanushek and Rivkin, 2012), these factors are unlikely to be independent. Various processes, such as residential segregation (Hanushek and Yilmaz, 2007) or endogenous selection of teachers into schools (Hanushek et al., 2005), have been connected to differences in the quality of public goods. Sorting between households and schools -defined generically as the covariance in quality-adjusted inputs -may thus accentuate inequalities in educational opportunities. Where material, sorting may not only confound precise estimation of the unique contribution of schools or households to educational outcomes, but it would also justify enhanced attention to how these inputs are distributed.
To date, the bulk of research on educational sorting pertains to richer countries. Various studies have shown that children from disadvantaged families disproportionately attend lower-quality schools (Nechyba, 2006;Hsieh and Urquiola, 2006), in part because teachers in socio-economically deprived areas tend to be less experienced or less qualified (Jackson, 2009;Clotfelter et al., 2011;Sass et al., 2012;OECD, 2018). For the USA, Jackson (2013) shows the quality of the match between teachers and children is economically important, representing around two thirds the explanatory power of teacher effects. Even so, the overall significance of these processes remains unsettled. Kremer (1997) argues that eliminating neighbourhood segregation would decrease long-run educational inequality in the USA by less than two percent. In contrast, Fernández and Rogerson (2001) develops a model where enhanced sorting can have much larger effects on inequality (also Fernandez, 2003); and data from Canada suggests that elimination of sorting either by home language or by parental education could reduce test-score variance as much as 40 percent in some subjects, 1 at least in locations where school segregation is substantial (Friesen and Krauth, 2007).
In developing countries, there is growing awareness of the need for education policies and systems as a whole to work better to enhance learning (World Bank, 2018). Early research, including of inequality, tended to focus on shortcomings in specific inputs. In Guatemala for example, McEwan and Trowbridge (2007) decompose the large achievement gap between indigenous and non-indigenous children, arguing it is predominately attributable to schooling attributes rather than family background; and similar findings have been found for other Latin American countries (McEwan, 2004). However, evidence from other countries, like Vietnam, of more equitable treatment of disadvantaged children (Glewwe et al., 2017) highlights the importance of systemic policies. More recently it has been noted that some innovative educational policies, such as vouchers, might increase segregation and even harm public schools (Mbiti, 2016). Yet there is still almost no research on sorting in these contexts, which -if material -could substantially undermine the efficacy of educational policies.
One reason for the lack of attention to sorting is the methodological challenges it presents, particularly when the factors of interest are incompletely observed. Our first contribution here, therefore, is to set out a new decomposition procedure that provides estimates of the variance contributions of two fixed-effects (households and schools) plus their covariance (sorting). Our approach has advantages over existing methods, which require highly interconnected data, and thus is more widely applicable in low-income country contexts where data tends to be limited. Furthermore, our approach is quite general and can be easily adapted to other contexts, such as to quantify the degree of sorting of teachers into schools or workers into firms.
Our second contribution consists in applying our procedure to a large dataset on educational achievement, comprising over one million children in three East African countries (Kenya, Tanzania and Uganda). This analysis yields some of the first estimates of the impact of household-school sorting on educational inequalities, as well as the magnitude of the standalone variance contributions of household and school factors. In addition, we simulate the long-term gains for educational equality were the magnitude of sorting to be reduced.
To preview our main findings, the East African data reveals a very substantial degree of spatial segregation in age-adjusted educational outcomes across communities -i.e., location does matter for learning. But even after controlling for this form of spatial clustering, we find the contribution of sorting of children across schools within communities is positive, accounting for around 8 percent of the total test-score variance. This means that almost a fifth of the joint variation in test-scores due to systematic supra-individual circumstances -due to schools, communities, and households -can be attributed to sorting. Further analysis of heterogeneity suggests the same sorting contribution tends to be larger among families that send their children to private schools, as well as in communities with lower school competition and greater socioeconomic inequality.
Our simulations show that for the average district in the region, the steady-state level of educational inequality would fall by around 15 percent if sorting between households and schools were fully eliminated and up to 30 percent if sorting across communities were also cut to zero. Roughly speaking, this would be equivalent to cutting the magnitude of the inter-generational persistence of education by more than half. Moreover, in a significant number of districts, the reduction in inequality from eliminating all forms of sorting would be over 40 percent of the total variance. Together, our results suggest that if educational policies in lower-income countries can counter the consequences of sorting between schools and households, there is much potential to narrow existing gaps in education achievement.
The rest of the paper is structured as follows: Section 2 provides a brief review of existing literature on the drivers of educational inequality. Section 3 introduces our proposed variance decomposition method. Section 4 describes the data. Section 5 presents the main empirical results from the decomposition analysis as well as the policy simulations. Section 6 summarises and concludes.

Drivers of educational inequality
At the most basic level, drivers of educational inequality might be taken to encompass any factor that produces differences in learning outcomes across children. However, following Roemer (1996) who advocates for a focus on equality of opportunity, being the most universally supported conception of justice in advanced societies, we narrow our attention to inequality of educational opportunity (IEO), defined as those factors beyond the influence of individual children that affect learning outcomes.
Without attempting to cover the huge literature on IEO (for discussion see Reynolds and Teddlie, 2000;Gamoran and Long, 2007;Raffo et al., 2007;Watkins, 2012), as a prelude to the remainder of the paper, Table 1 provides a selective sample of existing studies that shed light on the contribution of supra-individual factors to inequalities in test-scores, where the studies were chosen to cover the predominant range of methods and (to the extent possible) consider developing country contexts. As per most accounting procedures, all these studies measure outcome inequality using the variance. This reflects three attractive properties: the variance is ordinally invariant to standardization procedures often used to express test-scores on a comparable scale (Ferreira and Gignoux, 2014); the linear additive nature of the variance makes it straightforward to isolate the contributions of individual factors (Shorrocks, 1982) and it permits sub-group decomposability (Chakravarty, 2001).
The studies in the table are classified according to four distinct methodological approaches. Panel (i) highlights studies that focus on a single main factor, such as households or schools. In most cases, these studies recognise that observed dimensions of the chosen factor (e.g., parental education or books per pupil) are partial, necessitating use of random-(RE) or fixedeffects (FE) procedures that treat the specified factors as unobserved. Panel (ii) highlights two recent studies that take the fixed-effects approach further. Namely, based on a first-stage regression containing a single main unobserved fixed-effect (classrooms), the authors are able to further partition each estimated effect into two or more sub-components, focusing in particular on the 'permanent' contribution of individual teachers. The studies in panel (iii) use mixed-effects estimators, which represent an alternative and popular means to estimate unobserved effects at multiple levels. But this necessitates making strict assumptions about the correlation structure of the unobserved factors, including their mutual independence.
The studies in panel (iv) relax this latter assumption, but do so by combining one unobserved factor (estimated via fixed-effects) and proxying the other factor by observed covariates (OE). In so doing, they are able to estimate the covariance between the two factors (the sorting contribution; see below). For instance, in his analysis of rich test-score data from North Carolina, Mansfield (2015) finds that student background (which includes, family and individual characteristics as well as past performance) accounts for 60% of the variation in outcomes, while overall school quality (including teachers) accounts for about 6% and the covariance of these factors is also positive and similar in magnitude to the school effect at 4% of the variance.
Our classification hints at some of the methodological challenges plaguing identification of the drivers of IEO. First, treating a factor as unobserved requires multiple observations for each element (index) of each factor. Data limitations thus explain why measures of the contribution of family background to IEO in developing countries have predominantly relied on observed proxies (e.g., Ferreira and Gignoux, 2014). To our knowledge, in such contexts these so-called sibling correlations have only been estimated for income and educational attainment (e.g., Dahan and Gaviria, 2001;Louw et al., 2007;Emran and Shilpi, 2015), not learning outcomes. Second, while scholars have largely converged on value-added models to identify the causal effects of specific inputs within education production functions (see Todd and Wolpin, 2003), these models cannot easily incorporate multiple high-dimensional unobserved factors. Under conventional OLS estimation methods, combining a lagged outcome with fixed-effects introduces bias (Nickell, 1981), which will only be negligible if a large number of observations are available to identify each effect. And in mixedeffects estimators, strict exogeneity of the (initial) unobserved effects must be assumed for value-added specifications to be consistent.
Third, from the perspective of estimating the contribution of sorting, existing studies either rule this term out by assumption or reduce one of the factors to its observed components, implying only a partial view of this factor and its (co)variance can be captured. For this reason, such models have been rarely used either to consider questions of sorting or to estimate the relative importance of different components of IEO. Technical challenges in accurate estimation of multiple unobserved fixed-effects have also limited the development of more flexible models. Early on, Abowd et al. (2002) showed that the joint contribution of two-way fixed-effects can be obtained fairly easily (for implementation in Stata see Correia, 2017). The drawback is that where the individual effects themselves are of direct interest, their pairwise correlation tends to be biased downwards. Andrews et al. (2008) demonstrate this is driven by a quasi-mechanical relation, whereby if one factor (e.g., household-effects) is over-estimated then on average the other factor (e.g., schools) is under-estimated. And, as Jochmans and Weidner (2019) prove, this bias is larger where there is limited mobility or overlap of units across different effects, the limiting case being where their bipartite graph is disconnected.
Development of two-way fixed effects estimators has focussed predominantly on biascorrections in the context of the contribution of workers and firms to the variance of log-earnings. Bonhomme et al. (2020), for instance, show the variance contribution due to the sorting of workers across firms increases from around zero to 30% when bias-corrections are applied. Even so, not only do these methods remain work-in-progress, but also they demand rich longitudinal data, with a very large connected set of factors. In our case, this would require observations of members of the same household across different schools, including connections that span different geographic locations. This kind of data is rarely available even in high-income countries (for an exception, see Thiemann, 2018), motivating an alternative approach to identify sorting, to which we now turn.
Before doing so, a comment on the findings summarised in Table 1 is warranted. While it is inappropriate to compare effect sizes across studies due to differences in tests, methods and populations, it is nonetheless clear that variation in school (teacher) quality constitutes just one part of overall IEO. As also summarised by Hanushek and Rivkin (2012) for the USA, less than a quarter of the variance in test-scores can be attributed to dispersion in teacher quality. In contrast, as stressed early-on by the Coleman Report (1966), the home environment consistently seems to be of great(er) importance, often accounting for more 6 than half of the overall test-score variance (also Björklund and Salvanes, 2011). And this broad pattern appears to extend to lower-income countries, although previous studies have not covered both types of factors simultaneously.

Preliminaries
To fill the methodological gap in the literature, we propose a flexible decomposition procedure. This has the advantage of treating two separate effects as unobserved (here, households and schools), it does not impose any a priori structure on their correlation structure, and it also does not depend on the (biased) factor covariance identified from a two-way fixed effects estimator. To begin, consider a simple (unconditional) factor model: where t is a measure of educational achievement (e.g., test-scores); indexes i, j and k refer to individual children, families and schools respectively; factor h j represents the contribution of all factors shared by children in the same household to test-scores, including common genetic endowments; and s k represents the contribution of the given school to their learning. In both cases we assume these measures are comprehensive, in the sense that they reflect cumulative inputs until now. Finally, e ijk represents the remaining individual or idiosyncratic variation, which we assume is orthogonal to the household-and school-effects: E(e ijk |s k , h j ) = 0.
To make this framework tractable for an empirical analysis of inequality, we must place some structure on the IEO term, f (·). With respect to the levels expression in equation (1), we adopt a simple additive linear model. However, without further assumptions, this does not pin-down the (co)variance structure. To make this clear, Table 2 describes four alternative cases, where each row invokes specific assumptions about the level and variance of t. Row 1 invokes a zero-covariance assumption, corresponding to the most restrictive random-effects models described earlier (e.g., panel iii, Table 1). However, this effectively rules-out a range of sorting processes, including residential segregation, school choice (by parents) and some allocation rules.
Following Solon et al. (2000) and others, a practical way to relax this restriction involves regressing the outcome on a set of household dummies only. As per Row 2 of the table, this will capture both the stand-alone household-effect and the effect of all unobserved variables correlated with this factor (also Raaum et al., 2006;Björklund and Salvanes, 2011). To see this, imagine part of the contribution of schools to learning reflects a direct causal effect of (average) constituent households, such as when households make direct financial or time commitments to school functioning. This kind of mechanism is also suggested by versions of cream-skimming models, where average peer quality in a school (or class) is driven by household characteristics, which in turn directly influences individual achievement (Walsh, 2009). So, assuming s can be partitioned into a component that is oblique or parallel to h plus an orthogonal component ν, with own variance σ 2 ν , we have: The corollary is given in Row 3, where household-effects are assumed to be (partial) reflections of given school-effects, plus an orthogonal component (ω). In both these cases, the observed covariance between household-and school-effects is attributed wholly to one of the factors, meaning there is no remaining covariance by construction. These two models thus directly provide upper-bounds on the variance components of interest, while lowerbounds can be calculated in a two-step procedure (e.g., Raaum et al., 2006;Bau and Das, 2020) -in the first stage a one-way fixed-effects model is estimated; and the second effect is extracted from the combined residuals -e.g., ν k = 1

Methods
Interpreting upper-bound variance contributions as gross effects (including sorting) and lower-bounds for the same factor as net effects (excluding sorting), a natural approximation to the overall contribution of the factor covariance is their arithmetic difference. This is similar in spirit to approaches used to tighten estimated teacher effect variance estimates (as in McCaffrey et al., 2009;Hoffmann and Oreopoulos, 2009). 1 Formally, Appendix A shows how we can re-express both the household and school lower-bound variance contributions in terms of their 'true' effects: σ 2 ω = (1 − ρ 2 hs )σ 2 h and σ 2 ν = (1 − ρ 2 hs )σ 2 s , where ρ hs is the pairwise factor correlation.
Combining these expressions yields the following statement for the difference in the household upper-and lower-bounds: This states that the previously-defined range for the household variance contribution (on the LHS) represents a strict upper bound on the sorting term. The bias is positive and increasing in the factor correlation, reflecting that the two lower-bound contributions subtracted from the test-score variance only exactly offset the 'true' contributions associated with each individual factor in the special case of zero-covariance. 2 The advantage of this approximation is that it is straightforward. However, further elaboration reveals a complete decomposition is also feasible. Applying previous expressions, rewrite the unrestricted linear model as: which is just a quadratic in ρ. 3 It follows that once three specific quantities are knownnamely, the two lower-bound (uncorrelated) variance components and the total variance jointly attributable to the two latent factors (σ 2 t − σ 2 e ), then the unknown correlation coefficient can be obtained as the root to equation (4). Where there is doubt as to the relevant sign of the pairwise correlation, the approximate bounds can provide helpful guidance -in principle, the upper-and lower-bound difference could be small or even negative (e.g., for −1 ≤ γ < 0). And as summarised in Appendix Table E1, any of the variance decomposi-1 Thanks to an anonymous referee for highlighting this approach. 2 Due to the equality of the upper-bound household and school models (Rows 2 and 3, Table 2) it follows that the same expression gives the difference in the school upper-and lower-bounds. 3 Specifically: (ρ 2 hs − 1)(σ 2 t − σ 2 e ) + 2ρ hs σ ω σ ν + (σ 2 ω + σ 2 ν ) = 0.
tion components of interest, including the contribution of sorting as well as the upper-and lower-bound terms, can be calculated from three primitives alone -namely: σ 2 ω , σ 2 ν , ρ hs .

Implementation
The ultimate objective of the variance decomposition is to estimate the three primitives. As already suggested, however, upper-and lower-bound estimates of household and school effects can be derived using separate one-way fixed-effect models. These directly permit calculation of the sorting approximation (equation 3); and the correlation parameter can be derived in an additional step. Details of this procedure, which we refer to as indirect fixed-effects (FEi), are set out in Appendix B.
A drawback of approaches using high-dimensional fixed-effects is that estimates of latent factors generally include measurement error, meaning the corresponding raw-variance shares will be upwards biased (Koedel et al., 2015). To address this, we use empirical Bayes shrinkage, which involves adjusting each estimated effect toward a common prior by a factor proportional to the estimated noise-to-signal ratio in the original estimates. Following Stanek et al. (1999), we shrink each estimated fixed effect (e.g.,ĥ j ) toward a global mean as follows:h with shrinkage factor 0 < ψ = σ 2 Here, N j is the effective degrees of freedom available to estimate each of the j effects; σ 2 h is the raw-variance of the estimated household fixed effect; σ 2 is the estimated overall residual variance; andh j is the sample fixed-effect mean, typically zero under conventional normalization restrictions.
A complementary approach to dealing with the presence of sorting is suggested by Altonji and Mansfield (2018) (hereafter, AM18). Motivated by a concern that sorting on unobserved variables may bias estimates of the effects of group-level inputs, such as schools, these authors show that where households with different characteristics also have differing effective preferences (willingness-to-pay) for underlying school or community amenities, group averages of observed household or pupil characteristics can be employed as control functions. Inclusion of these generated variables, henceforth referred to as sorting proxies, in a regression model thus permits identification of a lower-bound on the unique (variance) contribution of school factors. Here the specification of interest becomes: where h O,j are observed household-level covariates;h O,k are the set of sorting proxies derived directly from the former. Treating the terms in brackets as (uncorrelated) household and school random-effects, the specification can be estimated using a linear mixed-effects estimator (denoted MLM; as used by AM18).
An advantage of this procedure, which is useful for triangulation, is that it relies on a single estimation equation. A drawback is that the random-effects are assumed to be orthogonal to the set of included covariates, implying equation (6) may remain mis-specified. 4 Furthermore, while the variances of the random effects are estimated as model parameters, their associated best-linear-unbiased predictors (BLUPs or conditional modes) typically do not share the same variance-covariance structure as the estimated population-level moments (see Morris, 2002). Consequently, the BLUPs may not be reliable for the purposes of investigating systematic patterns in the random effects (e.g., subgroup heterogeneity). 5 Consequently, as an additional validation procedure, we extend the AM18 approach to incorporate fixed-as opposed to random-effects terms. As described in Appendix B, this involves modifying the indirect estimation procedure whereby the sorting proxies are used directly to account for variation in test-scores before estimation of the (one-way) household and school fixed-effects. This procedure is denoted FEd.
It merits emphasis that under all three methods we seek to directly estimate the (upper-) and lower-bound variance components, not the 'true' components (σ 2 h , σ 2 s ). We further note that in the absence of longitudinal data, both households and schools are only encountered in the same locations and, thus, may share a common community (neighbourhood) component. To capture this explicitly, we further elaborate the household and school effects as follows: where z represents the joint contribution of schools and households in community l; and c is the location-specific effect. Admittedly, the nature and magnitude of these latter effects remains controversial (e.g., Oreopoulos, 2003). However, there is growing evidence that the quality of local community environments can have a material influence on child development trajectories (Chetty et al., 2016;Chetty and Hendren, 2018;Black et al., 2017;Richter et al., 2017). Furthermore, ignoring the contribution of the latter term would mean they are simply absorbed by the upper-bound household or school effect estimates, in turn muddying the interpretation of the variance components. Consequently, in the empirical implementation we treat community effects as a separate factor. In doing so, we estimate community effects first, in keeping with upper-bound fixed-effects models, in turn imposing the assumption that the estimated household/school-effects are orthogonal to the community effects (see Appendix B for details). As such our estimates of sorting should be net of any selection of households or schools into different communities, meaning the estimated contribution of sorting (defined above) should be interpreted in a narrow sense -i.e., it refers to processes of sorting between schools within a specific location. 6 Similarly, the household and school variance components will capture the average contribution (variation) of these factors within communities.

Data
Since 2010, the Uwezo initiative has undertaken large-scale household-based surveys of academic achievement in Kenya, mainland Tanzania and Uganda. 7 The surveys target children residing in households aged between the official starting-school age and 16 and are representative at both national and district levels. Excluding the initial surveys, five rounds of the Uwezo surveys are publicly available (2011)(2012)(2013)(2014)(2015) and used here. For each household, the surveys collected information covering general characteristics, as well as the demographic and educational details of resident children (e.g., age, gender, whether or not attending school, etc.). Also, all children of school age were individually administered a set of basic oral literacy and numeracy tests, which were tailored to each country and varied by survey round based on a common template to reflect competencies stipulated in the national curriculum at the grade 2 level.
Importantly, the Uwezo tests are not adapted to the children's ages or their completed level of schooling. Given that they focus on basic competencies, there are strong agerelated differences, which affect both the level and variance of scores between age cohorts. Consequently, we exclude children above 14 years old since they tend to perform at the 6 Note, however, that unlike in many advanced countries, migration driven by school choice is not common in the East Africa region, at least at young ages. Not only are barriers to internal migration generally recognised as high (Hirvonen, 2016), but the drivers of migration are predominantly of an economic nature and migrants are typically younger adults (without children) (see Beegle et al., 2011). 7 The approach adopted by Uwezo has been inspired by exercises carried out in India by the Assessment Survey Evaluation Research Centre (ASER). For further details and comparison to other regional assessments see Uwezo (2012) upper end of the tests (and show much lower variance). Our analytical sample also excludes observations that can be perfectly predicted using either household or school fixed-effectsi.e., singletons have been removed. Appendix Table E2 summarises alternative metrics of learning for this sample, pooling data from all countries and rounds. The first part shows the raw literacy and numeracy test-scores, described in detail in Jones et al. (2014). The literacy tests refer to the national languages of instruction in which pupils are tested at the end of primary school -i.e., English and Kiswahili in Kenya and Tanzania; and just English in Uganda. For consistency, we then take the main language as the overall literacy score, which is combined with the score in maths to give the overall mean.
Simple averages of scores on different tests may provide poor guidance to underlying levels of ability, particularly where tests are composed of questions of variable difficulty (Bond and Lang, 2013). Thus, we use two other approaches to combine the information on maths and literacy into an overall test-score. First, we take the first principal component of the suite of tests answered by each child (denoted PCA); second, in line with the literature, we apply a graded response IRT model to the same tests, where the empirical Bayes mean of the latent trait is taken as the metric of achievement (ability). 8 Since neither the PCA or IRT scores have a natural scale, for each country and survey round we standardize them to have a mean of zero and standard deviation of one. These scores are shown in the second part of Table E2. Notably, they are extremely highly correlated with one another and with the raw mean (correlation > 0.90).
Since the same Uwezo test forms are administered to children of all ages in each household, there is likely to be remaining between-cohort variation in test-scores. From the present perspective, however, this can be considered unwanted noise (see Mazumder, 2008). Thus, to ensure the score standard deviations are constant across cohorts over time, we further standardize the overall scores by age group within each country and survey year. These scores are shown in the final part of the table. They continue to be highly mutually correlated and, reflecting their attractive theoretical properties and general usage, we focus on the IRT scores henceforth. 9 Appendix Table E3 further reports regional means and standard deviations of the raw and IRT test-scores for the analytical sample. The first (column I) reports the simple means of the raw competency tests (ordinal scores); column II reports the IRT scores standardized by country and round, but not age; and column III reports the final measures, including age standardization. As can be seen, movement from the second to the third metric constitutes a simple monotone transformation. Also, as per the methodological discussion of Section 3.2, the rank position of each region according to its test-score variance is largely preserved, regardless of the transformation applied.
To implement the decomposition procedures, household and school indexes must be defined. The former is trivial -unique indexes are ascribed to all siblings in the same household (in each year). 10 The school-effects are less straightforward. In the present data, limited information about schools is provided. Nonetheless, we know the kind of school attended (none, public or private) and whether or not children attend the specified main public school in their local community (catchment area). Consequently, for each enumeration area, we categorise children into one of four school categories: (1) those not attending school; (2) those attending the main local public school, which we denote as 'matched'; (3) those attending other public schools; and (4) those attending private schools.
The advantage of our definition of school effects is that, within each household, children can be mapped to different school-effects -i.e., the school and household-effects are crossed. The main downside is that for the last two school categories we do not identify which specific school they attend; and the first category is composed of a mix of children that either never enrolled or dropped-out. Recognising that estimates of the school effects for these three groups are likely to be noisy averages of multiple underlying effects, unless otherwise indicated we present our main results for the matched group only. In doing so, however, we first run estimations on the full sample (e.g., to ensure household effects are identified correctly), but then report variance decompositions for the chosen sub-group. 11 Full results are also presented for comparative purposes.
Further descriptive statistics for the dataset are reported in Appendix Table E4. This shows the number of unique children (i), households (j) and schools (k) covered in the dataset. Additionally, the table reports summary statistics, including average child characteristics and schooling status indicators (those out of school, the shares attending the specific matched public schools, and those attending private schools). Overall, these indicate the sample is comprehensive and balanced (by age and gender). 12 It also reveals there are systematic differences in schooling among countries as well as across regions within each countrye.g., in all countries enrolment rates differ substantially across regions (e.g., from 80.0 to 96.5 percent in Kenya) as do the shares of children attending private school.

Variance bounds
Motivated by our interest in sorting, Appendix D provides a preliminary review of the overall degree of spatial clustering or segregation. This confirms that clustering is indeed significant and that household-, school-and community-level factors all merit attention. For instance, using separate one-way fixed effects models, around 50 percent of the variation in educational achievement can be attributed to households (equal to the sibling correlation), but 30 percent of the variation in educational achievement can be attributed to factors shared by all children in the same community. Table 3 moves on to the variance decomposition, presenting results from the upper-bound models for households (HUB) and schools (SUB), estimated using separate one-way fixed effects estimators (the FEi approach) as described in Appendix B. Panel (a) focusses attention to the sub-sample of individuals attending the matched public school; and panel (b) refers to the full sample. To account for observed individual-level covariates, such as gender, grade-for-age, birth order and whether they have ever attended school, these variables are added to the factor models; and their combined variance contribution is calculated manually from the corresponding estimates, denoted the 'individual' effect. Given both these and the community-level effects are estimated before extracting the separate contributions of the unobserved factors, their variance shares are fixed across the two models -i.e., following Section 3, the difference between the household and school models only concerns where the factor covariance is attributed. Both absolute and relative variance contributions are reported in the table, were the former is reported in standard deviation units (×100). Standard errors are calculated via a clustered bootstrap procedure.
Three immediate findings merit comment. First, our models are able to account for about half of the variation in test-scores. This is broadly consistent with studies elsewhere (see below), but confirms most variation in outcomes occurs between children in the same household. Second, consistent with Appendix D, while the community effect is sizeable, both the household and school upper-bound variance contributions are large. For the full sample these represent 32% and 20% of the total variance, respectively; however, in the matched sample, the school effect contribution drops to 13%. This gap implies almost half of the variation associated with schooling relates to differences across types of schools, including not being enrolled, a point to which we return below. Even so, the variation between matched public schools remains sizeable -a one standard deviation increase in school quality corresponds to a 0.34 standard deviation improvement in test-scores. Lastly, the (upper-bound) approximation to the sorting component is highly material. For the matched sample this equals 13% of the variance, or 16% for the full sample. These magnitudes are far from trivial, being just marginally smaller than the upper-bound estimates for the school effects or about half the upper-bound household effect contribution. According to the Uwezo data, an additional year of schooling is associated with an approximate 0.18 standard deviation increase in a child's test-score ceteris paribus. Against this benchmark, a shift of one standard deviation in either the quality of (public) schools or degree of sorting would be roughly equivalent to about two years of schooling. However, since this estimate of the sorting term is expected to be biased upwards, we now turn to results from the full decomposition.

Complete decomposition
Using these lower-bound estimates, Appendix Table E5 reports estimates for the three primitives required to identify the within-community sorting component. It also sets out the corresponding estimates from the two other empirical methods, based on AM18 (see Section 3.2; Appendix B) -namely, the mixed-effects model (MLM) and direct fixed-effects (FEd) approach. 13 Together these form the basis for the complete decomposition, the results of which are reported in Table 4 for both the matched and full samples. The relative contributions of the different constitutive components to total IEO, defined as the overall variance contribution of systematic components above the level of the individual, are further illustrated in Appendix Figure F1.
As can be seen from the primitives, the three decomposition methods yield highly consistent results and, hence, the different methods provide similar estimates for the various final variance components. Despite some differences in precise magnitudes, discussed further below, this suggests the present results are robust to the method chosen (e.g., mixed-versus fixed-effects) and, more specifically, that our simple indirect approach (FEi) performs very adequately. In the full sample, we find a positive correlation between the householdand school-effects, ranging between 0.32 and 0.43. Consequently, the derived withincommunity sorting component, shown in Table 4, remains substantial and is often similar in magnitude to the stand-alone contribution of schools. And although the estimates for this component are all lower than the earlier approximation (as expected), they nonetheless confirm that within-community sorting makes a non-trivial contribution to the total variance in test-scores, equal to around 8%.
Five further points stand-out. First, as before, the primary difference between the matched and full sample estimates refers to the contribution of schools. And, as this variance component feeds into the sorting component, the latter also remains somewhat smaller in the matched sample. The main implication is that sorting of children from distinct types of households across different types of schools, as well as at the extensive margin of schooling, is a material aspect of the overall covariance between household and school effects.
Second, differences between the three empirical methods appear to derive from two main sources. On the one hand, some of the differences for the correlation and sorting components reflect the sensitivity of calculations based on equation (4). Recall we estimate ρ hs from the household-and school-effect lower-bounds. Thus, even minor differences in either of these components, such as estimates for school effects under the FEi procedure, influence the estimates for the correlation term.
On the other hand, the mixed-and fixed-effects approaches treat the community effects in a slightly different fashion. In the latter case, community effects are removed from the joint fixed effect (households plus schools) in the first of the series of orthogonal projections (see Appendix B). As such, and in similar fashion to the upper-bound models discussed in Section 3, the estimated community variance contribution will capture both the direct effect of shared community-level factors on outcomes (e.g., via environmental conditions) plus any covariance with either household or school factors. Thus, any possible sorting of households (or teachers) across different communities will be absorbed by the community effect. 14 In the mixed-effects approach, all terms are estimated simultaneously, meaning that a strict distinction between within-and between-community components is not enforced ex ante. Although we implement a correction for this to obtain the uncorrelated school and household components (see above), this methodological difference is likely to explain some of the disparities in our estimates of the community effect and the uncorrelated variance components. However, overall, the qualitative differences in the results from the different approaches are trivial.
Third, in absolute and relative terms, the stand-alone contribution of schools might be deemed moderate, representing no more than 10 percent of the total variance and even less for the matched sample (7 percent). However, these magnitudes are not small when compared with school effects estimates in the literature (see Table 1). Also, AM18 estimate that both school and community effects jointly explain around 5 percent of the variation in high school graduation rates (lower-bounds) in the USA. Here, schools alone account for roughly the same amount of variation in achievement; and combined with community effects, the joint contribution is nearly 20 percent (Table 4). Similarly, the joint contribution of schools and communities taken from the school upper-bound estimates (Table 3), which are generally more directly comparable to estimates found elsewhere (e.g., Freeman and Viarengo, 2014) are substantial. Our estimates for East Africa place this joint effect at around 25 percent, which is similar to the average school-effect variance contribution calculated by Pritchett and Viarengo (2015) for a range of countries. Thus, while the present results show some similarities with findings elsewhere, the material contribution of residential location within the joint school/community effect stands out.
Fourth, as partly indicated, while all sources of inequality are substantial, systematic environmental and institutional factors that lie at levels above the household clearly merit consideration in the assessment of IEO. As shown in Figure F1, the combination of schools, communities and sorting is typically greater than the household contribution alone, accounting for around half of the total IEO. Not only does this reinforce our interest in quantifying multiple sources of IEO simultaneously. But also, assuming that household-based inequalities are generally persistent, it suggests the scope for policy measures to reduce such inequalities are substantial.
Last, we recognise the residual component remains large, accounting for roughly 50% of the test-score variance. On the one hand, this may well reflect the limitations of our explanatory variables. For instance, we do not measure the extent to which children within the same household receive different parental inputs, nor do we capture differences in the quality of teachers within each school. These factors will effectively remain in the residual, thereby implying we may be under-estimating the magnitude of IEO, which underlines the importance of the research agenda on measuring teacher effects (e.g., Buhl-Wiggers et al., 2018;Bau and Das, 2020) and variable household inputs (e.g., Das et al., 2013). On the other hand, a large residual is not inconsistent with substantial differences in ability and effort, as well as other idiosyncratic factors at the individual-level. Indeed, studies elsewhere suggest these effects are likely to be highly material (see Table 1

Heterogeneity
Averages can often hide substantial heterogeneity. Thus, to look behind the pooled results, we take advantage of the properties of the variance and calculate the decomposition for specific subgroups. These results are presented in Table 5, where we stratify individuals along various dimensions -gender, age group, schooling effect category and household SES tercile. For simplicity, and given the broad consistency across the empirical procedures, hereafter we only present results for the direct fixed-effects method. This is chosen as it gives variance component estimates that lie mostly in-between those of the other two methods. 15 Focusing on the main insights, there large differences in the variance components over the sub-groups. 16 Nonetheless, the four explained variance components above the level of the individual are all generally larger for younger children as well as among children who are not in school. Recall that the school effects treat all children within each community who are not enrolled as a separate category. Thus, the variance decomposition for the various categories of enrolled children captures variation between schools within each category, while results for the subgroup of non-enrolled children captures variation in environmental effects outside of the schooling system. In this sense, as the correlation coefficient and the school effects are largest in the non-enrolled category, this suggests schools do play a role in reducing inequality of opportunity. At the same time, among the three school type categories, the contribution of variation between schools is moderately larger in the private sector. While this may reflect measurement error, it may also reflect differences between lower-and higher-cost private schools, where these segments serve children of quite different backgrounds.
A related form of heterogeneity runs along spatial lines, which is particularly relevant from the perspective of pursuing geographically-targeted interventions. Here, an advantage of the Uwezo data is that we can run the same analysis at the district level. Appendix Figure F2 plots the cumulative empirical district-level distributions of the relative variance shares for the four main IEO components taken from the FEd estimates, by country. These show substantial variations across all components within each country, but country-specific differences are also evident. The structure of (relative) IEO appears most distinct in 15 For reference, Appendix Table E6 presents the identical subgroup decomposition in absolute magnitudes; and Tables E7 and E8 present the same relative and absolute results based on the mixed-effects method, which provides the largest estimates of the sorting component among the three methods. 16 Statistically speaking, based on a non-parametric medians test, across all variance components the differences between groups are generally significant at the 1 percent level, expect for between boys and girls (details on request); however, the relative magnitude of these differences are generally within ±10% of the overall effect. Tanzania, where the contribution of communities, sorting and households tends to be much larger on average. Even so, within countries, the variance contributions of the different components all display a wide range in comparison to the mean.

21
Leveraging this granular data, Table 6 presents results from a seemingly-unrelated regression of the five absolute variance components (for the full sample) on a set of explanatory variables, which reflect the composition of the sample in each district. Recognising that lower-level components of IEO are likely to influence higher-level components, we structure the system recursively. Five results stand-out. First, while we make no claims these results are causal, it is evident that the variance components do vary systematically across districts -i.e., around half of the variation in component magnitudes or more can be attributed to the included covariates. Second, gender differences are notable. Higher female shares are strongly associated with lower contributions of both individual and sorting effects, suggesting educational investments for male children are the typical focus of active choice.
Third, school status matters. The regression estimates employ a baseline category of children attending a public school in Kenya. Given higher shares of children not attending school (either never enrolled or dropped out) are associated with much larger school effects, implies attending school significantly compensates for other location-based differences in learning. Also, a greater share of children attending private school is also associated with a moderately larger school effect, indicative of significant differences in quality between public and private schools.
Fourth, a simple metric of school competition, defined as one minus the normalized Herfindahl index calculated from the share of children attending each of the three different school type categories, is relevant. In particular, greater school competition is associated with larger household and schooling effects, but lower sorting. This may reflect the point that districts with greater diversity in household types demand wider school choice and (naturally) a greater variation in schools is associated with larger dispersion in school quality. Even so, in the minimum, this reinforces the conclusion that schools do matter. However, the negative association with sorting would point to other mechanisms at play -e.g., contrary to a standard cream-skimming hypothesis, greater competition would provide those households with preferences for higher quality education to choose accordingly, regardless of their own 'type'. So, competition may not be inequality-enhancing.
Fifth, socio-economic inequality within each district, as measured by the standard deviation of the SES variable, remains material to IEO. Not only is there a clear positive association between the magnitude of the household effect and the extent of local inequality (as expected), but also districts with greater SES inequality appear to display larger school and sorting contributions. So, holding other conditions fixed, less economically equal districts show greater variation in school quality and more acute sorting. This is consistent with the idea that, even among public schools, poorer communities generally have access to poorer quality schools.

Long-run inequalities
As a final exercise we follow the spirit of Kremer (1997) and consider the implications of sorting for long-run (steady-state) educational inequalities. The key idea here is that in the presence of positive educational sorting, children from more advantaged families receive a double benefit that can be expected to exacerbate inequalities over time. Such children benefit both from family circumstances and from higher-quality schooling, which has a multiplicative effect across generations.
In order to quantify the potential magnitude of these effects, we move to a dynamic or intergenerational setting. Furthermore, we adopt the simple formulation that household effects (h) are directly proportional to parental achievement, reflecting both parental capacity to support learning and other characteristics that flow from their educational level. Indexing generations by g, and using previous notation, these assumptions imply h jg ≈ h j0 + δt j,g−1 , which in turn means: whereδ = σ 2 ωg /σ 2 t g−1 is the sorting-invariant or raw inter-generational persistence parameter. From this, we can then solve for the long-run steady-state level of educational inequality: where σ 2 c is the upper-bound variance contribution associated with residential location, which we treat as exogenous for simplicity; (see Appendix A for derivation).
To simulate this model, a value for the raw persistence parameter needs to be chosen. To do so, we opt for a data-driven approach and make the assumption that currently-observed inequality is approximately at the steady-state (i.e., σ 2 t = σ 2 t ), allowing us to solve forδ using equation (9) based on previously-estimated values of the primitives. This gives us the magnitude of the persistence parameter that would maintain achievement inequality at current levels. Fixingδ at this value also means that simulations based on changes to (other) primitives in the model can be interpreted directly with reference to the current magnitude of achievement inequality. Appendix Figure F3 plots the resulting district-level cumulative distributions of the inequality-constant persistence parameter, which range from 0.34 to 0.60; and the corresponding pooled overall coefficient is 0.48. These magnitudes are broadly consistent with inter-generational regression coefficients in educational attainment found elsewhere. 17 In other words, our baseline assumption is likely to be conservative. 17 For instance, Hertz et al. (2007) estimate an average persistence parameter of 0.42 for a range of mostly developed countries. However, existing estimates for (low income) developing countries typically point to somewhat larger values. For instance, Emran and Shilpi (2015) find a persistence parameter of around 0.50 in India; and in a sample of African countries, Azomahou and Yitbarek (2016) find an average persistence of 0.66. These values are generally larger than our own estimates, implying a higher steady-state level of educational inequality than observed today.
With the constant-inequality persistence parameter in hand, we now use equation (9) to simulate the steady-state level of inequality under alternative assumptions for other inputs. We consider three scenarios: (I) we set the within-community sorting correlation coefficient ρ to zero, under which the previous equation simplifies to σ 2 t = (σ 2 ν + σ 2 c + σ 2 e )/(1 −δ 2 ); (II) we remove the variance contribution due to community effects, setting it to zero; and (III) we combine the assumptions of scenarios (I) and (II) simultaneously. Looking across all districts, Appendix Table E13 summarises the results and Appendix Figure F4 shows them visually. On average, eliminating within-community sorting leads steady-state inequality to fall by about 15 percent relative to the baseline. As per equation (9), this underlines that eliminating educational sorting would lead to a substantially larger reduction in long-run inequality than might be suggested by the relative variance contribution of sorting alone (e.g., Figure F2). Indeed, the relative variance contribution due to sorting is roughly only half the expected proportional decline in inequality if sorting were to be reduced.
Eliminating within-community sorting does not address segregation of either households or teachers (schools) by residential location, as captured by the community effect. Our earlier results indicated this term was material, potentially reflecting significant barriers to internal mobility in the East African context. Exogenously cutting out this component, as per scenario (II), generates a similar reduction in inequality compared to the first scenario (14 percent on average), but with more dispersion. Combining these two assumptions, as in scenario (III), equates to a fall in long-run inequality by about one third, on average. To put this magnitude of reduction in context, it is equivalent to a very substantial fall in the intergenerational persistence of educational achievement. The final column of Table E13 (also Figure F4, panel IV) reports the percentage reduction in the persistence parameter required to match the effect of scenario (III) -namely, eliminating both within-community sorting and between-community segregation. On average, the equivalent reduction is around 60 percent, implying a mean persistence parameter of 0.20 (versus 0.48 in the baseline).

Conclusions
There is continued debate regarding the extent to which sorting exacerbates inequalities, including in educational attainment and achievement. We contribute to this debate by setting-out and implementing a novel variance decomposition, identifying sorting as the contribution of the covariance between household-and school-effects to the variation in educational outcomes. We apply this procedure to quantify the contribution of sorting to learning inequalities among over 1,000,000 children in three East African countries: Kenya, mainland Tanzania and Uganda.
To estimate the contribution of sorting we first propose a simple bounding procedure, based on separate fixed-effects estimators, where sorting is given by the difference between the upper-and lower-bound contributions of households (or schools). Second, we show how the same estimates can be combined to back-out the factor correlation coefficient, from which the covariance is derived straight-fowardly. To triangulate our results based on fixed-effects estimators, we further adapt the approach of AM18 (Altonji and Mansfield, 2018), who show how group means of observed factors can absorb confounding effects of sorting on unobserved characteristics. This helps isolate the independent contributions of the factors of interest, from which lower-bound variance contributions can be estimated using mixed-effects estimators. Third, we extend the AM18 approach to allow for fixed-as opposed to random-effects, which is both computationally more tractable and better suited to explore sub-group heterogeneity.
Empirically, all three estimation procedures for the complete decomposition indicate positive sorting of pupils across schools. We estimate that sorting accounts for around 8 percent of the total test-score variance and almost a fifth of the joint variation in test-scores due to schools, communities and households. The sorting contribution tends to be larger among families with children not enrolled in school, among those sending their children to private schools and in locations where there is less school competition or greater socio-economic inequality. We further find the stand-alone contribution of schools and households to the variance is around 10% and 20%, respectively, and where the former is in keeping with the order of magnitude of institutional-level effects found elsewhere on the continent.
To explore the implications of these results, we conduct simulations of how learning inequality evolves over time. These show that for the average district in the region, the steady-state level of educational inequality would fall by around 15 percent if withincommunity sorting between schools and households were to be fully eliminated. Moreover, in a number of districts, the reduction in inequality from eliminating this form of sorting would be over 20 percent. But if between-community sorting were also cut to zero, this would have an effect equivalent to cutting the inter-generational persistence of educational attainment by more than half. Overall, this suggests that policies that take sorting into account, and even actively counteract it by allocating additional resources (e.g., better teachers) to schools in disadvantaged locations, merit serious consideration.

A.1 Correlated and uncorrelated components
The uncorrelated (lower-bound) school variance share can be expressed as: where σ 2 t is the total variance to be decomposed taken from Row 2 of Table 2. The second line draws on the unrestricted linear model in Row 4 of Table 2, which specifies the sorting component separately; the third line restates the covariance term term using the pairwise factor correlation (ρ); and movement from the third to the fourth line derives from the definition of the covariance term from equation (2), as: Cov(h j , s k ) ≡ Σ sh = γVar(h j ). Last, the final line uses the fact that γ defines the slope of the (approximate) linear relation betweenh and s: γ = ρ hs (σ s /σ h ).

A.2 Steady-state inequality
Based on the unrestricted linear model of Table 2, combined with equations (7) and (8), education inequality evolves as follows: In long-run equilibrium it must hold that ∀n > 0 : σ 2 t g+n = σ 2 tg = σ 2 t , thus:

B Empirical steps
Following Section 3.2, we recognise that estimates of separate fixed-effects (e.g., household and schools) in a two-way setting are expected to be biased from a mechanical negative pairwise covariance. As such and excluding the distinct issue of measurement error, there is no reason to suspect this concern affects estimates of their joint contribution. In light of this, to undertake the variance decompositions we begin by running a full model with multiple fixed effects. Then we apply a sequence of orthogonal projections to identify the uncorrelated components of the estimated joint effect.
Concretely, for the indirect fixed-effects procedure (FEi) we proceed as follows: 1. use the fully-specified model to obtain the joint contribution of the latent factors (z jkl ); 2. projectẑ jkl on the location indexes to obtain the (upper-bound) location-specific effect; 3. separately project the household-and school-effect indexes on the fitted residual joint factor from Step 2, (ẑ jkl −ĉ l ), in each case obtaining their corresponding upper-bound contributions; and 4. separately project the household-and school-effect indexes on the relevant residuals from Step 3 to obtain their lower-bound contributions (e.g., the school lower-bound is estimated from the residual from the household upper-bound:ẑ jkl −ĉ l −ĥ * k , h * k = h j + γh jk ).
For the extension of the AM18 approach to allow for fixed-effects (FEd), we proceed as follows: 1. estimate a fully-specified model, including two-way fixed-effects, to obtain the joint contribution of the latent factors (z jkl , see equation 7); 2. projectẑ jkl on the location indexes to obtain the (upper-bound) location-specific effect (ĉ l );

project the residual from
Step 2 (ẑ jkl −ĉ l ) on the set of sorting proxies (h O,k ); 18 4. project the school indexes (only) on the residual component of the joint factor obtained from Step 3 (ẑ jkl −ĉ l −h O,kβ ) to obtain the uncorrelated school contribution; 5. project the household indexes (only) on the final residual taken from Step 4 to obtain the uncorrelated household lower-bound contribution (ω k ).
In both the above approaches, the sequential orthogonal projections ensure that the primitives of interest are not only mutually uncorrelated but also uncorrelated with any observed variables that enter the model, including the sorting proxies. Furthermore, any such additional control variables can be entered in a straightforward manner. For instance, individual-specific observed variables would enter the fully-specified model in Step 1 of each procedure. Also, observed household (school) characteristics are added in Step 3 of each procedure, thereby tightening the one-way estimates of the school (household) effects (see Raaum et al., 2006).

D Spatial clustering
To estimate the extent of clustering at different levels of aggregation, we compare the correlation in educational outcomes between individuals within different units, equivalent to estimating the proportion of the variance attributable to the between-group structure of the data (for similar exercises see Fryer and Levitt, 2004;Friesen andKrauth, 2007, 2010;Lindahl, 2011;Emran and Shilpi, 2015). The magnitude of the correlation across members of the same group, and how quickly this declines as we move to higher levels of aggregation, indicates the extent to which variables are spatially segregated (clustered). For instance, if local communities only contained households with exactly the same socio-economic status, then the proportion of variance in socio-economic status accounted for by communities would be equal to that accounted for by households, indicating a very high degree of clustering.
Results from this exercise are reported in Table D1, covering a range of variables and using the full sample of children. As might be expected, assuming child gender is approximately random, the between-group variance share accounted for by households is extremely low. For the remaining variables, however, clustering by households, schools or communities is much higher. For instance, more than two thirds of the variation in access to clean water is accounted for both by schools and by communities. Overall, a little more than half of the total variation in aggregate socio-economic status in the region (SES) is attributable to (average) differences between distinct communities, implying substantial levels of residential clustering or economic segregation.
Turning to the educational outcomes in the bottom of the table, we observe somewhat lower but nonetheless substantial magnitudes of clustering across communities and schools, in part reflecting variation in schooling status between children within households. 19 In terms of achievement on the Uwezo tests as measured by the age-adjusted IRT scores, the correlation between siblings is almost 50 percent, which is highly comparable to magnitudes found in a range of other countries (Section 3). More notably, however, the correlation between pupils attending the same schools, as well as between children in the same communities, is only moderately smaller. Nearly 40 percent of the overall variation in achievement is accounted for by the schooling effects, and 30 percent between distinct residential locations. And although the definition of communities is not equivalent across studies, these latter magnitudes appear to be a factor larger than encountered in developed countries. That is, the degree of clustering of both economic and educational outcomes in East Africa appears Note: cells in columns (a)-(e) report the correlation between children within the same grouping unit (e.g., households, schools, etc.), which is the adjusted R 2 from a one-way fixed-effects model, without covariates; for highest grade, this correlation is calculated conditioning on child age; final two columns report ratios.
to be large.

E Additional tables
hs Note: the table shows how the various variance bounds can be calculated directly from the underling primitives (c.f., Table E5).            Note: in the first three columns the cells report summary statistics of the percentage reduction in steady-state inequality of educational attainment associated with the indicated scenario; in the final column we report the fall in the intergeneration persistence parameter equivalent to the results under scenario (III); simulations are undertaken at the a distict-level. Note: bars indicate the percentage contribution of each factor to inequality of educational opportunity (IEO), which is the variance accounted for by systematic supra-individual factors (households, schools, communities plus their covariance); abbreviations for methods and models are as per

KE TZ UG
Note: lines indicate the cumulative distribution of of relative contributions of each factor to total test-score variance; estimates taken from the direct fixed-effects variance decomposition procedure, based on the unrestricted linear model representation; KE is Kenya; TZ is Tanzania (mainland); and UG is Uganda. Figure F3: Distribution of estimates of raw persistence parameter, by districts Note: panels (I)-(III) plot the ratio of long-run inequality to present inequality under different assumptions; in (I) we assume sorting is eliminated; in (II) we cut the variance contribution due to community effects by 2/3; and in (III) we combine the latter two; panel (IV) indicates the relative reduction in the intergenerational persistence parameter required to achieve the fall in long-run inequality indicated in panel (III) (e.g., 0.1 implies a 10 percent reduction); vertical blue line is the pooled correlation coefficient, given in the x-axis.