Income Segregation and Intergenerational Mobility Across Colleges in the United States∗

We construct publicly available statistics on parents’ incomes and students’ earnings outcomes for each college in the U.S. using de-identified data from tax records. These statistics reveal that the degree of parental income segregation across colleges is very high, similar to that across neighborhoods. Differences in post-college earnings between children from lowand high-income families are much smaller among students who attend the same college than across colleges. Colleges with the best earnings outcomes predominantly enroll students from high-income families, although a few mid-tier public colleges have both low parent income levels and high student earnings. Linking these income data to SAT and ACT scores, we simulate how changes in the allocation of students to colleges affects segregation and intergenerational mobility. Equalizing application, admission, and matriculation rates across parental income groups conditional on test scores would reduce segregation substantially, primarily by increasing the representation of middle-class students at more selective colleges. However, it would have little impact on the fraction of low-income students at elite private colleges because there are relatively few students from low-income families with sufficiently high SAT/ACT scores. Differences in parental income distributions across colleges could be eliminated by giving low and middle-income students a sliding-scale preference in the application and admissions process similar to that implicitly given to legacy students at elite private colleges. Assuming that 80% of observational differences in students’ earnings conditional on test scores, race, and parental income are due to colleges’ causal effects – a strong assumption, but one consistent with prior work – such changes could reduce intergenerational income persistence among college students by about 25%. We conclude that changing how students are allocated to colleges could substantially reduce segregation and increase intergenerational mobility, even without changing colleges’ educational programs. ∗The opinions expressed in this paper are those of the authors alone and do not necessarily reflect the views of the Internal Revenue Service, the U.S. Treasury Department, the Federal Reserve Board of Governors, or the College Board. This paper combines results reported in two NBER working papers: “Mobility Report Cards: The Role of Colleges in Intergenerational Mobility” and “The Determinants of Income Segregation and Intergenerational Mobility Across Colleges: Using Test Scores to Measure Undermatching.” This work was conducted under IRS contract TIRNO-16-E-00013 and reviewed by the Office of Tax Analysis at the U.S. Treasury. We thank Joseph Altonji, David Deming, Eric Hanushek, Jess Howell, Michael Hurwitz, Lawrence Katz, David Lee, Richard Levin, Sean Reardon, anonymous referees, and numerous seminar participants for helpful comments; Trevor Bakker, Kaveh Danesh, Katie Donnelly Moran, Niklas Flamang, Robert Fluegge, Jamie Fogel, Benjamin Goldman, Clancy Green, Sam Karlin, Carl McPherson, Daniel Reuter, Benjamin Scuderi, Priyanka Shende, Jesse Silbert, Mandie Wahlers, and our other pre-doctoral fellows for outstanding research assistance; and especially Adam Looney for supporting this project. Chetty, Friedman, Saez, and Yagan acknowledge funding from the Russell Sage Foundation, the Bill & Melinda Gates Foundation, the Robert Wood Johnson Foundation, the Schmidt Futures Foundation, the Center for Equitable Growth at UC-Berkeley, the Washington Center for Equitable Growth, the UC Davis Center for Poverty Research, the Alfred P. Sloan Foundation, the Laura and John Arnold Foundation, the Chan-Zuckerberg Initiative, the Overdeck Foundation, and Bloomberg Philanthropies.


I Introduction
How does the higher education system shape intergenerational income mobility in the United States? Many view college as a pathway to upward income mobility, but if children from higherincome families attend better colleges on average, the higher education system as a whole may not promote mobility and could even amplify the persistence of income across generations.
In this paper, we analyze how changes in the colleges that students attend could affect segregation across colleges by parental income and rates of intergenerational mobility in the U.S. 1 To do so, we first estimate three sets of parameters: (1) parental income distributions by college, (2) students' earnings outcomes conditional on parent income by college, and (3) the portion of the variation in students' earnings outcomes that is due to colleges' causal effects. We construct publicly available statistics on the first two elements using data on all college students in the U.S. from 1999-2013. We then combine these statistics with data on SAT and ACT scores and estimates of colleges' causal effects consistent with the prior literature to simulate how changes in the allocation of students to colleges affect income segregation and intergenerational mobility.
We use a de-identified dataset constructed by linking data from federal income tax returns, the Department of Education, the College Board, and ACT to obtain information on the colleges that students attend, their earnings in their early thirties, their parents' household incomes, and their SAT/ACT scores. 2 In our baseline analysis, we focus on children born between 1980 and 1982the oldest children whom we can reliably link to parents -and assign children to colleges based on the college they attend most frequently between the ages of 19 and 22.
We divide our analysis into three parts. First, we estimate parental income distributions by college to characterize the degree of income segregation across colleges. Among "Ivy-Plus" colleges (the eight Ivy League colleges plus Duke, MIT, Stanford, and the University of Chicago), more students come from families in the top 1% (annual family income above $532,000 in 2015 dollars) of the income distribution (14.5%) than the bottom half of the income distribution (13.5%). Only 3.8% of students come from the bottom quintile of the income distribution (families with annual incomes below $25,000 in 2015 dollars) at Ivy-Plus colleges. As a result, children from families in the top 1% are 77 times more likely to attend an Ivy-Plus college compared to the children from 1 An alternative approach to amplifying the impacts of the higher education system on intergenerational mobility is to increase colleges' value-added for low-income students through changes in their educational programs. Our goal here is to assess how far one may get through feasible changes in the allocation of students to colleges, holding their value-added fixed. 2 We measure children's earnings between the ages of 32 and 34; we show that children's percentile ranks in the earnings distribution stabilize by age 32 at all types of colleges. families in the bottom quintile. By contrast, 14.6% of students at community colleges are from families in the bottom quintile, and only 0.5% are from the top 1%. We find substantial segregation by parental income not just across selectivity tiers, but also across colleges within the same tier: two-thirds of the variation in bottom-quintile shares is within college quality tiers.
The degree of income segregation across colleges is as large as the degree of segregation across the neighborhoods in which children grow up. For example, among children with parents in the bottom quintile, 11.8% of their college peers come from the top quintile, while 11.5% of their peers in the ZIP code where they lived before college come from the top quintile. At the other end of the spectrum, students from high-income families at Ivy-Plus colleges have fewer low income peers in college than in their childhood neighborhoods. Colleges remain highly segregated even when we adjust for geographic differences in the distribution of parent income shares, as in Hoxby and Turner (2019). These findings suggest that efforts to increase interaction across socioeconomic groups may be just as valuable at the college level as they are at the neighborhood level (and may actually be somewhat easier to implement as there is an admissions process for many colleges, unlike neighborhoods).
In the second part of the paper, we examine the earnings outcomes of students who attend each college, conditional on parental income. In the nation as a whole, children from the highest-income families end up 29 percentiles higher in the earnings distribution on average than those from the lowest-income families. Controlling for college fixed effects, the gap between students from the highest-and lowest-income families falls to 11 percentiles, 38% of the national gradient. Hence, much of the gap in outcomes between children from low-vs. high-income families can be explained by differences between rather than within colleges, raising the possibility that reallocating students across colleges could increase intergenerational mobility substantially.
Children from high-income families tend to segregate into colleges at which students from all parent income levels have high average earnings outcomes: the (enrollment-weighted) cross-college correlation between mean parent income rank and mean student earnings rank of bottom-quintile students is 0.70. However, some colleges buck this pattern and have both a large share of students from low-income families and relatively good earnings outcomes, resulting in a high "mobility rate" of students from the bottom to the top of the income distribution. Examples of such high-mobilityrate colleges include mid-tier public institutions such as the City University of New York (CUNY), certain campuses of the California State University system, and several campuses in the University of Texas system. The colleges that have the highest mobility rates must either be particularly good at enrolling low-income students with high earnings potential or at adding substantial value for students from low-income families. In either case, they are an interesting set of institutions to study in future work for those interested in reducing income segregation or increasing mobility more broadly. These colleges do not differ substantially from other colleges on institutional characteristics like publicversus-private status, instructional expenditures, or endowments. This similarity in observable characteristics between high and low mobility colleges turns out not to hold if we focus on upper tail mobility -the fraction of students who come from bottom-quintile families and reach the top 1% of the earnings distribution (earnings > $182,000 at ages 32-34). The highest upper-tail mobility rates are concentrated at highly selective private colleges with large endowments, such as Ivy-Plus colleges.
In the third part of the paper, we simulate how income segregation across colleges and intergenerational mobility would change if students were allocated to colleges differently. We begin by evaluating the extent to which differences in parental income distributions across colleges can be explained by differences in academic preparation before students apply to college, as proxied for by SAT or ACT scores. 3 We find that at any given level of SAT/ACT scores, children from higher-income families attend more selective colleges, suggesting that low-and middle-income students "undermatch" to colleges (Bowen, Chingos and McPherson 2009). To quantify the degree of undermatching, we construct an "income-neutral" student allocation process, in which we fill each college's slot for a current student who has test score s with a random draw from the population of college students with test score s who come from the same state and are of the same race. In this scenario, colleges continue to enroll students based on both academic and non-academic credentials but eliminate variation in enrollment rates by parental income -whether due to differences in application, admissions, or matriculation -among students with comparable academic credentials, preserving the racial and geographic composition and the total size of each college. This counterfactual thus provides a natural benchmark to gauge the extent to which student bodies are representative of the underlying population of academically qualified students. 4 Income segregation across colleges would fall significantly if students enrolled at colleges in an income-neutral manner conditional on their test scores. The degree of under-representation of students from the bottom parental income quintile at selective (Barron's Tier 6 or higher) colleges would fall by 38% relative to a benchmark in which all colleges have the same fraction of bottomquintile students as in the current population of college-goers. This is because top-quintile students are currently 34% more likely to attend selective colleges than their bottom-quintile peers with the same test scores. The income-neutral allocation would also increase the representation of middleincome students (the second, third, and fourth quintiles) at selective colleges substantially.
The picture is somewhat different at the most selective elite private (Ivy-Plus) colleges. There, the fraction of students from the middle class (the second, third, and fourth quintiles) would rise substantially, from 28% to 38%, under income-neutral allocations. But, there would be little absolute change (from 3.8% to 4.4%) in the fraction of students from the bottom income quintile, reducing under-representation relative to the benchmark in which all colleges have the same fraction of bottom-quintile students by only 9%. These findings show that it is in fact middle-income students who attend Ivy-plus colleges at the lowest rates, conditional on test scores -what many have referred to as the "missing middle" at elite private colleges. 5 Our results imply much less undermatching of high-achieving low-income students at such colleges than found by Hoxby and Avery (2013) because there are few children from low-income families who have sufficiently high SAT/ACT scores. For instance, only 3.7% of children who score above a 1300 on the SAT come from families in the bottom income quintile. 6 High-scoring students from low-income families are scarce in substantial part because of disparities in schools, neighborhoods, and other environmental factors that cumulate since birth (Heckman andKrueger 2005, Fryer andLevitt 2013, Chetty and Hendren 2018, Reardon 2019). These pre-college disparities limit the scope to increase the number of students from the lowest-income families at elite colleges purely by recruiting more applications.
Further increasing the fraction of low-income students at selective colleges would require policies that induce low-income students to attend highly selective colleges at higher rates than higherincome students with currently comparable SAT scores. If low-income (bottom quintile) students factors unrelated to test scores in practice, we believe this counterfactual provides a more plausible benchmark for understanding the extent to which differences in test scores can explain income segregation across colleges. 5 The term "missing middle" has been used to describe the relative under-representation of middle-class students at elite private institutions since at least Todd (1976). More recently, Caroline Hoxby and Sarah Turner document results consistent with these findings, as reported in Rampell (2019). 6 We find many fewer high-achieving students from low-income families than that estimated by Hoxby and Avery. This difference arises because we measure parental income at the individual level rather than using geographic imputations and because of differences in the thresholds used to define quantiles of the income distribution; see Section V.A for details.
attended colleges comparable to high-income (top quintile) students with 160 point higher SAT scores, the higher education system would be fully desegregated, in the sense that parental income distributions would be very similar across all colleges. 7 To benchmark the magnitude of this change, a 160-point SAT increment would be equivalent to increasing Ivy-plus attendance rates from 7.3% to 25.8% for low-income students with an SAT score of 1400. This increment is very similar in magnitude to the implicit preference in admissions given to various preferred groups, such as legacy students, recruited athletes, and underrepresented minorities, at elite colleges, who are admitted at substantially higher rates than other students with similar qualifications (Espenshade, Chung andWalling 2004, Arcidiacono, Kinsler andRansom 2019). 8 How would such changes in segregation affect intergenerational mobility? To answer this question, we need an estimate of the fraction of the earnings premium at each college (conditional on parental income, race, and SAT/ACT scores) that is due to the causal effect of attending that college. Naturally, our simulated impacts on intergenerational mobility are highly sensitive to this parameter: if differences in earnings across colleges are driven purely by selection rather than causal effects, reallocating students across colleges would have no impact on mobility. To gauge what fraction of the difference in earnings across colleges is due to causal effects, we regress students' earnings on our estimates of mean earnings premia (conditional on race, parental income, and test scores), controlling for other observable characteristics such as gender, high-school GPA, and high-school fixed effects. We then follow Dale and Krueger (2002) and additionally control for the set of colleges to which a student applied to capture selection on unobservables. Including such controls yields a coefficient between 0.8-1, suggesting that at least 80% of the difference in earnings premia across colleges (conditional on parental income, race, and test scores) reflects causal effects.
We therefore assume that 80% of the earnings premium at each college is driven by a causal effect in our baseline analysis. We also assume that student reallocations do not change colleges' causal effects, even though the composition of the student body might change substantially.
We measure intergenerational mobility as the difference in the chance that college students from low vs. high income families reach the top earnings quintile, a simple measure of relative 7 Phasing out this increment roughly linearly from 160 SAT points in the bottom quintile down to 0 for the students in the top quintile leads to equal representation of students from all parental income levels across colleges. Note that we use the SAT here simply as a convenient metric to quantify the degree of need-affirmative preference needed to desegregate colleges; in practice, one could implement such policies using a variety of other metrics and approaches. 8 Our results do not speak to the debate about whether standardized tests provide comparable measures of aptitude for students from low vs. high income families. We simply use test scores to quantify the gap between students from low vs. high-income families in end-of-high-school academic qualifications. Whether that gap can be closed through changes in K-12 education, test design or preparation, or the college application or admissions process is a question left for future work. mobility (Chetty et al. 2014). Empirically, this difference is 22 percentage points for children in the 1980-82 birth cohorts. The income-neutral benchmark would narrow the gap by 15%, while need-affirmative admissions would narrow the gap by 27%. These are substantial effects given that children's outcomes in adulthood are shaped by a cumulation of environmental factors from birth until the point they enter the labor market  and most people spend at most 25% of their pre-labor-market years in college. The precise magnitudes that result from these simulations must of course be interpreted with caution because they hinge on strong assumptions, namely about the causal effect of colleges. Nevertheless, they suggest that changing which colleges students attend -i.e., reducing segregation without making any efforts to increase colleges' valueadded or reduce disparities that emerge before students apply to college -could increase economic mobility substantially.
Related Literature. The three parts of our analysis reconcile conflicting findings in prior work.
First, several papers have studied income segregation in higher education by selectivity tier or at selected colleges (e.g., Avery et al. 2006, Goodman 2008, Deming and Dynarski 2010, Hoxby and Turner 2013, Marx and Turner 2015, Andrews, Imberman and Lovenheim 2016, Manoli and Turner 2018. These studies find a wide range of estimates using small samples; for instance, the estimated fraction of students from bottom-quartile families at elite colleges ranges from 3% (Carnevale and Strohl 2010) to 11% (Bowen, Kurzweil and Tobin 2006, Chapter 7) across studies.
Our new statistics provide more definitive estimates of the degree of segregation across college tiers, shed light on segregation across colleges within selectivity tiers, and offer the first statistics on top-income shares by college.
Second, a smaller literature has measured the returns to attending certain colleges using quasiexperimental methods (e.g., Black and Smith 2004, Hoekstra 2009, Hastings, Neilson and Zimmerman 2013, Zimmerman 2014, Kirkeboen, Leuven and Mogstad 2016, Cellini and Turner 2019. Our analysis complements these studies by providing information on earnings distributions for all colleges. These data allow us to characterize how students' earnings distributions vary with parental income within each college and identify "outlier" colleges in terms of students' outcomes whose admissions policies or educational practices could be studied in future quasi-experimental work. Finally, our counterfactual analysis follows prior work examining how alternative admissions rules would affect the composition of colleges by selectivity tier (e.g., Arcidiacono 2005, Bowen, Kurzweil and Tobin 2006, Epple, Romano and Sieg 2006, Krueger, Rothstein and Turner 2006, Howell 2010. This work has again reached conflicting conclusions on the degree of undermatching and the consequences of alternative admissions regimes (Carnevale and Rose 2004, Hill and Winston 2006, Carnevale and Strohl 2010, Bastedo and Jaquette 2011, Hoxby and Avery 2013. In addition to reconciling these findings, we contribute to this literature by (1) analyzing counterfactuals across all colleges rather than by college tier, which proves to be quantitatively important and (2) showing impacts not just on the composition of the student body but on rates of intergenerational mobility.
The paper is organized as follows. Section II describes the data. Section III presents results on parent income segregation. Section IV examines students' earnings outcomes. Section V presents results on the relationship between SAT/ACT scores and parent income (undermatching) and discusses the counterfactual simulations. Section VI concludes. College-level statistics and replication code can be downloaded from the project website.

II Data
In this section, we describe how we construct our analysis sample, define the key variables we use in our analysis, and present summary statistics.

II.A Sample Definition
Our primary sample of children consists of all individuals in the U.S. who (1) have a valid Social Security Number (SSN) or Individual Taxpayer Identification Number (ITIN), (2) were born between 1980-1991, and (3) can be linked to parents with non-negative income in the tax data (see Online Appendix A for more details). 9 There are approximately 48.1 million people in this sample.
We identify a child's parents as the most recent tax filers to claim the child as a child dependent during the period when the child is 12-17 years old. If the child is claimed by a single filer, the child is defined as having a single parent. We assign each child a parent (or parents) permanently using this algorithm, regardless of any changes in parents' marital status or dependent claiming.
Children who are never claimed as dependents on a tax return cannot be linked to their parents and are excluded from our analysis. However, almost all parents file a tax return at some point when their child is between ages 12-17, either because their incomes lie above the filing threshold or because they are eligible for a tax refund (Cilke 1998). Thus, the number of children for whom we 9 Because we limit the sample to children who can be linked to parents in the U.S. (based on dependent claiming on tax returns), our sample excludes college students from foreign countries. We limit the sample to parents with nonnegative income (averaged over five years as described below in Section II.C) because parents with negative income typically have large business losses, which are a proxy for having significant wealth despite the negative reported income. The non-negative income restriction excludes 0.95% of children.
identify parents exceeds 98% of children born in the U.S. between 1980 and 1991 (Online Appendix

II.B College Attendance
Data Sources. We obtain information on college attendance from two administrative data sources: federal tax records and Department of Education records spanning 1999-2013. 11 We identify students attending each college in the administrative records primarily using Form 1098-T, an information return filed by colleges on behalf of each of their students to report tuition payments.
All institutions qualifying for federal financial aid under Title IV of the Higher Education Act of 1965 must file a 1098-T form in each calendar year for any student that pays tuition. Because the 1098-T data do not always cover students who pay no tuition-who are typically low-income students receiving financial aid-we supplement the 1098-T data with Pell grant records from the Department of Education's National Student Loan Data System (NSLDS). See Online Appendix B for details on these two data sources and how we assign students to colleges.
Because neither of our data sources relies on voluntary reporting or tax filing, our data provide a near-complete roster of college attendance at all Title IV accredited institutions of higher education in the U.S. Aggregate college enrollment counts in our data are well aligned with aggregate enrollments from the Current Population Survey and college-specific enrollment counts from IPEDS (Online Appendix Table I, Online Appendix B). 12 Definition of College Attendance. Our goal is to construct statistics for the set of degree-seeking undergraduate students at each college. Since we cannot directly separate degree seekers from other students (summer school students, extension school students, etc.) in our data, we proceed in two steps in our baseline definition of college attendance. First, we define a student as attending a given college in a given calendar year if she appears in either the 1098-T or NSLDS data. We then assign each student the college she attends for the most years over the four calendar years in which she turns 19, 20, 21, and 22. If a student attends two or more colleges for the same number of years 10 The fraction of children linked to parents drops sharply prior to the 1980 birth cohort because our data begins in 1996 and many children begin to the leave the household starting at age 17 (Chetty et al. 2014). Hence, the 1980 birth cohort is the earliest cohort we analyze.
11 Information on college attendance is not available in tax records prior to 1999, and the latest complete information on attendance available from the Department of Education at the point of this analysis was for 2013.
12 Students at some multi-campus systems cannot be assigned to a specific campus and therefore are aggregated into a single cluster. There are 85 such clusters, comprising 17.5% of students and 3.9% of colleges in our data. Separately, 1.8% of student-year observations are assigned to a "colleges with incomplete or insufficient data" category due to incomplete 1098-T data.
(which occurs for 9% of children), we define the student's college as the first college she attended. 13 Since we do not observe degree completion, students who do not graduate are included in all of the statistics we report.
To evaluate the robustness of our results, we also consider two alternative attendance measures: age 20 college (the college a student attends in the calendar year that she turns 20) and firstattended college (the college a student attends first between the calendar years in which she turns 19 and 28).

II.C Incomes
We obtain data on children's and parents' incomes from federal income tax records spanning 1996-2014. We use data from both income tax returns (1040 forms) and third-party information returns (e.g., W-2 forms), which contain information on the earnings of those who do not file tax returns.
We measure income in 2015 dollars, adjusting for inflation using the consumer price index (CPI-U).
Parent Income. We measure parent income as total pre-tax income at the household level. In years where a parent files a tax return, we define family income as Adjusted Gross Income (as reported on the 1040 tax return). This income measure includes both labor earnings and capital income. In years where a parent does not file a tax return, we define family income as the sum of wage earnings (reported on form W-2) and unemployment benefits (reported on form 1099-G).
In years where parents have no tax return and no information returns, family income is coded as zero. Importantly, the income distribution in the tax data is very similar to that in the American Community Survey (ACS) when one uses the same income definitions (Online Appendix C, Online Appendix Table II).
We average parents' family income over the five years when the child is aged 15-19 to smooth transitory fluctuations (Solon 1992) and obtain a measure of resources available at the time when most college attendance decisions are made. 14 We then assign parents income percentiles by ranking 13 If the student attended multiple "most attended" colleges in the first year, which occurs for 1.6% of students, then a college is chosen at random from that set.
14 Following Chetty et al. (2014), we define mean family income as the mother's family income plus the father's family income in each year from 1996 to 2000 divided by 10 (or divided by 5 if we only identify a single parent). For parents who do not change marital status, this is simply mean family income over the 5 year period. For parents who are married initially and then divorce, this measure tracks the mean family incomes of the two divorced parents over time. For parents who are single initially and then get married, this measure tracks individual income prior to marriage and total family income (including the new spouse's income) after marriage. We exclude years in which a parent does not file when computing mean parent income prior to 1999 because information returns are available starting only in 1999. We measure children's incomes in 2014 -the most recent year in which we observe earnings -to minimize the degree of lifecycle bias that arises from measuring children's earnings at too early an age. We assign children income percentiles by ranking them based on their individual earnings relative to other children in the same birth cohort. We show in Online Appendix D that the earnings ranks of children in our analysis sample stabilize by 2014.
We also consider two alternative measures of child income in sensitivity analyses: household income, defined in the same way as parents' household income, and household earnings, the sum of individual earnings (defined as above) for the child and his or her spouse. Household income includes capital income, whereas household earnings does not.

II.D Pre-College Neighborhoods
To measure segregation across neighborhoods, we assign the students in our sample a childhood neighborhood (ZIP code) as follows. We first identify the primary tax filer on the 1040 that claimed the child when assigning the child to parents. We then assign each child to the ZIP code on the primary filer's 1040 income tax return in the year when the child was age 17 or, if the primary filer did not file a tax return that year, to the most common ZIP code across the primary filer's information returns (e.g., W-2 forms) that year. If no ZIP code was found in the year when the child was age 17, we search for the primary filer's ZIP code when the child was age 16, then 18, then 15, then 19, then 14, then 20 until a ZIP code is found. Over 99.9% of children are assigned ZIP codes using this algorithm; the remaining children are grouped into a separate ZIP code.

II.E Test Scores and Race
We obtained records from the College Board and ACT on standardized college entrance exam scores and race/ethnicity for children in our analysis sample. Our data cover high school graduating cohorts 1996cohorts -2004cohorts for SAT and 1995cohorts -2007 for ACT.
We focus on individuals' SAT composite score (ranging from 400 to 1600), defined as the mathematics score plus the critical reading score, and the composite ACT score (ranging from 1 to 36). We map ACT scores into equivalent SAT scores using existing concordance tables, we prioritize the SAT if it is available, and we use an individual's maximum composite score if she has taken multiple of the same tests (see Online Appendix E for details). We use five race/ethnicity categories (referred to hereafter as race): Black, Asian, non-Hispanic white, Hispanic, and other.
SAT/ACT coverage rates (and therefore race coverage rates) are very high at selective colleges where standardized tests are typically required for admission; for instance, we observe a score for 98.5% of Ivy-Plus attendees. We use SAT/ACT scores and race primarily in our counterfactual analysis in Section V. 15 In that section, we describe and validate a procedure to impute SAT/ACT scores and race for the 26.2% of students for whom we do not observe a test score and race.

II.F College-Level Statistics
We construct publicly available college-level statistics on children's and parents' income distributions using data for children in the 1980-82 birth cohorts. 16 These children's incomes can be measured at age 32 or older in 2014, the age at which children's income ranks stabilize at all colleges (Online Appendix D).
To construct college-level statistics, we first exclude colleges that have fewer than 100 students on average across the 1980-1991 birth cohorts (in years where we have data for that college), all college-cohort observations with fewer than 50 students, and college-cohort observations that have incomplete data for two or more of the four years when students are aged 19-22. These colleges are added to a separate "colleges with incomplete or insufficient data" group. We then construct enrollment-weighted means by college of each statistic for the 1980-82 cohorts, imputing values from the 1983-84 cohorts for any missing college-by-cohort observations in the 1980-82 sample (see 15 Due to confidentiality restrictions governing the test score data, we are unable to disclose statistics that make use of test score data and/or race data by college and hence cannot report estimates of earnings conditional on test scores, race, or other related measures in this study. 16 We focus on the 1980-82 birth cohorts in this paper, but also provide longitudinal statistics by college for the 1980-1991 birth cohorts in our Online Data Tables. Our statistics expand upon those released in the U.S. Department of Education's College Scorecard (2015) by including all students (not just those receiving federal student aid) and fully characterizing the joint distribution of parent and child income.
Online Appendix B for details). There are 2,199 colleges for which we release statistics, of which 397 use data exclusively from the 1983-84 cohorts. We report blurred statistics for each college rather than exact values following established disclosure standards (see Online Appendix F); the blurred estimates are generally very accurate and using the exact values yields virtually identical results.
For certain analyses, we report statistics for groups of colleges rather than individual colleges. 17 We classify colleges as "4-year" or "2-year" based on the highest degree they offer using IPEDS data. 18 Following prior work (e.g., Deming et al. 2015), we use data from the Barron

III Parental Income Segregation Across Colleges
In this section, we construct statistics on parents' income at each college. This is the first of the three key factors that matter for the role of colleges in intergenerational mobility. Simply put, if a given college has very few children from low-income families, it cannot be helping move children up the income ladder. Understanding the extent of income segregation across the spectrum of colleges is therefore a key first step in assessing how the higher education system affects intergenerational mobility. Moreover, the degree of income segregation is of interest in its own right given growing concerns about the political and social consequences of segregation.

III.A Baseline Statistics
We begin by analyzing parental income distributions across colleges using our analysis sample (the 1980-82 birth cohorts).
As a reference, Figure Ia   This highly skewed parental income distribution is representative of other elite private colleges. Figure Ic shows the distribution of parent income at the twelve Ivy-Plus colleges (the Ivy League plus Stanford, MIT, Chicago, and Duke). Each of the 100 dots represents the fraction of students at those colleges with parents in a specific income percentile. There are more students who come from families in the top one percent (14.5%) than the bottom half of the parent income distribution (13.5%). Only 3.8% of students at these colleges come from families in the bottom quintile, implying that children from families in the top 1% are 77 times more likely to attend an Ivy-Plus college than children from the bottom quintile. This degree of income concentration at elite colleges is substantially greater than that implied by their internal data (Bowen, Kurzweil and Tobin 2006, Chapter 7).
Returning to Figure Ib, now consider UC-Berkeley. A smaller share of students at Berkeley, one of the most selective public colleges in the U.S., are from high-income families than at Harvard.
As parental income falls, the likelihood that a child attends Berkeley rather than Harvard rises monotonically. This finding is representative of a more general fact: students from the lowestincome families are less likely to attend the nation's most selective private colleges than its most selective public colleges. Since students from the lowest-income families pay very little tuition to attend elite private colleges, this result suggests that tuition costs are not the primary explanation for the under-representation of low-and middle-income students at elite private colleges.
Even at Berkeley, more than 50% of students come from the top quintile, as compared with only 8.8% from the bottom quintile. The other colleges in Figure Ib have many more students from lowincome families. SUNY-Stony Brook, a public second-tier (between rank 78 and 176) institution according to the Barron's rankings, has a much more even distribution of parental incomes, though there are still significantly more students from the top quintile (30.1%) than the bottom quintile (16.4%). Glendale Community College has a monotonically declining fraction of students across the income quintiles, with 32.4% of students coming from the bottom quintile and only 13.6% from the top quintile.
19 These percentile cutoffs are computed using the household income distribution for parents of children in the 1980 birth cohort when their children were between the ages of 15-19.
These four examples are more broadly illustrative of the large differences in parental income distributions across colleges with different levels of selectivity. We present statistics on the parental income distribution (and other key statistics analyzed in the following sections) by college tier in Table II. 20 We classify colleges into twelve tiers based on their selectivity (as defined by Barron's 2009 Index; see Section II.F for details), public vs. private status, and whether they offer two-year vs. four-year degrees. The fraction of students from families in the bottom quintile rises as one moves down selectivity tiers, ranging from 3.8% at Ivy-Plus colleges to 7.1% at "Selective Private" colleges to 21% at for-profit colleges. Conversely, the fraction of students coming from the top 1% falls from 14.5% to 2.4% and 0.4% across these tiers.
Our estimates of the degree of income segregation across selectivity tiers are broadly aligned with estimates using Department of Education survey data Strohl 2010, Bastedo andJaquette 2011). However, our college-level data reveal that there is considerable segregation by parental income even across colleges within these tiers. Regressing bottom-quintile parental income shares on tier fixed effects, we find that 66.8% of the variation in bottom-quintile shares lies within tiers. For example, within the Selective Public tier, the fraction of students from the bottom quintile ranges from 3.7% at the 10th percentile to 15.3% at the 90th percentile of colleges (enrollment-weighted). Hence, studies that analyze differences across tiers significantly understate the degree of income segregation in the higher education system.
The analysis above focuses exclusively on students who attend college before age 22. Children from low-income families tend to attend college at later ages than children from higher-income families (Online Appendix Figure I). To evaluate whether these differences in age of attendance affect our estimates, we reconstruct all of the statistics above defining college attendance based on the first college a child attends up through age 28. As an additional robustness check, we also construct estimates based on the college that students attend at age 20. We find very similar estimates of parental income distributions using these alternative definitions of college attendance, with correlations of 0.99 of the bottom-quintile share across colleges using the three measures (Online Appendix Table IV). More generally, none of the results reported below is sensitive to the way in which we assign students to colleges.

III.B Comparison to Pre-College Neighborhood Segregation
There is much interest and discussion about ways to foster greater interaction across class lines (e.g., Putnam 2016). Most such efforts focus on reducing residential segregation across neighborhoods.
Here, we explore how the degree of segregation across colleges compares to the degree of segregation across neighborhoods. The goal of this analysis is to provide information that may be useful in targeting policies: if colleges are as segregated as neighborhoods, it might be valuable to devote as much attention to reducing segregation in the higher education system as across neighborhoods.
We focus on answering the following simple question: when students get to college, do they find themselves with a more diverse peer group in terms of parental income than in the neighborhood in which they grew up? We measure segregation using exposure indices, asking what fraction of a child's peers in their childhood neighborhood or college come from parent quintile q, conditional on their own parents' income quintile. The degree of residential segregation depends upon the geographic unit one uses: larger geographic units will generally yield smaller estimates of segregation. To discipline our comparisons, we look for a tractable geographic unit whose size (in terms of number of people) is similar to the size of colleges. ZIP codes are a convenient unit that satisfy this property: the average number of children in a ZIP code is 1,860, as compared with an average of 2,351 students per college. 21 We therefore define an individual's childhood neighborhood as the ZIP code in which she or he was claimed as a dependent before attending college (see Section II.D for details). When measuring segregation across colleges, we treat those who do not attend any college as if they all attended a single distinct college.  Table V).
We reach similar conclusions when examining segregation within specific subsets of colleges. For example, Figure IIc replicates Figure IIb for the subset of students who attend Ivy-plus colleges (similar statistics are reported separately for each college in our Online Data Tables). We saw above that most students at Ivy-plus colleges come from very affluent families; Figure IIc shows two additional results about the backgrounds of students at these colleges.
First, comparing Figure IIc with Figure IIb, we see that children who attend Ivy-plus colleges tend to grow up in areas with a larger fraction of high-income peers than the average child, controlling for their own parents' incomes. For example, among children with parents in the top quintile, 34.5% of childhood peers come from the top quintile on average, compared with 48.5% for those who went on to attend Ivy-plus colleges. This pattern is consistent with Chetty et al.'s (2018) finding that children growing up in more affluent neighborhoods tend to have better outcomes on average. 22 Second, Figure IIc shows that even though children from high-income families who attend Ivy-Plus colleges grow up in especially segregated neighborhoods, they are even less exposed to low-income peers in college. For example, among those with parents in the top quintile, 68.7% of their college peers are from the top quintile as well -higher than the 48.5% rate in their childhood neighborhoods.
Naturally, when we examine children from low-income families who attend elite colleges, we see the opposite pattern: these children are much more exposed to higher income peers in college than in their childhood neighborhoods, because Ivy-plus colleges predominantly have students from high-income families (Online Appendix Table VI). This pattern holds more generally when we focus on all college students. Excluding those who do not attend college, for children with parents in the bottom quintile, 13.7% of their childhood peers are from the top quintile, compared with 22.5% of their college peers (Online Appendix Table VII). This is again because college attendance rates rise sharply with parental income, as shown in Figure Ia. Since most college goers are from higherincome families, low-income children who go to college must be more exposed to higher-income peers in college than in their childhood neighborhoods.
In short, college leads to greater exposure to higher-income peers for the relatively few children from low-income families who attend college, especially elite colleges. For children from high-income families, we see less exposure to low-income peers in college than in childhood. Overall, pooling 22 Another possibility is that household-level incomes are mismeasured, giving neighborhood-level measures of income more predictive power. Our baseline estimates average parental incomes over a 5 year period to capture permanent incomes; we find that using even longer time averages generally does not affect the results appreciably. all children -including those who do not attend college -we find that on average, children are exposed to the same types of peer groups at age 20 as they are in their childhood neighborhoods.
The similarity between our measures of segregation across colleges and residential segregation could partly be due to the fact that many colleges draw from a local pool of students, as most students stay at or near their childhood home when attending college. Put differently, parental income distributions across colleges could differ simply because of differences in local income distributions rather than differences in admissions or application policies. To assess the importance of this issue, we follow Hoxby and Turner (2019) and construct an alternative set of "locally normed" statistics that adjust for differences in the income distribution of the pool of students applying to each college. We assume that private elite colleges (i.e., private colleges in the top two selectivity tiers) draw students from a nationwide pool, the remaining selective colleges (i.e., private colleges in the top two tiers and all colleges in the next four tiers) draw students from a state-specific pool, and unselective colleges (i.e., tiers 7-12) draw students from their local Commuting Zone. 23 We construct locally normed measures by first dividing each college's parent income quintile shares by the parent income quintile shares of its potential pool of students. For each college, we then divide these five values by the sum of the five values so that the final normed shares sum to 1. The resulting statistics, which are reported by college in our Online Data Tables, can be interpreted as the parental income distributions that would arise at each college if every college had the same (national) pool of potential applicants.
We find that raw bottom-quintile shares are highly correlated with the normed bottom-quintile shares (enrollment-weighted correlation = 0.77). For example, the normed statistics imply that 14.9% of the college peers of children from families in the bottom quintile come from the bottom quintile themselves (Online Appendix Table VIIc), very similar to the 15.7% estimate based on the raw statistics in Online Appendix Table VIIb. 24 Hence, most of the parental income segregation across colleges in the U.S. is not driven by differences in the state-or CZ-wide pools from which they draw. Intuitively, there is much greater income heterogeneity within most CZs than between CZs, implying that the sharp differences in parental income distributions across colleges cannot be driven purely by cross-CZ income differences. 25 23 Commuting Zones (CZs) are aggregations of counties that approximate local labor markets and collectively span the entire United States. 24 We focus on segregation measures within the subset of college students here because the normed statistics are ill-defined for students who do not attend college.
25 Some of the differences across colleges -especially unselective colleges -may be due to more local income differences within CZs, as students at less selective colleges tend to come from nearby neighborhoods.

IV Students' Earnings Outcomes
In this section, we study children's earnings outcomes (conditional on parental income) at each college, the second of the three key factors that matter for the role of colleges in intergenerational mobility. We begin by examining the intergenerational persistence of income within colleges and then analyze how students' earnings outcomes and rates of intergenerational mobility vary across colleges.

IV.A Heterogeneity in Earnings Outcomes within Colleges
As a reference, the series in circles in Figure  In this subsection, we analyze how much of the unconditional gradient in Figure IIIa can be explained (in an accounting sense) by the colleges that children attend. Answering this question -along with parent income segregation and value-added estimates -is useful for understanding the role of higher education in intergenerational mobility. If the degree of intergenerational income persistence within colleges were the same as in the population as a whole, reallocating students across colleges would not affect mobility. If on the other hand children from low-and high-income families who attend the same college have similar earnings outcomes, changes in the colleges that student attend could potentially have larger effects on mobility. 28 Empirically, we find that the rank-rank relationship is much flatter within colleges than in the nation as a whole. To illustrate, Figure IIIb shows the rank-rank relationship among students at three of the colleges examined above in Figure Ib: UC-Berkeley, SUNY-Stony Brook, and Glendale Community College. 29 To increase precision, we plot the mean rank of children in each college by parent ventile (5 pp bins) rather than percentile. The rank-rank slopes at each of these colleges, estimated using OLS regressions on the plotted points, are less than or equal to 0.06, one-fifth as large as the national slope of 0.29. Figure IIIc shows that this result holds more generally across all colleges. It plots the relationship between children's ranks and parents' ranks conditional on which college a child attends for colleges in three tiers: elite four-year (Barron's Tier 1), all other four-year, and two-year colleges (see Table II for estimates for each of the twelve tiers). To construct each series in this figure, we restrict the sample to those who attended a given college tier and then regress children's ranks on parent ventile indicators and college fixed effects and plot the coefficients on the twenty ventile indicators. The slopes are estimated using OLS regressions of children's ranks on their parents' ranks in the microdata, with college fixed effects. Among elite colleges, the average rank-rank slope is 0.065 on average within each college. The average slope is higher for colleges in lower tiers-0.095 for other four-year colleges and 0.11 for two-year colleges-but is still only one-third as large as the national rank-rank slope. 30 The steeper slope could potentially arise because colleges in lower tiers are less selective and hence admit a broader spectrum of students in terms of abilities or because there is substantial heterogeneity in completion rates at lower-tier colleges, which may correlate with parent income.
Children from low-and high-income families at a given college not only have relatively similar mean rank outcomes but also a relatively similar distribution of earnings outcomes across all percentiles. Online Appendix Figure  Sensitivity Analysis. In Table III, we explore the robustness of these results using alternative income definitions and subsamples. Each cell of the table reports an estimate from a separate 29 We omit Harvard from this figure because the very small fraction of low-income students at Harvard makes estimates of the conditional rank for children from low-income families very noisy; the estimated rank-rank slope for Harvard is 0.112 (s.e. = 0.018). For the same reason, we combine the Ivy-Plus category with other elite colleges in Figure IIIc below.
30 These findings are consistent with prior research using survey data showing that the association between children's and parents' incomes or occupational status is much weaker among college graduates (Hout 1988;Torche 2011). Our data show that conditioning on the specific college a child attends further reduces the correlation between children's and parents' incomes, and that this holds true even at elite colleges, where concerns about mismatch of low-income students are most acute. regression of children's outcomes on parents' ranks, with standard errors reported in parentheses.
The first column of the table reports estimates from the baseline specification discussed above.
The first row replicates the slope reported in Figure IIIa, the unconditional rank-rank slope pooling all children. The next row adds college fixed effects, including those who did not attend a college in a separate "no college" category. Including college fixed effects reduces the 0.288 unconditional slope by half to 0.139, as shown in the series in triangles in Figure IIIa.
The third row shows that controlling additionally for SAT/ACT scores (interacted with the college fixed effects) does not change the relationship between parent and child income within colleges significantly. The series in squares in Figure IIIa shows this result graphically. Hence, the differences in outcomes between children from low-and high-income families who attend the same college are not explained by differences in academic ability or preparation, as proxied for by test scores at the point of college application.
If we restrict the sample to those who attend college, the rank-rank slope with college fixed effects falls further to 0.10, as shown in row 4, because the rank-rank slope is larger for students who do not attend college. The remaining rows show that we obtain similar rank-rank slopes within specific college tiers, with flatter slopes at more elite colleges as discussed above.
Columns 2-8 of Table III present variants of the specifications in Column 1 to assess the sensitivity of the preceding conclusions to various factors. Column 2 deflates both parents' and childrens' incomes by local costs of living. This adjustment makes little difference because children tend to reside as adults near where they grew up, so cost-of-living adjustments tend to move parent and child ranks either both up or both down, thereby preserving their correlation.
In Columns 3-5, we assess whether the observed intergenerational persistence of income might be low, especially within elite colleges, because children from high-income families at such colleges choose not to work (e.g., because they marry a high-earning college classmate). In practice, children from high-income families are slightly more likely to work, even within elite colleges, as shown in Column 3, which replaces the childrens' individual earnings rank outcome with an indicator for whether the child works. Even for men, for whom the hours of work margin is likely to be less important, the rank-rank slope is 0.09 within elite colleges, much lower than the national slope of 0.33 (Column 4). These results suggest that differences in labor force participation rates do not mask latent differences in the earnings potentials of children from low-vs. high-income families within elite colleges.
The degree of intergenerational persistence in income is substantially larger when measuring income at the household level (Column 6) than the individual level because children from richer families much more likely to be married, even conditional on college attendance (Column 7). Finally, Column 8 shows that adding capital income to household earnings yields very similar results.

IV.B Heterogeneity in Earnings Outcomes Across Colleges
The relatively small within-college rank-rank slopes estimated above imply that most of the intergenerational persistence of income at the national level must be accounted for by differences in earnings outcomes across the types of colleges that children from low vs. high income families attend. Indeed, we find that children from low-income families tend to segregate into colleges at which students have lower earnings outcomes. The enrollment-weighted correlation between mean parent income rank and mean student earnings rank is 0.78 across colleges. Likewise, the correlation between mean parent income rank and the mean student earnings rank of bottom-quintile students is 0.70.
In light of the importance of between-college heterogeneity in accounting for the intergenerational persistence of income, in this subsection we examine how earnings outcomes and mobility rates vary across colleges in greater detail. We do so by focusing on two statistics: the fraction of students from low-income families and the fraction of such students who reach the top quintile (earnings above $58,000 for children in the 1980 cohort). The product of these two statistics is the college's upward mobility rate, the fraction of its students who come from the bottom quintile (Q1) of the parent income distribution and end up in the top quintile (Q5) of the child earnings distribution: Importantly, mobility rates reflect a combination of selection effects (the types of students admitted) and causal effects (the value-added of colleges). In this subsection, we simply document how mobility rates vary across colleges without distinguishing between these two factors; we separate these two components in Section V below when analyzing counterfactuals.
Figure IVa plots the fraction of low-income students who reach the top quintile (P (Child in Q5 | Parent in Q1)) vs. the fraction of its low income students (P (Parent in Q1)).
Consistent with the findings above, colleges with higher fractions of low-income students tend to have fewer students who reach the top earnings quintile on average. However, because the correlation between fraction low-income and top quintile outcome rate is -0.50 (and not -1), there is still considerable heterogeneity in mobility rates across colleges. To illustrate this heterogeneity, we plot isoquants representing the set of colleges that have mobility rates at the 10th percentile (0.9%), median (1.6%), and 90th percentile (3.5%) of the enrollment-weighted distribution across colleges. This variation is substantial given that the plausible range for mobility rates in the economy as a whole is from 0% (perfect immobility) to 4% (perfect mobility, where children's earnings are independent of their parents' incomes and 4% of children transition from the bottom to top quintile).
Which colleges have the highest mobility rates? Table IVa Table IVa shows that the colleges with the highest mobility rates tend to be mid-tier public colleges that combine moderate top-quintile outcome rates with a large fraction of low-income students. In contrast, the twelve Ivy-Plus colleges, highlighted in large blue circles in Figure IVa, have a mean top-quintile outcome rate of 58%, but mean fraction low-income of 3.8%, leading to a mean mobility rate of 2.2%, slightly above the national median. Flagship public universities such as UC-Berkeley and the University of Michigan-Ann Arbor, highlighted in large red triangles in Figure IVa, have a somewhat higher mean fraction low-income (5.2%) but a considerably lower mean top-quintile outcome rate (33.4%), so that their average mobility rate is lower than that of the Ivy-Plus group. 32 At the other end of the spectrum, the colleges with the lowest mobility rates consist primarily of certain non-selective colleges at which a very small share of students reach the 31 When broken out separately by campus, six of the CUNY campuses are ranked amongst the top 10 colleges in terms of mobility rates. 32 As discussed in Section II.B, in some cases (e.g., the University of Illinois) we cannot separate the flagship campus (Urbana) from other campuses. We exclude such institutions for these calculations. top quintile. For example, several community colleges in North Carolina have top-quintile outcome rates below 4% and mobility rates below 0.5%. Notably, the top-quintile outcome rates at these colleges are below those of children who do not attend college between the ages of 19-22 (4.1%).
There is substantial heterogeneity in mobility rates even among colleges with similar observable characteristics. 98.4% of the variation in mobility rates is within selectivity tiers. To take a specific example, consider the University of California-Los Angeles (UCLA) and the University of Southern California (USC). Both colleges are in Los Angeles, were tied for the #21 U.S. research university in U.S. News and World Report's 2018 rankings, and have 54.6% of their low-income students reach the top earnings quintile. However, UCLA has a 10.2% fraction low-income compared to USC's 7.2% and therefore has a 42% higher mobility rate than USC.
Hoxby and Turner (2019) suggest using locally normed statistics for lists like that in Table IV when comparing colleges' mobility rates to adjust for differences in the pool of students they draw from. We present such normed measures of mobility rates in our Online Data Tables, adjusting parental income distributions as described in Section III.B. 33 These measures paint a broadly similar picture of differences in mobility rates (though the estimates change for certain colleges); for instance, 5 of the 10 highest mobility rate colleges in Table IVa remain in the top 10 using the normed measures. 34 This is because most of the variation in mobility rates is within local areas: the standard deviation of mobility rates falls only from 1.3% to 1.0% when controlling for a college's Commuting Zone (Online Appendix H). 35 In sum, although children from low-income families tend to attend colleges with relatively poor earnings outcomes -potentially amplifying the intergenerational persistence of income -there are several colleges that buck this pattern and have high mobility rates. These colleges must either enroll particularly high-ability students from low-income families or have especially positive treatment effects on such students. We now explore whether these colleges have certain systematic characteristics as a first step toward understanding their educational models.
Characteristics of High-Mobility-Rate Colleges. Table Va reports correlations between various college characteristics and the fraction of low-income students, the fraction of those students who 33 We focus on the raw statistics as our baseline measures both for simplicity and because whether and how to norm the raw statistics is open to debate. To help readers construct their own preferred measures, we also report estimates of local income distributions for our analysis sample in our Online Data Tables.
34 Eight of the top 10 colleges remain in the top 22 using the normed measures. South Texas College is located in America's third-poorest CZ and falls to rank 318.
35 Online Appendix H also shows that we obtain similar results when using household income instead of individual income to estimate mobility rates, allaying concerns that the differences are driven by variation in labor force participation rates among secondary earners. reach the top quintile, and mobility rates. Correlations with fractions low-income and mobility rates are weighted by enrollment; correlations with top-quintile outcome rates are weighted by low-income enrollment.
The first ten rows present univariate correlations with non-demographic characteristics of colleges, including the college's STEM (science, technology, engineering, and mathematics) major share, an indicator for public control, net costs to low-income students, and instructional expenditure per student. Each of these variables is significantly negatively correlated with the fraction low-income and significantly positively correlated with top-quintile outcome rate, except public control which carries the opposite signs. These opposite-signed correlations result in modest and typically insignificant correlations with mobility rates. For example, the STEM share has a modest positive correlation of 0.12, showing that high-mobility-rate colleges do not have systematically different fields of study (Online Appendix Figure V). Colleges with higher STEM shares have significantly higher earnings outcomes, but also have significantly fewer low-income students. As a result of these offsetting forces, mobility rates end up being only weakly correlated with the distribution of majors. Similarly, the public institution indicator has an insignificant correlation of 0.04 with mobility rate. Although public colleges dominate the top ten list in Table IVa, there are many public colleges that have much lower top-quintile outcome rates and hence much lower mobility rates than private colleges.
We find much stronger correlations between mobility rates and the demographic characteristics of the undergraduate student body at each college. The share of Asian undergraduates has a correlation of 0.53 with mobility rates, as the Asian share is highly positively correlated with top-quintile outcome rate but uncorrelated with fraction of low-income students. The shares of Hispanic and Black undergraduates are also positively correlated with mobility rates, with the converse pattern. Using a simple bounding exercise in Online Appendix I, we show that only a small fraction of these ecological (group-level) correlations can be driven by individual-level differences in incomes across racial and ethnic groups. Hence, non-Asian students at colleges with larger Asian shares must also have higher top quintile outcome rates.
We also find a correlation of 0.26 between mobility rates and average Commuting Zone income, perhaps reflecting the fact that children who go to college in high-income CZs (such as New York) tend to stay nearby and get higher paying jobs after college.

IV.C Upper-Tail Mobility
The measure of mobility analyzed above -moving from the bottom to top quintile -is one of many potential ways to define upward mobility. Alternative measures that define mobility rates more broadly -such as moving from the bottom quintile to the top two quintiles, moving from the bottom 40% to the top 40%, or moving up two quintiles relative to one's parents -exhibit very similar patterns across colleges. All of these measures have enrollment-weighted correlations with our baseline measures exceeding 0.8 (Online Appendix Table VIII).
There is, however, one measure of mobility that exhibits very distinct patterns: upper-tail mobility, i.e., reaching the top 1% of the earnings distribution ($182,000 at ages 32-34). Figure  Unlike with top-quintile outcome rates, there are no colleges with top-percentile outcome rates comparable to the Ivy-Plus colleges that have higher fractions of low-income students.
Because their students are so much more likely to reach the top 1%, many Ivy-Plus colleges rank among the top ten colleges in terms of upper-tail mobility rates despite having relatively few students from low-income families (Table IVb). Interestingly, none of the colleges that appear on the top ten list in terms of bottom-to-top quintile mobility in Table IVa appear on the top ten list in terms of upper-tail mobility in Table IVb. Hence, the educational models associated with broadly defined upward mobility are distinct from those associated with upper-tail mobility.
Unlike with bottom-to-top quintile mobility, Table Vb shows that observable characteristics are very strongly correlated with upper-tail mobility. Colleges that have higher upper-tail mobility rates tend to be smaller, have larger endowments, higher completion rates, and greater STEM shares.
The colleges with the highest upper-tail mobility rates are all highly selective, high-expenditure, elite colleges. This uniform description of high upper-tail mobility rate colleges contrasts with the relatively diverse set of educational models associated with higher top-quintile mobility rate colleges. In this sense, the institutional model of higher education associated with the selection and/or production of "superstars" is distinct from and much more homogeneous than the variety of institutional models associated with upward mobility defined more broadly.

V How Would Changes in the Allocation of Students to Colleges
Affect Segregation and Intergenerational Mobility?
In this section, we use our estimates to simulate how income segregation across colleges and intergenerational mobility would change if students were allocated to colleges differently, using data on SAT and ACT scores as a proxy for students' academic qualifications at the point of application. We first show how SAT/ACT scores vary with parental income, a relationship that is central for understanding the results we establish below. We then simulate how alternative allocations of students to colleges would change the degree of income segregation across colleges and the rate of intergenerational mobility in the economy.
The reallocations we propose are hold constant total national spending on higher education, since we hold the number of seats at each college fixed. However, they would require a change in the allocation of funding across families and colleges, as some colleges would have larger shares of low-income students and thus have lower net tuition revenue given the financial aid packages they currently offer. Hence, the counterfactual allocations we simulate below should not be thought of as policy proposals, but rather as benchmarks that shed light on the drivers of segregation across colleges and the potential impacts of changing which students attend which colleges on economic mobility.

V.A Undermatching: SAT/ACT Scores by Parent Income
The relationship between test scores on college entrance exams and parental income is important for understanding the types of policies that could mitigate segregation in higher education. If a large fraction of high-achieving (high-scoring) students come from low-and middle-income families relative to their representation at highly selective colleges, one could potentially reduce segregation at elite colleges by recruiting and admitting high-achieving, low-income applicants at higher rates.
If in contrast low-income students have much lower SAT/ACT scores than high-income students, one may require other approaches such as need-affirmative admissions to reduce segregation across colleges.
Several studies in the literature on "undermatching" have analyzed how SAT/ACT scores vary with parental income, but they have reached conflicting conclusions. Some studies (e.g., Carnevale and Strohl 2010, Hoxby and Avery 2013) find that there are many high-achieving, low-income students, but others (e.g, Carnevale and Rose 2004, Hill and Winston 2006, Bastedo and Jaquette 2011 find relatively few such students. Our data permit a more precise analysis of the degree of undermatching than prior work by combining administrative data on parental income, college attendance, and SAT/ACT scores. However, like many prior studies, we do not observe test scores for a significant share (26.2%) of college students, presumably because they were not required to take a standardized entrance exam by the college they attended. We impute an SAT score to these students using the SAT/ACT score of the college student from the same parent income quintile, state, and college selectivity tier who has the closest level of earnings in adulthood. 36 This imputation methodology relies on the assumption that the joint distribution of college, parent income quintile, state, and imputed test scores matches what one would observe if all students were to take the SAT or ACT. This assumption would be violated if the latent scores of non-SAT/ACT-takers differ systematically from SAT/ACT-takers. We evaluate the validity of this assumption using data from five states where the SAT or ACT is administered to nearly all students-Louisiana, Connecticut, Maine, North Dakota, and Tennessee. We run our imputation algorithm in two ways: as above, but ignoring state in the imputation algorithm, and then separately pretending that we do not observe SAT or ACT scores for anyone in these five states.
We then compare the distribution of imputed scores to the distribution of actual scores. Both  Table IX shows the full joint distribution of test scores and parent income ranks among all college students. We find that students from low-income families have substantially lower test scores on average and that there are very few high-achieving students from low-income families. 38 For example, 3.7% of college goers with an SAT/ACT score of at least 1300 come from families in the bottom quintile, while 53.7% come from the top quintile. If we limit the sample to the 73.8% of college goers whose test scores are not imputed, we find even fewer high-scoring, low-income 36 All students missing a test score are also missing race, since we obtain race information from the SAT/ACT data. We impute race to these students using exactly the same procedure as for test scores.
37 Furthermore, we find that running our imputation procedure purely using SAT scores (pretending we do not have ACT data) yields very similar results.
38 One should not infer from this result that SAT/ACT scores simply serve as a proxy for parent income: parental income ranks actually explain only 8.6% of the variance in SAT/ACT scores in our analysis sample. Though students from lower-income families have lower SAT/ACT scores on average, there are many students from middle-and high-income families who do not have high SAT/ACT scores. students -e.g., a 3.1% bottom-quintile share among those with scores above 1300 -because lowincome college goers are less likely to take the SAT or ACT (Online Appendix Table X). As an additional robustness check, we replicate this analysis using data from the National Postsecondary Student Aid Study, which has student-reported family income data. The NPSAS-based estimate of the bottom-quintile share of 1300+ scorers is 4.0% (Online Appendix Table XI).
Our estimates of the fraction of high-achieving students who come from low-income families are broadly similar to those reported by Carnevale and Rose (2004), Hill and Winston (2006), and Bastedo and Jaquette (2011), but are substantially smaller than those estimated in the influential study of Hoxby and Avery (2013). 39 Hoxby and Avery estimate that 17% of graduating high school seniors with an SAT score or ACT equivalent of at least 1300 have parents in the bottom quartile of the income distribution. 40 By contrast, our estimate of this statistic is 5.0%. Similarly, Hoxby and Avery estimate that 39% of students with SAT/ACT scores above 1300 come from families below the median, compared with 16.6% in our data.
One reason for this discrepancy may be that Hoxby and Avery impute family income using Census tract-level means rather than using individual-level measures, a natural approach given that parental income is frequently missing and potentially noisy in their self-reported data. However, we find that higher-income children within small geographies tend to have higher SAT/ACT scores using our individual-level data. As a result, using tract-level means overestimates the number of students from low-income families who have high test scores. A second reason may be that Hoxby and Avery define the 25th percentile of the income distribution based on family income data from the American Community Survey (ACS), but measure parental incomes based on information drawn from financial aid forms. Because of differences in household units and income definitions across these sources, it is possible that Hoxby and Avery's approach would classify more than 25% of parents as falling in the bottom 25% of the distribution. 41 By contrast, because we compute 39 Carnevale and Rose use the National Educational Longitudinal Study of 1988 to find that 3% of those with an SAT score or ACT equivalent above 1300 have bottom-SES-quartile parents, where SES is the NELS-provided socioeconomic-status composite of parent income, education, and occupation. Hill and Winston use population-level SAT and ACT data to find that 4.8% of those with at least a 1300 have bottom-quintile parents, based on studentreported incomes and American Community Survey thresholds. Bastedo and Jaquette report means and standard deviations from the Educational Longitudinal Study of 2002 that, under Normality, imply that 4.1% of those with an SAT score or ACT equivalent above 1300 have bottom-SES-quartile parents. 40 Hoxby and Avery also require a self-reported grade point average of A-or higher, but they note that the GPA restriction matters very little once they apply the SAT/ACT restriction. 41 Hoxby and Avery classify a child as falling in the bottom quartile if the child's estimated family income lies below $41,472, the 25th percentile of family income in the 2008 American Community Survey. The income data they use in their analysis is based on College Scholarship Service (CSS) Profile family income data reported by the student, which in turn comes from parents' tax returns and supplementary information. In the tax data, however, the 25th percentile of the Adjusted Gross Income distribution is about $25,000, well below the ACS threshold. In Online percentile thresholds and measure parental incomes using the same data, 25% of parents fall in the bottom 25% in our analysis by construction.
Having established the relationship between test scores and parental income, we now analyze how alternative allocations of students to colleges would affect income segregation and intergenerational mobility.

V.B Income Segregation
We begin by evaluating the extent to which income segregation across colleges can be explained by differences in academic credentials when students apply to college (as proxied for by SAT or ACT scores), holding fixed each college's current racial composition and the geographic origins of their students. We impose the geographic and racial constraints to better approximate feasible reallocations, recognizing that many institutions (e.g., public state institutions, local community colleges, or Historically Black Colleges and Universities) effectively face geographic or racial constraints in practice. 42 This analysis provides a natural benchmark to gauge the extent to which colleges' student bodies are representative of the underlying population of academically qualified students from which they seek to draw. For example, are the parental incomes of Ivy League students representative of all students with similar test scores who come from the same states and racial groups?
To conduct this analysis, we first record the actual vector of SAT/ACT test scores at each college-by-state-by-race group − → s g . We then allocate students by filling each college-state-race's slot for a student with test score s with a random draw from the state-race's population of college students with test score s. In this "income-neutral" student allocation regime, colleges continue to enroll students based on both test scores and other credentials (e.g., recommendations, extracurriculars), but eliminate variation in enrollment rates by parental income -whether due to differences in application, admissions, or matriculation -among students with comparable test scores in the same state and racial group. Appendix C, we show that the differences between the tax data and the ACS are entirely due to differences in the definition of household units and incomes. 42 The impacts of our counterfactuals on aggregate segregation and mobility actually turn out to be quite similar if we permit reallocations without any racial or geographic constraints (Online Appendix Table XIV). segregation among high-income students by plotting the fraction of top-quintile peers for students from the top quintile (see Online Appendix Table XIII for additional statistics). In each case, we plot three statistics: the actual rates in the data, the rates under the income-neutral allocation counterfactual, and the rates under need-affirmative student allocations (which we discuss below).
Segregation across colleges would fall substantially if college enrollment were income neutral conditional on test scores: for example, the top-quintile peer share of students from low-income families would rise from 22.5% to 27.8%. Since 30.8% of college students come from the top quintile (shown by the horizontal line on the figure), a random allocation of students to colleges among the current pool of college students would yield a top-quintile peer share of 30.8%. Hence, income-neutral allocations would close 63.9% of the gap between the current degree of exposure that students from low-income families have to high-income students and the exposure they would have if colleges were perfectly integrated by income (conditional on the set of students who currently attend college). Put differently, only 36.1% of the income segregation across colleges can be attributed to differences in students' test scores, racial backgrounds, or geographic origins. The remaining 63.9% is driven by a combination of differences in student application choices, college admissions, and matriculation decisions by parental income conditional on these factors.
Although the income-neutral allocation reduces segregation overall, it largely reshuffles students within selectivity tiers and thus has smaller impacts on parental income distributions at more selective colleges. Figure Table VI for statistics for each tier separately). The bottom-quintile share of students at selective colleges overall rises from 7.3% to 8.6%, closing 38% of the gap in their underrepresentation relative to their 10.7% share of college goers overall. This 38% reduction in segregation at selective colleges is substantial, but it is much smaller than the 64% reduction overall.
Impacts at Ivy-Plus Colleges. The impacts of income-neutral allocations at the most selective colleges differ from those in the broader population. At Ivy-Plus colleges, the fraction of students from the bottom quintile remains essentially unchanged under income-neutral allocations in absolute terms (rising from 3.8% to 4.4%), but the fraction of students from the middle class (the second, third, and fourth income quintiles) rises sharply, from 27.8% to 37.9%, as shown in Table VI. Figure   Va shows why we see the biggest impacts on the representation of the middle class by plotting the parental income distribution of high SAT/ACT (>=1300) scorers alongside the parental income distribution of actual Ivy-Plus enrollees. Children from the bottom-quintile are represented at nearly the same rate as one would expect given their test scores; children from the middle-class are under-represented at these colleges; and those from the top quintile are over-represented.
Figure Vb presents a more granular depiction of the degree of over/under-representation by parental income. It plots the share of students with an SAT/ACT score of 1400 -the modal and median test score among actual Ivy-Plus students -who attend an Ivy-Plus college. Rather than a flat line, which would have indicated that 1400-scorers from each parent income bin attend an Ivy-Plus college at the same rate, we observe an asymmetric U-shape, with higher attendance rates in the tails. In particular, 1400-scorers with parents from the top and bottom quintiles attend Ivy-Plus colleges at 2.4 and 1.6 times the rate of middle-quintile children with comparable test scores, respectively. We find similar patterns at other test score levels; see Online Appendix Table   XII.
The upshot of this analysis is that there is a "missing middle" at Ivy-Plus institutions -an under-representation of students with high test scores from middle class families relative to students from low-income and especially high-income families. Changes in application or admissions policies that eliminate existing differences in attendance rates conditional on test scores across parental income groups could therefore significantly increase the representation of the middle class (though not low-income) families at the nation's most selective private colleges. 43 Of course, test scores are an imperfect proxy for academic credentials, and colleges weigh many factors (e.g., extracurriculars, overall fit) beyond academic qualifications in admissions decisions. Therefore, one cannot interpret the counterfactual estimates as representing income segregation under a "meritocracy." Nevertheless, we view this counterfactual as a natural benchmark to gauge the extent to which student bodies are representative of the underlying population of academically qualified students. If one's objective is to have income-neutral enrollment conditional on merit, deviations from this benchmark can be justified at current selectivity levels only if other non-testscore determinants of merit are correlated with parent income. 44 43 This conclusion differs from that of Carnevale et al. (2019), who report that high-socioeconomic-status (a composite of parent income, education, and occupation prestige) shares at highly selective colleges would barely change under a system in which students with the highest test scores are admitted to the most selective colleges, without regard to other credentials. This is because the students with the very highest SAT/ACT scores tend to come from the highest-income families. Although Carnevale et al.'s pure test-score-admissions counterfactual also achieves income neutrality conditional on test scores, it increases the selectivity of elite colleges, because elite colleges currently admit many students who have SAT scores well below 1600. Our point is that shifting to a system that is income-neutral conditional on the current distribution of test scores at elite colleges (thereby preserving current levels of selectivity) would substantially reduce top income shares.
44 It may be useful to consider an analogy to the principle of "disparate impact" in anti-discrimination law. Any hiring practice (e.g., requiring candidates to excel at squash) that has a disparate (differential) impact by gender or Need-Affirmative Student Allocations. Although a system of applications and admissions that is income neutral conditional on academic credentials would reduce income segregation significantly, the fraction of students from the bottom income quintile would remain about 50 percent higher at unselective colleges than selective colleges. We therefore now turn to ask how much of a preference one would need to give children from lower-income backgrounds in the student allocation processor, equivalently, how much lower-income students' test scores would have to rise -to fully eliminate segregation across colleges.
We simulate need-affirmative student allocations by adding ∆s q points to the SAT/ACT scores of children with parents from income quintile q < 5. We vary the values of {∆s q }, leaving SAT/ACT scores for children from the top quintile unchanged (∆s 5 = 0), in order to identify a profile of testscore increases that results in a constant parental income distribution across all college selectivity tiers. We then re-norm test scores to match the actual distribution and replicate the income-neutral allocation above with these adjusted scores (see Online Appendix J for details).
Iterating over linearly-declining profiles of {∆s q }, we find that that adding 160 SAT points for those from the bottom quintile (∆s 1 = 160) and ∆s q = (1− q−1 5 )160 for q = 2, 3, 4 -i.e., increments of 80%, 60%, and 40% of the bottom-quintile increment -produces roughly equal parental income shares across tiers. 45 To understand the practical implications of such an increment, note that 7.3% of children from the bottom parental income quintile with an SAT score of 1400 attend an Ivy-plus college in our data. Such students would attend Ivy-plus schools at a rate of 25.8% in our need-affirmative 160 point SAT increment scenario. More generally, among students with SAT scores above 1300, the 160 point increment increases the likelihood of attending an Ivy-plus college for a bottom-income-quintile student conditional on their SAT score by a factor of 3.54 on average.
It is instructive to gauge the magnitude of these increments in SAT scores and attendance rates for low-income students by comparing them to admissions preferences currently granted to other groups. Espenshade, Chung and Walling (2004) use admissions data from three elite private colleges to evaluate the extent to which legacies, athletes, and underrepresented minorities are more likely to be admitted, controlling for their credentials at the point of application. They race is prima facie evidence of unlawful discrimination and shifts the burden of proof to the employer to show that the practice is consistent with business necessity and has no practical and more neutral alternative. Disparate impact by parental income is not a legal concern, but would be of analogous interest to those seeking a system of college admissions that is income-neutral conditional on merit.
find that the increase in admissions probability for these groups is roughly equivalent to the effect of a 160 point increase in SAT scores. 46 Similarly, Arcidiacono, Kinsler and Ransom (2019) use data from Harvard to estimate that students who are recruited athletes, legacies, those on the Dean's interest list, or children of faculty and staff (ALDCs) have admissions rates 3.4 times higher than non-ALDC students with otherwise similar characteristics. 47 Hence, one way to implement our need-affirmative counterfactual could be to grant a preference in admissions for lower-income students similar to that currently given to other groups. Another approach may be to increase application or matriculation rates for lower-income students relative to high-income students by an equivalent amount.

V.C Intergenerational Mobility
Estimating Colleges' Value-Added. To quantify how changes in the allocation of students to colleges would affect intergenerational mobility, we first need estimates of how children's earnings outcomes 46 More precisely, Espenshade, Chung and Walling estimate that legacy status is equivalent to 160 SAT/ACT points, recruited athlete status 200 points, African-American status 230 points, and Hispanic status 185 points. Hurwitz (2011) also finds large observed admissions advantages for legacy applicants.
47 Table 10 of Arcidiacono, Kinsler and Ransom (2019) reports counterfactual admissions rates for admitted ALDC students, removing the ALDC preferences, separately for students of each race. Averaging these counterfactual admissions rates across racial groups using the number of admitted ALDCs from each race (reported in the same table) yields 29.4%, implying admissions rather that are 1 / 29.4% = 3.4 times higher for ALDCs than otherwise similar non-ALDCs. 48 We present results with alternative increments to SAT/ACT scores in Online Appendix Figure VIII. 49 Bowen, Kurzweil and Tobin (2006, Chapter 7) also examine the effects of need-affirmative allocations on parental income distributions at 18 elite colleges. Our findings are qualitatively consistent with their results at these 18 colleges, although our quantitative results differ because their self-reported parent income measures yield low-income shares at elite colleges that are twice as large as ours.
would change if they were to attend different colleges (i.e., colleges' causal effects or "value-added").
Directly estimating each college's value-added would require a source of quasi-experimental variation at each college and is outside the scope of this paper. Instead, we build on the prior literature and use estimates that are consistent with that work as an input into our simulations.
We begin from our estimates of children's mean earning ranks conditional on their parental income, race, and SAT/ACT scores estimated above. 50 We then estimate the fraction λ of these conditional earnings differences across colleges that is due to causal effects vs. selection by controlling for observable characteristics and for the set of colleges to which a student applied to capture selection on unobservables, following Dale and Krueger (2002).
Formally, consider the regression model where y iqc is the earnings rank of student i from parent income rank q who attended college c; X iqc is a vector of observed student-specific characteristics; f (S i ) is a quintic in the student's SAT or ACT equivalent score, an indicator for taking the SAT, and an indicator for taking the ACT (note that some students took both tests); f (p q ) is a quintic in the student's parent income rank; θ r is a race fixed effect, and δ c is a college fixed effect. We can estimate the vector of college fixed effects ∆ c = {δ c } using a variety of control vectors X iqc . First consider estimates where X iqc is empty and thus the only controls are SAT/ACT scores, parent income, and race; denote these estimates by ∆ S,p,r c . We can then assess the relationship between these test-score-and-parent-income-and-race controlled estimates of colleges' effects with estimates that include additional controls by running the regression ∆ S,p,r,X c = α + λ∆ S,p,r c + ν c .
The parameter λ gives an estimate of the fraction of the baseline test-score-and-parent-incomeand-race-controlled difference between any two colleges that would remain, on average, with the addition of further controls. If latent student quality is not correlated with the college he or she attends conditional on the observed characteristics X, the parameter λ can be interpreted as the fraction of the differences between colleges' earnings estimates ∆ S,p,r c that reflects their causal effects (value-added).
Table VII reports estimates of λ using a range of control vectors X. 51 Columns 1-3 control successively for the following observable student characteristics: interactions between gender, race, and the test score quintic; high school fixed effects; and high-school fixed effects interacted with race. These specifications all yield estimates of λ > 0.9, i.e. more that 90% of the baseline earnings variation (conditional on parental income, race, and test scores) reflects a causal effect if these observables capture selection.
To assess whether selection on other unobservable dimensions might confound our estimates, we use the set of colleges to which students apply as controls for their latent ability, as in Krueger (2002, 2014). 52 In Column 4 of Table VII, we follow Dale and Krueger (2014) and control for mean SAT score of the colleges to which students send their SAT/ACT scores (a proxy for college application) and the total number of colleges to which they send their scores in addition to the observable characteristics used in Column 2. Column 5 adds high-school fixed effects interacted with race to Column 4, while Column 6 limits the sample to students in the bottom quintile of the income distribution. 53 These specifications all yield point estimates of λ ≥ 0.85, with a lower bound on the 95% confidence interval of around 0.82. 54 Given these estimates, we assume that λ = 80% of the conditional earnings differences observed across colleges are due to causal effects (value-added) and the remaining 20% is due to selection in our baseline simulations. 55 Importantly, we also assume that these estimated causal effects do not change under our counterfactual student reallocations, in particular ignoring potential changes in value-added that may arise from having a different group of students (peer effects). 51 We exclude students who do not attend any college and omit students with imputed test scores from these regressions.
52 Controlling for the set of colleges to which students apply is what Dale and Krueger (2002) call a "self-revelation" approach to adjusting for selection; they show that this approach yields estimates that are very similar to specifications that control for the set of colleges to which students are admitted. Dale and Krueger (2014) simply control for the application set rather than the admittance set to maximize power in light of this result, and we follow that approach here (since we do not have data on admissions). 53 As the estimate in Column 6 indicates, we do not find significant heterogeneity in λ across parental income groups. However, the baseline conditional earnings differences from attending a more selective college are larger for students from low-income families. In particular, we replicate Dale and Krueger's result that the return to attending a college with higher average SAT scores is small on average, but larger for low-income students in Online Appendix Table XV. 54 In their College and Beyond sample, Dale and Krueger find that controlling for the application set reduces the coefficient on mean SAT scores substantially even after controlling for student's own SAT scores and other observables. We believe our findings differ because we have more precise controls for student background (e.g., a precise measure of parental income rather than a proxy) and because students' own SAT scores may be a stronger predictor of outcomes today than for students who attended college in the 1970-80s. 55 To further validate this approach, we compare our estimates to the regression discontinuity estimates of Zimmerman (2014), who essentially estimates the causal effect of attending Florida International University vs. Miami Dade College. Our estimates based on the approach outlined above are similar to Zimmerman's quasi-experimental estimates.
Income-Neutral Student Allocations. We construct a counterfactual earnings distribution for children at each college based on the observed distribution of earnings for children in each parent income quintile, SAT/ACT score level, race, and college. Mechanically, children are randomly assigned the earnings of another child who is observed as attending their counterfactually assigned college and who has the same parent income quintile, race, and SAT/ACT score with 80% probability and are assigned their actual earnings with 20% probability (reflecting our 80% causal effect assumption). Because this reallocation changes the aggregate distribution of children's earnings in adulthood, we then recompute quintile earnings thresholds based on the new aggregate earnings distribution when computing mobility rates. 56 Table VIII (Table VIIIb), a gap that is 14.6% smaller than the empirically observed gap. The gap in children's chances of reaching the top 1% between children from lowincome and high-income families falls from 2.8pp to 2.3pp, a similar reduction in percentage terms (Table VIII). Likewise, the correlation between parents' and children's income ranks among college students falls by 15% under the counterfactual. In sum, the intergenerational persistence of income would fall by about 15% if students were allocated to colleges based purely on their qualifications at the point of application (as proxied for by SAT/ACT scores).
Need-Affirmative Student Allocations. To compute students' earnings distributions under needaffirmative allocations, we follow the same approach as above, using students' actual SAT/ACT scores (rather than their adjusted SAT/ACT scores) in the earnings rank reallocation. This approach means that the test score increment granted in the admissions process does not affect students' earnings outcomes aside from the college that they attend.
Under need-affirmative allocations, the chance of reaching the top quintile ranges from 20.8% to 37.0% across parent income quintiles (Table VIIIc), 26.5% smaller than the empirically observed gap ( Figure VIc). The correlation between parents' and children's income ranks falls by 25%.
The gap in children's chances of reaching the top 1% between children from low-income and highincome families falls from 2.8pp to 1.9pp, a 32.6% reduction. The impact on children's chances of reaching the upper tail is particularly large because need-affirmative allocations sharply change the distribution of parental incomes at the most selective private colleges, whose students are especially likely to reach the upper tail, as shown in Section IV.
Need-affirmative reallocation has nearly twice as large an effect on mobility rates as incomeneutral reallocation because it enables low-income students to attend the highest value-added colleges. The value-added of the colleges that students from low-vs. high-income families attend is essentially equalized under need-affirmative allocations. The difference in the value-added of the colleges attended by students from the top vs. bottom parent income quintile (estimated as described above) falls by 89% relative to the empirically observed difference of 4.5 percentiles.
By contrast, income-neutral allocations reduce the gap in college value-added by parental income much less, by only 47% relative to the empirically observed difference. Intuitively, this is because income-neutral allocations tend to reshuffle low-income students across colleges in the same tier as shown above, whereas need-affirmative allocations enable low-income students to get into higher value-added colleges in higher selectivity tiers.
Alternative Assumptions About Causal Effects. In Online Appendix Table XVI, we vary our assumption about the fraction of the difference in earnings across colleges conditional on parental income, race, and SAT/ACT scores that is due to causal effects from θ = 100% (pure causal effects, no selection) to θ = 0% (pure selection, no causal effects). At the upper bound (θ = 100%), need-affirmative allocations would reduce the intergenerational persistence of income by 33%. The simulated impact mechanically decreases to 0% at the lower bound of θ = 0%. Assuming that θ > 50% -roughly the lower bound of the 95% confidence interval implied by comparing Zimmerman's (2014) estimate to ours -one could reduce the intergenerational persistence of income by at least 17% (among children who attend college) purely by changing the allocation of students to colleges, without attempting to change any college's production function. 57 These are substantial effects given that gaps in intergenerational mobility emerge from an accumulation of exposure to different 57 An alternative possibility is that the ratio of selection effects vs. causal effects is heterogeneous by parent income, with larger causal effects of attending an elite college for children from lower-income families. In Online Appendix Table XVII, we consider a scenario in which causal effects are 0 for reallocations within selective colleges (the top six tiers) for students with parents in the top four quintiles, 40% for reallocations within selective colleges for students with parents in the bottom quintile, and 80% for all other reallocations. In this scenario, need-affirmative allocations would reduce the intergenerational persistence of income by 21.3%. environments and schools throughout childhood . Since colleges account for less than a quarter of the time most children spend in formal education, one would not expect impacts on mobility much larger than 25% purely from changes in higher education.

VI Conclusion
Using data covering nearly all college students in the U.S. from 1999-2013, we constructed new college-level statistics on two key inputs necessary for understanding how the allocation of students to colleges affects intergenerational mobility: (1) parental income distributions and (2) children's earnings in adulthood conditional on parent income. We used these statistics to establish two sets of empirical results. First, parental income segregation across colleges is approximately as large as parental income segregation across the neighborhoods in which children grow up. Second, children of low-and high-income parents who attend the same college have relatively similar earnings outcomes, but children from high-income families are much more likely to attend colleges with high earnings outcomes.
Combining these college-level statistics with data on students' SAT and ACT scores, we find that allocating students to colleges in an income-neutral manner conditional on their test scores would increase the representation of students from low-and middle-income families at selective colleges substantially, holding fixed the racial composition and geographic origins of their students.
At the most selective (Ivy-Plus) colleges, the fraction of students from the middle class would rise substantially, although there would be little absolute change in the fraction of students from the bottom income quintile because so few of them currently have sufficiently high SAT/ACT scores.
Under the assumption that 80% of the difference in earnings premia (conditional on parental income, race, and state) are causal, our simulations imply that income-neutral allocations of students to colleges (conditional on test scores) would itself reduce the intergenerational persistence of income by 15%.
To go further, we simulate the consequences of raising lower-income students' test scores or granting them a preference in the admissions process similar to that currently given to legacy or minority students at elite private colleges. Such a change would essentially eliminate income segregation across all college tiers and reduce the intergenerational persistence of income by about 25%. We conclude that feasible changes in the allocation of students to colleges could increase intergenerational mobility substantially without any changes to existing educational programs, suggesting value in further efforts to enable students from low-and middle-income families to attend colleges that offer better earnings outcomes.
harvard university and nber brown university and nber uc-berkeley and nber federal reserve board uc-berkeley and nber The table presents summary statistics for the analysis sample defined in Section II.F. Column 1 includes all children in the 1980-82 birth cohorts. Column 2 limits this sample to students who attend a college (between the ages of 19-22) that is included in the public data release, using imputed data from the 1983-84 birth cohorts for colleges with insufficient data in the 1980-82 birth cohorts (see Section II, Online Appendix B, and Section II.F for details). This is the sample used for most of our analyses. Column 3 includes children in the 1980-82 birth cohorts who did not attend college between the ages of 19-22. Children are assigned to colleges using the college that they attended for the most years between ages 19 and 22, breaking ties by choosing the college the child attends first. Ivy-Plus colleges are defined as the eight Ivy-League colleges as well as the University of Chicago, Stanford University, MIT, and Duke University. Elite colleges are defined as those in categories 1 or 2 in Barron's Profiles of American Colleges (2009). 4-year Colleges are defined using the highest degree offered by the institution as recorded in IPEDS (2013). Parent income is defined as mean pre-tax Adjusted Gross Income during the five-year period when the child was aged 15-19. Parent income percentiles are constructed by ranking parents relative to other parents with children in the same birth cohort. Children's earnings are measured as the sum of individual wage earnings and selfemployment income in the year 2014. At each age, children are assigned percentile ranks based on their rank relative to children born in the same birth cohort. Children are defined as employed if they have positive earnings. In Column 2, the number of children is computed as the average number of children in the cohorts available for a given college multiplied by 3. Medians are not reported in Column 2 because the imputations are implemented at the college rather than individual level. We report dollar values corresponding to other key quantiles in Column 1 because those are the thresholds used to define the income groups we use in our analysis (bottom 20%, top 20%, etc.). All monetary values are measured in 2015 dollars. Statistics in Column 1 are constructed based on Online Data Tables 6 and 9; in Column 2 based on Online Data Table 2; and in Column 3 based on Online Data Table 6, with the exception of median income and earnings, which are constructed directly from the individual-level microdata. Notes: This table presents statistics on parental income segregation and children's earnings outcomes by college tier; see Section II.F and Online Appendix G for definitions of these tiers. All statistics reported are for children in the 1980-82 birth cohorts. All distributional statistics are enrollment-weighted means of the exact values for each college, except for median parent income and child earnings, which are the mean incomes for the percentile of the overall income or earnings distribution which contains the within-tier median. For example, the median Ivy-Plus parent falls in the 92nd percentile of the overall income distribution and the mean income for Ivy-Plus parents in the 92nd percentile of the overall distribution is $171,000. The exact fraction of students from less than two-year colleges with parents in the Top 1% is not available due to small sample sizes in the publicly available data. The trend statistics are coefficients from enrollment-weighted univariate regressions of the share of parents from the bottom 20% or 60% on student cohort, multiplied by 11; the statistics can therefore be interpreted as the trend change in lower-parent-income shares over the 1980-1991 cohorts. Rank-rank slopes are coefficients from a regression of child income rank on parent income rank with college fixed effects, as in Panels E-G of

Individual Earnings Rank
Notes: This table presents estimates from OLS regressions of children's ranks on parents' ranks using data for children in the 1980-1982 birth cohorts. Each cell reports the coefficient on parent rank from a separate regression, with standard errors in parentheses. Panel A uses the full population of children. Panel B and C also use the full population, but utilize college and college by 20-point SAT/ACT bin fixed effects. Panel D restricts to all children that attend college (between the ages of 19-22) and includes fixed effects for the college the child attended. Panels E, F, and G replicate the specifications in Panel D, restricting the sample to children who attended particular types of colleges: Elite (Barron's Tier 1) colleges, all other 4-year colleges, and 2-year colleges. In all specifications, the independent variable is the parents' household income rank, calculated by ranking parents relative to other parents with children in the same birth cohort based on their mean pre-tax Adjusted Gross Income during the five-year period when the child was aged 15-19. Column 1 uses the child's individual earnings rank in 2014 as the dependent variable. Column 2 adjusts both the dependent variable and independent variable for cost of living: we deflate both parents' and children's incomes (based on where they live when we measure their incomes) using a Commuting-Zone-level price index constructed using local house prices and retail prices as in Chetty et al. (2014, Appendix A). In Column 3, the dependent variable is an indicator for whether the child is working (defined as having positive earnings) in the year 2014. Columns 4 and 5 replicate Column 1, restricting the sample to male and female children, respectively. Column 6 uses children's ranks based on their household adjusted gross income instead of their individual earnings as the dependent variable. Column 7 uses an indicator for whether the child is married as the dependent variable. Column 8 uses children's ranks based on their household wage earnings plus self-employment income as the dependent variable. Columns 6-8 all use the full sample of children. See notes to Table I  The bottom-to-top-quintile mobility rate is the fraction of students whose parents were in the bottom quintile of the parent household income distribution (when they were aged 15-19) and whose own earnings (at ages 32-24) place them in the top quintile of the children's income distribution. The mobility rate equals the product of the fraction of children at a college with parents in the bottom quintile of the income distribution ("Fraction Low-income") and the fraction of children with parents in the bottom quintile of the income distribution who reach the top quintile of the income distribution ("Top-Quintile Outcome Rate"). The upper-tail mobility rate is defined analogously, measuring the fraction of students who reach the top 1% instead of the top 20%. Parent income ranks, child income ranks, and college assignment are described in the notes to    Notes: This table reports estimates of the fraction of the differences in mean earnings observed across colleges conditional on parental income, race, and SAT/ACT scores that are due to causal effects, corresponding to the parameter l in equation (2). The sample comprises all college-goers in our 1980-1982 cohorts who are matched to College Board or ACT data. Each column presents coefficients from univariate OLS regressions run at the college level, weighted by child count, following equation (2). The independent variable in all columns is the college fixed effect obtained from a regression of child earnings rank on college fixed effects, a quintic in parent income percentile, a quintic in SAT/ACT score, an indicator for taking the SAT, an indicator for taking the ACT (as some took both tests), and race/ethnicity indicators, as in equation (1). The dependent variable in each column is the child's college's fixed effect from the same regression, including additional controls. In Column 1, we add a gender indicator, and we fully interact the race, gender, and SAT-quintic. Column 2 adds fixed effects for the child's high school. Column 3 interacts the high school and race indicators. Column 4 replicates Column 2 and controls for the mean SAT score of the colleges to which students sent scores and also the total number of colleges to which the students sent scores, as in Dale and Krueger (2014). Column 5 replicates Column 3, adding the same controls as in Column 4. Column 6 replicates Column 5, restricting attention to children with parents from the bottom quintile. Statistics in this table are constructed directly from the individual-level microdata. Notes: Panel A shows the actual intergenerational income transition matrix for college students in our analysis sample (1980-1982 birth cohorts). Each cell of Panel A reports the percentage of college goers with earnings outcomes in the quintile given by the column conditional on having parents with income in the quintile given by the row for the analysis sample. Panels B and C repeat Panel A under the income-neutral student allocation and need-affirmative student allocation counterfactuals, defined in the notes to Table VI. Panels B and C assume that 80% of children's earnings differences across colleges reflect causal effects conditional on SAT/ACT scores, race, and parental income. Mechanically, children are randomly assigned the earnings of another child who is observed as attending their counterfactually assigned college and who has the same parent income quintile, race, and SAT/ACT score. After that counterfactual earnings level is calculated, with 80% probability, children are assigned that randomly assigned earning, and with 20% probability, children are assigned their actual earnings. See Online Appendix J for details.  Notes: Panel A plots the fraction of students in our analysis sample (1980-82 birth cohorts) who attend college at any time during the years in which they turn 19-22 by parental income percentile. Panel B plots the percentage of students with parents in each quintile of the income distribution at Harvard University, University of California at Berkeley, State University of New York at Stony Brook, and Glendale Community College in the analysis sample. The percentage of students with parents in the top income percentile for each college is also shown. Panel C plots the percentage of students in the analysis sample with parents in each income percentile pooling all 12 Ivy-Plus colleges, which include the eight Ivy-League colleges as well as the University of Chicago, Stanford University, MIT, and Duke University. Parent income is defined as mean pre-tax Adjusted Gross Income (in 2015 dollars) during the five-year period when the child was aged 15-19. Parent income percentiles are constructed by ranking parents relative to other parents with children in the same birth cohort. Children are assigned to colleges using the college that they attended for the most years between ages 19 and 22, breaking ties by choosing the college the child attends first. Panel A is constructed directly from the individual-level microdata; Panel B from Online Data Table 2; and Panel C from Online Data Table 6. Notes: This figure shows the relationship between children's income ranks and parents' income ranks for children in the 1980-82 birth cohorts. The series in circles in Panel A plots the mean child rank for each parent income percentile, pooling all children in our analysis sample. The series in triangles in Panel A repeats the series in circles after including college fixed effects, constructed by demeaning both child and parent ranks within each college, computing an enrollment-weighted average across colleges of the resulting series for each college, and adding back the national means of child and parent rank (50). Children who do not attend college are grouped into a single category for this purpose. The series in squares in Panel A repeats the series in triangles and interacts the college dummies with 20-pt SAT/ACT bins. The slopes reported are for a linear regression fit on the plotted points. Panel B plots the mean child rank in each parent income ventile (5 percentile point bin) vs. the mean parent rank in that ventile for students at the University of California at Berkeley, State University of New York at Stony Brook, and Glendale Community College. The figure also plots the mean child rank vs. parent income percentile in the nation as a whole (including non-college-goers) as a reference. We report rank-rank slopes for each college, estimated using an OLS regression on the twenty plotted points, weighting by the count of observations in the microdata in each parent ventile. To construct the series for each college group plotted in Panel C, we first run an enrollment-weighted OLS regression of children's ranks on indicators for parents' income ventile and college fixed effects. We then plot the coefficients on the parent income ventiles, normalizing the coefficients on the ventile indicators so that the mean rank across the twenty coefficients matches the mean unconditional mean rank in the relevant group. The rank-rank slope in each group is obtained from an OLS regression of child rank on parent rank including college fixed effects in the microdata. Children's incomes are measured in 2014 and children are assigned percentiles based on their rank relative to other children from the same birth cohort in 2014. See the notes to Figure   Notes: Panel A plots the percentage of children who reach the top quintile of the earnings distribution in 2014, conditional on having parents in the bottom income quintile (termed the "top-quintile outcome rate") vs. the percentage of students with bottom-quintile parents (termed "fraction low-income"), with one observation per college. Children's ranks are constructed by comparing their earnings in 2014 to others in the same birth cohort. Parent income percentiles are constructed by ranking parents relative to other parents with children in the same birth cohort. Multiplying a college's top quintile outcome rate by its fraction of low-income students yields the college's "mobility rate," the probability that a child has parents in the bottom parent income quintile and reaches the top quintile of the child income distribution. The curves plot isoquants representing the 10th, 50th, and 90th percentiles of the distribution of mobility rates across colleges. Ivy-Plus and public flagship colleges are highlighted. Ivy-Plus colleges are defined in the notes to Figure I. Public flagships are defined using the College Board Annual Survey of Colleges (2016). Public flagships that are part of a super-OPEID cluster that contains multiple schools are omitted. We report the mean mobility rate for these two sets of colleges and the standard deviation (SD) of mobility rates across all colleges. Panel B repeats Panel A but using the fraction of students who reach the top 1% of the earnings distribution on the y-axis (instead of the top 20%). All estimates use the analysis sample and all statistics reported are weighted by enrollment. See notes to Figure   Notes: Panel A plots two series: the parent income distribution of college students nationwide with an SAT/ACT score of at least 1300 (the 93rd percentile), and the parent income distribution of students attending an Ivy-Plus college. See Online Appendix Table IX for analogous statistics at other SAT/ACT thresholds. See Table VI for the parent income distributions of tiers other than the Ivy-Plus. Panel B plots Ivy-Plus college attendance rates by parental income percentile for students with a 1400 SAT/ACT score, the modal and median test score among Ivy-Plus students. The plotted line is an unweighted lowess curve fit through the 100 plotted data points. The dashed horizontal line is the average Ivy-Plus attendance rate for college students with a 1400 SAT/ACT score. See Online Appendix Table XII and Online Appendix Figure VII for analogous statistics on attendance rates at other test score thresholds. SAT scores for 47.6% of college goers are obtained directly from the College Board; composite test scores for another 26.2% of college goers are obtained from ACT and converted to an SAT score. We impute an SAT/ACT score to the other 26.2% of college-goers using the SAT/ACT score of the student from the same parent income quintile and same college tier with the nearest child earnings. See Figure I for definition of Ivy-Plus colleges. This figure is constructed directly from the individual-level microdata. Notes: This figure shows how the income-neutral and need-affirmative student allocation counterfactuals affect income segregation across colleges and intergenerational mobility. The income-neutral counterfactual allocates students to colleges randomly based on their SAT/ACT scores while holding fixed the distribution of SAT/ACT scores, race, and pre-college states to match the empirical distribution at each college. The need-affirmative student allocations counterfactual replicates the income-neutral counterfactual after adding 160 points to the SAT/ACT scores of all college goers from the bottom parent income quintile, 128 points to second quintile college goers, 96 points to third quintile college goers, and 64 points to fourth quintile college goers. See Section V.B for details on these counterfactuals. Panel A plots the fraction of college peers from the top quintile among college students with parents in the bottom quintile (left triplet of bars) and the top quintile (right triplet of bars) in actuality and under the two counterfactuals. These statistics are based on the subset of students who attend college in our analysis sample (i.e., excluding those who do not attend college). The dashed horizontal line shows the fraction of college students who come from the top quintile, which is the fraction of top-quintile peers one would observe if students were randomly allocated to colleges. See Online Appendix Table XIII for additional statistics on peer exposure across colleges. Panel B plots the fraction of students from the bottom parental income quintile in actuality and under the two counterfactuals at Ivy-Plus colleges, all Selective colleges, and all Unselective colleges. Selective tiers comprise the top six tiers listed in Table II, while Unselective tiers comprise the remaining six tiers. Panel C plots the gap (percentage-point difference) in the fraction of children who reach the top quintile between top-parent-incomequintile college-goers and bottom-parent-income-quintile college-goers in actuality and under the two counterfactuals. Brackets denote the share of the gap narrowed under each counterfactual. The calculations in Panel C assume that 80% of children's earnings differences across colleges reflect causal effects conditional on SAT/ACT scores, parental income, and race; see section V.C for details. In Online Appendix Table XVI, we report results under alternative assumptions about the causal share. This figure is constructed directly from the individual-level microdata.