Cohort Profile: the Office for National Statistics Longitudinal Study (The LS)

Cohort Profile: the Office for National Statistics Longitudinal Study (The LS) Nicola Shelton, Chris E Marshall, Rachel Stuchbury, Emily Grundy, Adam Dennett, Jo Tomlinson, Oliver Duke-Williams, ONS Staff and Wei Xun Centre for Longitudinal Study Information and User Support (CeLSIUS), Department of Epidemiology and Public Health, University College London, London, UK, Institute for Social and Economic Research, University of Essex, Colchester, UK, Centre for Advanced Spatial Analysis, Department of Information Studies, University College London, London, UK and Longitudinal Study Development Team, Office for National Statistics, Titchfield, UK


Why was the cohort set up?
Two factors were particularly important in the decision to set up the [Office for Population Censuses and Surveys (OPCS), now Office for National Statistics (ONS)] Longitudinal Study (the LS) in 1974. 1 These were concern over the limitations of the occupational data collected at death registration which were used to calculate occupational mortality rates, and a need for more information on fertility patterns, particularly changes in birth spacing. It was recognized that existing data sources were inadequate for analysis of mortality differentials, particularly by occupation, due to bias resulting from the fact that denominator (population) data came from the Census and numerator (deaths) data from vital registration. Employment profiles that reflected lifetime experiences rather than last job from the death certificate were required for detailed occupational mortality analyses. More information on birth spacing and social and family influences on fertility patterns was also needed, the early 1970s representing the 'birth dearth' period when policy makers were very concerned about fertility patterns. The then Office for Population Censuses and Surveys (OPCS) had developed considerable experience in record linkage studies following up particular occupation groups (and members of specific studies), a process facilitated by development of the National Health Service Central Register (NHSCR). In the early 1970s, OPCS decided to make better use of existing resources by establishing a longitudinal study based on linked census and vital registration data (births, deaths, cancer registrations). The usefulness of the LS for migration and sociodemographic studies was also anticipated.
The initial sample was drawn from the 1971 Census on the basis of birthday, in order to facilitate linkage. All those born on four undisclosed birthdays per year were included, giving a sample amounting to just over 1% of the population of England and Wales. The study has been maintained as a continuous multi-cohort through the addition of new births and immigrants with the same birth date, and includes individuallevel data from five Censuses (1971,1981,1991,2001,2011) as well as linked information on births, deaths and cancer registrations. Access to anonymized data for research purposes is permitted under strict access conditions which include useage of microdata in an ONS secure data laboratory.
The LS is representative of the whole population of England and Wales, including those in non-private households and all age groups; and it includes census information about other people in sample members' households at each census, which provides additional opportunities for examining intergenerational continuities and changes. The 'width' of the sample in terms of size means it is possible to study relatively small groups, such as members of particular ethnic minority groups or older people resident in institutional settings (a group excluded altogether from most surveys). The 'depth' of the study over time makes it increasingly valuable for research including a life course or intergenerational perspective. A further strong advantage of the LS is minimal bias due to non-response or attrition, as census coverage is good and rates of linkage high.

Who is in the cohort?
This is a 1% dynamic sample of all persons of any age or gender, identified as having an LS date of birth (one of four dates, spread through the year) and usually resident in England and Wales, who completed a census form and have joined through birth or immigration, since 1971 (Tables 1-5).
How often have they been followed up?
From the 1971 Census onwards, the LS has been maintained in the following manner. Life events are also linked to LS members, as follows. The LS life events tables are generally updated once per year. Late notifications (e.g. of deaths abroad) mean that counts for some years already available will increase with each update. The data on LS members are enhanced (no data are ever deleted) by the addition of new data at 10yearly intervals as information from the decennial censuses becomes available. The total number of LS members following the 2011 Census is now more than one millionthis includes those who have died in the intervening period (5000-8000 per year since 1971, offset by a roughly similar number of births). The number present at any one time point has risen slightly with each census, but ranges between 524 000 and 581 000. The LS is maintained and updated by ONS and makes secondary use of data collected for other purposes. Consent is not required as this work is carried out as part of ONS's statutory functions as laid out in the Statistics and Registration Service Act 2007.

What has been measured?
The census gathers a large amount of sociodemographic data every 10 years. The data available consist of the responses to the census questions and some other variables (e.g. social class) derived from relevant census variables. The baseline sample will be followed up, but as the LS is a dynamic sample, the subsequent 'follow-up' sample also includes new members at each census.  The 1971 Census asked ever-married women then aged 16-59 how many children they had and the year of the children's births, giving a baseline idea of fertility. The 1971 Census asked for address 5 years ago, 1 year ago and the present address. Subsequent censuses only asked for current address and address 1 year ago. Migration studies therefore have 11 possible residential locations for people identified at 1971, alive in 1966 and still alive in 2011. Additionally, 10-year migration indicators have been derived by comparing address, or postcode district, of members in successive censuses. In the early years of the LS (1971-74), data from moves between Family Practice areas were also recorded. Table 6 shows key variables included in the 1971 and subsequent censuses.

What has it found? Key findings and publications
The LS has provided evidence with academic and nonacademic impact for social policy issues such as: • inequalities in health, employment, education and geography; • equal opportunities for women, ethnic groups and the long-term sick; • social exclusion, including long-term outcomes of education and employment status; • economic integration of migrant groups; • housing and geographical mobility;   • family policy, including early/late parenthood, different childbearing patterns of advantaged and less advantaged groups, and cohabitation.
The LS has been used to provide unique information to support a series of major reports for government on health and mortality: Inequalities in Health, 1980 (the Black Report) 2 ; The Health Divide: Inequalities in Health in the 1980s, 1987 (the Whitehead Report); 3 Independent Inquiry Into Inequalities in Health Report, 1998 (the Acheson Report); 4 and the Strategic Review of Health Inequalities in England Post-2010: Fair Society, Healthy Lives (the Marmot Review). 5 The LS has also been used for analysis of work on pensions. The first report, Pensions: Challenges and Choices, in 2004, 6 was followed in 2005 by the Turner Report: A New Pension Settlement for the Twenty-first Century. 7 Both reports include information on trends in life expectancy at 65, by social class. Subsequently research from the LS has fed into the state pension age review in 2017. 8 The Dilnot Report: Fairer Care Funding was published in 2011. 9 The size of the population in long-term residential and nursing home care at any one point in time depends on rates of admission and length of stay. The submission used data from the LS on the survival of older people who in the 2001 Census were recorded as residents of residential care homes, nursing homes or other types of communal establishment, and examined differentials in the survival of this population by characteristics including: broad type of establishment (residential, nursing or other); gender; and marital status in 2001. It also used information on place of death, to assess the assumption that residents in communal establishments of various types in 2001 remained in institutional care throughout the follow-up period (from the 2001 Census to the end of 2008). 10 Social mobility continues to be of significant political concern; a report for the Joseph Rowntree Foundation was published in 2005, which traced patterns of intergenerational social mobility for children born between the late 1950s and mid-1970s from different ethnic groups in England and Wales. Key findings included: the children of parents in higher social classes were more likely to end up in higher social classes themselves; and most minority ethnic groups showed high levels of children moving into a higher class than their parents. The stability of couple partnerships is also of interest to policy makers. The paper: 'Do partnerships last?, comparing marriage and cohabitation using longitudinal census data' was published in 2010. 11 The research used a sample of adults who were in a partnership (married or cohabiting) in the 1991 Census of England and Wales, and then explored whether these individuals were living with the same partner in 2001.
Main findings include: 82% of married adults aged between 16 and 54 in 1991 were still living with the same partner in 2001, compared with 61% of cohabiting adults; adults were less likely to remain with the same partner if, in 1991, they were younger, had no dependent children living in the household, had a limiting long-term illness, had previous experience of partnership dissolution, had no higher qualifications or were unemployed. This paper now is cited in the A-level Sociology syllabus.
Academic impact is a key feature of LS research. There are many highly cited papers, especially within epidemiology and the social sciences. Examples include sex differences in developmental reading disability, 12 selective migration and health 13 and limiting long-term illness and mortality among non-migrant people, 14 fertility history and health in later life, 15 socioeconomic status and ischaemic heart disease mortality, 16 sociodemographic variations in moves to institutional care, 17 living arrangements and place of death, 18 accumulated labour market disadvantage and limiting long-term illness, 19 population change and migration, 20 and cancer and proximity to power lines. 21 Recent work drawing significant media attention includes trends in life expectancy at birth and at age 65 by socioeconomic position based on the National Statistics Socioeconomic Classification, England and Wales: 1982-86 to 2007-11, produced by ONS. Headline results that the most advantaged men were living longer than the least advantaged women for the first time were published in many national newspapers. [22][23][24][25] A paper on impacts of in utero exposure to air pollution using LS data was featured in the Telegraph 26-27 and a paper on chronic health effects of air pollution was widely featured in the press. 28 A full set of publications is available here at the Census & Administrative Data Longitudinal Studies Hub. 29

Main strengths and weaknesses
The strength of the LS is its large sample size (total N > 1 000 000), the length of follow-up available (40 years, 1971-2011 for main census data) with life events for LS members available until about 2 years before the current year of analysis. This is by far the largest nationally representative longitudinal dataset in the UK; it allows analysis of small areas (well below local authority level), particular ethnic groups and specific occupational groups. These are not possible with any other longitudinal dataset because of insufficient numbers. In addition to information on LS members, there is information on all persons in their household at any time point. This means that information missing, for example the social class of a child, can be recovered by looking at the social class of their parents.
With the long period of follow-up, survival analysis can be performed looking at differences between subjects with far more parameters than just age at death and sex: industry, social class, education and location are all variables that could be entered into the analysis.
Geography (where people live) is consistent at all time points to the geographical identity in England and Wales in 1974. However, researchers will need guidance as to which variables to use as other geographies are in force in the LS in 1991, 2001 and 2011. The lowest geography at which a researcher may generally report results is Local Authority, of which there are just under 350 in England and Wales. Lower level geographies are available for attaching the researcher's own external data, but the small area geographical variables are removed before the dataset is made available to the researcher. Unusually, the data include persons in communal establishments, so groups such as students and older adults are represented.
Since the LS comprises all persons born on 4 days of the year, the sampling fraction is approximately 1.1% and sampling bias is almost nil. The high tracing rates contribute to the high linkage rate of LS members from census to census (88% 2001 to 2011). 30 Response rates to the 2011 Census were very high relative to other national censuses, sample surveys and cohort and panel studies, at 94%. 31 There are changes in study population over time, but this offers the opportunity to look at both a closed cohort and a representative sample of the national population. Table 7 shows the tracing rates for each of the five censuses included in the LS.
Comparative analyses of UK data are also possible using e-Datashield for the periods 1991, 2001 and 2011. 32 Sister cohorts exist for Northern Ireland 33 and Scotland 34 and these can be analysed indirectly in any of the three Research Support Units in London, Edinburgh or Belfast, drawing on the strengths of the support teams in all three units and the e-DataSHIELD software. A considerable amount of meta-data are available for the LS, including a data dictionary with sample sizes and variable similarity scores over time. 35 Relative to cohort and panel studies there is a limited set of questions asked, and there are changes in definitions and questions asked for several variables over time. The main weakness of the LS is the lack of behavioural data. Also the census is every 10 years, so updates are limited, but there are some questions that offer retrospective information such as year left last job and address 1 year ago. As the data are anonymized, but LS members do not know they are part of the study, extreme care has to be taken when reporting results: no cell count less than 10 may be published unless the researcher can demonstrate that a lower cell count is not disclosive and that it is vital to the findings of the research project, but the onus is on the researcher to prove this.
Can I get hold of the data? Where can I find out more?
The LS data are available to anyone in the UK who can fulfil the requirements of ONS's Approved Researcher Scheme. 36 The data can be accessed through the Secure Research Service (SRS) safe setting rooms at ONS offices in London (Pimlico), Hampshire (Titchfield) and South Wales (Newport), or remotely by sending syntax to user support officers to run, and receiving output by return. The Centre for Longitudinal Study Information and User Support (CeLSIUS) provides support for UK-based researchers from the academic, public and third sectors. The LS Development Team at ONS provides support for all other researchers.
The application process is fully detailed on the CeLSIUS website at [www.ucl.ac.uk/celsius] where all the necessary forms can be found under the 'Using the ONS Longitudinal Study' section. Significant user support is provided by CeLSIUS and ONS. A synthetic training dataset with a limited range of variables and transitions from 2001-11 is freely available to download under Open Government Licence for testing syntax and sample size estimations. 37,38 Synthpop, the process for offering CeLSIUS is supported by the Economic and Social Research Council (Award Ref: ES/R00823X/1) and therefore their service is free to academic and public sector researchers in most circumstances. This work contains statistical data from ONS which is Crown Copyright. The use of the ONS statistical data in this work does not imply the endorsement of the ONS in relation to the interpretation or analysis of the statistical data. This work uses research datasets which may not exactly reproduce National Statistics aggregates. for more than 580,000 study members.
• Data is socio-economic and demographic data with self-reported health measurements since 1991 and linkage to mortality and cancer registration from 1971.