An investigation of the consistency of parental occupational information between UK birth records and a national social survey

In the United Kingdom, new sources of administrative social science data are unfolding rapidly but the quality of these new forms of data for sociological research is yet to be established. We investigate the quality and consistency of the parental occupational information that is officially recorded on administrative birth records by undertaking a comparison with information collected from the same parents in the UK Millennium Cohort Study (MCS). We detect a large amount of missing information in the birth records and a range of inconsistencies. We present an empirical analysis of MCS data using parental social class measures derived both from the birth records and the survey to assess the effects of these discrepancies. We conclude that parental occupational information from administrative birth records should not be assumed, a priori, to be suitable for sociological analyses and that further research should be undertaken into their consistency and accuracy.


Introduction
The explosion in the availability of new sources of data in the early part of the 21st century is set to revolutionize research possibilities within sociology. The emergence of 'big data' and other forms of 'digital data' offer new opportunities to study individuals and societies (see for example Manovich, 2011;Burrows and Savage, 2014;Kitchin, 2014;Schroeder, 2014). Simultaneously, advances in e-research and computer science provide increasingly improved solutions for linking large data sets (see Goerge and Lee, 2001;Halfpenny and Procter, 2015).
Administrative social science data resources contain information which originate from the operation of administrative systems, typically those that are associated with public sector agencies (Elias, 2014;Woollard, 2014). These data sets offer new opportunities for empirical sociological research. Researchers in the Nordic nations have benefited from unparalleled access to administrative social science data (see United Nations, 2007), whilst at the same time their national registers have provided the basis for a strong data infrastructure. By contrast, in most other nations, sociological analyses of administrative data have been far less widespread and are far from routine. The increased research potential that would be offered by improved access to administrative data has recently been recognized in the United Kingdom, and major infrastructural investment has been made to support the analysis of administrative data 1 (see Administrative Data Taskforce, 2012).
The new sources of administrative social science data in countries like the UK are unfolding rapidly and haphazardly, and are not supported by the framework of a national population register. The quality of these new forms of data for sociological research is yet to be established. This article is original because it engages in an innovative analysis to assess the consistency of a set of administrative data and survey data collected from the same individuals. The specific focus of this article is the assessment of the consistency of parental occupational information in UK birth records.
Within sociology there is a longstanding recognition that in industrialized societies occupations are often the most powerful single indicator of levels of material reward, social standing, and life chances (Parkin, 1971;Rose et al., 2005). Occupations remain a key element of contemporary social life, and occupation-based indicators are a cornerstone of sociological research. Measures of parental socio-economic position are essential to analyses of inequalities in a wide range of areas for example social stratification, education, health, and well-being (see for example Graham, 2007, Bukodi andGoldthorpe, 2012;Sullivan et al., 2013;Gr€ atz, 2015).
In the UK, the only source of administrative data on parental occupations, taken at the same age for all children, are birth records. 2 Parental occupational information is also available in parental marriage records. These records are of limited use since an increasing number of children are born outside of marriage, and the gap between marriage and children's birth dates varies substantially. UK census records provide another potential source of information on parental occupations. The utility of this data source is also questionable since the UK census is conducted decennially.
In the UK the systems for the collection of birth registrations vary slightly between territories, 3 but each territory collects information on the name, date, and place of birth of the child, the father's name and occupation, and the mother's name and occupation. We are in the methodologically fortunate and unusual position to have access to linked data on parental occupations reported on administrative birth records and also a short time later in a social survey interview conducted as part of the UK Millennium Cohort Study (MCS). Our analyses investigate the consistency of reports of parental occupations between these two data resources.
The MCS data are collected specifically for the purposes of research, and the data collection has been designed to maximize the validity and reliability of the data. The MCS survey is administered by a professional data collection agency, and interviews are carried out by trained interviewers collecting data specifically for the purposes of research. In large-scale nationally representative social surveys, extensive cross-checking and validation work is carried out to maximize data quality and the data quality will be clearly documented.
By contrast, the administrative birth records are not collected for the purposes of research. Goerge and Lee (2001), for example, note that the original motivation for collecting administrative data should be questioned when assessing its quality. Researchers should consider whether the information they are interested in is central to the purposes it was collected for. If certain measures are not required for the operation of an administrative system, they may not be collected conscientiously (Goerge et al., 1992;Goerge and Lee, 2001). The collection of parental occupations on the birth records is not directly required for the operation of any administrative system or the delivery of a service. Therefore, the accurate collection of these data is not of immediate importance to the frontline worker collecting the information. The influence which frontline workers can have on administrative systems is highlighted clearly in Lipsky (1979).
It is also important to note that administrative data resources can take many forms. In this case, the data collected in birth records are based on the information provided by an individual, in much the same way as they would provide information in a social survey interview. In that respect, our comparisons between these data sets investigate differences in the recording of occupations by a registrar compared with a social survey interviewer, and not the differences between survey and administrative data in general. This form of administrative data collection is not unusual in the UK. The UK does not have a national register, and individuals do not have a unique identification number; therefore, information is most commonly provided to different administrative systems by the individuals themselves. In some cases, administrative data will be produced through more objective processes such as records on the amount of tax paid, the educational qualification attained in national examinations, or the model and colour of a vehicle registered to a motorist. The characteristics and accuracy of administrative data will vary according to its source, and the manner in which it is collected. Goerge and Lee (2001) emphasize that the degree of error varies between administrative data systems, and they encourage researchers to assess each new administrative social science data set individually for every new research question. Following the prescription from Goerge and Lee (2001), the central aim of this article is to undertake an evaluation of UK administrative birth records.
Whilst both the survey data and administrative data will contain inaccuracies, we strongly believe that the purposes and processes involved in the production of the survey data will usually render these data more suitable for social research than administrative birth records. Therefore, we consider the comparison of parental occupations on birth records with available social survey data to be a valid and meaningful assessment of the quality of this administrative data resource.
This article will address three main questions: 1. How consistent are maternal and paternal occupations reported on the survey and in the birth records? 2. Are parental characteristics associated with patterns of agreement and missingness of occupational information on the survey and the birth records? 3. What potential impact do disagreements have on empirical sociological analyses?
There are some previous studies from the United States that have investigated the accuracy of the occupational information provided on birth records. These have generally been from within the field of epidemiology, and they have been motivated by the need to identify occupational risk factors for maternal and child health. Carucci and Prasad (1979) studied birth records in upstate New York. This study found a lack of detail in reports of mothers' occupations. This precluded the use of full occupational codes, 4 and this in turn would be a major impediment for the development of occupation-based socio-economic measures. Carucci and Prasad (1979) encountered a high degree of missing maternal occupational information. Mothers were required to give details of their last employment, and 65 per cent were described as 'housewives' on the birth record. The survey identified that over half of mothers described as 'housewives' on the birth record did have a previous occupation. Shaw et al. (1990) found more promising results when assessing parental occupations on Californian birth records. For 71 per cent of mothers and 80 per cent of fathers, the occupation on the birth record was the same as the occupation reported in an interview. Brender et al. (2008) studied parental occupations on birth records in Texas and found that mothers were frequently misclassified as 'homemakers' or unemployed when they did have previous employment. 5 Paternal occupations were missing in 22 per cent of cases. For those parents with occupational information available, 77 per cent of maternal occupations and 63 per cent of paternal occupations matched between the birth records and the interview.
We can only speculate on the reasons for the finding of increased missingness of mother's occupations on administrative birth records. We conjecture that the following three factors may be implicated. First, mothers may consider their occupation as being a 'housewife' even though this is not officially recognized as an occupation. Second, registrars may not fully explain that they are asking for last occupation, and not what the mother considers as her current activity. Third, the registration takes place shortly after the baby's birth. If the father attends the registration on his own it is plausible that he may provide less detailed information on his partner's occupation. An observational study of registrations, which included suitable follow-up interviews, would be required to comprehensively establish the reasons for the under-reporting of maternal occupations.

Data and Methods
The data that are investigated in this analysis are drawn from the UK Millennium Cohort Study (MCS) (for more details, see Connelly and Platt, 2014). The MCS is a sample of children born between the 1st of September 2000 and the 11th of January 2002 throughout England, Wales, Scotland, and Northern Ireland. The MCS currently comprises five survey waves. We use information from the first wave of data collection, when the children were around 9 months of age (SN4683, UCL Institute of Education, 2012). Data from birth records were linked to the MCS survey by statistical agencies in each of the constituent UK territories (i.e. the Office for National Statistics in England and Wales and the General Register Office in Scotland). These data are held in the 'Millennium Cohort Study Birth Registration and Maternity Hospital Episode Dataset' (SN5614, UCL Institute of Education, 2008). Full details of the data linkage process are available in Hockley et al. (2007).
There are 16,629 6 families included in the first MCS survey (excluding families in Northern Ireland), and 15,013 of these families were successfully linked to the birth records data, a 90 per cent linkage rate (see Tables 1 and 2). Our analyses exclude birth registrations from Northern Ireland, as the occupational information in these cases was provided in the form of Standard Occupational Classification 90 (SOC90) codes which are different to the occupational information from other territories which is provided in the form of SOC2000 codes. There is no direct conversion between the older SOC90 and the more recent SOC2000. Therefore, to avoid the possibility of introducing additional inconsistencies into the analyses, we have excluded Northern Ireland.
The MCS data are collected through a face-to-face interview, conducted in the family's home. Information is collected from main respondents (usually the child's mother) and partner respondents (usually the child's father). We identify and include only natural mothers and natural fathers in our sample, as these are the parents whose details were recorded on the birth record. Registration of a birth is made in person at a Registry Office, and an official registrar records the information. In Scotland, births must be registered within 21 days of the birth, and in England and Wales, births must be registered within 42 days. Births can be registered by either parent if they are married, or by the mother if the parents are unmarried.
The MCS has a complex sample design which should be appropriately represented in statistical analyses (see Plewis et al., 2004). When making descriptive comparisons between the two data sources, we present unadjusted results, as we are interested specifically in comparing the information available for the same families in these two different data sources. When undertaking multivariate analyses, however, we represent the complex survey design. The full unadjusted results of all models are provided in the supplementary materials, and the substantive conclusions generally remain consistent in the adjusted and unadjusted models. In this analysis, we have used the standard weights that are deposited with the data (see Ketende and Jones, 2011) because they provide general and robust adjustments. In other analyses, it might be desirable to construct bespoke weights with the aim of making specialized adjustments.

Occupational Information
Maternal and paternal occupational information is collected in the MCS survey using the following questions. If the respondent is either currently working, has a paid job but is on leave, or has worked in the past but is not currently working, they are asked, 'What is your main job?' The respondents are then asked, 'What do you mainly do in your job?' These questions are asked to all respondents if they have previously stated that they have worked in the past, even if they are not currently working. All parents who have held a job at some point in their lives should report occupational information. The interviewer collects the occupational details and these are recorded as free text within a computer system.
We have gained an understanding of the practical process of how occupational information is collected by registrars in the birth records through email correspondence with the relevant national statistical agencies and through meeting and discussing the data collection process with a registrar. In comparison to the standardized questions used in the survey, registrars do not use a standard set of questions to collect the occupational information. Registrars ask for the mother and father's occupation, in an attempt to collect information on the present or last known occupation. If an individual is unemployed or retired, they are asked for details of their last job. If 'housewife' is given as an occupation,  registrars are told to inform the parent that this is not an occupation in the sense of a profession, employment, business, or calling, and they are encouraged to probe for a previous occupation. The registrars are allowed to enter the term 'housewife' or 'house person' as an occupation if the parent insists. The occupational details which registrars collect are entered as free text into a computer system. For both the survey and the birth records, the occupational information is coded to a standard occupational classification by a third person (i.e. not by the survey interviewer or the registrar). The occupational information collected in the survey were coded to the SOC2000 (Office for National Statistics, 2000) after collection using the Computer Assisted Structured Coding Tool (CASCOT, Elias et al., 1993;Jones, 2004). This tool suggests occupational codes based on the text of a job title, but a coder must decide if this code is suitable and select a more suitable code if one is required.
There is a small element of interpolation involved in the process of coding occupations, but it is largely formulaic. The occupational information on the birth records was also coded to SOC2000 codes using computer-assisted programmes. In some cases, this means that the coder has to adjudicate and decide on the most suitable occupational code for the occupational information available. To date, we are not aware of any results of side-by-side calibration tests of CASCOT and the government occupational coding programmes. Both the survey data coders and the birth records data coders employed verification checks where a proportion of the coding was checked by an additional coder.
In this article we consider the consistency of occupations based on the four-digit SOC2000 codes available in the social survey and the birth records. For most research purposes, detailed occupational codes will be converted into an occupation-based measure (see Connelly et al., 2016). Therefore, we also consider the agreement between the occupations coded to the eight class version of the UK National Statistics Socio-Economic Classification (NS-SEC, Office for National Statistics, 2010). Ideally NS-SEC is produced using standard occupational codes and information on employment status 7 (Rose et al., 2005). Employment status information is collected in the birth records; however, in the data set employment status, information is only available for Scottish births. 8 The Scottish employment status information available is not presented in a standardized form which would permit its use in coding NS-SEC in the officially prescribed manner. Therefore we have coded NS-SEC using only occupational information, by allocating occupations to NS-SEC categories without reference to employment status, which is known as the simplified method (see Rose et al., 2005). To ensure comparability and to maintain clarity, we also use the simplified method when coding NS-SEC from the survey data, although suitable employment status information is available in the MCS. 9

Analysis
Question 1: How consistent are maternal and paternal occupations reported on the survey and in the birth records?

Missing Occupational Information
The percentage of valid and missing occupational information on the birth record for mothers and fathers in our analytical sample is reported in Table 3. Overall 90 per cent of mothers and 73 per cent of fathers had valid SOC2000 codes in the survey. In the birth record, 62 per cent of the mothers and 86 per cent of the fathers had valid SOC2000 codes. In five cases for mothers and 11 cases for fathers, an occupational code was given on the birth record that was not a valid SOC2000 code, we recoded these cases as missing 'other'. In line with the findings of Carucci and Prasad (1979) and Brender et al. (2008), there is a large amount of missing occupational information for mothers on the birth record. Fourteen per cent of fathers had missing occupational information on the birth record in our sample, whereas 38 per cent of mothers had missing occupational information. Of those mother's with missing occupational information, 15 per cent were recorded as undertaking 'full time care of home/relative' and a further 20 per cent were recorded as having 'occupation not stated'. Table 4 shows the percentage of valid occupational information available in both the survey and birth record. In our sample, 68 per cent of families have valid occupational information for fathers on both the birth record and survey, and 61 per cent of families have valid occupational information for mothers on both the birth record and the survey. In the survey, 22 per cent of families only have valid occupational information for their mother, and only 5 per cent only have valid occupational information for their father. In the birth record the situation is reversed, 6 per cent only have valid occupational information for their mother, and 30 per cent only have valid occupational information for their father. In 29 per cent of cases, a valid occupation was reported for the mother in the survey when they had missing occupational information on the birth record; this only occurred in 4 per cent of cases for father's occupational information. This suggests that there may be under-reporting of valid maternal occupations on the birth record. The high degree of missingness for maternal occupational information on the birth record is in line with the findings from the aforementioned studies from the United States (see Carucci and Prasad, 1979;Brender et al., 2008).
There may be a higher degree of paternal missingness on the survey compared with the birth records, as 15 per cent of MCS children were born to parents who were not in a co-residential partnership and non-resident parents did not take part in the survey (Kiernan, 2006). For resident parents, there was also a higher degree of missingness for partner interviews (mainly undertaken by fathers) compared to main interviews (Dex and Joshi, 2004). Father's information may be more likely to be included on the birth record, as this can be used to gain parental rights, and can also be used as evidence of paternity in claims for child maintenance payments. 10 There may also be social stigma attached to not including a child's fathers' details on the birth record (see for example Maldonado, 2011). Overall, there are far stronger incentives for a father's details to be entered on a child's birth record, than for a father to take part in the MCS survey.

Agreement between the Survey and the Birth Records
We now investigate the agreement between the SOC2000 codes reported in the birth records and the   ) survey. Overall 36 per cent of maternal occupational information and 37 per cent of paternal occupational information is the same in the two data sources (Table 5).
When we consider only those cases where valid occupational information is available in both data sources, 59 per cent of maternal occupations match and 54 per cent of paternal occupations match ( Table 6). The per cent agreement between sources is a measure of consistency. We also present estimates of Cohen's Kappa (Table 7), a measure of inter-rater reliability (Cohen, 1960). Although interpretations of the magnitude of Kappa should be treated with caution (see Bakeman et al., 1997), Landis and Koch (1977) suggest that kappa values over 0.61 should be considered as substantial, and Fleiss et al. (2013) suggest that values over 0.75 should be considered as excellent. Table 7 shows the Kappa statistic for agreement between SOC2000 codes and NS-SEC. The Kappa values are calculated for those cases without missing occupational information, and show a moderate, but not overwhelming level of reliability between sources.

Error in Practice and Error in Principle
There are disagreements between the SOC2000 codes reported in the birth records and the survey, and we theorize that these disagreements take two forms. The first we term 'error in principle', and the second we term 'error in practice'. An error in principle occurs when the SOC2000 codes do not match but this does not impact the position of the individual when the occupation is coded to a socioeconomic measure (e.g. NS-SEC). For example, a secondary school teacher (SOC2314) who is recorded as a primary school teacher (SOC2315) would have a different SOC2000 code but both occupational codes would be included in NS-SEC 2 (lower managerial, administrative, and professional occupations). In 'principle', this is an error but in 'practice' it would have no effect in an analysis that used NS-SEC as an explanatory variable.
By contrast, an error in practice occurs when the SOC2000 codes do not match and also lead to a discrepancy in the socio-economic position which would be allocated to an individual. For example, a dispensing optician (SOC3216) who is recorded as an ophthalmic optician (SOC2214) would be coded to NS-SEC 2 (lower managerial, administrative, and professional occupations) instead of NS-SEC 3 (intermediate occupations). In 'practice', this disagreement could have an effect on an analysis that used NS-SEC as an explanatory variable. We reiterate that our analysis allows us to consider the consistency between the two data sources and not whether either data set is error free; however, the consideration of 'error in principle' and 'error in practice' provides additional insight into the nature of the disagreement between the survey and administrative data.
The degree of 'error in practice' and 'error in principle' will depend on the occupation-based measure that is derived from the detailed occupational codes (e.g.    . For example, there would be less 'error in practice' if using a three-category social class scheme (e.g. the three-class version of NS-SEC) compared with the more common eight-category NS-SEC scheme. When SOC2000 is used to construct finer-grained measures such as a social stratification scales (e.g. CAMSIS or SIOPS, see Treiman, 1977;Prandy, 1999), then it is likely that more 'errors in practice' will occur than when a categorical socio-economic measure with a limited number categories is derived. We demonstrate these two forms of 'error' using the eight-class version of the NS-SEC. Table 8 demonstrates the degree of 'error in practice' and 'error in principle' for cases where there is a valid SOC2000 code on both the survey and birth record. An 'error in practice' occurs for 60 per cent of cases where mothers' occupations do not match, and 71 per cent of cases where fathers' occupations do not match. If we consider all cases with valid SOC2000 codes on the survey and birth record regardless of whether they match, we can determine a total rate of 'error in practice'. There is a total 'error in practice' rate of 25 per cent for mothers and 33 per cent for fathers. These 'error in practice' rates indicate that there is a notable, and consequential, degree of disagreement between these two data sources.
Tables 9 and 10 show the cross-tabulation of the two NS-SEC measures, one coded using occupations on the birth record and the other from the survey (for mothers and fathers). The shaded cells show the percentage of mothers or fathers who would be coded to the same NS-SEC category in both data sources. For mothers, only 53 per cent of those identified as belonging to NS-SEC 5 using the birth records, for example, were also coded to this category using the survey data. Twenty-two per cent of these mothers were coded to NS-SEC 6 using the survey data. For fathers, 61 per cent of those coded to NS-SEC 3 using the birth records, for example, were also coded to this category using the survey data. Using the survey data, 12 per cent of these fathers would be in NS-SEC 2.
Question 2: Are parental characteristics associated with patterns of agreement and missingness of occupational information on the survey and the birth records?
We now investigate what factors are associated with patterns of consistency and missingness in the   (100) Note: NS-SEC is coded using the simplified method. The base n is the number of cases where there is a valid SOC2000 on the survey and the birth record (total n " 9,168). occupational information between the two data sources. In the first stage of this analysis, we estimate a series of logistic regression models to investigate missingness; then we estimate multinomial logistic regression models to investigate different patterns of missingness and agreement in the two data sources. In the regression analyses, we use additional information about the parents taken from the MCS survey 11 as our explanatory variables (see Table 11).
Estimating regression models to investigate the extent to which missingness and agreement are associated with socio-demographic factors have proved to be effective (see for example Plewis, 2007). When analysing administrative social science data, there are often a limited number of explanatory variables, however, which could be used in techniques such as multiple imputation . Through this enquiry we seek to deepen our understanding of the nature of occupational data available on UK birth records by assessing the extent to which the patterns of missingness in these data are associated with parental characteristics, and therefore to identify any key biases in the data source. Previous studies of non-response in social surveys have documented that those who do not respond are likely to be younger, less educated, and from ethnic minorities (Dex et al., 2008). In previous studies investigating the agreement of occupations on birth records and an interview in the United States, younger mothers (<25 years), mothers with lower levels of education, and those of Black ethnicity were more likely to have mismatched occupations between data sources (Brender et al., 2008).
The results of the logistic regression models of missingness are summarized in Table 12. Separate models are estimated for mothers, fathers, the survey, and the birth record. In relation to age, families with older mothers are less likely to have missing information on the survey and the birth record for both mothers and fathers. It is plausible that younger mothers and fathers  (100) Note: NS-SEC is coded using the simplified method. The base n is the number of cases where there is a valid SOC2000 on the survey and the birth record (total n " 10,241). Note: This sample is formed of cases which contain complete information on the four additional variables. In this sample, 185 cases are dropped, as they contain missing information on two variables, and one case is dropped, as it contains missing information on three variables. The final analytical sample for the regression analyses is 14,827. The data are adjusted to reflect the MCS survey design. are more likely to not have a prior occupation. There are complex patterns of association between mother's ethnicity and missingness. For mothers' occupational information on the birth record, cases with mothers from Pakistani and Bangladeshi backgrounds are more likely to be missing than other groups; however, this pattern is less clear for the survey data. Again, for fathers there are no clear relationships with ethnicity, although cases with mothers from Black African and Black Caribbean backgrounds are more likely to have missing fathers' occupational information on the survey and the birth record than those from White backgrounds. In terms of education, for mothers' occupational information, there is a fairly clear pattern of parents with higher education levels being less likely to be missing on both the survey and birth record. There is a clear pattern of missingness related to education level for fathers' occupations. Overall there are no notable differences in the patterns of occupational information missingness on the survey and the birth records, and there are clear patterns of non-random missingness in both data sources.
Tables 13 and 14 show the multinomial logistic regression models for mothers and fathers, respectively. The distribution of mothers and fathers in the five outcome categories is summarized in Table 5. Only 1 per cent of mothers (n " 97) are in the 'missing on survey only' category; this estimate should therefore be treated with caution. Families with older mothers and those from the White ethnic group are less likely to have any combination of missing occupational information for mothers or fathers, compared to having matching occupational information. There are less clear patterns for the association with education level. There is no obvious consistent pattern of association between the variables considered here and whether occupations match or do not match on the birth record and survey, for either mothers or fathers.

Genuine Occupational Change
A weakness of the comparison between these two data sources is that the survey interview occurred around 9 months after the birth registration. Therefore, it is possible that the mothers and fathers genuinely changed occupations between these two data collections. In the multinomial logistic regression models presented in Tables 13 and 14, we include the child's age at the survey interview in days, to provide a measure of the length of time between the birth of the child and the survey interview. The mean age of the child at the time of the survey interview was 295 days (s.d. " 14), around 9.5 months of age (min " 244 days, max " 382 days). We have no details of when, within the 21-day (for Scotland) or 42-day (for England and Wales) stipulated period, the child's birth was registered. The child's age at the survey interview represents the maximum possible time difference between the recording of the parents' occupations on the birth record and the recording of their occupation in the survey. Including the age of the child at interview in the regression models allows us to investigate whether there is more change in occupations observed when more time has passed between the child's birth and the social survey interview. The multinomial logistic regressions indicate that there were only very small effects for the age of the child at interview on the likelihood of a mismatch between the data resources. We investigate this issue further by comparing the degree of occupational mismatch observed between the birth records and the survey, with the degree of occupational change that might be expected for adults in this stage of the life course. Longhi and Brynin (2010) investigate the degree of occupational change over a year in the population using the British Household Panel Survey (BHPS). Taking into account changes in occupational codes and also reported changes in jobs, Longhi and Brynin find that around 8.1 per cent of men and women of working age change occupations in a year. We duplicate Longhi and Brynin's methodology using the BHPS data (SN5151, Institute for Social and Economic Research, 2010). We examine changes in jobs and occupations between Waves 10 and 11 of the BHPS which coincide with the period of the birth of the MCS cohort members. We look at the occupational change of women and men within 2 standard deviations of the mean age of the MCS mothers and fathers, and also the occupational change of only those woman and men who had a baby between sweeps 10 and 11 of the survey (see Table 15). For those men within the same age range as the MCS parents, the change in occupations is approximately 4 per cent. For those who had a baby, less than 1 per cent of mothers changed occupations, whereas a greater percentage of men (9 per cent) changed occupations over this period (n.b. sample sizes become very small for this subsample and should be treated with suitable caution). The amount of change found here is far less than the degree of discrepancy observed between the occupations reported on birth records and the survey (see Table 6). It is unlikely that the high level of disagreement between the two data sources is due to genuine changes in occupations over the 9-month period between the registration of the birth and survey interview.  Note: Based on the methodology described in Longhi and Brynin (2010) which defined occupational change as a change in occupational code and also a change in job between survey sweeps. The data are adjusted to reflect the BHPS survey design.
Question 3: What potential impact do disagreements have on empirical sociological analyses?
Socio-economic inequalities in test scores are strong and well reported, and children from less advantaged groups perform less well on these tests (see for example Feinstein, 2003;Sullivan et al., 2013;Dickerson and Popli, 2016). Performance on cognitive tests in childhood is important because it is widely found to be associated with later educational attainment, and with occupational positions in adulthood (see Mascie-Taylor and Gibson, 1978;Jencks, 1979;Jensen, 1998;MacKintosh, 1998;Tittle and Rotolo, 2000;Sternberg et al., 2001;Bartels et al., 2002;Nettle, 2003;Schmidt and Hunter, 2004;Deary et al., 2007;Connelly, 2012). We conduct a concise sensitivity analysis to compare the substantive conclusions which would be drawn in an analysis of social class inequalities in cognitive test scores if parental occupation-based measures were used from the survey or the birth records.
We consider two cognitive tests taken at different sweeps of the MCS. At age 5, we use the 'Naming Vocabulary' test, and at age 11, we use the 'Verbal Similarities' test. These are both subscales of the British Ability Scales, second edition (Elliott et al., 1996). We use standardized test scores that are adjusted for the child's age, and the range of items which they have completed (see Connelly, 2013).
We run eight separate ordinary least squares (OLS) models with the cognitive test at age 5 or 11 years as the outcome. Each model contains an NS-SEC measure based on either the mother or father's occupational information derived from either the survey or the birth record. The models also control for the child's gender. To allow for comparison, the analytical sample comprises those sample members who completed the cognitive test and have occupational information available in both the survey and birth record. Due to attrition, the MCS sample size decreases between the first sweep and the third (age 5) and fifth (age 11) sweeps of the survey (see Platt, 2014). The final analytical sample sizes for the models are 8552 and 7600 for models comparing fathers' and mothers' measures, respectively, at age 5, and 7614 and 6710 for models comparing fathers' and mothers' measures, respectively, age 11. The coefficients and 95 per cent quasi-variancebased comparison intervals for the NS-SEC variables are presented in Figures 1 and 2 (full models are available in the supplementary materials). This sensitivity analysis indicates that the same substantive conclusions would be reached for these samples irrespective of whether the occupations reported in the birth records or survey are used. This is an encouraging finding; however, it should be noted that these models compare only those cohort members with mothers' and fathers' information available in both data resources and do not take into account the other patterns of missing data described above. Figure 1. OLS regression models of naming vocabulary test scores at age 5. Note: UK Millennium Cohort Study (SN4683 & SN5614). Models also contain gender. Models are adjusted for survey design. Models are run separately with mothers' and fathers' variables and include families with valid occupational information available for both the survey and birth record, n (fathers) " 8522, n (mothers) " 7600.

Conclusions
The new forms of administrative social science data that are emerging are likely to increase the scope and scale of empirical sociological inquiries. New infrastructural resources in the UK aim to be instrumental in improving access to administrative social science data. New sources of administrative social science data are emerging rapidly. In countries like the UK administrative data are haphazard and there are no national population registers against which to organize data. The empirical work undertaken in this article is original because it assesses the consistency of administrative birth records data using survey data collected from the same individuals. Measures derived from information on parental occupations are central to a wide spectrum of sociological analyses, and the occupational information on UK birth records will provide a central measure of social origins for sociological research on inequalities. A clear message from this work is that there are inconsistencies in the occupations reported in the birth records when compared with the information collected by professional interviewers shortly afterwards in a social survey. These findings are similar to US studies which have also examined occupational information on administrative birth records (see Carucci and Prasad, 1979;Brender et al., 2008). This finding warns against the naïve or uncritical use of UK birth records data for sociological research.
It is fortuitous that data were available from the birth records for the participants in the MCS. Ordinarily researchers using administrative social science data will not have access to data that act as a comparative source. In these circumstances, researchers might reasonably be concerned about the quality of the administrative data. In one illustrative empirical example, we have shown that the inconsistencies in the birth records data have no appreciable influence on substantive conclusions. We strongly assert that this is not a necessarily general finding and must not be assumed a priori.
We advocate that further research is undertaken into the consistency and accuracy of UK birth records data. One potential strategy would be to compare parental occupational information within the birth records with official data collected for taxation (in the case of the UK, National Insurance information might provide a potential benchmark). Another potential strategy would be to compare parental occupational information on birth records with data collected from parents in a large-scale longitudinal study (for example, the UK Household Longitudinal Study). These analyses should also be extended to examine the quality of other sources of administrative social science data. A final comment is that in the changing climate of administrative social science data analysis, organizations engaged in collecting and curating information should be encouraged to place more emphasis on providing researchers with clear Figure 2. OLS regression models of naming verbal similarities test scores age 11. Note: UK Millennium Cohort Study (SN4683 & SN5614). Models also contain gender. Models are adjusted for survey design. Models are run separately with mothers' and fathers' variables and include families with valid occupational information available for both the survey and birth record, n (fathers) " 7614, n (mothers) " 6710.
information on the provenance of the data that they collect.

Notes
1 See: www.adrn.ac.uk. 2 Despite various official data collection and registration exercises that parents routinely have to engage with (for example, relating to children's health and enrolment at school), there is no single organized national activity that collects detailed information on parental occupations. The UK does not have national registers, identification numbers, or identification cards. 3 Northern Ireland is excluded from this analysis, as the Northern Irish data available to us were stored in an older standardized occupational classification than the other territories. More details of this analytical decision are provided in later sections of the article. 4 Standardized occupational codes organize jobrelated information (e.g. job titles) into a list of occupations. Examples include the UK Standard Occupational Classification (SOC) and the International Labour Organization's International Standard Classification of Occupations (ISCO). 5 The proportion of women misclassified as housewives is presented by occupation in this article. Of those mothers who report having a job in the interview, the per cent misclassified as housewives on the birth record varies from 0 per cent for those mothers working as 'health diagnosing and treating practitioners' (n " 14 in interview) to 65 per cent of those working in 'food preparation and serving occupations' (n " 14 in interview). 6 There was an overall achieved response rate of 68 per cent in the UK Millennium Cohort Study (Dex and Joshi, 2004). 7 Employment status defined whether an individual is an employer, self-employed, or employee; whether a supervisor; and the number of employees at their workplace. For more details of this measure see here: http://webarchive.nationa larchives.gov.uk/20160105160709/http://www.ons. gov.uk/ons/guide-method/classifications/current-stan dard-classifications/soc2010/soc2010-volume-3-ns-se c-rebased-on-soc2010-user-manual/index.html. 8 The data deposited in the UK Data Archive and available to us as researchers are a sub-set of all of the data which are collected in administrative birth records.
9 To investigate the differences in NS-SEC classification when the full (i.e. with employment status) and simplified (i.e. without employment status) derivation methods are used, we have coded the MCS mothers' and fathers' occupational information using both methods. For mothers, there was 86 per cent agreement between the two measures (K " 0.83, r " 0.96, P " 0.001). For fathers, there was 78 per cent agreement between the two measures (K " 0.75, r " 0.92, P " 0.001). 10 For Scotland see: https://www.citizensadvice.org. uk/scotland/relationships/birth-certificates-andchanging-your-name-s/birth-certificates-s/ [ac cessed 01/06/2016]. For England and Wales see: http://www.oneplusone.org.uk/content_topic/mar ried-or-not/children/ [accessed 01/06/2016]. 11 We use mother's information only for the age and ethnicity variables to reduce the amount of missingness.