Socioeconomic position, bronchiolitis and asthma in children: counterfactual disparity measures from a national birth cohort study

Abstract Background The debated link between severe respiratory syncytial virus (RSV) infection in early life and asthma has yet to be investigated within a social inequity lens. We estimated the magnitude of socioeconomic disparity in childhood asthma which would remain if no child were admitted to hospital for bronchiolitis, commonly due to RSV, during infancy. Methods The cohort, constructed from national administrative health datasets, comprised 83853 children born in Scotland between 1 January 2007 and 31 June 2008. Scottish Index for Multiple Deprivation (SIMD) was used to capture socioeconomic position. Emergency admissions for bronchiolitis before age 1 year were identified from hospital records. Yearly indicators of asthma/wheeze from ages 2 to 9 years were created using dispensing data and hospital admission records. Results Using latent class growth analysis, we identified four trajectories of asthma/wheeze: early-transient (2.2% of the cohort), early-persistent (2.0%), intermediate-onset (1.8%) and no asthma/wheeze (94.0%). The estimated marginal risks of chronic asthma (combining early-persistent and intermediate-onset groups) varied by SIMD, with risk differences for the medium and high deprivation groups, relative to the low deprivation group, of 7.0% (95% confidence interval: 3.7–10.3) and 13.0% (9.6–16.4), respectively. Using counterfactual disparity measures, we estimated that the elimination of bronchiolitis requiring hospital admission could reduce these risk differences by 21.2% (4.9–37.5) and 17.9% (10.4–25.4), respectively. Conclusions The majority of disparity in chronic asthma prevalence by deprivation level remains unexplained. Our paper offers a guide to using causal inference methods to study other plausible pathways to inequities in asthma using complex, linked administrative data.


Introduction
Asthma is the most common chronic respiratory condition in children worldwide. 1 In 2018, an estimated 8% of children younger than 15 years in Scotland had a current doctor diagnosis of asthma. 2 Severe symptoms of respiratory syncytial virus (RSV) infection in infancy, including hospital admissions due to bronchiolitis, have been frequently associated with an increased risk of wheeze and asthma in later childhood, 3 offering a possible pathway for early intervention. However, it remains unclear whether this association represents a true causal mechanism (i.e. early infection impairs pulmonary function, thereby directly influencing the development of asthma) or it is due to other shared influences, such as environmental exposures and/or genetic disposition to respiratory ill health. [3][4][5] The research to date has not looked at this question within a social inequity lens, despite socioeconomic deprivation being a prevailing facet of both conditions. 6,7 Identifying the extent to which severe bronchiolitis (i.e. that which requires hospital admission) may mediate the pathway between socioeconomic deprivation and asthma will offer guidance on the planning and potential impact of future preventive measures, including RSV vaccines and extended halflife monoclonal antibodies, which will become available within the next 5 years. 8,9 One challenge for studies of the link between RSV infection and asthma is that 'asthma' is not one condition but a heterogeneous group of respiratory disorders with multiple aetiologies. 3 In the early years in particular, recurrent symptoms of wheeze are common and are not necessarily an indicator of chronic asthma. 10 To better investigate the heterogeneous entities (and aetiologies) that likely comprise asthma, 11 statistical models to identify typical subgroups of asthma symptoms have been applied to longitudinal cohort studies. [12][13][14] However, a 2020 systematic review and meta-analysis of childhood wheeze trajectory-specific risk factors identified high risk of information bias in all of the 13 included cohort studies. 14 Self-reports, rather than clinical validation of symptoms, limited sample sizes, and loss to follow-up are noted problems that lead to low power to detect associations with risk factors. Overcoming many of these previous shortcomings, Sbihi et al. used administrative health data to identify different asthma trajectories in a sample of more than 65 000 children in British Columbia. 15 Results of Sbihi et al.'s study demonstrate the unique opportunity administrative data offer to study asthma trajectories, overcoming logistic and financial constraints and self-report biases inherent in purposefully designed longitudinal cohort studies based on questionnaires.
We aimed to estimate how prevention of severe infant bronchiolitis might reduce the socioeconomic patterning of wheeze/asthma trajectories through childhood. First, we modelled typical trajectories of asthma among children from age 2 to 9 years, inclusive. Second, having observed socioeconomic disparities in prevalence of these trajectories, we established the extent to which these disparities would remain if hospital admissions for bronchiolitis during infancy could be prevented. We used national, linked administrative data from children born in Scotland to meet our aims.

Data sources
Several Scotland-wide administrative databases, provided by Public Health Scotland (PHS), comprised the longitudinal dataset for this study. National Records of Scotland birth registrations (the spine of the cohort) were supplemented with the Scottish Birth Record, an electronic record of neonatal care, and the mother's delivery record (Scottish Morbidity Record 02; SMR02). [16][17][18] Subsequent hospital admissions by cohort members were retrieved from SMR01, a dataset of all inpatients and day cases discharged from hospitals in Scotland. 18 Information about deaths was added from National Records of Scotland death registrations. 19 Data were also extracted from the Scottish national Prescribing Information System, which contains information on medicines dispensed in a community setting. 20 PHS linked records belonging to the same child across databases using their unique Community Health Index (CHI) number. 21 The CHI database includes information on deregistrations from Scottish general practices, which was used to estimate emigration. 22 We received these datasets from PHS with a unique pseudo-anonymized identifier for each child and mother, and with patients names, CHI numbers and full addresses removed. Child and maternal datasets were linked by PHS using deterministic and probabilistic methods described elsewhere. 23

Study population
The initial study population included all live births between 1 January 2007 and 31 June 2008 in Scotland. Follow-up began at birth and continued until the 10th birthday. We excluded children who died or emigrated out of Scotland by age 2, to ensure that all children in the cohort could have had a recording of asthma/wheeze on at least one time point. Children born to non-resident mothers (as recorded on birth registrations) were excluded, to prevent potential systematic loss to follow-up. One child from each non-singleton birth was randomly selected to be included in the study.

Exposure, mediator and outcome measures
We used the Scottish Index for Multiple Deprivation (SIMD) 2006 to proxy individual-level socioeconomic position in this study. 24 We used SIMD version 2006 as it was closest in date to cohort members' years of birth. SIMD is a relative measure based on data zone-level (small areas of 500-1000 residents) deprivation across seven domains: income, employment, health, education, access to services, crime and housing. 24 SIMD deciles, based on residential address at birth, were retrieved from birth registration files and supplemented from Scottish Birth Record/ SMR02 where missing. SIMD deciles were split into groups for analyses indicating high (top 30% rank of SIMD scores), medium (middle 40% rank) and low (bottom 30%) socioeconomic deprivation.
The mediator was defined as having !1 hospital admission with a primary or secondary diagnosis of bronchiolitis during the first year of life, identified in SMR01 records by the International Classification of Diseases 10th Revision (ICD-10) code J21 for acute bronchiolitis. We included all J21 acute bronchiolitis subgroups in the mediator (J21.0 due to RSV, J21.1 due to human metapneumovirus, J21.8 due to other specified organisms, J21.9 unspecified) because specific bronchiolitis diagnoses are poorly coded in Scottish hospital admission data. 25 It is estimated that almost 80% of admissions with a primary diagnoses of bronchiolitis among infants can be attributed to RSV. 26 The outcome consisted of trajectory groups of asthma/ wheeze (see Supplementary Material Part 1, available as Supplementary data at IJE online for an explanation of this term), determined through the latent growth curve modelling described below. To model trajectories, we first defined instances of asthma/wheeze at specific time points.
For each year of age between 2 and 9 inclusive, children were defined as having asthma and/or wheeze if they had: !1 hospital admission in SMR01 with a main diagnosis, as defined by ICD-10 codes J45 for asthma, J46X for status asthmaticus or R06.2 for wheezing, and/or !4 dispensed prescriptions for any specified asthma medication (see Supplementary Table S1, available as Supplementary data at IJE online).

Covariates
Identification of the disparity measures described below invokes the assumption of no unmeasured confounding of the mediator-outcome relationship. We used an evidence synthesis approach, as proposed by Ferguson et al., 27 to construct our directed acyclic graph (DAG). See Supplementary Material Part 2 (available as Supplementary data at IJE online) for implementation of this approach. A simplified version of the derived DAG is shown in Figure 1 where we list the identified confounding variables, highlighting those not available in our study. Table 1 and Supplementary Material Part 3 (available as Supplementary data at IJE online) outline the included confounding variables.
We also included additional variables related to missingness in confounders to be used in the imputation models outlined below: National Statistics Socio-Economic Classification (an occupation-based measure of social class), child's postnatal intensive care stay, preterm-related complications and birth hospitals with high levels of missing values indicator (full description in Supplementary Material Part 3).

Statistical analysis
First, we described cohort characteristics and missing data. The association between variables and the probability of missingness in at least one study variable was explored using multivariable logistic regression. There was a relatively high level of missing data across the cohort (10.3% of children had !1 missing value) and, prominently, we found that missingness was associated with observed data. Therefore, we imputed the missing values assuming missingness was at random, conditional on the covariates listed above (as well as the confounders, exposure, mediator and outcome). 28 We used latent class growth analysis (LCGA), a statistical approach to detecting classes of individuals with a similar developmental trajectory, to model asthma/wheeze groups. 29 Trajectory groups are not necessarily true phenotypic groups, but represent descriptions of the variation in individual trajectories, where the prevalence of each group is based on estimated posterior probabilities. 30 The dichotomous outcomes of asthma/wheeze presence between ages 2 and 9 years were modelled using a mixture of logistic distributions, with age as the only explanatory variable. Based on previous research, 14 we fitted LCGA models with three groups initially and added more groups in a stepwise manner to select the best fitting model (using criteria described in Supplementary Material Part 4, available as Supplementary data at IJE online). Due to the low frequencies of some of the identified asthma/wheeze groups, risk estimates and comparative disparity measures were carried out after recoding these groups into 'chronic asthma' versus 'no/non-chronic asthma' (further details in Results section). This classification is justified by the clinical difference between the non-chronic and chronic trajectories, whereby the former are most likely linked to early wheeze than asthma. 10 Disparities in chronic asthma by SIMD group (per 1000 children) were defined as marginal risk differences and estimated by inverse probability weighting of a binomial regression model with weights expressed as function of year of birth, maternal country of birth and area of residence. To capture the proportion of socioeconomic disparity in asthma that would remain if hospitalization for bronchiolitis were prevented, we used counterfactual disparity measures (CDM). 31 By setting the mediator to a predefined value (m ¼ 0 denoting absence of hospitalization for bronchiolitis), CDM captures the disparity in the outcome risk due to the exposure that would remain if the mediator were intervened upon and set to m, without intervening on the exposure. The formal definition and key assumptions to this approach are  We used inverse probability weighting of marginal structural models to estimate CDM(m ¼ 0), 32 with weights expressed as a function of the confounders detailed in Table 1. Results are displayed as the remaining risk differences in chronic asthma between high and medium SIMD versus low SIMD (per 1000 children) if no child had a hospital admission for bronchiolitis during infancy. Using the estimated marginal risk differences in chronic asthma, we also derived the proportion of disparity reduction attributable to elimination of bronchiolitis admissions as: Disparity reduction (%) was estimated as ¼ (risk difference-CDM(m ¼ 0))/ risk difference Â 100. To calculate confidence intervals for the estimates of CDM(m ¼ 0), risk difference and proportion of disparity eliminated, while addressing the presence of missing data in our dataset, we used single stochastic imputation using chained equations with 10 burn-in iterations as outlined by Micali et al. 33,34 Imputation was directly followed by the CDM analysis, and both processes were repeated on 1000 bootstrap samples to estimate standard errors. To meet the missing at random assumption, we conditioned on all covariates from the substantive model, alongside the additional variables listed above (further details in Supplementary Material Part 5).
We used Mplus version 8.2 to implement LCGA, Stata 15 for descriptive data analysis and mediation analysis. 35,36 The code for LCGA and mediation analyses can be found at [https://github.com/UCL-CHIG/SEPbronchiolitis-asthma-study]. DAGitty was used to determine minimally sufficient adjustment sets for the identification of CDMs. 37

Cohort characteristics
Of the 83 853 children in the cohort after exclusions were applied ( Figure 2, Table 2), 48.9% were female and 40.3% were resident in Edinburgh or Glasgow at birth. Of these, 5140 infants were born before 37 weeks' gestation and 7442 were classified as small for gestational age (6.5% and 9.4% of those with available data, respectively). A total of 20 095 infants (25.3%) with a delivery method recorded were born via a caesarean section, and 16 600 mothers reported smoking traditional cigarettes during their pregnancy (21.9% of those with available data). Further, 3.2% of children (2675) were admitted to hospital at least once with a diagnosis of bronchiolitis in the first year of life and 9.3% (7819) met the study definition for asthma/wheeze at least once during follow-up. The mother's delivery record linked for 79 500 (94.8%) of children, and the mean follow-up time from birth per child was 8.84 years (standard deviation 0.92). Of the cohort children, 2325 (2.8%) either emigrated or died by their ninth birthday (Supplementary Table S4, available as Supplementary data at IJE online).
Of cohort members, 8632 (10.3%) had missing data for at least one variable. Missingness was driven by infants without linked maternal records (4353, 5.2%, of the cohort) and missing data on smoking status, which was missing for an additional 3807 of the infants who were linked to mother's records (9.7% missing smoking status in total). As shown in Table 2, missingness in any variable was more common in children with: postal areas in South Scotland, particularly Glasgow; earlier birth years; births between  March and May; mothers born outside the UK; and intensive care stay after birth.

Growth modelling
The LCGA model that identified four groups and a cubic polynomial function for the association with age were selected as the most suitable overall according to fit indices (see Supplementary Table S5, available Figure 3. The groups can be summarized as: no asthma/wheeze, 94.0% of the cohort had none or (for a minority of the group) one instance of asthma/wheeze from ages 2 to 9 years; earlytransient, 2.2% of the cohort had asthma/wheeze that had begun by age 2, had a peak prevalence at age 3 and dissipated by age 7; intermediate-onset, 2.0% of the cohort had asthma/wheeze that had begun by age 4 and had a peak prevalence at age 8, a year before follow-up ended; and early-persistent, 1.8% of the cohort had asthma/wheeze that had begun by age 2, had a peak prevalence at age 6, with ongoing symptoms at the end of follow-up. The distributions of all relevant variables by asthma/wheeze trajectories (Table 3)

Disparity analysis
We reclassified cohort members assigned to the earlypersistent or intermediate-onset asthma/wheeze groups as 'chronic asthma' (3.8% of the cohort), and those assigned to no or early-transient asthma/wheeze as 'non-chronic asthma' (96.2%). The chronic asthma group experienced greater frequencies of high socioeconomic deprivation (41.9% vs 35.7%) and hospital admissions for bronchiolitis (7.2% vs 3.0%).
As shown in Table 4, the associational marginal risk of chronic asthma by age 9 is estimated as 30.8 per 1000 children (95% CI 28.

Discussion
Using administrative data, this study was able to follow up more than 80 000 children until their 10th birthday to identify asthma/wheeze trajectories with minimal selection bias. We identified four different trajectories: early-transient Figure 3 Estimated probabilities (and 95% confidence intervals) of experiencing asthma/wheeze at each time point for latent trajectory groups, derived from the posterior probability distributions of asthma/wheeze prevalence, estimated for the four-class latent class growth analysis model with cubic growth for age. The 'no asthma/wheeze' group has a consistent near 0 probability of reporting asthma/wheeze We used a range of clinical definitions to describe asthma/wheeze in our study with input from a clinician, in contrast to self-or parent-reported measures used elsewhere. 14 This approach, paired with a cautious methodology to define symptoms, means there are likely to be fewer false-positives captured in this study. 38 On the other hand, some children (with milder symptoms) will likely have been misclassified as not having had either bronchiolitis or asthma/wheeze in this study. Presentation and admission to hospital can be influenced by factors other than the severity of illness, including the availability of primary care, affordability of transport and child care, clinical decision making and (for paediatric patients) parental expectations. [39][40][41] The social patterning of these influences may have introduced bias into the relationship between socioeconomic deprivation and bronchiolitis, although the direction of the bias needs to be determined. The addition of primary care and emergency department records in future administrative data studies may improve identification of all relevant cases of bronchiolitis and asthma/wheeze. Other studies have included measures of atopy in modelling of asthma trajectories. 12,42 However, overlap between medication used to treat asthma and atopic conditions meant that the current dataset would not have allowed for this nuanced differentiation. In addition, although we used predetermined criteria to determine trajectories, these methods still have a degree of subjectivity. 12 Although the class sizes were proportionally smaller than those found in other studies, there are similarities in the shape and relative size of asthma/wheeze trajectory groups modelled using other datasets. 13,15 We had to take several pragmatic steps in this study, based on constraints of the methods and datasets used. We used an area-level measure of socioeconomic deprivation as a proxy for an individual-level indicator, potentially leading to an underestimation of the true individual-level socioeconomic effects. 43 We encountered missing data, which was likely due to variation in recording practices by hospital, over time and by clinical need, but we dealt with this bias using imputation. We restricted the sample to children who were alive at the start of asthma/wheeze measurement, meaning that the findings are contingent on survival until the age of 2. This may have had the effect of biasing the CDM, most likely by underestimating it since the risk of mortality is higher among children with underlying illnesses and from the most socioeconomically deprived backgrounds. 44 In addition, we were unable to separately examine the four asthma/wheeze trajectory groups in the CDM analysis because we encountered problems of stability of the estimates due to the small numbers in some subgroups. In the future, cohort studies from linked Scottish administrative data with births spanning several years will enable longer follow-up periods and a greater sample size for more nuanced analyses.
Two systematic reviews of the evidence published in 2020 commented on the high risk of confounding bias in observational studies looking at the relationship between RSV and subsequent chronic wheezing. 5, 45 We have used causal inference methods for a very targeted estimand, the CDM, with explicit discussions of the assumptions invoked to obtain estimates using our observational study. The estimation of CDM calls for a hypothetical intervention that Risk and absolute risk difference after accounting for differences in year of birth, maternal country of birth and area of residence between groups using inverse probability¼weighting. b 95% CIs calculated using the bootstrap with 1000 replications. c Disparity reduction (%) was estimated as ¼ (risk difference-CDM(m ¼ 0))/risk difference Â 100. reduces hospitalization due to bronchiolitis for all children to levels among those with low socioeconomic deprivation. This could be achieved for example by greater prevention efforts directed at children in poorer areas, for example ensuring high uptake of future RSV vaccines or monoclonal antibodies. There were several confounders of the mediator-outcome association identified in previous research which could not be included in this study. However, this uncontrolled confounding effect may be partially captured by the other variables that were included in the study. For example, gestational diabetes, pre-eclampsia and breech presentation birth are thought to, at least partially, influence offspring respiratory outcomes through delivery method and preterm birth. 46,47 In addition, the extent of the residual confounding induced by these factors may have been moderated by controlling for socioeconomic position.

Conclusion
This is the first time the pathways between socioeconomic position, bronchiolitis admissions and asthma have been explored using counterfactual methods. We estimate that about 20% of the disparity between socioeconomic groups may be attributable to an admission to hospital with bronchiolitis during infancy. Moreover, our work also highlights that at least 80% of the association between socioeconomic position and chronic asthma cannot be explained by hospital admission for bronchiolitis in infancy. This underscores the need to further investigate the causes of inequities in bronchiolitis admissions and wheeze/asthma. In the future, studies using administrative data could be enhanced by linkage to other datasets, for example the Department for Environments Food and Rural Affairs' modelled background pollution data or prospectively designed cohort studies, 48,49 which will enable more risk factors to be measured and further eliminate bias from calculations. This paper offers a guide to implement causal inference methods to carry out further counterfactual disparities analysis using these complex, linked health datasets.

Ethics approval
The use of linked administrative health data from Scotland was approved by the Public Benefit and Privacy Panel, reference 1617-0224, and the South East Scotland Research Ethics Committee 02, reference 18/SS/0117.

Data availability
This work uses data provided by patients and collected by the National Health Service (NHS) as part of their care and support. Authors do not have permission to share patient-level data. Data are available from Public Health Scotland [phs.edris@phs.scot] for researchers who meet the criteria for access to confidential data.