Defining vulnerability subgroups among pregnant women using pre-pregnancy information: a latent class analysis

Abstract Background Early detection of vulnerability during or before pregnancy can contribute to optimizing the first 1000 days, a crucial period for children’s development and health. We aimed to identify classes of vulnerability among pregnant women in the Netherlands using pre-pregnancy data on a wide range of social risk and protective factors, and validate these classes against the risk of adverse outcomes. Methods We conducted a latent class analysis based on 42 variables derived from nationwide observational data sources and self-reported data. Variables included individual, socioeconomic, lifestyle, psychosocial and household characteristics, self-reported health, healthcare utilization, life-events and living conditions. We compared classes in relation to adverse outcomes using logistic regression analyses. Results In the study population of 4172 women, we identified five latent classes. The largest ‘healthy and socioeconomically stable’-class [n = 2040 (48.9%)] mostly shared protective factors, such as paid work and positively perceived health. The classes ‘high care utilization’ [n = 485 (11.6%)], ‘socioeconomic vulnerability’ [n = 395 (9.5%)] and ‘psychosocial vulnerability’ [n = 1005 (24.0%)] were characterized by risk factors limited to one specific domain and protective factors in others. Women classified into the ‘multidimensional vulnerability’-class [n = 250 (6.0%)] shared multiple risk factors in different domains (psychosocial, medical and socioeconomic risk factors). Multidimensional vulnerability was associated with adverse outcomes, such as premature birth and caesarean section. Conclusions Co-existence of multiple risk factors in various domains is associated with adverse outcomes for mother and child. Early detection of vulnerability and strategies to improve parental health and well-being might benefit from focussing on different domains and combining medical and social care and support.


Introduction
T he first 1000 days of life, from preconception to the child's second birthday, are crucial to children's further physical, mental and social development. This critical and sensitive period is an important determinant of health and well-being in adulthood, as supported by the well-evidenced Developmental Origins of Health and Disease (DOHaD) concept. 1,2 The DOHaD concept explains how experiences and exposures during early life, such as stress and nutrition, influence susceptibility to disease in later life and across generations, arguably through epigenetic mechanisms of foetal programming. 1,2 Because of this intergenerational aspect, parents are the central focus to improve child health and advance health equity. 3 To indicate subgroups of parents and their unborn or newborn children who are at higher risk of poor health or have lower access to healthcare, the concept of vulnerability is often used. 4-6 Vulnerability reflects a complex and dynamic process. Simplified, various stressors at individual or contextual level (e.g. unemployment or living in a deprived neighbourhood) can act as risk factors to vulnerability, while protective factors (e.g. stable social network) might reduce or prevent vulnerability. 4, 5,7,8 Whether the presence of risk factors increases vulnerability and thereby hinder achieving one's optimal health potential depends on the balance and interaction between risk and protective factors. 4, 8 While research on perinatal health has traditionally focussed on risk factors of a medical nature, there is now indisputable evidence for direct and indirect influences of social factors as well. [9][10][11][12][13][14] The social, economic, cultural and environmental living conditions (i.e. social determinants of health) that shape parents' and children's daily experiences and thereby influence their health and development, are embedded in larger systems and structures, such as policies and laws. 3,15 There is an international growing professional and political focus on early detection of vulnerability during the first 1000 days and development of effective strategies to improve parental health and well-being. 3,16 For instance in the Netherlands, the government launched a nationwide 'Solid Start'-programme in 2018 with the aim of providing each child the best start in life by strengthening collaboration between medical and social services, with a specific focus on families in vulnerable situations. 16 Detecting vulnerability during pregnancy with the preventive purpose of countering suboptimal child health is challenging and can benefit from in-depth knowledge into vulnerability.
However, currently, little is known about the combination of different risk and protective factors to vulnerability and its influence on health outcomes. There seems to be few studies that consider protective factors to vulnerability and there is limited insight into clustering and underlying interactions, while it is recognized that especially the co-existence of risk factors can lead to adverse birth outcomes. 11,17,18 Previous studies frequently explored the association between a limited number of predetermined, single risk factors and adverse birth outcome, but neglected co-existence of both protectiveand risk factors that can influence outcomes. 12,18,19 The aim of this study was to identify classes of vulnerability among pregnant women based on a wide range of social risk and protective factors in a latent class analysis (LCA). We conducted the LCA using Dutch observational nationwide data sources and selfreported data prior to pregnancy. In addition, we validated these classes by studying the association between latent class membership and various maternal and perinatal health outcomes and care utilization.

Data sources
This study utilized data from the nationwide population-based datainfrastructure DIAPER (acronym for Data-InfrAstructure for ParEnts and ChildRen). DIAPER integrates routinely collected observational data from three Dutch nationwide data sources (Perined, Vektis and Statistics Netherlands) at individual level. The Dutch Perinatal Registry 'Perined' collects routine care data on pregnancy after 22 weeks of gestation, birth and the first 28 days after birth, as supplied by midwives, gynaecologists and paediatricians. 20 Healthcare information centre 'Vektis' collects claims data under the Dutch Healthcare Insurance Act and provides data on healthcare utilization and spending. 21 'Statistics Netherlands' collects and publishes data on societal matters and provides access to data through their System of Social Statistical Datasets (SSD). 22,23 This linkable SSD-data covers nearly 20 themes, including health, welfare, income, education and labour.
We enriched DIAPER with self-reported data on health, wellbeing and lifestyle of the Public Health Monitor 2016 (PHM-2016). 24 This is a health survey among a varying sample of the Dutch population aged 19 years and older, carried out every 4 years by the Community Health Services, Statistics Netherlands and the National Institute for Public Health and the Environment. The PHM-2016 had 457 153 participants and was mainly conducted from September-December 2016. Supplementary appendix S1 provides more information about the data sources.

Study population
To ensure that information was not influenced by pregnancy itself, women were eligible for inclusion if these criteria were met: (i) they participated in the PHM-2016 (pre-pregnancy), (ii) they gave birth (livebirth or stillbirth) or had a termination of pregnancy before 1 January 2019 and (iii) pregnancy data in 2017 or 2018 were recorded within Perined. In case women had multiple pregnancies or births during the study period, only data on the first observation was included, to avoid duplication of women's characteristics.

Variables
The selection of variables for the LCA started with compiling a list of all possible risk and protective factors to vulnerability based on the framework of the National Academies of Sciences and Medicine, 3 other scientific studies and definitions of vulnerability 4,5,8 and expertise of the research team. Based on this list, 42 variables were available and selected in our data sources. These were divided into nine themes: individual characteristics, socioeconomic characteristics, lifestyle factors, household characteristics, self-reported health, healthcare expenditures and utilization, psychosocial characteristics, life-events and living conditions. The timing of the PHM-2016 was decisive in the choice for 1 October 2016 as baseline to include information. If data were available only on yearly basis, we included data from 2016. To increase interpretability, variables were categorized into two or three categories with the first category representing the risk factor to vulnerability. Supplementary appendix S1 provides a detailed overview of the variables, including definitions, categories and sources.

Outcomes
We studied the association between latent class membership and perinatal and maternal health outcomes and care utilization to validate classes. Perinatal health outcomes comprised: preterm birth (<37 weeks), small for gestational age (SGA, <10th percentile corrected for gestational age and foetal sex), preterm birth and/or SGA and admission to a neonatal intensive-care unit (NICU) after birth. Maternal health outcomes comprised: primary and secondary caesarean section, pre-eclampsia/hypertension and postpartum haemorrhage (1000 ml). Outcomes regarding healthcare utilization included: not having the first antenatal care appointment (i.e. booking visit) before the 10th week of pregnancy and not receiving postpartum care (at home) after birth. Supplementary appendix S1 provides more information.

Latent class analysis
LCA is a data-driven analysis technique that aims to structure heterogeneity in a population by classifying individuals into unobserved-or latent-homogeneous classes. 25 Structuring is based on included variables. Each class is denoted by conditional probabilities for each variable to take on a certain response value (e.g. 1 or 0), with the objective to categorize individuals into the smallest possible set of distinct and interpretable latent classes.
Using R version 3.6.2 (package poLCA), we estimated latent class models using all 42 variables with no prior assumptions about the optimal number of classes. 26 Missing data were imputed through Multiple Imputation using Chained Equations (MICE) (Supplementary appendix S2). We started with a one-class model and stepwise increased to a 15-class model. Parameters of the latent class models were estimated by maximum likelihood. We considered both statistical fit as well as parsimony and interpretability to select the optimal model. 25 To compare the competing models' relative fit, we used the Akaike Information Criterion (AIC) 27 and sample-size adjusted Bayesian Information Criterion (aBIC). 28 Lower values indicate better fit of the model to the data. We also considered the fitindices' relative decrease, as done in previous studies, 29 because a continuous decrease in the AIC is common with large sample sizes and the aBIC also may indicate towards a model with more classes than useful. 30 We additionally reviewed the models' entropy, which reflects how clearly the classes can be distinguished with scores ranging from 0 to 1 (optimum). 31 We selected three preferred models based on their fit statistics and compared their item-response probabilities. The final model was selected based on parsimony and interpretability and women were classified into one of the identified classes based on predicted class membership (largest posterior probability). Further, to evaluate the LCA's robustness, we performed two additional analyses. First, to unravel the impact of previous pregnancies, we excluded nullipara and conducted a LCA with additionally previous perinatal and pregnancy outcomes. Second, to evaluate whether similar vulnerability classes can be distinguished across women in the entire reproduction age, we repeated the LCA with a different study population consisting of all women between 19 and 44 years old.  Defining vulnerability subgroups among pregnant women using pre-pregnancy information 27

Regression analysis
We studied the association between class membership and adverse outcomes by means of unadjusted logistic regression analysis. Results are reported as odds ratios (ORs) with 95% confidence interval (CI).
A p-value of <0.05 was considered statistically significant.

Results
The study population consisted of 4172 women, of whom 1129 had missing data (table 1). A five-class model was considered best (see Supplementary appendix S3 for fit-indices). The aBIC reached a minimum in the 12-class model, but did not show considerable improvement after models beyond seven classes when reviewing the relative fit (elbow shape). The AIC continuously decreased as expected. Entropy values were regarded best for models with two to five classes. We compared the interpretation of models with four, five and six classes and chose the five-class model for its interpretative and distinctive classes.
The five-class model divided the study population into one class characterized by vulnerability in various domains, three classes characterized by vulnerability predominantly in one specific domain and one class with mainly protective factors (see table 2 for all class proportions and characteristics). Figure 1 provides a visual representation.
Class 1 (n ¼ 250; 6.0%), was characterized by high proportions of almost all risk factors to vulnerability. Women in this class were likely to receive social benefits or to have no income (proportion of 0.62) and to live in a rented house (0.65). Related to health, Class 1 was characterized by high GP healthcare expenditures (0.67), longterm illness (0.68) and negative perceptions of health (0.70). These women had a high probability of feeling lonely (0.87) and a moderate to high risk of depression or anxiety (0.87). Considering the vulnerabilities in different areas (including psychosocial, medical and socioeconomic risk factors), Class 1 was named 'multidimensional vulnerability'. Class 2 (n ¼ 485; 11.6%) was characterized by high healthcare expenditures. All women classified in this class had total healthcare expenditures in the highest quintile. Also, they frequently experienced high hospital care expenditures (0.69). Simultaneously, women in this class were likely to have protective factors including a healthy BMI (0.68), positive perception of health (0.87), high educational level (0.65), paid work (0.96), low probability of feeling lonely (0.78) and an owner-occupied house (0.90). Based on the dominant features, Class 2 was named 'high care utilization'. Class 3 (n ¼ 395; 9.5%) was characterized in particular by high proportions of socioeconomic risk factors. Women in this class were likely to receive social benefits or have no income prior to pregnancy (0.87). They frequently lived in a rented house (0.58), had a non-Dutch background (0.56) and a low (0.30) or moderate (0.39) educational level. The probability of living in a neighbourhood with a low liveability score was highest in this class (0.22). When considering protective factors, these women were often married (0.70), had a positive perception of health (0.90) and low healthcare expenditures (0.83). Class 3 was named 'socioeconomic vulnerability'.
Class 4 (n ¼ 1005; 24%) was characterized by psychosocial health issues. The majority had a moderate to high risk of depression or anxiety disorders prior to pregnancy (0.71). These women were likely to feel lonely (0.57) and nullipara were overrepresented (0.55). Regarding protective factors, the majority had a full-time contract (0.69), an owner-occupied house (0.64) and no high healthcare expenditures (0.95). Class 4 was named 'psychosocial vulnerability'. Class 5 (n ¼ 2040; 48.9%) was characterized by women with low probabilities of all risk factors to vulnerability before pregnancy. Instead, in general, these women had a positively perceived health (1.00), did not feel lonely (0.86), had a high educational level (0.70) and paid work (0.98). Women in Class 5 had the highest probability to experience high control over life (0.37). Class 5 was named 'healthy and socioeconomically stable'.
The analyses in the two additional study populations (women who gave birth before and all women aged 19-44 years) showed similar results. The five-class model was preferred and classes could be interpreted similarly. Figure 2 shows associations between classes and adverse outcomes. Class 5 (healthy and socioeconomically stable) was the referencecategory. Women classified in Class 1 (multidimensional vulnerability) were more likely to have babies who were born prematurely, SGA or admitted to a NICU. These women were also more likely to have a caesarean section. There were no significant associations found for other maternal health outcomes including hypertension/ pre-eclampsia and postpartum haemorrhage. Compared to Class 5 (healthy and socioeconomical stable), all other classes except Class 4 (psychosocial vulnerability) were more likely to not receive postpartum care (at home) and to not receive antenatal care on time. Adverse outcomes were quite similar in Class 2 (socioeconomic vulnerability) and Class 5 (healthy and socioeconomically stable), except from the odds of planned caesarean section. Supplementary appendix S4 shows prevalences of outcomes for each class.

Discussion
This study aimed to identify classes of vulnerability among pregnant women and to validate these classes by studying the association with adverse perinatal and maternal health outcomes and care utilization. The LCA procedure identified five classes with different combinations of risk and protective factors to vulnerability. Most women were classified into the 'healthy and socioeconomically stable'-class with mainly protective factors. Women classified in the classes 'high care utilization', 'socioeconomic vulnerability' or 'psychosocial vulnerability' shared risk factors to vulnerability in one specific domain and protective factors in others. Women classified into the 'multidimensional vulnerability'-class shared multiple risk factors in several domains (e.g. psychosocial, medical and socioeconomic) and were more likely to develop poor health outcomes, such as premature birth, SGA, caesarean section and NICU admission.
Our study showed that multidimensional vulnerability leads to experiencing worse outcomes compared to vulnerability on a single domain or no vulnerabilities. This indicates the importance of coexistence or clustering of multiple risk factors (such as no income, high healthcare expenditures and feelings of loneliness) in increasing the probability of adverse outcomes for mother and child. Our findings strengthen results from previous studies that aimed to explain differences in adverse outcomes by interrelated individual or contextual risk factors. 10,11,17 Previous LCA studies also led to classes of pregnant women with different health behaviours, psychosocial or socioeconomic characteristics that show differences in outcomes, a: Following guidelines of Statistics Netherlands, the data of some variables were rounded (parity) or not shown (having been detained) to prevent disclosure of information about individuals. b: Detailed definitions of variables and categories are provided in Supplementary appendix S1. Missing data are shown in italic. Defining vulnerability subgroups among pregnant women using pre-pregnancy information 29 although these studies included less factors and domains, and other populations in comparison to our study. 17,32,33 The findings do not inform us on how risk factors interplay and lead to adverse health outcomes. The syndemic model provides a perspective on this interplay by describing how co-occurring health adversities are fuelled by different social and contextual factors that interact and increase the health burden of both mental and physical illness. 34 This suggests the need to combine social and medical care and support, instead of focussing on the separate domains to combat multidimensional vulnerability.
We found that women with socioeconomic vulnerability generally did not experience worse outcomes. This finding is not in congruence with previous research indicating that adverse perinatal health outcomes are more prevalent among women with a low socioeconomic status (SES). 9,10,14 Previous studies often focussed on a limited number of risk factors or domains, or used more traditional (regression) techniques to study the relation between SES and outcomes. However, as the impact of risk factors can depend on other factors, it is important to step away from traditional independent 'ceteris paribus' linear effect assumption of social determinants. Therefore, we used LCA as analytical approach that considers the combination of both risk and protective factors, allowing a more comprehensive approach to study vulnerability. Protective factors (e.g. social support) can act as positive exposures or buffering mechanisms that promote resilience and improve health. 3,8,35,36 This indicates the importance of acknowledging both strengths and challenges in families to create a supportive environment for early development. 37 Additionally, low SES may not necessarily be a risk factor for adverse outcomes unless it coincides with other hardships. The relation between SES and health can be described by processes such as social causation (adverse conditions of poverty impact health through, for example, stress and food insecurity) and health selection (people with worse physical or mental health outcomes fall into poverty through, for example, stigma, health expenditures and lower productivity). 38 This increases the importance for healthcare professionals to understand different domains of vulnerability and tailor the need for support to the individual. 39,40 Our findings reveal a difference in care utilization patterns. The 'healthy and socioeconomically stable'-class was most likely to receive early antenatal care and postpartum care (at home). This corresponds to findings of Grabovschi et al. 6 in their scoping review into vulnerability. People with higher vulnerability levels (i.e. multiple vulnerability aspects) have higher healthcare needs, but less access to services and lower quality of healthcare. This raises questions about whether current support meets parents' needs.
The main strength of this study is that we linked routinely collected nationwide observational data sources to self-reported data on health, well-being and lifestyle. This offered the opportunity to Proportions of risk factors (first category) >0.6 are shown in bold to indicate the higher occurrence of certain risk factors per class. For each category, the class with the highest proportion is shown in italic. Totals may not add up to 1.0 because of rounding. Following guidelines of Statistics Netherlands, the observed numbers in each category were rounded to five before calculating proportions in order to prevent the disclosure of information about individuals.
include data on a wide range of medical and social factors for a large group of pregnant women to better understand vulnerability. While previous studies often had a unidimensional perspective to vulnerability (focussing on single risk factors such as individual SES, or neighbourhood SES on aggregated level), we could unravel the difference between unidimensional and multidimensional types of vulnerability due to our extensive dataset. Another strength is that we included protective factors, while most studies focus primarily on factors that increase the risk of adverse outcomes and less on protective factors that might counteract these effects. 18,19 Unfortunately, data on topics such as nutrition, stress, health literacy, preconception care and adverse childhood experiences were not available, while these factors could provide additional insights into vulnerability. Next, using largest posterior probability to assign women to classes is a limitation, because not all women are fully representative of one class only. Our study was moreover limited by not including the father or woman's partner, despite growing evidence of their importance in promoting healthy pregnancy, childbirth and childoutcomes. Another limitation relates to the representativeness of the study population due to using the PHM-2016. Compared to all other pregnant women in 2017/2018, women in our study less often had a low income (5% vs. 8%), low educational level (8% vs. 12%) and migration background (18% vs. 32%). Since generally people with higher vulnerability less often participate in research, we assume that the size of the multidimensional vulnerability-class is an underestimation. Nevertheless, since we could identify classes of vulnerability and differentiate between single and multidimensional vulnerability, we expect that their characteristics are also applicable beyond the study population. Similar results from our additional analyses strengthen this expectation. Nevertheless, our approach and findings should be validated in other cohorts and countries and until then be interpreted with caution. Our findings can have several implications for practice and research. We believe that screening instruments for vulnerability before and during pregnancy could benefit from including a balanced set of both risk and protective factors. In refining screening instruments, we have to consider the various criteria for responsible screening, such as the availability of associated care or support strategies. 41 Greater consciousness among healthcare providers regarding the complexity of vulnerability in terms of risk and protective factors and personal perceptions could enhance the provision of personcentred care and support. 6,40,42 Multiple studies argue that future strategies should also pay attention to underlying, root causes of vulnerability in policies, laws and governance. 3,15,43 Advancing health equity requires both individual-level interventions targeted at vulnerable individuals as well as systemic-level change. 3,15,43 Factors related to housing, education and social security for example, frequently lie upstream of individual lifestyle and behavioural factors modifiable through individual-level interventions. Findings of our study can be input for longitudinal monitoring of vulnerability at population level. Future research is needed to identify if vulnerability classes can be identified using solely routinely collected population Figure 1 A visual representation of the five latent classes, described across the nine themes that summarize all 42 factors related to vulnerability. The vertical axis displays for each theme the average proportion of women within the categories that represent the risk factors (each first category in table 2). A higher score means that a higher proportion of women in a class have risk factors to vulnerability. An example: the theme 'self-reported health' consists of three factors: perceived health, long-term illness and restriction by health. For Class 1 (multidimensional vulnerability), the average proportion of women with a negative perceived health (0.7), long-term illness (0.68) and feelings of being restricted by health (0.76), is 0.71. This average proportion is displayed Defining vulnerability subgroups among pregnant women using pre-pregnancy information 31 data, without using self-reported data. Additionally, more research is necessary regarding the role of the father or woman's partner in relation to vulnerability.
In conclusion, there is growing attention for early detection of vulnerability and implementing effective strategies to improve health and well-being of current and next generations. Results of this data-driven study suggest that several vulnerability classes can be distinguished among pregnant women in the Netherlands. The co-existence of risk factors in multiple domains leads to more adverse outcomes for mother and child. Effective strategies, starting preconceptionally, should include both medical and social care and support.

Supplementary data
Supplementary data are available at EURPUB online.