A prediction model for neonatal mortality in low- and middle-income countries: an analysis of data from population surveillance sites in India, Nepal and Bangladesh

Abstract Background In poor settings, where many births and neonatal deaths occur at home, prediction models of neonatal mortality in the general population can aid public-health policy-making. No such models are available in the international literature. We developed and validated a prediction model for neonatal mortality in the general population in India, Nepal and Bangladesh. Methods Using data (49 632 live births, 1742 neonatal deaths) from rural and urban surveillance sites in South Asia, we developed regression models to predict the risk of neonatal death with characteristics known at (i) the start of pregnancy, (ii) start of delivery and (iii) 5 minutes post partum. We assessed the models’ discriminative ability by the area under the receiver operating characteristic curve (AUC), using cross-validation between sites. Results At the start of pregnancy, predictive ability was moderate {AUC 0.59 [95% confidence interval (CI) 0.58–0.61]} and predictors of neonatal death were low maternal education and economic status, short birth interval, primigravida, and young and advanced maternal age. At the start of delivery, predictive ability was considerably better [AUC 0.73 (95% CI 0.70–0.76)] and prematurity and multiple pregnancy were strong predictors of death. At 5 minutes post partum, predictive ability was good [AUC: 0.85 (95% CI 0.80–0.89)]; very strong predictors were multiple birth, prematurity and a poor condition of the infant at 5 minutes. Conclusions We developed good performing prediction models for neonatal mortality. Neonatal deaths are highly concentrated in a small group of high-risk infants, even in poor settings in South Asia. Risk assessment, as supported by our models, can be used as a basis for improving community- and facility-based newborn care and prevention strategies in poor settings.


Introduction
Worldwide, every year, nearly 3 million infants do not survive the first 28 days of life. 1 Nearly all (99%) of these deaths occur in low-and middle-income countries. 2 In poorer parts of India and Bangladesh, 35-65 babies in 1000 live births die in the neonatal period. 3 For publichealth policy-making and management of pregnancy, delivery and the newborn period, including proper risk selection and institution of selective care pathways for highrisk pregnancies, it is important to be able to predict which infants are at a high risk of neonatal death.
Prediction models of neonatal mortality are largely restricted to high-income countries, which account for only 1% of neonatal deaths. These models focus on infants in neonatal intensive-care units. [4][5][6] Existing models for lowand middle-income countries are few and again focus on neonatal intensive-care patients. 7 In poor settings, where many neonatal deaths occur at home, 8 prediction models of neonatal mortality in the general population, rather than for selective high-risk patients only, can aid publichealth policy-making and decision-making by family members and community health workers (e.g. through early recognition of potential problems). To our knowledge, no such models for neonatal mortality have been published in English-language international peer-reviewed journals.
Whereas prediction models for neonatal mortality are scarce, there is quite a good understanding of the causes of and risk factors for neonatal death in low-and middleincome countries. Preterm birth, neonatal infections and birth asphyxia account for around 80% of neonatal deaths. 1,2 Direct risk factors include young and relatively advanced maternal age, maternal under-nutrition, primiparity and high parity, short pregnancy interval, multiple pregnancy, maternal health problems during pregnancy, malpresentation, problems during delivery, male infant sex (with exceptions in settings with strong son preference), low birth weight and exposure to infections. 2,9,10 Low socio-economic position of the mother is an important underlying risk factor for neonatal death. 11 The advantage of prediction models is that they formally combine risk factors, allowing more accurate risk estimation. 4 Yet, as many births in poor settings occur at home without skilled care, good data on neonatal mortality and its risk factors remain scarce. Demographic surveillance sites in South Asia, in which the full population is followed up and all women were interviewed post partum, do provide such data, offering a unique opportunity to develop a prediction model for neonatal mortality in poor settings.
We aimed to develop and validate a prediction model for neonatal mortality in the general population in lowand middle-income countries, with specific reference to South Asia, using data from four surveillance sites.

Methods
We used prospectively collected data from surveillance sites in rural Nepal (Makwanpur district, surveillance population of 170 000) and Bangladesh (Moulvibazar, Bogra and Faridpur district, 500 000) and rural (five districts in the states of Odisha and Jharkhand, 228 000) and urban (informal slum settlements in Mumbai, 283 000) India. [12][13][14][15][16] The surveillance systems and data-collection tools were comparable across the sites. At each site, the full population (in the Nepal site, a closed cohort of women) in a geographically defined area was followed up on a continuous basis, and all births and birth outcomes were recorded. Local key informants, typically covering around 250 households, were responsible for reporting all births, birth outcomes and deaths to women of reproductive age to a salaried interviewer who met with the key informant on a monthly or fortnightly basis. The interviewer verified all reported events and paid the key informant a small financial incentive (more or less $1, depending on the site) for each correct identification. In the Nepal site, local female enumerators visited all cohort members in their area every month to record menstrual status. In each site, all women who had given birth, or a family member if the woman had died, were interviewed at around 6 weeks post partum, and detailed information about the mother and the pregnancy, delivery and newborn period was recorded. The questionnaires were similar across the sites, with some adaptations to the local context, e.g. in the way household assets were measured (see footnote to Table 1). The sites were set up for randomized-controlled trials of community-based interventions with participatory women's groups. We only included data from the control arms of the trials. We included data from all South Asian sites of which the women's group trial results have been Our outcome of interest was neonatal death, i.e. death in the first 28 days of life among live-born infants. All characteristics known to influence neonatal mortality as reported in the Lancet Neonatal Survival series, 2,17 when available in our dataset, were included as predictors in our initial models. We also included season of birth-a predictor of neonatal death in at least one of our sites. 18 All variables were based on the mother's report or the report of a family member in the event of her death. Included characteristics at the start of pregnancy were: maternal age, maternal education (no school, primary, secondary, BSc/MSc) and literacy (can read, cannot read), household economic status (wealth tertiles, based on Principal Component Analysis) 19 and pregnancy interval (using birth interval as proxy, categorized as <15, 15-26, 27-68, >68 months or primigravida). 10 We included the following characteristics known at the start of delivery: at least one antenatal care (ANC) visit (y/n), at least four ANC visits (y/n), tetanus vaccination during pregnancy (y/n), premature birth (y/n, defined as gestational age of 8 months; gestational age in weeks not available), season of birth (warm-dry, rainy, cold) and pregnancy complications (y/n). Pregnancy complications were defined as any one of: reduced/no fetal movement, jaundice, fits/seizures/convulsions/lost consciousness. These complications were identified as the strongest independent predictors of neonatal mortality in a preliminary logistic regression analysis that also included: excessive vomiting, felt weak/tired, swollen feet/legs/face, severe stomach pain, looked pale, malaria, severe headache/dizziness/fainting, breathless when doing household tasks, blurred vision/spots before eyes, anaemia. Multiple birth (y/n) may or may not have been known at the start of delivery, depending on the quality of the ANC. The following characteristics known 5 minutes post partum were included: presentation/mode of delivery [normal, breech, caesarean section (C-section)], place of delivery (home, facility), labour duration ( or >24 hrs), delivery complications (y/n), maternal death (y/n), sex of baby, size of baby at birth (small, normal, large), looking abnormal (y/n), breathing/crying immediately after birth (y/n), condition of arms and legs of baby after birth (normal, floppy, stiff) and condition of baby at 5 minutes ('crying well, breathing well, pink and active', 'poor or no cry, poor breathing, blue limbs or body, poorly active/no movement'). Delivery complications were defined as any one of the following: fever within 3 days prior to labour, retained placenta and haemorrhage ('vaginal bleeding so much that you thought you were going to die'). Looking abnormal was mostly based on the question: 'How did the baby look at birth, normal/abnormal?' Most predictors were available for over 90% of deliveries (Table 1). Some variables were not available or had many missing values for the Mumbai (India) site (presentation; condition at 5 minutes; condition arms and legs) and rural Nepal (birth interval; tetanus vaccination; pregnancy complications; breathed and cried immediately; condition at 5 minutes; condition arms and legs). Because each variable was available for a considerable number of births, we used an advanced multiple imputation of missing values strategy (method of chained equations) to make efficient use of the available data. 20 To maximize the use of all the available information, we included all potential predictors of neonatal mortality, as well as the site and the outcome, in the model for imputation of missing values. We used the R package 'mice' for multiple imputation. 21 We developed three logistic regression models to predict the risk of death in the first 28 days of life at the individual level, based on Household-wealth indicators included in the Principal Components Analysis were as follows: Bangladesh (electricity, radio/tape recorder, fan, television, telephone, generator, bicycle, fridge), Jharkhand/Odisha (India) (electricity, radio/tape recorder, fan, television, generator, bicycle, fridge), Mumbai (India) (electricity, radio/tape recorder, fan, television, telephone, bicycle, fridge). For Nepal, the wealth measure was based on predefined asset levels in the surveillance questionnaire, based on household ownership of one or more of the items on the list. These items were as follows: least poor (bus, truck, motorcycle, television, motor tractor, fridge, hand tractor, sewing machine/cassette player/fan/radio/camera/bicycle), middle (wall clock/iron), poorest (none of the above). characteristics known at (i) the start of pregnancy, (ii) the start of delivery and (iii) 5 minutes post partum.
We modelled possible non-linearity of the association between mother's age and the risk of neonatal death with restricted cubic splines. 22 We expressed the strength of the association between predictors and neonatal death by crude and adjusted odds ratios. We evaluated the contribution of each predictor by the difference in Akaike's information criterion (DAIC) between multivariable models with and without the predictive factor, balancing the improvement in goodness of fit of a model with its increased complexity. 22 We deleted variables with negligible predictive contribution, i.e. when the v 2 test statistic minus twice the degrees of freedom was relatively small (below 10).
We assessed the discriminative ability of each model by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The AUC can be interpreted as the probability that the risk prediction of a randomly chosen neonatal death is higher than the risk prediction of a randomly chosen neonatal survivor. We determined the AUC of the models within each of the four sites ('apparent AUC'). We also used a cross-validation approach between sites to obtain a more realistic presentation of the AUC in independent settings ('cross-validated AUC'). Cross-validation means that the model is consecutively fitted in three of the four sites and validated-with the AUC-in the site that was left out when fitting the model. To obtain overall AUCsboth apparent and cross-validated-we used random-effects meta-analyses of the four site-specific AUCs. 23 For calculation of an individual's probability of neonatal death, we present the prediction models with nomograms. 22,24 For regression analysis and construction of nomograms, we used the R package 'rms'. 21 Approval for the trials of which we used the data for our secondary analysis was received from the Research Ethics Committee at the UCL Institute of Child Health and appropriate national Ethics Committees. [13][14][15][16] Results Across the sites, 1742 neonatal deaths occurred in 49 632 live births, with the neonatal mortality rate (NMR) varying from 58.8/1000 in rural Jharkhand/Odisha (India) to 36.9/ 1000 in Nepal, 34.6/1000 in Bangladesh and 8.6/1000 in informal settlements in Mumbai (India) ( Table 1).
The following characteristics were very strongly associated with neonatal death [univariable odds ratios (ORs), Supplementary Table  1, available as Supplementary data at IJE online]: breech delivery, premature birth, mother died, multiple birth, small size at birth, looking abnormal, not immediately crying or breathing, poor condition at 5 minutes, and infant had floppy or stiff arms and legs. The other included characteristics were also associated, though less strongly, with neonatal death in most sites. Table 2 presents the prediction models. At the start of pregnancy, a high educational attainment was associated with a lower odds of death and low economic status was associated with a higher odds of death (Table 2; Supplementary  Table 2, available as Supplementary data at IJE online). Also, a very short birth interval and births to primigravid, younger (especially <18 years) and older (35þ) women were associated with a higher odds of death. Socio-economic (DAIC education: 35; economic status: 12) and demographic characteristics (DAIC birth interval: 31; maternal age: 14) were equally strong predictors of neonatal death. At the start of pregnancy, the predictive ability of the model was moderate {apparent AUC: 0.59 [95% confidence interval (CI) 0.58-0.61]; cross-validated AUC 0.58 [95% CI 0.56-0.59]}.
At the start of delivery, prematurity was a very strong predictor of neonatal death [DAIC: 1658; OR 11.11 (95% CI 9.89-12.47)]. Less strong, but still predictive, were health problems during pregnancy and delivery in the cold season. Low maternal socio-economic position and short birth interval were also important predictors. Predictive ability at the start of delivery was considerably better than at the start of pregnancy  who presented normally (1.7 points), but was born prematurely (4.4 points) in the cold season (0.7 points) in Jharkhand/Odisha (India) (3.7 points), to a primigravid (0.5 points) mother with no schooling (2.2 points) had an estimated mortality risk of 384/1000 if the infant was in good condition at 5 minutes, with arms/legs in normal condition. If the same infant was in a poor condition at 5 minutes (5 points), but with arms/legs in normal condition, the mortality risk amounted to 863/1000.

Discussion
We developed and validated prognostic models for neonatal mortality in the general population in low-and middleincome countries, with specific reference to South Asia, on the basis of risk factors known at (i) the start of pregnancy, (ii) the start of delivery and (iii) 5 minutes post partum. At the start of pregnancy, prediction of neonatal death was difficult, although infants born to women of lower socio-economic position and to women with certain demographic characteristics (young or advanced age, very short birth interval, primigravida) were at a higher risk of neonatal death. Predictive ability improved at the start of delivery, where multiple pregnancy and a premature start of delivery were highly predictive of neonatal death.
Predictive ability was high at 5 minutes post partum, where prematurity, multiple birth and a poor condition of the infant were strong predictors of death. The models can be used to inform population-based prevention and more narrowly targeted interventions for high-risk infants.

Methodological issues
Our models are based on large datasets from sites in which the full population was prospectively followed up and detailed information on predictors of neonatal death was collected, allowing precise prediction. Yet, recall bias is a potential problem, as information was based on the mother's report at approximately 6 weeks post partum. Whereas we reduced this problem by using broad categories for variables such as size at birth, random error may remain substantial for such variables. Furthermore, the mother's report may have been biased by the outcome (death/survival), with worse conditions reported for neonatal deaths, leading to inflated ORs for characteristics that mothers associate with death (e.g. infant condition at 5 minutes). Yet, for other predictors, such as multiple birth, such recall bias is probably minimal. Finally, whereas the high number of missing values in some predictors in particular sites may be considered a limitation, we were able to develop our models based on multiple imputation of missing values using the substantial amount of available data. Nevertheless, this may have led to an under-estimation of the discriminative ability of the models. Despite these problems, we arguably used some of the best data available for general populations in poor settings (i.e. prospectively collected data from some of the largest networks of linked demographic surveillance sites in South Asia) where home births without skilled care are common and reliable vital registration systems are non-existent. Our models are arguably generalizable to rural and poor urban South Asia. Our study sites ranged from informal settlements in megacity Mumbai, with a comparatively low NMR, to tribal areas in some of the poorest states in India, with a high NMR. The discriminative ability of the models-measured by the apparent and cross-validated AUC-was stable across sites, implying that the models are generally applicable across our study population. Our models are possibly less applicable to the top layer of South Asian society with a different cause-of-death pattern. Furthermore, their wider generalizability to other world regions needs further examination.

Comparison with the literature and implications
To our knowledge, our study is the first to formally combine known risk factors for neonatal mortality into a prediction model for the general population in low-and middle-income countries. We developed models for three time points, i.e. onset of pregnancy, onset of delivery, immediately after birth-something we rarely encountered in the literature.
We found that three risk factors-preterm birth, multiple birth and poor condition at 5 minutes post partumwere associated with a very high risk of neonatal death. A substantial proportion of deaths was associated with these risk factors. Secondary prevention (improving outcomes among infants with these risk factors, rather than reducing risk-factor prevalence) can play an important role in preventing these deaths. Facility-based interventions to improve management of high-risk infants exist for poor settings. 25,26 Whereas timely access to skilled care can be critical, it is often problematic in poor rural areas. Healthsystem strengthening to improve the quality and availability of care and demand-side interventions (e.g. conditional cash transfers) to reduce care-seeking delays are therefore  Figure 1. Nomogram of the prediction of neonatal mortality at start of pregnancy. To estimate an infant's probability of neonatal death, first determine all of its risk-factor characteristics [educational attainment of its mother, (estimated) birth interval, etc.]. Second, read the risk points associated with each risk factor by drawing a line up from the predictor value to the 'Points' axis. Third, add up the points for all risk factors to obtain the total points for that infant. The probability of neonatal death can be read by moving vertically from the 'Total Points' axis to the 'nnd' axis. The predictor 'site' can be used to take regional differences in NMR into account. When using the nomograms outside of our study populations, readers are advised to use the site with an NMR closest to their own study population.  important. Interventions also exist for community settings, including participatory women's groups and home-based neonatal care by village health workers. 26,27 Communitybased management requires that care-givers are aware of important risk factors and react pro-actively to danger signs. 28 This means anticipating potential problems in women with a multiple pregnancy and/or premature start of delivery where there is still time to travel to a facility, and early recognition and home management of problems among preterm infants and babies in a poor condition (e.g. bag-and-mask ventilation, kangaroo care, delayed bathing). 29,30 Raising awareness about the importance of the above risk factors within community-based interventions and empowering families and communities to address these problems are therefore recommended. Similarly, these strategies can be used for the other described risk factors, including breech delivery (timely recognition and care-seeking) and delivery in the cold season (thermal care). Also, whereas infants are at the highest risk of death on the day of birth, 1 these strategies are equally important for the late neonatal period (comprising 20-50% of deaths in our sites). So, rather than being competing strategies, population-level interventions to raise awareness and empower communities to act are a prerequisite for effective secondary prevention in settings where home births without professional care are common.
Combining the above strategies with population-level primary prevention to reduce the incidence of risk factors, e.g. by improving maternal nutrition, reducing indoor pollution and increased use of family planning, will help to further reduce neonatal mortality. 1,31 Similarly, measures to improve living conditions and hygienic practices are important. Forty per cent of deaths in our sites occurred among infants without the three main risk factors; infections may have played an important role in these deaths, as well as in the death of high-risk infants. 1

Conclusions
We developed good performing prediction models for neonatal mortality in the general population in South Asia. We conclude that neonatal deaths are highly concentrated in a small group of high-risk infants, even in poor settings in South Asia. These high-risk infants can be identified  based on characteristics available before or shortly after birth. Our models suggest that improved management of high-risk infants can substantially reduce neonatal mortality. Where health systems are weak, a high-risk approach should arguably include population-level strategies to raise awareness about important risk factors and empower community-based care-givers to take action. This should arguably be complemented with health-system strengthening to improve the uptake of facility-based care and quality of maternity and newborn care and action on the social determinants of health to reduce mortality in low-risk, as well as high-risk, infants.

Supplementary Data
Supplementary data are available at IJE online.