Can existing improvements of water, sanitation, and hygiene (WASH) in urban slums reduce the burden of typhoid fever in these settings?

Sustained investment in water, sanitation and hygiene (WASH) has lagged in resource-poor settings; incremental WASH improvements may, nonetheless, prevent diseases such as typhoid in disease-endemic populations. Using prospective data from a large cohort in urban Kolkata, India, we evaluated whether baseline WASH variables predicted typhoid risk in a training subpopulation (n=28470). We applied a machine learning algorithm to the training subset to create a composite, dichotomous (“good”, “not good”) WASH variable based on four variables and evaluated sensitivity and specificity of this variable in a validation subset (n=28470). We evaluated in Cox regression models whether residents of “good” WASH households experienced lower typhoid risk after controlling for potential confounders. We constructed virtual clusters (radius 50m) surrounding each household to evaluate whether “good” WASH prevalence modified typhoid risk in central household members. In this typhoid-endemic setting, natural variation in household WASH was associated with typhoid risk. If replicated elsewhere, these findings suggest that WASH improvements short of major infrastructural investments may enhance typhoid control. years of surveillance occurred in a population that had not been vaccinated with ViPS, whereas the second 2 years occurred during the ViPS trial, we evaluated whether the predictive rule performed similarly with respect to sensitivity and specificity before and during the ViPS trial. Second, we evaluated in the total population the protection against typhoid associated with “good” WASH in the household. To measure the protective association between “good” WASH in the household and the subsequent risk of typhoid fever in household inhabitants, we used Cox proportional hazard regression models after evaluating proportionality assumptions for independent variables. To evaluate independent variables not fulfilling proportionality assumptions, we included the variables as interaction variables with time of follow-up. Models were adjusted for age in years with time interaction, number of household members with time interaction, and monthly household expenditure (in Indian rupees), all of which were associated at p<0.05 with the hazard of typhoid in bivariate Cox models. The hazard ratio (HR) for culture positive typhoid fever was estimated by exponentiation of the coefficient of living in a household with “good” WASH in the model, and the 95% confidence interval (CI) for the HR was estimated using a robust sandwich method where each household was assumed as a cluster in the model. We also used Cox models to evaluate the protective association between the percentage of “good” WASH households ascertained by GIS in the baseline census as located within a 50m radius of the central household under analysis, on the one hand, and the rate of typhoid in the total population during the ensuing four years, on the other. The analysis was performed using “rpart” package for decision tree modeling, “rpart.plot” package for tree plotting, “pROC” packages for ROC curve, “survival” for Cox model and “dplyr” package for data management under R-Studio analytical software (9-11). In all statistical analyses, p<0.05 (2-tailed) was taken as the margin of statistical significance. the association between the level of surrounding households with “good” WASH and the risk of typhoid in members of the central household in a model that controlled for potentially confounding variables (age, household and household expenditure). In this analysis, we found that the association was significant (HR=0.988, 95% CI: 0.979-0.996, p=0.004, for each percent increase of coverage). However, it was not possible to evaluate the independent protective associations with typhoid fever for household WASH on the one hand and level of neighborhood WASH on the other hand due to collinearity between these two variables. These results indicate that improvements to WASH short of major municipal infrastructure developments might have an appreciable effect on the risk of typhoid in urban settings of South Asia. The results reinforce the notion that household transmission was an important contributor to disease transmission in this urban setting, and moreover, that household source of drinking water appears to be a dominant determinant of typhoid risk, although the effect of drinking water source can be modified by a variety of other WASH variables. This evaluation also suggests that the effect of WASH in the household was not modified by the level of WASH in the surrounding neighborhood. The ability of our analysis to demonstrate that household WASH may influence the risk of typhoid in household residents encourages future studies of this topic. We recommend that future studies characterize WASH in a more comprehensive fashion, including WASH behaviors, and that cohort studies of this topic monitor and analyze changes in household WASH that occur during follow-up. As well, because of the theoretical likelihood that protection conferred by newer generation typhoid vaccines may be modified by the intensity of the typhoid inoculum to which vaccinees are exposed, it will be of great interest to examine the interactions between levels of household WASH and typhoid vaccine protection in future vaccine evaluations.


Introduction
Major infrastructural improvements in clean municipal water and sanitary treatment and disposal of human waste have eliminated typhoid fever, an infection caused by Salmonella enterica serovar Typhi (S. Typhi), as a significant public health problem in affluent countries. However, typhoid continues to affect populations of low-and middle-income countries (LMICs) (1)(2)(3), and worldwide, there are approximately 10.9 million cases and 116,800 deaths annually (4). Because resources in LMICs are generally too limited to enable the major infrastructural improvements achieved by more affluent countries in the foreseeable future, it may be questioned whether incremental improvements in water quality, hygiene and sanitation that are feasible and affordable in these settings will be sufficient to have a measurable impact on typhoid burden. One way of approaching this problem is to evaluate whether natural variations in the level of household water, sanitation, and hygiene (WASH) that are already present in poor, typhoid-endemic settings, and thus are affordable and feasible, impart a measurable impact on the risk of typhoid fever in those settings.
In this paper, we reanalyze longitudinal data from a Kolkata slum population under comprehensive surveillance for typhoid to address this question.

The Vi Polysaccharide trial in Kolkata
A cluster randomized trial of the effectiveness of typhoid Vi polysaccharide (ViPS) vaccine in a poor slum area of Kolkata, where typhoid is endemic, was conducted in individuals 2 years of age or older between 2004 and 2006 (5). Two years prior to vaccine dosing for the trial, surveillance for enteric fever was initiated in the trial area and continued for two years following vaccination. All households and individuals within the study area were enumerated through a baseline census conducted at the beginning of surveillance activities. Periodic census updates were conducted throughout the study period, including immediately prior to vaccination, 1-year post-vaccination, and at the completion of the surveillance period. Census methods included the enumeration of individuals A c c e p t e d M a n u s c r i p t 6 and collection of geolocation, demographic, socio-economic, and WASH information of the household and its residents. WASH information was collected during the baseline census only.
Passive surveillance for typhoid was conducted at 5 study clinics, where residents of the study area presenting with a history of fever for at least 3 days received blood cultures which were evaluated for S. Typhi as described elsewhere (5). Visits for fever in which the date of onset of fever was within 14 days of the date of discharge for the previous visit were grouped together into febrile episodes. All subjects or their guardians provided written informed consent. Typhoid fever was defined as a febrile episode in which as least one blood culture yielded S. Typhi. The overall protective effect of the Vi vaccine was 61% (95% CI: 41-75) when measured against a hepatitis A vaccine control (5).

Selection of WASH variables for analysis
We analyzed whether variables characterizing the level of household WASH, ascertained in the baseline census, predicted the risk of typhoid fever in the baseline closed cohort over the ensuing four years of clinical surveillance for typhoid fever. In the census there were 5 non-binary household variables characterizing household source of drinking water, treatment of water for daily use, site of defecation, handwashing practice after defecation, and waste disposal location. We grouped the categories of each of these variables into two classifications reflecting "good" or "not good" WASH based on substantive judgements and the distribution of the population according to the different categories, but without prior knowledge of the rates of typhoid associated with each component category. We then divided the baseline population at random into two equal sized, mutually exclusive subsets, a "training" population and a "validation" population, and selected the four binary WASH variables that were associated with the hazard of typhoid fever in Cox regression models at p<0.1, after first ascertaining whether the variables met the proportional hazards assumption. These variables

Development of the WASH decision tree to predict typhoid in the training subpopulation
To create an overall binary composite variable for household WASH predicting typhoid in household members, we used a machine learning algorithm to develop a decision tree constructed with recursive partitioning (6). The decision tree was designed to predict development of typhoid fever among baseline household members over 4 years of follow-up, using 4 dichotomous household WASH variables independently associated with typhoid hazard. We initially ran the algorithm for the training population, specifying a default loss function of 1:1 for the ratio of costs of false negative and false positive classifications, and requiring that each terminal node have at least 300 observations. For cross-validation, the training population was randomly partitioned into 10 parts; 1 part was used in estimating cross-validation error. To find the optimal typhoid prediction tree in the training population, the model was pruned by the minimal complexity parameter (CP) corresponding to a minimum error with at least 2 terminal nodes in the tree. A Receiver Operating Characteristic (ROC) Curve was constructed for the selected rule, and the area under the ROC curve (AUC) was estimated (7,8). We used maximization of the Youden index to select the cutoff probability for the ROC demarcating the overall binary WASH variable into "good" versus "not good" household WASH in relation to the risk of typhoid in inhabitants of the household.

Further analyses
Because analysis of predictive rules developed in a training population may overestimate the performance of the rule if analyzed in the same subpopulation, we evaluated the rule in a separate validation population. Procedures for analysis of the validation subpopulation were the same as for the training subpopulation. After confirming that the predictive rule for "good" WASH performed similarly in the training and validation populations with respect to sensitivity (the proportion individuals developing typhoid who lived in households with "not good" WASH) and specificity (the proportion of individuals not developing typhoid who lived in households with "good" WASH) in predicting typhoid, we undertook further analyses in the total population. First, because the first 2 Downloaded from https://academic.oup.com/cid/advance-article/doi/10.1093/cid/ciaa1429/5910137 by guest on 27 September 2020 A c c e p t e d M a n u s c r i p t 8 years of surveillance occurred in a population that had not been vaccinated with ViPS, whereas the second 2 years occurred during the ViPS trial, we evaluated whether the predictive rule performed similarly with respect to sensitivity and specificity before and during the ViPS trial. Second, we evaluated in the total population the protection against typhoid associated with "good" WASH in the household. To measure the protective association between "good" WASH in the household and the subsequent risk of typhoid fever in household inhabitants, we used Cox proportional hazard regression models after evaluating proportionality assumptions for independent variables. To evaluate independent variables not fulfilling proportionality assumptions, we included the variables as interaction variables with time of follow-up. Models were adjusted for age in years with time interaction, number of household members with time interaction, and monthly household expenditure (in Indian rupees), all of which were associated at p<0.05 with the hazard of typhoid in bivariate Cox models. The hazard ratio (HR) for culture positive typhoid fever was estimated by exponentiation of the coefficient of living in a household with "good" WASH in the model, and the 95% confidence interval (CI) for the HR was estimated using a robust sandwich method where each household was assumed as a cluster in the model. We also used Cox models to evaluate the protective association between the percentage of "good" WASH households ascertained by GIS in the baseline census as located within a 50m radius of the central household under analysis, on the one hand, and the rate of typhoid in the total population during the ensuing four years, on the other. The analysis was performed using "rpart" package for decision tree modeling, "rpart.plot" package for tree plotting, "pROC" packages for ROC curve, "survival" for Cox model and "dplyr" package for data management under R-Studio analytical software (9)(10)(11). In all statistical analyses, p<0.05 (2-tailed) was taken as the margin of statistical significance.  Table 1). There was some evidence for association with always hand washing with soap after defecation (HR=0.74, 95% CI: 0.52-1.05, p=0.097) and no evidence for association with waste disposal practice (p=0.958).
In the training population, the 4 WASH variables associated with risk of typhoid (p<0. 10) were included in a decision tree model for binary classification of household-level WASH. The optimal cutoff probability maximized the Youden Index (0.004343) applied to an ROC curve that had an AUC=58% (95% CI: 54-61) (Figure 2). The resulting decision tree delineated 2 rules for "not

Evaluation of the protective association between WASH and the rate of typhoid
In the total population, in comparison with residence in a "not good" WASH household, residence in a "good" WASH household was associated with 59% (HR=0.41, 95% CI: 0.27-0.63, p<0.001) crude reduction in hazard for culture confirmed typhoid overall and a 43% (HR=0.57 (95% CI: 0.37-0.90, p=0.015) hazard reduction in the adjusted model (Table 2). When stratified by age at baseline, both the younger and older age groups residing in "good" WASH households experienced a reduction in typhoid hazard, 39% (HR=0.61, 95% CI: 0.27-1.38, p=0.235) and 44% (HR=0.56 95% CI: 0.34-0.93, p=0.005), respectively; the association was statistically significant only in the older age group, likely due to the lower number of typhoid cases in the younger age group. We next stratified the population into high versus low levels of "good" WASH in households surrounding the central household within 50m virtual clusters, taking the cut point (34%) as the prevalence of "good" WASH at which there was a noticeable decrease in the incidence of typhoid in the central households ( Figure   4). Our data failed to show that the level of "good" WASH coverage in the community surrounding a central household under analysis modified the protective association between the level of household WASH and the risk of typhoid among members in the central household (Table 3). We next measured A c c e p t e d M a n u s c r i p t 11 the association between the level of surrounding households with "good" WASH and the risk of typhoid in members of the central household in a model that controlled for potentially confounding variables (age, household size, and household expenditure). In this analysis, we found that the association was significant (HR=0.988, 95% CI: 0.979-0.996, p=0.004, for each percent increase of coverage). However, it was not possible to evaluate the independent protective associations with typhoid fever for household WASH on the one hand and level of neighborhood WASH on the other hand due to collinearity between these two variables.

Discussion
We used recursive partitioning to develop a dichotomous classification rule for householdlevel "good" WASH based on 4 independent non-binary WASH variables collected at baseline. When applied to the study population, residence in a "good" WASH household was associated with a moderate decrease in risk for contracting typhoid during the study period. When analyzed separately by age group, the protective association was significant for the older age group and suggestive for the younger age group. The neighborhood level of "good" WASH coverage appeared to be associated with the risk of typhoid among members of central households, however it did not modify the association between the level of WASH in the central household and the risk of typhoid in household members.
The findings of this study must be considered in light of several important limitations. Firstly, our construction of a composite WASH variable to classify households relied on very simply ascertained WASH variables that were collected at baseline only as potential confounders for analysis of the vaccine trial. More in depth characterization of WASH variables might have resulted in a more powerfully predictive WASH composite variable and created the opportunity for further assessment of independent contributions of WASH component variables. For example, although it is likely that drinking water from a household's private tap, well, or pump was microbiologically superior to that Downloaded from https://academic.oup.com/cid/advance-article/doi/10.1093/cid/ciaa1429/5910137 by guest on 27 September 2020 A c c e p t e d M a n u s c r i p t 12 from other sources, we do not have direct data on this issue. Secondly, WASH variables were ascertained at baseline and the population was followed for 4 years. Although the population was quite stable, it is possible that the WASH status of households changed over time. However, such misclassification bias, if random, would have made our analyses conservative. Thirdly, we used a machine learning algorithm that created a dichotomous composite WASH variable. Although useful, particularly in classifying the level of WASH coverage in surrounding households, this dichotomization may have resulted in some loss of information about the variables, again making our analyses conservative. Fourthly, our analysis of the protective association of household WASH and the risk of typhoid by age was limited by sparse data in the younger age stratum. Fifthly, we partitioned the training and validation sets by individuals rather than households, with the result that there were some shared households in the training and validation sets. However, sensitivity and specificity values for the composite variable were nearly identical in the training and validation subsets when we excluded households that were shared by the two subpopulations. Finally, the period of follow-up for our study included periods before and during a ViPS vaccine trial. However, as indicated in the results, separate analyses of the predictive rule during these periods revealed consistent results.
Despite these potential limitations, several factors strengthened our findings. First, the analyses utilized data prospectively collected in the context of a clinical trial for which population information was collected systematically and typhoid ascertainment was done in a comprehensive manner and was confirmed by blood culture positivity. Secondly, the population was relatively stable over time, and thus patterns of migration were unlikely to affect our assessments. And lastly, our dataset was large enough to both train and validate our model in mutually exclusive populations to minimize overfitting.
A c c e p t e d M a n u s c r i p t 13 These results indicate that improvements to WASH short of major municipal infrastructure developments might have an appreciable effect on the risk of typhoid in urban settings of South Asia.
The results reinforce the notion that household transmission was an important contributor to disease transmission in this urban setting, and moreover, that household source of drinking water appears to be a dominant determinant of typhoid risk, although the effect of drinking water source can be modified by a variety of other WASH variables. This evaluation also suggests that the effect of WASH in the household was not modified by the level of WASH in the surrounding neighborhood.
The ability of our analysis to demonstrate that household WASH may influence the risk of typhoid in household residents encourages future studies of this topic. We recommend that future studies characterize WASH in a more comprehensive fashion, including WASH behaviors, and that cohort studies of this topic monitor and analyze changes in household WASH that occur during follow-up. As well, because of the theoretical likelihood that protection conferred by newer generation typhoid vaccines may be modified by the intensity of the typhoid inoculum to which vaccinees are exposed, it will be of great interest to examine the interactions between levels of household WASH and typhoid vaccine protection in future vaccine evaluations.

Conflict of interest
The authors state no commercial or other association that might pose a conflict of interest with regards to the findings presented in this manuscript.