Spatiotemporal Patterns and Diffusion of the 1918 Influenza Pandemic in British India

Abstract The factors that drive spatial heterogeneity and diffusion of pandemic influenza remain debated. We characterized the spatiotemporal mortality patterns of the 1918 influenza pandemic in British India and studied the role of demographic factors, environmental variables, and mobility processes on the observed patterns of spread. Fever-related and all-cause excess mortality data across 206 districts in India from January 1916 to December 1920 were analyzed while controlling for variation in seasonality particular to India. Aspects of the 1918 autumn wave in India matched signature features of influenza pandemics, with high disease burden among young adults, (moderate) spatial heterogeneity in burden, and highly synchronized outbreaks across the country deviating from annual seasonality. Importantly, we found population density and rainfall explained the spatial variation in excess mortality, and long-distance travel via railroad was predictive of the observed spatial diffusion of disease. A spatiotemporal analysis of mortality patterns during the 1918 influenza pandemic in India was integrated in this study with data on underlying factors and processes to reveal transmission mechanisms in a large, intensely connected setting with significant climatic variability. The characterization of such heterogeneity during historical pandemics is crucial to prepare for future pandemics.

The factors that drive spatial heterogeneity and diffusion of pandemic influenza remain debated. We characterized the spatiotemporal mortality patterns of the 1918 influenza pandemic in British India and studied the role of demographic factors, environmental variables, and mobility processes on the observed patterns of spread. Fever-related and allcause excess mortality data across 206 districts in India from January 1916 to December 1920 were analyzed while controlling for variation in seasonality particular to India. Aspects of the 1918 autumn wave in India matched signature features of influenza pandemics, with high disease burden among young adults, (moderate) spatial heterogeneity in burden, and highly synchronized outbreaks across the country deviating from annual seasonality. Importantly, we found population density and rainfall explained the spatial variation in excess mortality, and long-distance travel via railroad was predictive of the observed spatial diffusion of disease. A spatiotemporal analysis of mortality patterns during the 1918 influenza pandemic in India was integrated in this study with data on underlying factors and processes to reveal transmission mechanisms in a large, intensely connected setting with significant climatic variability. The characterization of such heterogeneity during historical pandemics is crucial to prepare for future pandemics. diffusion; environment; influenza; mobility; pandemic; spatial heterogeneity; transportation; tropics The 1918 influenza pandemic has left an indelible mark on human history. Significant increases in the number of deaths associated with respiratory conditions and fever were first observed in the United States in March 1918. By autumn 1918, the H1N1 influenza had spread to the rest of the world, facilitated in that newly globalized era by steamship travel and the intense movement of troops during World War I (1,2). Although pandemic death estimates remain disputed, the global toll was placed at 50 million deaths in 1 analysis (3).
The 1918 pandemic has been well described in the United States and Europe (4)(5)(6)(7) and has been characterized for other parts of the Americas (8)(9)(10)(11). This work of the past 2 decades has established the following signature features of influenza pandemics: a shift in the virus subtype, an age shift in deaths to young adults, successive pandemic waves, high transmissibility, and spatial heterogeneity in burden (12). However, our understanding of historical pandemics in Asia remains limited and focuses primarily on the estimation of the impact of deaths (13)(14)(15)(16)(17)(18)(19), with few exceptions (20). Characterizing the environmental, sociodemographic, and evolutionary factors underlying epidemics is crucial to our ability to develop public health countermeasures and implement effective mitigation plans, and requires diverse case studies across climatic and socioeconomic strata. Here, we contribute a case study of the 1918 pandemic in India, a nation that was made up of a largely rural but intensely connected population spread over diverse climatic regions.
Of the 50 million pandemic-associated deaths, 8 million were thought at the time to have occurred in British India (21); however, that number has since been estimated to be closer to 14 million (17,22). This means that 1 in every 23 Indians died during 1918-1919 and that 1 in every 3.5 global pandemic deaths was an Indian; both proportions are underestimates because they only include the areas of India under British rule. It is understood that influenza first hit the province of Bombay in September 1918 and proceeded to spread north and east in a wave-like pattern that slowed and attenuated in severity as it traveled further from its origin (20,22,23).
The study of sociodemographic factors underlying influenza pandemics has received attention in past work, with the most focus on age-specific death risk. According to many studies, young adults experienced a disproportionately high death risk during the 1918 pandemic, whereas older adults had a relative decreased risk (4,8,9,12,22,(24)(25)(26). Studies of other demographic features, however, have been few; these include the work of Chowell et al. (5) on the role of urbanization in predicting death burden, with both high population density and very low population density being associated with high death rates.
Studies of the impact of environmental factors on historical pandemics includes work on prediction of pandemic emergence based on El Niño cycles (27) and the association between disease and temperature (28) or latitude as a proxy for climate (11). In addition, in recent studies on seasonal outbreaks, the importance of environmental processes has been highlighted by inferring a U-shaped effect of absolute humidity on seasonal influenza prevalence, mediated by temperature (29,30). That is, both low humidity/low temperature environments (as found in temperate regions of the world in winter months) and high humidity/high temperature environments (as found in tropical regions of the world) are hypothesized to increase influenza risk. Also, rainfall has been associated with influenza epidemics in the tropics (31)(32)(33)(34). Some hypotheses supporting this pattern include increased indoor crowding facilitating airborne and droplet transmission, and decreased vitamin D intake depressing innate immune responses (35). These mechanisms may support a contemporaneous or asynchronous association between precipitation and influenza burden (29).
In addition to factors that drive individual-level transmission, population-level processes such as mobility are also crucial to the spatial dynamics of disease. The movement of human hosts provides the scaffolding over which pathogens traverse great spatial distances and has been implicated from the diffusion of plague in pre-industrial Europe along silk trade routes (36) to the spread of Ebola virus via regional connectivity (37) and Zika virus via air travel (38). In rare explanatory studies about spatial diffusion during the 1918 pandemic, Palmer et al. (39) examined the impact of boat and rail traffic on the spread of influenza in Newfoundland through qualitative methods, and Eggo et al. (7) tested whether assorted mobility models predict pandemic dynamics in the United Kingdom and United States. Valleron et al. (40) even implicate surface travel in the spread of the 1889 influenza pandemic through Europe, though they are unable to test this hypothesis because data are unavailable. In India, the railway network began carrying unprecedented numbers of people farther, faster, and more frequently, leading up to the twentieth century (with annual passenger numbers growing from 0.5 million in 1854 to 176 million by 1900 (41)). Simultaneously, infectious disease outbreaks of cholera, plague, malaria, and smallpox were ravaging the country, and Indian rail travel entered the global debate linking human travel to public health. During the year 1918, more than 459 million passengers traveled the Indian railway, and the Sanitary Commissioner of India believed that this played an important role in pandemic influenza spread (20,42).
Here, we characterize the spatial dynamics of excess mortality during the 1918 influenza pandemic in British India with respect to spatial age structure, heterogeneity, and synchrony.
Using key covariate data, we also analyze the underlying environmental factors and social processes that may have contributed to the observed spatial variation and diffusion. In our study of the 1918 influenza pandemic in India, we 1) characterize the spatiotemporal patterns and age structure of excess mortality; 2) explain the spatial variation in excess mortality patterns with demographic and environmental factors; and 3) understand the role of short-and long-distance travel on spatial diffusion during the outbreak. We suggest that the characterization of such heterogeneity during historical pandemics is crucial to our ability to prepare against future pandemics.

Defining pandemic mortality
Death data were obtained from sanitary reports published annually for 206 districts in the provinces or presidencies of Assam, Bengal, Bihar and Orissa, Bombay, Central Province and Berar, Madras, Northwest Frontier Province, Punjab, and the United Provinces . The numbers of monthly fever-related deaths were compiled for all years between 1916 and 1920 at the district level and covered areas of India under British rule, representing approximately 70% of the total population of 318 million in India (93). We also compiled fever-associated, respiratory-associated, and (age-specific and total) all-cause mortality data at the province level for 1916-1920 (43)(44)(45)(46)(47). The primary source indicated that pandemic deaths were preferentially coded as resulting from fever rather than respiratory causes. (In the Appendix, we compare these cause-specific and all-cause mortality data, Web Figure 1, available at https://academic.oup. com/aje.) Consequently, we estimated monthly excess feverrelated mortality (above a seasonal baseline) to identify the number of deaths attributed to influenza, using a seasonal regression model in which differences in regional seasonality were controlled for. To provide a finer temporal resolution, we also resampled and interpolated the monthly excess fever-related mortality to produce weekly excess fever-related mortality time series. We used district-specific, weekly excess fever-related mortality for most of our analysis, with 2 exceptions: For our analysis on age-specific death patterns, we used total all-cause mortality because data on age-specific fever-related deaths were not available; for our analysis on environmental drivers of disease burden, we used district-level, monthly excess fever-related mortality because environmental variables were only available at the monthly level. More details on our data and these procedures can be found in Web Appendix 1.

Defining covariates
Population size data were collected from the 1911 (decennial) Census of India (94) for each district in our data set. Monthly rainfall and monthly minimum temperature data were compiled for 25 districts across all 9 provinces for 1916-1920 from the Sanitary Commissioner's annual report (43)(44)(45)(46)(47).
Data were collected on the number of passengers traveling annually on 59 local railway lines in India (42). On the basis of these data, we constructed a railway travel network, where nodes were districts and an edge existed between 2 districts if there was 1 or more railway lines connecting them. Only railway line origins and final destinations were available. Travel was assumed to be bidirectional on each railway line, and each edge was weighted with the number of annual passengers traveling on the line, if available. After eliminating nodes with no disease data, the railway network consisted of 52 nodes and 41 edges (with edge weight data available on 16 edges).
We also constructed a local travel network where nodes were districts and an edge existed between 2 districts if they shared a physical border. This network represents unobserved travel via roads or waterways. After eliminating nodes with missing disease data, the local travel network consists of 197 nodes and 382 edges.

Measuring spatial heterogeneity and synchrony
We examined spatial heterogeneity of excess fever-related mortality with the Lorenz curve, which is used to compare the cumulative distribution of excess deaths to the cumulative population size among districts (ranked smallest to largest) (95). The farther the Lorenz curve is from our expectation (the main diagonal), the greater the spatial heterogeneity in death rates. We further quantified this through the Gini coefficient, which is close to 1 when there is high spatial heterogeneity in death rates and close to zero when death rates are directly proportional to population size.
To estimate the seasonality of pandemic and nonpandemic seasons, we detected the timing of epidemic peaks in each district by performing a continuous wavelet transformation on the time series of excess fever-related mortality (96,97). Details on these methods can be found in Web Appendix 1.

Examining environmental drivers of disease burden
Among the 25 districts for which rainfall and temperature data were collected, we used 2 time-series generalized linear mixed models to examine the association between excess death rates and environmental factors for months before and during the pandemic period (January 1916 through July 1918 and August 1918 through March 1919, respectively). Excess death rates were transformed as the log of the excess death rates plus 1; environmental data were centered and standardized; district was included as a group (random) effect. For the 2 periods, we compared models where the rainfall predictor had 0-2 month lags to examine synchronous and asynchronous relationships between disease and rainfall.

Explaining spatial diffusion of disease
To test hypotheses about spatial diffusion, we considered associations between observed travel networks and the observed infection data using a likelihood-based approach (98). We applied this method to alternative travel networks assuming infection timing for each district coincided with pandemic onset. We defined a pandemic onset date for each district by using weekly excess death data and specifying onset as the first week when the excess death rate was greater than 1 per 1,000 population. Using this pandemic onset date for each district and 3 network hypotheses (i.e., the local travel network and the railway travel network, unadjusted or weighted by passenger fluxes; each described in the previous section "Defining covariates"), we used a likelihood approach to estimate the predictive power of each empirical network to explain the observed patterns of pandemic spatial spread (98). We inferred transmission parameters for network spread and non-network spread, and measured predictive power by comparing each empirical network with a set of null networks. Additional details on this methodology are given in Web Appendix 1.

Spatial dynamics and age structure
We focused on analyzing the spatial and temporal dynamics of the autumn wave of the 1918 pandemic as measured by the number of fever-related deaths in 206 districts of India. The autumn wave of the pandemic in India started during the first week of September 1918, with shipping traffic into the Bombay port seeding infection (20,23), and lasted through March 1919. Although this wave was concentrated, there was significant heterogeneity among our data on the temporal dynamics of the disease ( Figure 1A). All districts had pandemic onset by November 1918, and cases lasted in each district from 2 to 13 weeks. The northern and central parts of the country (particularly parts of the Central Province and Berar, and the Northwest Frontier) experienced the highest death burden, whereas the southern and eastern districts had less pronounced death waves ( Figure 1B). The spatial diffusion of the pandemic resembled a wave, starting from the western coast and spreading eastward, as demonstrated previously (20).
In Figure 2A, we compared all-cause mortality across age groups during the autumn wave of the pandemic relative to the influenza-relative excess mortality during 1917. Death rates were higher for all age groups compared with those occurring during a recent seasonal outbreak. In particular, the pandemic affected the young, with death rates in individuals aged 20-30 years being 4-to 5-fold the seasonal death rates in the western, central, and northern provinces; the pattern is weak but still detectable in the eastern provinces (including Madras, Bengal, Assam, and Bihar and Orissa), where burden was low. (See also Web Figure 2A in the Web Appendix for relative comparison). In addition, we highlight that India largely did not experience elderly sparing observed in other settings (Web Figure 2B).

Spatial heterogeneity and synchrony
Our spatially resolved data set gave us an opportunity to further consider heterogeneity in the patterns of the 1918 influenza pandemic. In particular, we considered the spatial distribution and the temporal dynamics of the outbreak. In Figure 2B, we illustrate the Lorenz curve, which highlights that larger populations were disproportionately responsible for disease burden. The Gini coefficient for the district-level data is moderate (0.27).
In Figure 2C, we considered the seasonality of influenza during the pandemic and during nonpandemic seasons, using a wavelet analysis. Nonpandemic influenza-relative excess mortality was characterized by 2 different seasonality profiles (with peaks occurring during the summer or winter) for different geographic locations. On the other hand, the 1918 pandemic was highly synchronous across the country, regardless of geography and nonpandemic seasonality.

Environmental drivers of spatial variation in disease burden
We examined associations between environmental drivers and peak influenza activity. First, we validated existing hypotheses about interpandemic seasons in which it is suggested that high rainfall is associated with high excess mortality burden in tropical regions and low temperature (correlated with low humidity) is associated with excess mortality disease burden in temperate regions (99). Next, we explored possible environmental associations with pandemic influenza (27,100,101). Because the mechanisms behind these hypotheses lead to a synchronous or asynchronous association with rainfall, we examined 6 models that had 1 of 0-to 2-month lags for the rainfall predictor (but no lag for the minimum temperature predictor) for the pandemic and nonpandemic periods.
During the nonpandemic period, rainfall was positively predictive of spatial variation in excess mortality burden at all lags, whereas minimum temperature was largely insignificant or demonstrated a small (negative) effect size (Table 1). In contrast, influenza burden during the pandemic period was negatively predicted by rainfall at no lag or a lag of 1 month ( Table 2). The 2month lag model provided the best fit for the nonpandemic period, whereas the no-lag model provided the best fit for the pandemic; however, all models had comparable Akaike Information Criterion values. We note that all models suffered from heteroscedasticity, despite log transformation of the response data.

Human mobility and spatial diffusion of disease
Our analysis of the spatiotemporal patterns of the 1918 pandemic in India led us to 2 hypotheses about the spatial diffusion of influenza during this outbreak: 1) The wave-like pattern observed in Figure 1A and Figure 3A, as well as noted by Chandra and Kassens-Noor (20), supports spread via local travel (e.g., road, waterway); and 2) the spatial heterogeneity and spatial synchrony seen in Figure 2C instead support spread via long-distance travel (e.g., railway).
We tested these hypotheses through a likelihood-based approach (96) by testing the ability of each travel network (local ( Figure 3B), rail ( Figure 3C), and weighted rail (Web Figure 3) networks) to predict the observed spatial progression of disease through the country ( Figure 3A). All 3 networks were significant in explaining the observed disease patterns when compared against the null (Table 3), conditional on the network transmission parameter, β, and non-network transmission parameter, ε. In Web Appendix 2, we tested the sensitivity of these results to our assumptions and found they were robust (Web Table 1).

DISCUSSION
We have presented an analysis of the spatiotemporal spread of the autumn wave of the 1918 influenza pandemic among districts of British India. According to our findings, the spread of the 1918 H1N1 influenza virus was rapid and synchronous across the country but resulted in varying disease burden across regions along an east-west gradient. We show that the spatial variation in infection burden is explained by environmental drivers and that spatial diffusion of disease is predicted by long-distance mobility patterns.
The historical death data presented in this study are subject to limitations, notably in the coding for cause of death.  Figure 1A.
Although our use of fever-related deaths is validated by the primary source, fever-related deaths also include other seasonal infectious disease causes such as malaria. The seasonality of malaria has been identified as May-September during that era, so this would be a confounding factor for those provinces in our study that have monsoon influenza seasonality. Of these, a few are known to be hyperendemic areas (e.g., parts of Central Province and Berar, Bihar and Orissa, Assam) (102). However, given our focus on the autumn wave of the 1918 pandemic and because low malaria burden during 1918 was reported in historical records, we expect this to have limited impact on our findings (21). Influenza seasonality remains poorly characterized, particularly in low-income countries and in the tropics (29,(103)(104)(105). The distinct seasonality we observed in India during nonpandemic excess mortality activity in 1916-1920 coincides with the climate zones of India, based on the Köppen classification (106), with the northeastern region classified as "humid semitropical" and distinct from surrounding regions. Our seasonality findings are also largely consistent with those of studies of contemporary influenza seasonality in India (107)(108)(109) and in other countries with mixed climates (104). However, we note that our methods are unable to disentangle excess mortality caused by influenza from deaths due to other pathogens with similar symptoms (e.g., malaria), and this inability may affect our understanding of nonpandemic seasonality.
We observed the signature "W" pattern of 1918 age-specific death among the provinces of India, with the highest death rates among infants, followed by older adults and adults. This pattern is similar to what has been found in other countries outside Europe and North America during the 1918 pandemic, including Colombia (10), Mexico (8), and Brazil (110). (Comparison of age-specific death data from other countries is available in Web Figure 2.) High-income countries have reported relatively low death rates among the elderly, but this was not observed in Indian populations, which suggests the elderly in the Indian population may not have been exposed to the 1830s global pandemic virus or its descendants (4,26). We also observed similarity in the agespecific death curves among provinces with the same seasonal influenza patterns (following the east-west gradient), where western districts with temperate-region seasonal influenza patterns tended to have higher death rates. Our findings may be limited by our use of all-cause mortality data.
Although we could not examine absolute and relative humidity (30), we considered the role of rainfall in predicting death burden during the pandemic compared with nonpandemic years. India experiences complex seasonal influenza dynamics due to its size and climatic diversity. During nonpandemic periods, we found distinct seasonal patterns according to regional climatic profiles, and rainfall was positively associated with the magnitude of excess mortality, thus providing evidence for the increased crowding or decreased micronutrient hypothesis (35). On the other hand, the 1918 pandemic in India had highly synchronous dynamics that supplanted distinct seasonality in different regions of the country. We thus hypothesize that sociodemographic and immunological factors may have dominated environmental ones to synchronize the timing of the autumn pandemic wave, as has been observed for the recent 2009 H1N1 pandemic (111). The magnitude of the autumn pandemic wave, however, was still modulated by environmental factors, with pandemic disease burden being inversely proportional to rainfall. We speculate that this surprising result can be explained by the link between environment and nutrition. The year 1918 brought one of the most   Figure 2. Spatial age structure, heterogeneity, and synchrony of the 1918 influenza pandemic in British India. A) Age-specific, all-cause mortality during the 1918 pandemic autumn wave, by province. For comparison, we considered the age-specific, all-cause deaths during the 1917 seasonal influenza epidemic for all of India. Age groups are defined as follows: young children, <5 years; school children, 5-20 years; young adults, 20-30 years; adults, 30-50 years; older adults, ≥50 years. B) The Lorenz curve illustrates the distribution of the total number of excess deaths for the 1918-1919 pandemic wave as a function of cumulative (ascending) population size at the district level. The blue circles represent the empirical data that demonstrate moderate heterogeneity (with a Gini coefficient of 0.27); the gray dashed line shows the null, which represents no heterogeneity in death rates. C) A wavelet analysis illustrates the synchrony of peak time of influenza activity during pandemic (1918)(1919) and nonpandemic (1916-1917 and 1917-1918) periods. Each violin plot shows the occurrence of excess mortality peaks across districts within a given province between June of the first year and May of the subsequent year. Central P, Central Province. severe droughts of the twentieth century to India, except in the northeastern region, which received excess rain during the monsoon season (June to August) (112,113). These dry conditions, although beneficial for depressing other infectious diseases such as malaria and plague, led to food and milk shortages, thus increasing susceptibility to infection (21,114,115). Our findings support this hypothesis in the models with an asynchronous association between rainfall and influenza burden. Our results also show a synchronous inverse relationship between precipitation and disease (in the lag-0 pandemic models), suggesting a correlation between rainfall and absolute humidity, but such a relationship would require additional data to test. Beyond environmental factors, we also sought to identify demographic and social processes that could explain the observed spatial dynamics. First, by constructing a Lorenz curve, we identified spatial heterogeneity in the burden of the pandemic in British India and found this burden was nonlinearly associated with population size. Our finding of a Gini coefficient of 0.27 is comparable to those reported for rural areas of England and Wales for the 1918 pandemic (5). Second, we focused on the impact of hostmovement dynamics on the spatial spread of disease. Past studies have identified 2 classes of spatial dynamics for influenza: 1) a local and radially diffusive wave of spread as observed by Gog et al. (111); and 2) hierarchical spread starting at populous centers (connected by long-distance travel) with subsequent spread to smaller areas (7,116,117). Disentangling the hypotheses of wave-like versus hierarchical spread is key to our understanding of transmission mechanisms and to targeting control measures. Results of our spatiotemporal descriptive analysis of and past work on the Indian pandemic are suggestive of preliminary support for both hypotheses, thus we used a data-driven statistical approach to test them. Our findings provide significant evidence for long-distance travel (via the rail network) and for short-distance travel (via a local travel network), thus supporting the hierarchical-spread hypothesis. Other modes of transportation (e.g., shipping traffic, troop movements during World War I) may have also contributed to host mobility and infection seeding (particularly in the port cities of Madras and Calcutta). However, our findings provide a parsimonious explanation for the observed spread without these alternative modes.
Although the intense connectivity provided by rail travel may have been a key player in the propagation of the pandemic, the railways were also a focus of public health monitoring and biosecurity in India. Devastating plague outbreaks motivated an extensive and coordinated entry and exit screening medical surveillance system, of which the railways were a part, starting in 1897 (41). In addition, railway carriages were disinfected intensely. Modern outbreaks of severe acute respiratory syndrome and the 2009 H1N1 influenza have brought into focus the limited impact of travel restrictions and travel surveillance, given that the reductions necessary to significantly affect spatial spread are not feasible in practice (118,119). Our findings about the role of travel in the pandemic spread (particularly the support of the railway network weighted by passengers) either confirm travel surveillance was also not very effective in 1918 or suggest the pandemic burden would have been worse in the absence of these public health efforts.
We limited our current study to the autumn wave of 1918 because it was the largest wave in India. Recent work has highlighted the importance of the "herald waves" that have been documented in North America and Europe ahead of the autumn wave (120). Our data show limited evidence of high influenzarelated death rates during April to May 1918, particularly in the districts of the United Provinces of India ( Figure 1). Although this epidemic may have been a herald wave, perhaps explaining the United Provinces' relatively mild autumn pandemic wave, the United Provinces' year-round excess mortality activity makes it difficult to distinguish from a seasonal influenza outbreak.
This study contributes to our understanding of spatial variation and diffusion during the 1918 influenza pandemic. India of 1918 represents a unique case study with a highly rural population in a climatically diverse setting, intraconnected via railways and interconnected with the rest of the globe through shipping traffic. The lack of elderly sparing and a largely missing herald wave place the 1918 pandemic in India with other rural and isolated populations, although the early, large, and fast autumn wave make it similar to the pandemic dynamics of large, connected locations. Our findings provide a parsimonious explanation of the spatial dynamics of the pandemic in India via environmental and social processes. In particular, our work highlights the role of rainfall in emerging infectious disease dynamics. As our society moves into an increasingly water-stressed future, we advocate that pandemic planning should better integrate an understanding of environmental extremes and how they feed into social, agricultural, and economic processes affecting disease transmission. We thank Sarah Kramer for data collection and analysis early in this study and Angela Wong for assistance with covariate data digitization. We also thank Dr. Patrick Manning and the Collaborative for Historical Information and Analysis at the University of Pittsburgh for collaboration on data collection.
Conflict of interest: none declared.