ABSTRACT

Background

Disparities in cross-regional coronavirus disease 2019 (Covid-19) mortality remain poorly understood. The association between pre-epidemic health and epidemic mortality can inform a policy response to future outbreaks.

Method

We conducted an ecological study of the association between the cumulative deaths attributed to Covid-19 epidemic in the 20 Italian regions and nine determinants of population health derived from a systematic review of the literature. We used a multiple least square regression to predict the cross-regional variation in mortality observed from the onset of the epidemic to 23 September 2020.

Results

Four independent variables best explained the cross-regional differences in the number of deaths attributed to Covid-19: the force of infection, population density, number of elderly living in assisted facilities and the standard rate of diabetes. The semi-partial correlation coefficients suggest that the force of infection and the number of elderly residents in nursing homes were the dominant predictors of the number of deaths attributed to Covid-19. Statistical controls and validation confirmed the generalizability of the predictive model.

Conclusions

Our findings indicate that a significant reduction of social contacts in main metropolitan areas and the timely isolation of elderly and diabetic residents could significantly reduce the death toll of the next wave of Covid-19 infection in Italy.

Introduction

The uncertainty of the second wave of coronavirus disease 2019 (Covid-19) infections hangs over a society already physically and mentally exhausted by lockdowns and social distancing. Future scenarios for the next Covid-19 infection vary from localized outbreaks to a new, stronger epidemic.1 Policymakers seem to rely on targeted lockdowns as a trade-off between reducing the risk of further outbreaks and allowing for some degree of social life. There is still no consensus, though, on a public policy to respond to a new wave of Covid-19.

The Covid-19 outbreak found most countries unprepared to face a fast-spreading threat to public health.2 The first Italian case of Covid-19 secondary transmission was identified in Codogno, a town close to Milan, on 18 February 2020. Only 200 days later, on 23 September 2020, 302 537 confirmed cases and 35 758 deaths were attributable to Covid-19 infection. The 20 Italian regions paid a different toll to Covid-19. Mortality rates (deaths over 100 000 residents) and fatality rates (deaths over infected patients) observed in the Italian regions vary widely around the national mean. Mortality rates ranged from 168 deaths over 100 000 residents in Lombardy to just about five in Sicily, Calabria and Basilicata. Fatality rates show the same variability, from 0.16 in Lombardy to 0.04 in Umbria, Molise, Campania, Basilicata and Sardinia (Fig. 1).3

Cumulative mortality and fatality rates stratified by Italian regions from the onset of the epidemic to 23 September 2020 (data source: Italian MinSan).
Fig. 1

Cumulative mortality and fatality rates stratified by Italian regions from the onset of the epidemic to 23 September 2020 (data source: Italian MinSan).

Disparities in cross-regional Covid-19 epidemic mortality remain poorly understood. This ecological study uses cumulative regional data on the observed number of deaths attributed to Covid-19 to explore the determinants of epidemic mortality. The variables of the model (the ‘predictors’) could inform a data-driven response to the next wave of Covid-19.

Method

The data underlying this article are available in the article and its online supplementary material.

Study design

We performed an ecological study of associations between the cumulative number of deaths attributed to Covid-19 and the determinants of population health derived from a systematic review of the literature. The number of regionally stratified deaths attributable to Covid-19 was derived from the civil protection database and referred to the epidemic period between 1 January 2020 and 23 September 2020. We defined the Covid-19 deaths as the ‘mortality space’ in which their values along its dimensions could locate each of the 20 Italian regions. We then fitted an equation whose form was adequate to predict the number of Covid-19 deaths by Region with the smallest margin of error. The ‘mortality space’ will then be defined by the predictors, the coefficients of this equation.4

This analysis is divided into four main methodological steps: literature review, variable selection, regression analysis and model validation. Considering the previously published models of Covid-19 mortality, the authors showed three critical risks of bias: the selection of predictors, the method of analysis and, most importantly, the lack of validation of the model.5 To mitigate ‘a priori’ these known sources of bias, we firstly performed a systematic review of the literature to inform the selection of predictors and the method of analysis.

The systematic review was conducted in adherence to the PRISMA guidelines,6 and the search protocol was published in the International Prospective Register of Systematic Reviews (PROSPERO).7 We identified and critically appraised 56 studies reporting predictors of mortality attributable to Covid-19. We extracted 12 potential predictors of Covid-19 mortality from the models included in the review, listed below in descending order of frequency (in parenthesis): age (28), population tested/swabs administered (20), pre-existing medical conditions (19), the severity of Covid-19 outbreak (15), gender (13), exposure to air pollution (12), hospital resources/health spending (11), Gross Domestic Product (GDP) per capita/income inequality/deprivation (11), ethnicity (seven), population density (six), climate (four) and number of elderly residents in assisted living facilities (one).

Table 1

Selection of predicting variables

Independent variablesDependent variable: deaths attributable to COVID-19
Correlation coefficient rSignificance level*95% confidence interval of r
Elderly in assisted living facilities0.8993P < 0.00010.7588 to 0.9599
Total number of swabs administered0.8381P < 0.00010.6288 to 0.9342
Population density0.7084P = 0.00050.3873 to 0.8762
GINI index−0.1259P = 0.5968−0.5384 to 0.3353
Exposure to pollution0.7407P = 0.00020.4436 to 0.8911
Attack rate0.6815P = 0.00090.3421 to 0.8636
Healthcare public expenditure per capita0.01704P = 0.9431−0.4287 to 0.4561
Population > 75 as % of total population0.0721P = 0.7626−0.3826 to 0.4987
Males as % of population >75−0.3808P = 0.0976−0.7046 to 0.07423
GDP per capita0.6269P = 0.00310.2552 to 0.8372
African and Asian residents0.8463P < 0.00010.6455 to 0.9377
Diff max temperature from mean C° Feb 2020 source: CNR data0.1131P = 0.6349−0.3468 to 0.5291
Std mortality rate−0.05728P = 0.8104−0.4875 to 0.3953
Co-morbidity rate−0.4189P = 0.0660−0.7267 to 0.02898
Std rate diabetes−0.4625P = 0.0401−0.7512 to −0.02505
Std rate hyperthension−0.1890P = 0.4249−0.5828 to 0.2767
Std rate obesity−0.5094P = 0.0218−0.7768 to 0.08638
Independent variablesDependent variable: deaths attributable to COVID-19
Correlation coefficient rSignificance level*95% confidence interval of r
Elderly in assisted living facilities0.8993P < 0.00010.7588 to 0.9599
Total number of swabs administered0.8381P < 0.00010.6288 to 0.9342
Population density0.7084P = 0.00050.3873 to 0.8762
GINI index−0.1259P = 0.5968−0.5384 to 0.3353
Exposure to pollution0.7407P = 0.00020.4436 to 0.8911
Attack rate0.6815P = 0.00090.3421 to 0.8636
Healthcare public expenditure per capita0.01704P = 0.9431−0.4287 to 0.4561
Population > 75 as % of total population0.0721P = 0.7626−0.3826 to 0.4987
Males as % of population >75−0.3808P = 0.0976−0.7046 to 0.07423
GDP per capita0.6269P = 0.00310.2552 to 0.8372
African and Asian residents0.8463P < 0.00010.6455 to 0.9377
Diff max temperature from mean C° Feb 2020 source: CNR data0.1131P = 0.6349−0.3468 to 0.5291
Std mortality rate−0.05728P = 0.8104−0.4875 to 0.3953
Co-morbidity rate−0.4189P = 0.0660−0.7267 to 0.02898
Std rate diabetes−0.4625P = 0.0401−0.7512 to −0.02505
Std rate hyperthension−0.1890P = 0.4249−0.5828 to 0.2767
Std rate obesity−0.5094P = 0.0218−0.7768 to 0.08638

*Green boxes identify significance level acceptable to inclusion into the model (P ≤ 0.05)

Table 1

Selection of predicting variables

Independent variablesDependent variable: deaths attributable to COVID-19
Correlation coefficient rSignificance level*95% confidence interval of r
Elderly in assisted living facilities0.8993P < 0.00010.7588 to 0.9599
Total number of swabs administered0.8381P < 0.00010.6288 to 0.9342
Population density0.7084P = 0.00050.3873 to 0.8762
GINI index−0.1259P = 0.5968−0.5384 to 0.3353
Exposure to pollution0.7407P = 0.00020.4436 to 0.8911
Attack rate0.6815P = 0.00090.3421 to 0.8636
Healthcare public expenditure per capita0.01704P = 0.9431−0.4287 to 0.4561
Population > 75 as % of total population0.0721P = 0.7626−0.3826 to 0.4987
Males as % of population >75−0.3808P = 0.0976−0.7046 to 0.07423
GDP per capita0.6269P = 0.00310.2552 to 0.8372
African and Asian residents0.8463P < 0.00010.6455 to 0.9377
Diff max temperature from mean C° Feb 2020 source: CNR data0.1131P = 0.6349−0.3468 to 0.5291
Std mortality rate−0.05728P = 0.8104−0.4875 to 0.3953
Co-morbidity rate−0.4189P = 0.0660−0.7267 to 0.02898
Std rate diabetes−0.4625P = 0.0401−0.7512 to −0.02505
Std rate hyperthension−0.1890P = 0.4249−0.5828 to 0.2767
Std rate obesity−0.5094P = 0.0218−0.7768 to 0.08638
Independent variablesDependent variable: deaths attributable to COVID-19
Correlation coefficient rSignificance level*95% confidence interval of r
Elderly in assisted living facilities0.8993P < 0.00010.7588 to 0.9599
Total number of swabs administered0.8381P < 0.00010.6288 to 0.9342
Population density0.7084P = 0.00050.3873 to 0.8762
GINI index−0.1259P = 0.5968−0.5384 to 0.3353
Exposure to pollution0.7407P = 0.00020.4436 to 0.8911
Attack rate0.6815P = 0.00090.3421 to 0.8636
Healthcare public expenditure per capita0.01704P = 0.9431−0.4287 to 0.4561
Population > 75 as % of total population0.0721P = 0.7626−0.3826 to 0.4987
Males as % of population >75−0.3808P = 0.0976−0.7046 to 0.07423
GDP per capita0.6269P = 0.00310.2552 to 0.8372
African and Asian residents0.8463P < 0.00010.6455 to 0.9377
Diff max temperature from mean C° Feb 2020 source: CNR data0.1131P = 0.6349−0.3468 to 0.5291
Std mortality rate−0.05728P = 0.8104−0.4875 to 0.3953
Co-morbidity rate−0.4189P = 0.0660−0.7267 to 0.02898
Std rate diabetes−0.4625P = 0.0401−0.7512 to −0.02505
Std rate hyperthension−0.1890P = 0.4249−0.5828 to 0.2767
Std rate obesity−0.5094P = 0.0218−0.7768 to 0.08638

*Green boxes identify significance level acceptable to inclusion into the model (P ≤ 0.05)

Our second methodological step was to transform the 12 predictors identified by the systematic review into inputs to inform the predictive model. This required the adaptation of the predictors to the granularity of data available at Italian regional level. The process generated 17 independent variables grouped by five main domains: five demographic variables (age, gender, population density, ethnicity and elderly living in assisting facilities), three economic variables (GDP per capita, income inequality and public expenditures in healthcare), two variables related to Covid-19 infection (force of infection and number of swabs carried out), five variables describing the population’s pre-existing medical conditions (standard rates of mortality, diabetes, hypertension, obesity and comorbidity) and two environmental variables (air pollution and climate). For each predictor considered, data for 20 Italian regions were extracted from primary national sources (see supplementary material Table 4).

Thirdly, a univariate correlation analysis was performed to evaluate its association with the cumulative deaths attributed to Covid-19. The variables showing a significant association (P ≤ 0.05) with the cumulative deaths attributable to Covid-19 were included in the multiple regression. Predicted and actual values were then compared to validate the predictive accuracy of the model.

Statistical analysis

A univariate correlation analysis was performed to test the statistical association between the number of deaths attributed to Covid-19 in each Region and the 17 independent variables identified by the literature review. Nine independent variables, whose correlation coefficient showed a significance level of P ≤ 0.05, were included in the final regression model (Table 1).

We used multiple least square regression to estimate the coefficients of the predictive model. Since the variables’ values spanned nine orders of magnitude (from millions to decimals), we transformed all inputs to the model into their natural logarithms.8 A step-forward selection was adopted to add to the model the relevant variables (Pearson correlation index with a P-value < 0.05). All selected variables were added one at a time beginning with the predictor with the highest correlation with the dependent variable. If the added variable did not contribute to improving the goodness-of-fit of the regression, it was then excluded from the regression model.9

Finally, statistical controls and a validation process were used to test the generalizability of our model. We calculated the zero-order simple regression coefficients, and we then reported the statistical controls related to the least-squares multiple regression. The accuracy of the predictive model was measured as the mean absolute percentage error (MAPE), or mean error between the actual and the predicted values of the dependent variable.10 We used the following scale for the comparison and interpretation of MAPE values: MAPE < 10, highly accurate forecasting; ≥10 MAPE ≤ 20, good forecasting; ≥20 MAPE ≤ 50, reasonable forecasting; and > 50, inaccurate forecasting.11

The predictive performance was tested following a validation process, aimed to avoid the ‘ecological fallacy’, which occurs when associations which exist for groups are assumed to also be true for individuals.12 We validated the selection of variables included in the model using randomly generated ‘training sets’. The accuracy of the model was assessed by comparing the mean MAPE error of the original regression and the one obtained from the training set.

Results

Univariate association of all the independent variables identified by the review of the literature

The univariate correlation values confirmed a strong correlation between nine predictors and the number of deaths attributed to Covid-19, as reported in Table 1. The univariate analysis seemed to question the predictive validity of some variables frequently used to model Covid-19 mortality. Population ageing, gender and the rate of comorbidity resulted poorly or negatively correlated to the number of deaths attributable to Covid-19.

Table 2

Inputs to the predictive model (non-transformed values)

Italian regionsDependent variableIndependent variables included in the predictive model
Deaths attributed to COVID-19 from onset to 23/09/2020Population density: 100 000 residents per square kilometreNumber of African and Asian residentsElderly in assisted living facilitiesGross Domestic Product (GDP) per capita (Euro)Attack rate: COVID-19 confirmed cases per 100 000 residentsCumulative number of swabs from onset to 06/06/2020Standard rateExposure to air pollution (number of 2019 days over the limit times residents exposed)
DiabetesObesity
Piedmont4157172665 78036 27930 300793.7682 2824.939.3158 856 813
Valle d’Aosta14639366599235 2001021.827 9774.939.91156 272
Liguria159428679 79511 08529 678823.5291 9364.441.326 727 702
Lombardy16 925422711 77978 30638 2001045.91990 9124.739.3234 554 214
Trentino Alto Adige6977934 174832639 200853.2383 6773.538.35441 248
Veneto2167267180 44937 07333 100533.11820 1014.443.2107 437 139
Friuli Venezia Giulia35015334 70911 34331 000367.1388 8105.040.24470 953
Emilia Romagna4479199268 09728 99135 300776.11110 2875.243.4137 369 149
Tuscany1153162192 46617 86430 500377.0694 2045.243.217 941 648
Umbria8510434 520251624 300256.9192 5245.342.85205 203
Marche98916261 899706726 600511.5236 5144.739.11162 392
Lazio902341303 85415 44232 900254.7810 8096.044.780 247 263
Abruzzo47712127 699772124 400324.0189 0525.447.86335 168
Molise23696716173519 500203.840 5165.651.11398 397
Campania457424134 193332818 200188.0552 2316.851.540 938 370
Pulia58320656 066805218 000179.5385 4906.350.27868 080
Basilicata28568755119720 800118.968 0816.648.93338 450
Calabria9812847 464391017 10096.9190 0317.348.01412 670
Sicily303194108 28514 85617 400124.7448 4125.947.41350 200
Sardinia1456827 742496620 300207.7175 8295.342.137 069 268
Italian regionsDependent variableIndependent variables included in the predictive model
Deaths attributed to COVID-19 from onset to 23/09/2020Population density: 100 000 residents per square kilometreNumber of African and Asian residentsElderly in assisted living facilitiesGross Domestic Product (GDP) per capita (Euro)Attack rate: COVID-19 confirmed cases per 100 000 residentsCumulative number of swabs from onset to 06/06/2020Standard rateExposure to air pollution (number of 2019 days over the limit times residents exposed)
DiabetesObesity
Piedmont4157172665 78036 27930 300793.7682 2824.939.3158 856 813
Valle d’Aosta14639366599235 2001021.827 9774.939.91156 272
Liguria159428679 79511 08529 678823.5291 9364.441.326 727 702
Lombardy16 925422711 77978 30638 2001045.91990 9124.739.3234 554 214
Trentino Alto Adige6977934 174832639 200853.2383 6773.538.35441 248
Veneto2167267180 44937 07333 100533.11820 1014.443.2107 437 139
Friuli Venezia Giulia35015334 70911 34331 000367.1388 8105.040.24470 953
Emilia Romagna4479199268 09728 99135 300776.11110 2875.243.4137 369 149
Tuscany1153162192 46617 86430 500377.0694 2045.243.217 941 648
Umbria8510434 520251624 300256.9192 5245.342.85205 203
Marche98916261 899706726 600511.5236 5144.739.11162 392
Lazio902341303 85415 44232 900254.7810 8096.044.780 247 263
Abruzzo47712127 699772124 400324.0189 0525.447.86335 168
Molise23696716173519 500203.840 5165.651.11398 397
Campania457424134 193332818 200188.0552 2316.851.540 938 370
Pulia58320656 066805218 000179.5385 4906.350.27868 080
Basilicata28568755119720 800118.968 0816.648.93338 450
Calabria9812847 464391017 10096.9190 0317.348.01412 670
Sicily303194108 28514 85617 400124.7448 4125.947.41350 200
Sardinia1456827 742496620 300207.7175 8295.342.137 069 268
Table 2

Inputs to the predictive model (non-transformed values)

Italian regionsDependent variableIndependent variables included in the predictive model
Deaths attributed to COVID-19 from onset to 23/09/2020Population density: 100 000 residents per square kilometreNumber of African and Asian residentsElderly in assisted living facilitiesGross Domestic Product (GDP) per capita (Euro)Attack rate: COVID-19 confirmed cases per 100 000 residentsCumulative number of swabs from onset to 06/06/2020Standard rateExposure to air pollution (number of 2019 days over the limit times residents exposed)
DiabetesObesity
Piedmont4157172665 78036 27930 300793.7682 2824.939.3158 856 813
Valle d’Aosta14639366599235 2001021.827 9774.939.91156 272
Liguria159428679 79511 08529 678823.5291 9364.441.326 727 702
Lombardy16 925422711 77978 30638 2001045.91990 9124.739.3234 554 214
Trentino Alto Adige6977934 174832639 200853.2383 6773.538.35441 248
Veneto2167267180 44937 07333 100533.11820 1014.443.2107 437 139
Friuli Venezia Giulia35015334 70911 34331 000367.1388 8105.040.24470 953
Emilia Romagna4479199268 09728 99135 300776.11110 2875.243.4137 369 149
Tuscany1153162192 46617 86430 500377.0694 2045.243.217 941 648
Umbria8510434 520251624 300256.9192 5245.342.85205 203
Marche98916261 899706726 600511.5236 5144.739.11162 392
Lazio902341303 85415 44232 900254.7810 8096.044.780 247 263
Abruzzo47712127 699772124 400324.0189 0525.447.86335 168
Molise23696716173519 500203.840 5165.651.11398 397
Campania457424134 193332818 200188.0552 2316.851.540 938 370
Pulia58320656 066805218 000179.5385 4906.350.27868 080
Basilicata28568755119720 800118.968 0816.648.93338 450
Calabria9812847 464391017 10096.9190 0317.348.01412 670
Sicily303194108 28514 85617 400124.7448 4125.947.41350 200
Sardinia1456827 742496620 300207.7175 8295.342.137 069 268
Italian regionsDependent variableIndependent variables included in the predictive model
Deaths attributed to COVID-19 from onset to 23/09/2020Population density: 100 000 residents per square kilometreNumber of African and Asian residentsElderly in assisted living facilitiesGross Domestic Product (GDP) per capita (Euro)Attack rate: COVID-19 confirmed cases per 100 000 residentsCumulative number of swabs from onset to 06/06/2020Standard rateExposure to air pollution (number of 2019 days over the limit times residents exposed)
DiabetesObesity
Piedmont4157172665 78036 27930 300793.7682 2824.939.3158 856 813
Valle d’Aosta14639366599235 2001021.827 9774.939.91156 272
Liguria159428679 79511 08529 678823.5291 9364.441.326 727 702
Lombardy16 925422711 77978 30638 2001045.91990 9124.739.3234 554 214
Trentino Alto Adige6977934 174832639 200853.2383 6773.538.35441 248
Veneto2167267180 44937 07333 100533.11820 1014.443.2107 437 139
Friuli Venezia Giulia35015334 70911 34331 000367.1388 8105.040.24470 953
Emilia Romagna4479199268 09728 99135 300776.11110 2875.243.4137 369 149
Tuscany1153162192 46617 86430 500377.0694 2045.243.217 941 648
Umbria8510434 520251624 300256.9192 5245.342.85205 203
Marche98916261 899706726 600511.5236 5144.739.11162 392
Lazio902341303 85415 44232 900254.7810 8096.044.780 247 263
Abruzzo47712127 699772124 400324.0189 0525.447.86335 168
Molise23696716173519 500203.840 5165.651.11398 397
Campania457424134 193332818 200188.0552 2316.851.540 938 370
Pulia58320656 066805218 000179.5385 4906.350.27868 080
Basilicata28568755119720 800118.968 0816.648.93338 450
Calabria9812847 464391017 10096.9190 0317.348.01412 670
Sicily303194108 28514 85617 400124.7448 4125.947.41350 200
Sardinia1456827 742496620 300207.7175 8295.342.137 069 268
Table 3

Model predictors and regression results by Italian region

Italian regionsModel predictionsPredictors
Natural log valuesMean Absolute Percent Error (MAPE)Actual, predicted and relative valuesActual values
Deaths attributed to COVID-19Deaths predicted by the modelDeaths attributed to COVID-19Deaths predicted by the modelDeaths attributed to COVID-19 (per 100 000 residents)Deaths predicted by the model (per 100 000 residents)Mortality ratedeaths/Covid-19 cases)Mortality rate predicted by the model (deaths/Covid-19 cases)Attack rate (Covid-19 cases per 100 000 residents)Elderly living in RSAPopulation density (residents per km2)Incidence of diabetes (standard rate)
Lombardy9.749.780.4616925176941681760.160.171046783064224.70
Valle d’Aosta4.984.990.071461471161170.110.111022992394.90
Emilia Romagna8.418.380.3544794348100970.130.13776289911995.20
Piedmont8.338.370.454157431895990.120.12794362791724.90
Liguria7.377.572.62159419341031250.120.15823110852864.40
Trentino alto Adige6.556.146.2269746465430.080.058538326793.50
Veneto7.687.912.982167272444560.080.10533370732674.40
Friuli Venezia Giulia5.866.439.8135062229510.080.14367113431535.00
Northern regions2.8730 51532 2501101160.130.148232123952314.71
Marche6.906.407.2698959965390.130.0851170671624.70
Abruzzo6.175.992.8847739936300.110.0932477211215.40
Tuscany7.056.941.571153103231280.080.07377178641625.20
Lazio6.807.063.79902116815200.060.08255154423416.00
Pulia6.375.907.365833651490.080.0517980522066.30
Campania6.125.864.35457350860.040.0318833284246.80
Sardinia4.984.685.88145108970.040.032084966685.30
Umbria4.444.685.368510810120.040.0525725161045.30
Sicily5.715.730.36303309660.050.05125148561945.90
Molise3.143.9626.2823528170.040.082041735695.60
Calabria4.584.560.589895550.050.059739101287.30
Basilicata3.333.194.302824540.040.041191197566.60
Southern regions5.835243461116140.070.06228886541796.02
Total tally4.6535 75836 86159610.120.125013010492005.42
Italian regionsModel predictionsPredictors
Natural log valuesMean Absolute Percent Error (MAPE)Actual, predicted and relative valuesActual values
Deaths attributed to COVID-19Deaths predicted by the modelDeaths attributed to COVID-19Deaths predicted by the modelDeaths attributed to COVID-19 (per 100 000 residents)Deaths predicted by the model (per 100 000 residents)Mortality ratedeaths/Covid-19 cases)Mortality rate predicted by the model (deaths/Covid-19 cases)Attack rate (Covid-19 cases per 100 000 residents)Elderly living in RSAPopulation density (residents per km2)Incidence of diabetes (standard rate)
Lombardy9.749.780.4616925176941681760.160.171046783064224.70
Valle d’Aosta4.984.990.071461471161170.110.111022992394.90
Emilia Romagna8.418.380.3544794348100970.130.13776289911995.20
Piedmont8.338.370.454157431895990.120.12794362791724.90
Liguria7.377.572.62159419341031250.120.15823110852864.40
Trentino alto Adige6.556.146.2269746465430.080.058538326793.50
Veneto7.687.912.982167272444560.080.10533370732674.40
Friuli Venezia Giulia5.866.439.8135062229510.080.14367113431535.00
Northern regions2.8730 51532 2501101160.130.148232123952314.71
Marche6.906.407.2698959965390.130.0851170671624.70
Abruzzo6.175.992.8847739936300.110.0932477211215.40
Tuscany7.056.941.571153103231280.080.07377178641625.20
Lazio6.807.063.79902116815200.060.08255154423416.00
Pulia6.375.907.365833651490.080.0517980522066.30
Campania6.125.864.35457350860.040.0318833284246.80
Sardinia4.984.685.88145108970.040.032084966685.30
Umbria4.444.685.368510810120.040.0525725161045.30
Sicily5.715.730.36303309660.050.05125148561945.90
Molise3.143.9626.2823528170.040.082041735695.60
Calabria4.584.560.589895550.050.059739101287.30
Basilicata3.333.194.302824540.040.041191197566.60
Southern regions5.835243461116140.070.06228886541796.02
Total tally4.6535 75836 86159610.120.125013010492005.42
Table 3

Model predictors and regression results by Italian region

Italian regionsModel predictionsPredictors
Natural log valuesMean Absolute Percent Error (MAPE)Actual, predicted and relative valuesActual values
Deaths attributed to COVID-19Deaths predicted by the modelDeaths attributed to COVID-19Deaths predicted by the modelDeaths attributed to COVID-19 (per 100 000 residents)Deaths predicted by the model (per 100 000 residents)Mortality ratedeaths/Covid-19 cases)Mortality rate predicted by the model (deaths/Covid-19 cases)Attack rate (Covid-19 cases per 100 000 residents)Elderly living in RSAPopulation density (residents per km2)Incidence of diabetes (standard rate)
Lombardy9.749.780.4616925176941681760.160.171046783064224.70
Valle d’Aosta4.984.990.071461471161170.110.111022992394.90
Emilia Romagna8.418.380.3544794348100970.130.13776289911995.20
Piedmont8.338.370.454157431895990.120.12794362791724.90
Liguria7.377.572.62159419341031250.120.15823110852864.40
Trentino alto Adige6.556.146.2269746465430.080.058538326793.50
Veneto7.687.912.982167272444560.080.10533370732674.40
Friuli Venezia Giulia5.866.439.8135062229510.080.14367113431535.00
Northern regions2.8730 51532 2501101160.130.148232123952314.71
Marche6.906.407.2698959965390.130.0851170671624.70
Abruzzo6.175.992.8847739936300.110.0932477211215.40
Tuscany7.056.941.571153103231280.080.07377178641625.20
Lazio6.807.063.79902116815200.060.08255154423416.00
Pulia6.375.907.365833651490.080.0517980522066.30
Campania6.125.864.35457350860.040.0318833284246.80
Sardinia4.984.685.88145108970.040.032084966685.30
Umbria4.444.685.368510810120.040.0525725161045.30
Sicily5.715.730.36303309660.050.05125148561945.90
Molise3.143.9626.2823528170.040.082041735695.60
Calabria4.584.560.589895550.050.059739101287.30
Basilicata3.333.194.302824540.040.041191197566.60
Southern regions5.835243461116140.070.06228886541796.02
Total tally4.6535 75836 86159610.120.125013010492005.42
Italian regionsModel predictionsPredictors
Natural log valuesMean Absolute Percent Error (MAPE)Actual, predicted and relative valuesActual values
Deaths attributed to COVID-19Deaths predicted by the modelDeaths attributed to COVID-19Deaths predicted by the modelDeaths attributed to COVID-19 (per 100 000 residents)Deaths predicted by the model (per 100 000 residents)Mortality ratedeaths/Covid-19 cases)Mortality rate predicted by the model (deaths/Covid-19 cases)Attack rate (Covid-19 cases per 100 000 residents)Elderly living in RSAPopulation density (residents per km2)Incidence of diabetes (standard rate)
Lombardy9.749.780.4616925176941681760.160.171046783064224.70
Valle d’Aosta4.984.990.071461471161170.110.111022992394.90
Emilia Romagna8.418.380.3544794348100970.130.13776289911995.20
Piedmont8.338.370.454157431895990.120.12794362791724.90
Liguria7.377.572.62159419341031250.120.15823110852864.40
Trentino alto Adige6.556.146.2269746465430.080.058538326793.50
Veneto7.687.912.982167272444560.080.10533370732674.40
Friuli Venezia Giulia5.866.439.8135062229510.080.14367113431535.00
Northern regions2.8730 51532 2501101160.130.148232123952314.71
Marche6.906.407.2698959965390.130.0851170671624.70
Abruzzo6.175.992.8847739936300.110.0932477211215.40
Tuscany7.056.941.571153103231280.080.07377178641625.20
Lazio6.807.063.79902116815200.060.08255154423416.00
Pulia6.375.907.365833651490.080.0517980522066.30
Campania6.125.864.35457350860.040.0318833284246.80
Sardinia4.984.685.88145108970.040.032084966685.30
Umbria4.444.685.368510810120.040.0525725161045.30
Sicily5.715.730.36303309660.050.05125148561945.90
Molise3.143.9626.2823528170.040.082041735695.60
Calabria4.584.560.589895550.050.059739101287.30
Basilicata3.333.194.302824540.040.041191197566.60
Southern regions5.835243461116140.070.06228886541796.02
Total tally4.6535 75836 86159610.120.125013010492005.42

Multiple least square regression

The nine predictors included in the model were regressed against the cumulative number of deaths attributable to Covid-19 by each of the 20 Italian regions. Dependent and independent variables’ values are reported in Table 2.

Four independent variables best predicted the number of deaths attributed to Covid-19: the force of infection (attack rate), the number of elderly living in assisted facilities, the population density and the standard rate of diabetes.

The predictive model equation was:

ln(y) = −14.9165 + 1.2950 ln (attack rate) + 0.7841 ln (elderly in RSA) + 0.5985 ln (population density) + 2.0941 ln (standard rate of diabetes) 1

The sign of the correlation was expected, with all predictors positively correlated to the dependent variable.

Predictors and regression results are reported in Table 3. Compared to the actual number of deaths attributed to Covid-19 in each of the 20 Italian regions, the values predicted by the model showed a MAPE = 4.65 (standard deviation 0 0.33; standard error of the mean = 0.07). The Lewis scale11 rated the predictive model as ‘highly accurate’. Only Molise, one of the smallest Italian region, showed a MAPE value above 10.

Statistical controls

The regression statistics confirmed that the model selected a parsimonious number of independent variables (n = 4), significantly correlated to the dependent variable (coefficients P-values < 0.05) and modestly autocorrelated (VIF values < 4). The predictive model showed an elevated coefficient of determination (adjusted R2 = 0.95) and a high level of significance (P < 0.0001). Residuals were approximately normally distributed (P = 0.3967).

The semi-partial correlation coefficients provided an indication of the dependent variable’s variance that each independent variable explained. The force on infection and number of elderly residents in hospices and retirement homes were the dominant predictor of deaths attributed to Covid-19 (r semi-partial = 0.31), followed by the population density (r semi-partial = 0.14) and the standard rate of diabetes (r semi-partial = 0.11).

Validation

We validated the selection of the predictive variables in the regression equation: attack rate, elderly living in assisted homes, population density and standard rate of diabetes. We randomly allocated 70% of the available dataset (16 regions), to ‘the training set’, and use the remainder of the dataset (4 Regions) as a validation set. We repeated the random allocation of variables until each region was included in the validation set at least once. The mean MAPE value obtained from the nine random validation tests (10.99; resampled C. I 8.5–14.3) was higher than the error of the predictive model (MAPE = 4.65). This was expected since we reduced the regressions’ degrees of freedom from 19 to 14. The mean MAPE from the nine random tests, though, confirmed a high level of accuracy with only four out of nine tests reporting a MAPE value exceeding the average.

In conclusion, statistical controls and validation confirmed the robustness, accuracy and generalizability of our predictive model.

Discussion

Main findings of the study

Our analysis focused on the role of pre-existing determinants of public health in explaining the wide variation in cross-regional mortality attributable to Covid-19 in Italy. The focus was motivated by the outcomes of a review of the recent literature on Covid-19 infection modelling. The regression model showed that four predictors (force of infection, number of elderly living in assisted facilities, population density and standard rate of diabetes) could explain over 95% of the differences in cross-regional mortality observed in Italy from the onset of the epidemic to 23 September 2020.

What is already known on the determinants of Covid-19 mortality

Earlier reports suggest that elderly patients, patients with comorbidities (chronic obstructive pulmonary disease, cardiovascular disease, hypertension) and patients presenting with dyspnoea are vulnerable to more severe morbidity and mortality after Covid-19 infection.5 In the systematic review of prediction models related to the Covid-19 mortality, we identified and critically appraised 56 studies to extract 12 candidate predictors. The majority of studies developed new models, but only a few reported information on the selection of the independent variables or proceeded to validate the predictive model with external data sample or training sets.

What this study adds

We found that Covid-19 is an unequal killer: when its force increases, the frails living in highly populated areas are the most vulnerable to death. These results correspond with previously published studies on the association between Covid-19 mortality and pre-existing determinants of public health. The positive correlation between the force of Covid-19 infection and the number of deaths confirmed the findings of transmission models included in our systematic review13 Residents living in areas with high population density have a higher probability of coming into close contact with others and, consequently, any contagious disease is expected to spread more rapidly.14 Italy’s rural landscape can be classified into four types, according to the intensity of energy inputs used in the agricultural process, socioeconomic and environmental features. Italians living in underpopulated, rural areas are less exposed to Covid-19, despite a high number of elderly residents. The lowest energy-intensive landscapes have an average of 49 infected per square kilometre and 28 per 10 000 inhabitants, compared to 134 per square kilometre and 37 per 10 000 inhabitants in more energy-intensive zones.15 Mortality data related to the first wave of Covid-19 infection (from 1 February to 12 May 2020) confirm excess mortality of 61% in the main cities of Northern Italy. Excess mortality was calculated as the difference between the observed mortality in the period and the 5-year mean. Covid-19, as the cause of death, explained 80% of the difference. Excess mortality was growing with age: from +37% in the 65–74 year range to +59% in the 75–84 range and up to +75% for the elderly over 85.16

The third variable of the regression analysis proposes a further insight into the association between age and Covid-19 mortality. The number of elderly living in hospices and retirement homes was one the dominant predictor of deaths attributed to Covid-19 (r semi-partial = 0.31). Anecdotal evidence confirmed an abnormal number of deaths among elderly living in assisted facilities during the peak of the epidemic outbreak.17,18 A survey of a significant sample of Italian nursing homes (1356 out of 3417) was conducted from 1 February to 5 May 2020.19 A total of 97 521 elderly were living in nursing homes, 75 710 (78%) of which in the North of Italy and 26 981 (28%) in Lombardy, the region with the highest number of deaths attributable to Covid-19. During the observation period, 3092 deaths were attributable to Covid-19 infection, 1807 (48%) of which reported in Lombardy alone. During the observation period, 5292 elderly residents had to be hospitalized: 965 (18%) of them were Covid-19 positive, while 2021 (38%) reported symptoms consistent with a Covid-19 infection. If positive to Covid-19, only 48% of the residents could be isolated in a single room, 47% of them remained in rooms with multiple beds, while only 5% were transferred to a dedicated structure. Relatives and visitors accessed the premises of most of the nursing homes without any precaution until the end of February 2020. Most of the respondents complained about the lack of personal protection equipment (PPE) and of clear procedures to contain the Covid-19 infection. Consequently, a striking 21.1% of nursing home staff resulted positive to Covid-19. Infected patients in ‘post-acute’ stage were discharged from hospitals to nursing homes for their rehabilitation, to make room for more severe patients.20 From 8 March 2020, an undisclosed number of Covid-19 patients in Lombardy were transferred to local nursing home facilities to ease the pressure on the intensive care units of the ‘hubs’, the hospitals designated to treat Covid-19 severe patients in the region.

Diabetes was not recognized at the onset of the Covid-19 epidemic as a determinant of mortality. Early observations from the countries most affected by the Covid-19 epidemic, including China, USA and Italy, seemed to indicate that prevalence of diabetes among patients affected by Covid-19 was not higher than that observed in the general population, thus suggesting that diabetes was not a risk factor for Covid-19 infection. However, a large body of evidence demonstrated that diabetes was a risk factor for disease progression towards critical illness, development of acute respiratory distress syndrome, need for mechanical ventilation or admission to intensive care unit and ultimately death.21 Diabetes patients should be regarded as a particularly vulnerable group for which specific strategies must be implemented, including an extensive serological screening and early containment measures.22

Lastly, the aggregation of predicted values into two clusters (Northern and Southern regions) raises a fundamental question about the effectiveness of the lockdown imposed on the population mostly affected by the first wave of Covid-19 infection in Italy. The model seems to accurately predict the cross-regional differences in mortality in the Northern regions (MAPE = 2.87). Hence the four independent variables are highly associated with the number of deaths attributable to Covid-19. Table 3 shows the aggregate values of each predictor for the Northern and Southern regions. The mean value of the Covid-19 attack rate is four times higher in the North (823 versus 228), while the total number of elderly residents in RSA is more than double in the North compared to the South (212 395 versus 88 654). The population density in the North is ~30% higher in the North, while the standard rate of diabetes tips the scale in favour of the South. Keeping in mind that association does not imply causality, was the successful isolation of the Southern regions the outcome of the lockdown? Alternatively, did the lockdown create a deadly inequality, by failing to isolate vulnerable individuals in the highly populated ‘red zones’, where the force of Covid-19 infection continued to grow?

Limitations of the study

The main limitation of our study is related to the risk of bias of the unpublished publications included in the systematic review. Two factors mitigate the risk of bias: extraction methods and time of the review. Firstly, the objective of our systematic review was limited to create a comprehensive repository of variables potentially relevant to the number of deaths attributable to the Covid-19 epidemic. No data from these studies were used to inform our analysis. Secondly, since the first outbreak of Covid-19 was disclosed in January 2020, most of the models relevant to Covid-19 were still going through a peer-review process at the time we performed the systematic review.

Conclusions

Understanding the relationship between pre-epidemic health and epidemic mortality can provide data-driven inputs to inform the policy response aimed to contain the death toll imposed by the next outbreak of Covid-19 infection.

The enforcement of facial protection, social-distancing and targeted lockdown in highly populated areas, where the probability of contagion is highest, can significantly reduce the number of deaths attributable to Covid-19 infection. The adherence to lockdown can be extremely difficult for underprivileged individuals, consequently increasing the overall mortality of Covid-19 infection. Welfare support of 600 euros for self-employed individuals was approved by the Italian Government in late March 2020.23 Still, the payout was delayed by red tape until the beginning of June for most of the entitled individuals. A furlough scheme for employed workers followed the same fate, and its payout was delayed by months. Many employers advanced their employees the payout, using the company and private financial resources to ease the economic hardship of employees and their families. An agile welfare scheme, promptly accessible to underprivileged residents, would significantly improve the effective implementation of total or partial social isolation.

Vulnerable individuals should be closely monitored and safely isolated to shield them from exposure to the Covid-19. Elderly living in assisted living facilities and diabetic patients should be continuously monitored by qualified medical and nursing staff, provided with adequate PPE.

Our findings indicate that a significant reduction of social contacts in the main metropolitan areas and the timely isolation of elderly and diabetic individuals could significantly reduce the death toll of the next wave of Covid-19 infection in Italy.

Conflict of interest

The authors declare that they have no competing interests.

Cristina Oliva, Doctoral student

Francesco Di Maddaloni, Senior Lecturer

Andrea Marcellusi, Senior Lecturer

Giampiero Favato, Professor

References

1

Osterholm
MT
,
The CIDRAP viewpoint working group. COVID-19: the CIDRAP. Viewpoint. Part 1: the future of the COVID-19 pandemic: lessons learned from pandemic influenza
. https://www.cidrap.umn.edu/sites/default/files/public/downloads/cidrap-covid19-viewpoint-part1_0.pdf (30 October 2020, date last accessed).

2

World Health Organization (WHO)
.
Thematic paper on the status of country preparedness capacities background report commissioned by the global preparedness monitoring board (GPMB)
. https://apps.who.int/gpmb/assets/thematic_papers/tr-2.pdf (1 November 2020, date last accessed).

3

Dipartimento della Protezione Civile
. Aggiornamento casi COVID-19:
Settembre 23rd, 2020
. http://opendatadpc.maps.arcgis.com/apps/opsdashboard/index.html#/b0c68bce2cce478eaac82fe38d4138b1 (30 October 2020, date last accessed).

4

Anson
J
.
Model mortality patterns: a parametric evaluation
.
Popul Stud
1991
;
45
(
1
):
137
53
.

5

Wynants
L
,
Van Calster
B
,
Collins
GS
et al.
Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal
.
BMJ
2020
;
369
:m1328.

6

Moher
D
,
Liberati
A
,
Tetzlaff
J
et al.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
.
PLoS Med
2009
;
6
(
7
):e1000097. doi: .

7

Favato
G
,
Oliva
C
,
Di Maddaloni
F
.
Factors associated with COVID-19 case-fatality and mortality rates: a systematic review
.
PROSPERO
2020
CRD42020188240
;
Available from
. https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020188240.

8

Liang
D
,
Shi
L
,
Zhao
J
et al.
Urban air pollution may enhance COVID-19 case-fatality and mortality rates in the United States
.
medRxiv
[Internet]
.
2020
January 1st;2020.05.04.20090746. Available from
: http://medrxiv.org/content/early/2020/05/07/2020.05.04.20090746.abstract (21 December 2020, date last accessed).

9

Blanchet
FG
,
Legendre
P
,
Borcard
D
.
Forward selection of explanatory variables
.
Ecology
2008
;
89
(
9
):
2623
32
.

10

Mayers
JH
,
Forgy
EW
.
The development of numerical credit evaluation systems
.
J Am Stat Assoc
1993
;
58
(
303
):
799
806
.

11

Lewis
CD
. Industrial and business forecasting methods. In:
a practical guide to exponential smoothing and curve fitting
.
London; Boston
:
Butterworth Scientific
,
1982
.

12

Steyerberg
E W.
Clinical prediction models: a practical approach to development, validation, and updating
.
Springer Nature Switzerland AG
,
2019
. doi:

13

Ciminelli
G
,
Garcia-Mandicó
S
.
Mitigation policies and emergency care management in Europe’s ground zero for COVID-19
.
medRxiv [Internet]
2020
January 1st;2020.05.19.20106575. Available at
. http://medrxiv.org/content/early/2020/05/26/2020.05.19.20106575.abstract (21 December 2020, date last accessed).

14

Kadi
N
,
Khelfaoui
M
.
Population density, a factor in the spread of COVID-19 in Algeria: statistic study
.
Bull Natl Res Cent
2020
;
44
(
1
):
138
.

15

Agnoletti
M
,
Manganelli
S
,
Piras
F
.
Covid-19 and rural landscape: the case of Italy
.
Landscape and Urban Planning
2020
204 Available online at
. doi: .

16

Italian Ministry of Health
.
Andamento della mortalità giornaliera (SiSMG) nelle città italiane in relazione all’epidemia di Covid-19. Final report
:
September 1st – May 12th, 2020
.
Available at
: http://deplazio.net/images/stories/SISMG/SISMG_COVID19.pdf (30 October 2020, date last accessed).

17

Comas-Herrera
A
, and
Zalakain
J
(
2020
)
Mortality associated with COVID-19 outbreaks in care homes: early international evidence
.
2020. Available at
: https://ltccovid.org/2020/04/12/mortality-associated-with-COVID-19-outbreaks-in-care-homes-early-international-evidence/ (15 October 2020, date last accessed).

18

Lorenz-Dant
K
(
2020
)
Report on the COVID-19 long-term care situation in Germany. 2020
.
Available at
: https://ltccovid.org/2020/04/15/report-on-the-COVID-19-long-term-care-situation-in-germany/ (30 October 2020, date last accessed).

19

ISS
. Survey nazionale sul contagio COVID-19 nelle strutture residenziali e sociosanitarie, Istituto Superiore di Sanità. Epidemia COVID-19, Final report:
May 5th, 2020
.
Available at
: https://www.epicentro.iss.it/coronavirus/pdf/sars-cov-2-survey-rsa-rapporto-finale.pdf (1 November 2020, date last accessed).

21

Pugliese
G
,
Vitale
M
,
Resi
V
,
Orsi
E
.
Is diabetes mellitus a risk factor for COronaVIrus disease 19 (COVID-19)?
Acta Diabetol
2020 Nov
;
57
(
11
):
1275
85
. doi:
Epub 2020 Aug 31. PMID: 32865671; PMCID: PMC7456750
.

22

Apicella
M
,
Campopiano
MC
,
Mantuano
M
et al.
Guida pratica alla prevenzione e gestione dell’infezione da COVID-19 nelle persone con diabete
.
L’Endocrinologo
2020 Oct 23
;
1
5
Italian
. doi:
Epub ahead of print. PMCID: PMC7582423
.

23

DPCM
Decree 17 marzo 2020, n. 18 Misure di potenziamento del Servizio sanitario nazionale e di sostegno economico per famiglie, lavoratori e imprese connesse all'emergenza epidemiologica da COVID-19. (20G00034) (GU Serie Generale n.70 del 17-03-2020).
Available at:
https://www.gazzettaufficiale.it/eli/id/2020/03/17/20G00034/sg

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data