Redistribution of garbage codes to underlying causes of death: a systematic analysis on Italy and a comparison with most populous Western European countries based on the Global Burden of Disease Study 2019

Abstract Background The proportion of reported causes of death (CoDs) that are not underlying causes can be relevant even in high-income countries and seriously affect health planning. The Global Burden of Disease (GBD) study identifies these ‘garbage codes’ (GCs) and redistributes them to underlying causes using evidence-based algorithms. Planners relying on vital registration data will find discrepancies with GBD estimates. We analyse these discrepancies, through the analysis of GCs and their redistribution. Methods We explored the case of Italy, at national and regional level, and compared it to nine other Western European countries with similar population sizes. We analysed differences between official data and GBD 2019 estimates, for the period 1990–2017 for which we had vital registration data for most select countries. Results In Italy, in 2017, 33 000 deaths were attributed to unspecified type of stroke and 15 000 to unspecified type of diabetes, these making a fourth of the overall garbage. Significant heterogeneity exists on the overall proportion of GCs, type (unspecified or impossible underlying causes), and size of specific GCs among regions in Italy, and among the select countries. We found no pattern between level of garbage and relevance of specific GCs. Even locations performing below average show interesting lower levels for certain GCs if compared to better performing countries. Conclusions This systematic analysis suggests the heterogeneity in GC levels and causes, paired with a more detailed analysis of local practices, strengths and weaknesses, could be a positive element in a strategy for the reduction of GCs in Italy.


Introduction
D espite constant improvements in the reporting of causes of death (CoDs), with high-quality CoD data reported via vital registration (VR) systems, a non-negligible share of deaths remains poorly classified. The process of identifying and reporting the correct underlying cause of death (UCoD) can be challenging. 1 One of the main problems encountered is that of the so-called garbage codes (GCs). 2 GCs are a set of International Classification of Diseases (ICD) codes that define poorly specified diagnoses not clearly identifying an UCoD. GCs limit the utility of death statistics, undermining their importance as a primary source of information for planning and assessing health policies and interventions. 3 Common criticalities in UCoD reporting can be traced from certification by physicians to coding. 4 In high-income countries, GCs are more often related to poor certification. In Italy, coding is done at central level by the National Institute of Statistics (ISTAT). ISTAT applies WHO standards that minimize the number of deaths assigned to ill-defined and trivial causes. Moreover, an ad hoc international software automatically codes about 80% of death certificates to reduce potential errors introduced by coders in manual coding and facilitate international comparisons. 5 Regarding certification, Italy still relies on paper death certificates. Quality of certification can be affected by (i) lack of information in identifying the UCoD, (ii) lack of importance attributed to death certification or (iii) lack of training of physicians in this specific task. The lack of information should not be relevant in more affluent countries, such as Italy. 6 Different approaches have been used to classify GCs and reduce their impact, generally involving the redistribution of GC to plausible UCoDs. 1,[7][8][9] The Global Burden of Disease (GBD) Study tries to generate comparable cause-specific mortality estimates from a collection of imperfect, heterogeneous data, [10][11][12] by redistributing GCs to UCoDs. Accounting for deaths assigned to GCs is one of the key data processing steps in creating comparable cause-specific mortality estimates, by time, age group, sex and location.
The issue of GCs was addressed for Italy by the Italian GBD Initiative, a network of Italian GBD collaborators now comprising more than 100 collaborators from over 25 research institutions, including the National Institute of Health (Istituto Superiore di Sanità). During the last round of estimates' revisions, several discrepancies were found between official data and GBD estimates. These discrepancies might induce scepticism when users of CoD statistics need to bring together official VR data and GBD estimates. Most of them were the consequence of a substantial difference in the scope of official data vs. GBD estimates, which translates in GBD into the redistribution of CoDs-identified as GCs by the GBD-to UCoD. 2,10,13 The objective of the present paper is to describe the GCs identified by the GBD 2019 study, and the effects of their redistribution on UCoD estimates for Italy, and to discuss how GBD estimates finally reconcile with the official statistics. Using estimates from the GBD 2019 cycle, we describe temporal changes , and make comparisons between Italy and other Western European countries with populations above 10 million inhabitants, as well as among the 19 Italian Regions and two Autonomous Provinces.
It is not within the scope of this paper to describe national reporting systems and solutions adopted to reduce the burden of GCs. However, the proposed approach can help solve the misunderstanding around the differences between VR data and GBD estimates, can lead to similar analyses in other Western European countries, and provide useful information to build up a common platform for discussion and intervention.

Overview on GC definitions and redistribution methods
The univocal assignment of ICD codes to GBD CoDs is mapped and constantly updated. Details are available elsewhere. 10,14 According to the GBD Study, reported CoDs that do not identify specific UCoDs are identified as GCs, and can be classified in two main categories: • CoDs that cannot be considered UCoDs, either because they are intermediate (e.g. sepsis or heart failure) or immediate causes (e.g. cardiac arrest or respiratory failure). 2 • Generic causes that do not identify a specific UCoD, e.g. unspecified type of diabetes or cancers of unknown primary (CUP).
GCs are classified into four classes, 13,15 according to their importance in terms of policy implications, and to the Levels of the GBD cause list across which they can be redistributed: • Class 1 comprises GCs that could be redistributed to any UCoD in any of the three Level 1 cause groups of diseases and injuries (e.g. sepsis): Communicable diseases, non-communicable diseases and Injuries. They are the most difficult to redistribute and require the most modelling. • Class 2 comprises GCs that could be redistributed to any UCoD in one or two of the three Level 1 cause groups of diseases and injuries (e.g. unspecified and undetermined intent of injury). • Class 3 includes GCs for which the UCoD is likely part of the same ICD chapter (e.g. CUP). • Class 4 is the one with less policy implications and identifies GCs for which the UCoD is likely to be attributed to a single disease or injury (e.g. unspecified type of stroke).
Details on the methods and algorithms developed for redistribution of GCs to UCoDs are described elsewhere. 13 Overview of present analysis The specific analysis we conducted was meant to identify the main GCs requiring redistribution to UCoDs, and how they got redistributed. We also analysed from which GCs the UCoDs most affected by redistribution received additional deaths. We conducted our analysis based on the GBD 2019 cycle estimates, and focused on the years for which we had both official CoD data for Italy (from the ISTAT) and GBD estimates, ranging from 1990 to 2017.
The GBD 2019 study complies with the Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER) statement. 16 The analysis was done on both sexes combined and all ages together. Depending on the analyses and comparisons, we chose to use all age or age-standardized rates. In a mostly homogeneous context such as Western Europe, we decided to compare Italy with similar countries, considering the complexity of a national health system greatly depends on the size of a country's population. We thus conducted comparisons between Italy and Western European countries with more than 10 million inhabitants, namely Belgium, France, Germany, Greece, the Netherlands, Portugal, Spain, Sweden and the UK. For some countries (France, Belgium and Greece), we could only reach 2016, due to the unavailability of country VR data for the year 2017. From 1996 (the Netherlands) to 2014 (Greece), all countries shifted from the ICD9 to the ICD10 coding system. In addition, ICD9 CoD data for some countries were only available in the aggregated Basic Tabulation  For Western European countries and subnational locations for Italy, we compared the rate of GCs over the total number of reported deaths, age-standardized, for all the GCs, and separately for Classes 1 and 2, and Classes 3 and 4. For the same countries and subnational locations for Italy, we then ranked the first 10 GCs for the last available year (2017 or 2016) by percentage of attributed deaths over total number of deaths. The rationale was to understand if differences in VR systems could lead to particular patterns in the relative weight of GCs. Differences could also partly be the reflection of different epidemiological profiles of countries and locations.
To be able to study the evolution of the ranking of main GCs for Italy, we selected the first 15 for the years 1990 and 2017, altogether covering more than 90% of all GCs for the respective years. The same analysis was conducted on age-standardized GC rates per 100k population.
For Italy, and for each subnational location, we analysed the first 15 UCoDs affected by redistribution. These first 15 causes, with the exception of the Province of Trento, Marche and Sardegna, accounted for over 90% of the overall causes requiring redistribution.
At the national level for Italy, the year 2017, we selected the first 15 GCs in terms of the number of deaths, and showed to which UCoDs these got redistributed.
Finally, for Italy, the year 2017, we analysed how the first 15 Level 4 UCoDs were affected by redistribution, by reporting which GCs mainly received additional deaths. For each UCoD, we show the first 10 GCs, always covering more than 90% of the redistribution.

Results
In 1990, in Italy, the age-standardized percentage of GCs over total deaths was 31% (table 1). Sweden had the lowest percentage (23%), while Greece and Portugal had the highest (47 and 51%).
Supplementary table S1 shows the crude all-age percentage of GCs over total deaths, with differences from table 1 attributable to the different age structure of the countries considered.
From 1990 to 2017, the percentage of GCs decreased for Italy as well as for the other countries considered, with the exception of Sweden which, however, had the lowest level in 1990 and second lowest in 2017. In Portugal and Greece, the percentage of GCs dramatically declined, but still remained above 30%.
Considering the repartition of GCs by Class of attribution, for Italy and most countries, the decrease was mostly due to Classes 3 and 4 GCs (Supplementary tables S2 and S3). Looking at Classes 3 and 4, all countries had more than 25% reductions, with the exception of Belgium which had a 4.5% increase. Classes 1 and 2, instead, showed increases for five countries (Sweden, the UK, Italy, the Netherlands and France), but also marked decreases (above 30%) for Portugal, Greece and Spain. However, the repartition of GCs into Classes 1 and 2 or Classes 3 and 4 does not seem to be associated with the overall percentage of GCs over the total (Supplementary figure S1).
The analysis of the main GCs in terms of percentage over total reported deaths shows heterogeneity in the comparison among the Western European countries considered (Supplementary table S4). Unspecified type of stroke is the first GC requiring redistribution for Italy, representing 5% of all CoDs. Percentages vary among countries, ranging from 2.5% in Spain to 8.4% in Greece. Unspecified type of diabetes is the second most frequent GC for Italy (2.3%), the highest value among the countries considered, followed by unspecified heart diseases, unspecified lower respiratory infections, exposure to unspecified factor (ICD code: X59), sepsis, CUP, each of these accounting for more than 1% of total deaths. Similar to what we did for the select Western European countries, we carried out a comparison among Italian subnational locations in terms of rate of GCs over total deaths, age-standardized (table 2). We looked at the time trend 1990-2017. This analysis was conducted for all GCs, and separately for Classes 1 and 2, and Classes 3 and 4 (Supplementary tables S5 and S6). We noticed some heterogeneity among Italian locations, in the percentage of agestandardized GCs. We also noticed an overall improvement from 1990 to 2017, again, more evident in Classes 3 and 4. For all GCs (table 2), the best performing location in 2017 was Bolzano, followed by Valle D'Aosta, Sardegna and Marche, while the Regions with the highest rates of GCs were Campania, Calabria and Sicilia.
Looking at Italy and the ranking of the main GCs in 2017 compared to 1990, (figure 1) the first three GCs have remained stable since 1990, and involve two Class 4 (unspecified type of stroke and unspecified type of diabetes) and a Class 3 (unspecified heart disease). In 2017, the fourth cause was unspecified lower respiratory infections (Class 4), seventh in 1990. Exposure to unspecified factor (ICD code: X59), fifth in 2017, slightly grew since 1990 in terms of prevalence and ranking. Sepsis (excluding maternal and neonatal; Comparison among most Western European countries with more than 10 million inhabitants. Class 1) grew substantially, from 24th to 6th, and from 0.44% of all GCs identified in 1990-4.24% in 2017. Atherosclerosis (Class 2) dropped from fourth to 34th position, and from being 6.15% of all GCs in 1990-0.56% in 2017. Two other GCs saw important drops: unspecified gastrointestinal cancer (Class 3) and unspecified cardiomyopathy (Class 4). The same analysis was also performed for each of the select Western European countries ( Supplementary Fig S2a-2i).
Regarding the analysis of the ranking of the main GCs in 2017 by subnational locations in Italy (Supplementary table S7), we noticed a high level of heterogeneity in the ill-defined causes and their ranking. Unspecified type of stroke remained, however, the first GC in all locations.
For Italy, the redistribution of GCs mainly affected ischemic stroke, which absorbed 27.9% of all ill-defined causes (table 3). The second UCoD affected by redistribution was Type 2 diabetes mellitus, which attracted almost 13.8% of GCs. Together with chronic ischemic heart disease (11.3%), these three UCoDs gathered more than 50% of all GCs for Italy in 2017. The same analysis was carried out for single subnational locations and is shown in Supplementary tables S8a-8u.
Supplementary table S9 shows how the 15 main GCs (by number of attributed deaths) were redistributed to UCoDs (the year 2017). If, for each GC, the UCoDs were 10 or less, these were all reported.
Otherwise, we only reported the first 10 by number of redistributed deaths. The list of UCoDs is short, as in the case of Class 4 GCs like 'diabetes unspecified type', or long, namely for Class 1 GCs, such as 'sepsis' or 'shock, cardiac attack, coma'. The 15 UCoDs affected by redistribution can give an idea of how ill-defined causes can affect reporting (Supplementary table S10). There is some obvious specularity with Supplementary table S9. Ischemic stroke is the primary cause most affected by redistribution, with 84% of deaths redistributed to this cause coming from 'unspecified type of stroke'. Diabetes mellitus Type 2, which is the second most affected underlying cause, receives 91% of its redistributions from 'unspecified type of diabetes'. The third cause, ischemic heart disease, receives 35% from 'unspecified heart disease', but also 9% from 'unspecified left or right heart failure' and 'unspecified cardiovascular disease'. Intracerebral haemorrhage, the fourth primary CoD affected, receives 66% of the redistributed deaths from 'unspecified type of stroke', and 12% from 'hypertension'.

Discussion
The present analysis provides new insights on why official VR data can be different from GBD estimates. The comparison between Italy  and other Western European countries, and among Italian subnational locations, reveals strengths and weaknesses in current reporting systems. Despite realities with lower percentages of overall GCs can set an achievable standard, less performing countries and locations still show lower percentages for specific GCs and should be carefully studied. GCs affect, albeit to a different extent, all countries and VR systems. All countries considered in the present analysis-with the exception of Sweden which had the lowest level of GCs in 1990-saw a reduction in the percentage of GCs over total deaths. However, none of the countries achieved percentages lower than 25% for all ages and 22% for age-standardized rates.
The range in the overall percentage of GCs encountered in the comparison by country is almost the same as that found comparing Italian subnational locations. Throughout the 10 countries considered, in 2016/2017, the age-standardized proportion of GCs varied between 22% for the UK to 33% for Greece. In Italy, it varied between 20% for Bolzano and 32% for Campania. This implies the margins of improvement are potentially wide. Despite the possible differences in national reporting systems, it is important to note that all Italian regions follow the same system and differences cannot be attributed to this aspect.
Wide heterogeneity exists in the ranking of the single GCs by comparing different countries or subnational Italian locations. As mentioned, even less performing countries and subnational locations have good performances on certain GCs, meaning that we could all learn from others. Greece has the lowest proportion of 'unspecified heart disease' and 'unspecified cardiovascular disease' across the considered countries (Supplementary table S4), while Portugal has the lowest proportion of 'undetermined intent poisoning by multiple or unspecified drug', which is the ninth GC in the UK. The same occurs in Italy, with Sicily being the location with the lowest proportion of 'sepsis', and Calabria having the lowest proportion of 'unspecified non-follicular lymphoma' (Supplementary  table S7).
The inconsistency in the percentage composition of Classes 1 and 2 vs. Classes 3 and 4, with respect to the overall proportion of GCs, across countries and locations, suggests that, despite being relatively easier to reduce, Classes 3 and 4 GCs have not been systematically tackled even in better performing countries and locations. The lack of a significant decrease in Classes 1 and 2 GCs, highlights a serious difficulty in dealing with this type of GCs, but also a lack of systematic action in finding solutions for the reduction of these which are the most difficult GCs to redistribute.
Heterogeneity in GCs across countries and locations could be explained, at least partially, as a result of different epidemiological profiles. However, by comparing the age-standardized death rate for stroke for Italy (34.9 per 100k) and Spain (29.1)-according to GBD 2019-and the proportion of the GC 'unspecified type of stroke' from Supplementary table S4, being 5.1% for Italy and 2.5% for Spain, we are led to believe there is significant room for improvement for Italy. This being corroborated by the fact that the Province of Trento, with 2.7%, has almost reached the level of Spain already (Supplementary table S7).
Regarding 'unspecified type of diabetes', the second GC for Italy (2.3%), we could find no objective difficulty in the determination of the actual type of diabetes as a CoD.
Despite the measures adopted to improve coding following international standard rules, a share of GCs remains, as a consequence of errors in medical certifications rather than in the coding process. The quality of mortality statistics in Italy is deemed very high and, according to the WHO, which uses different definitions and grouping processes with respect to GBD, the proportion of GCs was the lowest among high-income countries. 17 However, this proportion is not negligible. With Italy representing one of the top-rated Health Systems, we find it hard to justify having every year more than 30 000 deaths attributed to unspecified stroke (Class 4), 15 000 to unspecified type of diabetes (Class 4), 10 000 to unspecified heart disease (Class 3), 8000 to X59-exposure to unspecified factor causing fracture or other unspecified injury (Class 2), and 7000 to sepsis (Class 1).
Death certification is a complex task, but systems can be improved. Local training of medical doctors and constant review of records at hospital and local level, focusing on most common GCs and less performing regions, would be needed. Electronic CoD registration systems can help physicians reduce errors and increase precision in data entry, 4,18 while increasingly accurate algorithms of redistribution could improve GBD estimates.
In the current pandemic context, with the urge to accelerate the process of acquisition of the CoDs, the Italian Ministries of Economy and Finance, Health, and Internal Affairs drafted a Decree introducing digital reporting performed directly by the certifying doctors. 19 The decree is now under the scrutiny of the Data Protection Authority, the Regions and the National Association of Italian Municipalities (ANCI).
Although the GBD identifies GCs and defines redistributions by sex and age, the main limitation of our analysis is the lack of consideration of sex and age differences. Finally, a detailed analysis of national CoD registration systems could have also been of interest. Our concern, however, was that the inclusion of these dimensions would have caused too much dispersion. The different timing in the implementation of ICD-10 updates among countries has likely affected the comparability among countries and, within a given country, throughout time. In this regard, in Italy, the adoption of the 2016 version of the ICD-10 and of a new automated coding system (Iris software) caused some time trend comparability issues. 20 This thorough analysis of GCs conducted for Italy, with a comparision with select Western European countries, should be considered as a first step towards structured actions for the improvement of CoD classifications in Italy, with an alignment of all regions toward the best achievable standards.