Abstract

Global Trigger Tool (GTT) of the Institute for Healthcare Improvement (IHI) has been used as a measurement strategy for patient safety by several institutions and national programs. Although the greater ability of the GTT to identify adverse events (AEs) compared to other methods has already been demonstrated, there are few data on its accuracy, and studies suggest lower sensitivity for minor AEs. This study aimed to assess the accuracy of the GTT for identifying AEs in adult inpatients for all AEs and for the subgroup of AEs with greater harm to the patient, classified as F–I on the IHI-GTT adapted version of the National Coordinating Council for Medication Error Reporting and Prevention (NCC MERP) Index for Categorizing Errors. In this diagnostic test study, GTT is the index test and identification of AEs (yes/no) represents the condition of interest. Due to the lack of a gold standard test, a composite reference standard method was developed. Reference standard method combined real-time (during hospitalizations) and retrospective search of medical records and administrative data for screening criteria and AEs. Both tests were applied to a random sample of 211 hospitalizations of adult inpatients during October–November 2016 in a large public hospital in Belo Horizonte, Brazil. The accuracy of the GTT was evaluated using sensitivity, specificity, and global accuracy. A total of 176 AEs were identified in 67 admissions using reference standard method and 129 AEs in 76 admissions using GTT, resulting in rates of 126 and 93 AEs/1000 patient-days, respectively. Sensitivity, specificity, and global accuracy of the GTT for the identification of individual AEs were, respectively, 0.41 (95% confidence interval [CI] 0.34; 0.49), 0.68 (95% CI 0.60; 0.74), and 0.54 (95% CI 0.49; 0.60) for all AEs, regardless of the harm categorization, and 0.85 (95% CI 0.72; 0.93), 0.88 (95% CI 0.82; 0.92), and 0.87 (95% CI 0.82; 0.91) for the subgroup of AEs categorized as harm F–I. Among the main AEs missed by the GTT are AEs related to nursing care, such as those related to peripheral venous access and gastric/enteric catheters. GTT proved to be a valid method for identifying AEs in adult inpatients. Its accuracy increases when minor harm AEs are not counted. Among the main AEs missed by the GTT are those related to nursing care. Therefore, the GTT should be used in conjunction with other measurement strategies to achieve results that are representative of the quality profile of the care provided and, thus, guide the best improvement strategies.

Introduction

One of the major challenges health-care systems face today is providing safe and quality care in complex, pressured, and fast-moving environments [1]. A reliable, valid, and feasible measurement strategy is required to determine whether efforts to enhance safety result in overall improvements [2, 3].

The Global Trigger Tool (GTT) of the Institute for Health-care Improvement (IHI) is a simple, inexpensive, and easy-to-execute method for estimating the occurrence of adverse events (AEs) in adult hospitalized patients [4, 5]. Several institutions and national programs have used it as a patient safety measure [2–12]. Although its greater ability to identify AEs compared to other methods has already been demonstrated [6–10], there is no clear data on its accuracy [3, 11, 12]. Moreover, studies suggest that minor harm AEs are more difficult to be identified by the GTT and the exclusion of these events could increase the method’s reliability and validity [2, 5].

If we understand a method for identifying AEs as a diagnostic test, its accuracy could be assessed by comparing its results with the best available test [13]. The scarcity of studies addressing the accuracy of the GTT is justified by the difficulty in establishing a gold standard or an acceptable reference standard test [2–4, 11, 12]. Classen et al., in one of the few studies that calculated the accuracy of the GTT, compared the results obtained through this method with those obtained through a detailed retrospective review of medical records and administrative data. They found a sensitivity of 94.9% and a specificity of 100% for identifying admissions with AEs [7]. However, the reference standard test used has similar limitations to the GTT—data obtained exclusively from medical records, which may have resulted in overestimated accuracy data.

Some discuss methodological alternatives for diagnostic accuracy studies when a gold standard does not exist [14–17]. An approach when there is no clear data on the accuracy of the available tests and none is indicated as the preferred reference standard is using a composite reference standard, in which the results of several imperfect tests are combined [17]. In the context of patient safety, this seems particularly applicable as previous studies have shown that different methods identify different AEs and recommend combining different AE identification approaches to understand patient safety issues occurring within an organization [6–9].

The aim of this study was to assess the accuracy of the GTT for identifying AEs in adult inpatients for all AEs and the subgroup of AEs with greater patient harm. For this, a composite reference standard method (RSM) was built. The results are expected to support health-care systems in choosing the most appropriate method, or a combination of them, for managing care risk and building strategies that support actions to improve the security and quality of care.

Methods

Study design, population, and sample

This is a diagnostic test study in which the GTT is the index test and identification of AEs (yes/no) represents the condition of interest. It was performed in a 500-bed, general, public, university hospital in Belo Horizonte, Brazil, offering tertiary and quaternary care. It has a hybrid medical record, with both electronic and paper-based documentation.

The cross-section of the study was defined as the period from October 2 to November 4, 2016, and is referred as the “study period.” Patients aged 18 years or older who were hospitalized for >24 hours during the study period were eligible. Patients whose admission records were unavailable or incomplete, i.e. did not contain critical elements, such as a discharge summary, prescriptions, and significant portions of medical and nursing evolutions, were excluded.

Sample size calculation considered a confidence level of 95%, margin of error of 5%, population size of 1500 (historical monthly average of hospitalizations), and minimum expected proportion of AEs in the population of 20%, which was based on previous studies showing AE rates between 7.2% and 27.0% of hospitalizations [3]. This resulted in a sample of 212 patient admissions. An intentional random oversampling of about 25% was added due to potential losses, leading to a final sample of 268 admissions.

Different researchers’ teams independently applied the GTT and the constructed RSM. Both tests were used following the methodological characteristics of each, being exclusively retrospective for the GTT, through the analysis of medical records, and following a combination of strategies in the RSM that included information collection during the study period and retrospective analysis of clinical and administrative data. The study design is represented in Fig. 1.

Diagnostic test study: assessment of accuracy of the Global Trigger Tool to identify adverse events
Figure 1

Diagnostic test study: assessment of accuracy of the Global Trigger Tool to identify adverse events

TP, true positive; FP, false positive; FN, false negative; and TN, true negative.

GTT test

At the time of the study, there was no official version of the GTT in Portuguese. The tool translation and adaptation were based on the original GTT white paper and a Portuguese version translated by the IHI Latin American team. No substantial changes were made to the general content of the original version. This process was described elsewhere [18].

The review team consisted of a pair of primary reviewers composed of medical students of the fourth and fifth years and two senior physician specialists in Internal Medicine, who alternated, depending on availability, to validate the findings of the primary reviewers. All reviewers were familiar with the medical record and the model of care offered by the institution. The qualification of the reviewers included the individual reading of the GTT white paper, followed by a 12-hour theoretical–practical training, focused on AE concept understanding, meaning of each trigger, and harm categorization. All reviewers evaluated 10 medical records, including five IHI’s commented examples and five institution’s medical records.

The GTT was applied following the IHI protocol. In a first step, each primary reviewer was allowed a maximum of 20 minutes per record to search for triggers, identify possible AEs, classify them according to the harm category, and record the findings. Subsequently, the primary reviewers discussed individual findings and recorded the consensus. In the second step, medical reviewer, based on the findings of the primary reviewers, confirmed the AEs and rectified harm categories when necessary. Medical reviewers could consult medical records for clarification and had the support of a group of specialist physicians, who were consulted in complex situations, when the attribution of harm to health care was not evident.

Reference standard test

The choice of which tests would compose the RSM was based on previous publications describing the methods for identifying AEs, their strengths, and limitations and the feasibility of applying each in the research context [19, 20]. Systematic real-time data collection (interviews with professionals and review of prescriptions); voluntary institutional incident reporting system; analysis of existing and routinely collected data (information technology and electronic medical records, such as reviewing laboratory test results, ordering antibiotic initiation, and operating notes for surgical and obstetric procedures, and administrative data, such as reports of incidents related to transfusion and infections associated with health care); and analysis of deaths and early hospital readmissions were included. AEs identified by the RSM research group during the collection of other data were also included. Methods that used a retrospective review of medical records, such as the classic Harvard Method or its variations [21], were not included due to the risk of incorporation bias.

The methods selected to compose the RSM were included in phase 1. They were understood as a set of different sources of information to identify screening criteria for AEs. Subsequently, in phase 2, a physician reviewed charts from positive patient admissions for at least one screening criterion, looking for additional information to validate the occurrence of harm to the patient related to health care. After confirming an AE, it was rated on the harm categorization. The RSM team of physicians comprised two Internal Medicine specialists with 5–6 years of experience. They were trained on the concepts and classifications used in the study and could consult each other or a medical specialist. RSM is described in detail in Supplementary Material 1.

Definitions

AE was defined as “an unintended physical injury resulting from or contributed to by medical care which requires additional monitoring, treatment or hospitalization, or which results in death” [5]. AEs resulting from acts of omission clearly related to the occurrence of harm to the patient were counted. These events include delays in providing health services, such as carrying out diagnoses or treatments, due to organizational issues.

All AEs were classified according to nature and harm. The IHI-GTT adapted version of the National Coordinating Council for Medication Error Reporting and Prevention (NCC MERP) Index for Categorizing Errors was used to classify AE-related harm into five categories: (E) temporary harm and required intervention, (F) temporary harm and required initial or prolonged hospitalization, (G) permanent harm, (H) intervention required to sustain life, and (I) death [5, 22].

Although the GTT allows the identification of AEs during the entire period of hospitalization, the RSM analysis was restricted to the study period. For this reason, only AEs that occurred in this period, regardless of the beginning or end of hospitalization, or those prior to it, provided they were directly related to the current admission, such as those that caused temporary or permanent harm, requiring new treatments or interventions, were counted.

To reduce the subjectivity of the medical reviewer’s judgment in determining the occurrence of an AE in both methods, GTT, and RSM, a four-point scale (0–3) was used to determine the level of confidence that the harm could be attributed to health care, rather than the patient’s disease process. Only those classified as 2–3 (moderate to certain evidence that the harm could be attributed to the health care) were considered AEs. Questions adapted from Baker et al. [23] and Mendes et al. [24] were used to support the medical reviewer’s judgment.

Analysis and statistics

The sample was characterized using absolute and relative frequencies for qualitative variables and measures of central tendency and dispersion for quantitative variables. The variable of interest was the occurrence of AEs. The frequency of occurrence of AEs was presented as the number of AEs per 1000 patient-days. McNemar’s test was used to compare the performance of the GTT and the RSM to identify AEs of different harm categories. The test results were considered dichotomous, and the classic measures of accuracy—sensitivity, specificity, and global accuracy—were calculated using 2 × 2 tables [14, 15, 25].

The results were demonstrated using two different study units: (i) per patient admission, in which the occurrence of one or more AEs in a hospitalization counted as a single positive case, and (ii) per AE, in which each AE identified in an admission counted as a positive case. In both, patient admissions in which no AE was detected were counted as a negative case each. Accuracy of the GTT was evaluated for AEs in general (E–I) and for the subgroup of events of greater harm, classified as F, G, H, or I (F–I).

Results

Preliminarily, a total of 1172 admissions were considered eligible. Of the 268 records selected, 49 could not be accessed due to remote storage of large records and their use for other purposes, such as medical care, audits, or billing. Of the 219 available records, 20 were deemed as not meeting the eligibility criteria. An extra round of randomization selected 12 additional admissions, leading to 211 admissions. Table 1 shows the sample description.

Table 1.

Characteristics of the patient admissions.

VariableSample (n = 211)
Genre
 Women134 (63.5%)
 Men77 (36.5%)
Age group
 <60 years old148 (70.1%)
 ≥60 years old63 (29.9%)
Type of hospital admission
 Urgency172 (81.5%)
 Elective39 (18.5%)
Charlson Comorbidity Index
 086 (40.8%)
 1–278 (36.9%)
 3–431 (14.7%)
 ≥516 (7.6%)
Reason for admission
 Surgical90 (42.7%)
 Clinical76 (36.0%)
 Obstetric45 (21.3%)
Mean length of stay (days)12.2 (SD 18.6)
VariableSample (n = 211)
Genre
 Women134 (63.5%)
 Men77 (36.5%)
Age group
 <60 years old148 (70.1%)
 ≥60 years old63 (29.9%)
Type of hospital admission
 Urgency172 (81.5%)
 Elective39 (18.5%)
Charlson Comorbidity Index
 086 (40.8%)
 1–278 (36.9%)
 3–431 (14.7%)
 ≥516 (7.6%)
Reason for admission
 Surgical90 (42.7%)
 Clinical76 (36.0%)
 Obstetric45 (21.3%)
Mean length of stay (days)12.2 (SD 18.6)

SD, standard deviation.

Table 1.

Characteristics of the patient admissions.

VariableSample (n = 211)
Genre
 Women134 (63.5%)
 Men77 (36.5%)
Age group
 <60 years old148 (70.1%)
 ≥60 years old63 (29.9%)
Type of hospital admission
 Urgency172 (81.5%)
 Elective39 (18.5%)
Charlson Comorbidity Index
 086 (40.8%)
 1–278 (36.9%)
 3–431 (14.7%)
 ≥516 (7.6%)
Reason for admission
 Surgical90 (42.7%)
 Clinical76 (36.0%)
 Obstetric45 (21.3%)
Mean length of stay (days)12.2 (SD 18.6)
VariableSample (n = 211)
Genre
 Women134 (63.5%)
 Men77 (36.5%)
Age group
 <60 years old148 (70.1%)
 ≥60 years old63 (29.9%)
Type of hospital admission
 Urgency172 (81.5%)
 Elective39 (18.5%)
Charlson Comorbidity Index
 086 (40.8%)
 1–278 (36.9%)
 3–431 (14.7%)
 ≥516 (7.6%)
Reason for admission
 Surgical90 (42.7%)
 Clinical76 (36.0%)
 Obstetric45 (21.3%)
Mean length of stay (days)12.2 (SD 18.6)

SD, standard deviation.

There were identified 627 occurrences matching screening criteria by the composite RSM, of which 274 (43.7%) were considered AEs. One AE could be related to one or more screening criteria. The most frequent sources of information were the interviews with health-care professionals (n = 357), followed by requests to start antibiotics (n = 85). The frequency of occurrence and positivity for AEs of screening criteria identified using the RSM by the type of source is described in Table 2.

Table 2.

Frequency of occurrence of the RSM screening criteria by type of source and percentage of them that were confirmed as adverse event.

Type of source on the screening criteriaTotalConfirmed as adverse events (%)
Interviews with health-care professionals357165 (46.2)
Request to start antibiotics8529 (34.1)
Results of laboratory tests4911 (22.5)
Review of operative notes349 (26.5)
Review of prescriptions3315 (45.5)
Findings in medical records during the collection or analysis of other data2121 (100.0)
Voluntary reporting2111 (52.4)
Early hospital readmission113 (27.3)
Review of obstetric notes72 (28.6)
Report on health care–associated infections66 (100.0)
Transfusion Agency Reports22 (100.0)
Review of the cause of death10 (0.0)
Total627274 (43.7)
Type of source on the screening criteriaTotalConfirmed as adverse events (%)
Interviews with health-care professionals357165 (46.2)
Request to start antibiotics8529 (34.1)
Results of laboratory tests4911 (22.5)
Review of operative notes349 (26.5)
Review of prescriptions3315 (45.5)
Findings in medical records during the collection or analysis of other data2121 (100.0)
Voluntary reporting2111 (52.4)
Early hospital readmission113 (27.3)
Review of obstetric notes72 (28.6)
Report on health care–associated infections66 (100.0)
Transfusion Agency Reports22 (100.0)
Review of the cause of death10 (0.0)
Total627274 (43.7)
Table 2.

Frequency of occurrence of the RSM screening criteria by type of source and percentage of them that were confirmed as adverse event.

Type of source on the screening criteriaTotalConfirmed as adverse events (%)
Interviews with health-care professionals357165 (46.2)
Request to start antibiotics8529 (34.1)
Results of laboratory tests4911 (22.5)
Review of operative notes349 (26.5)
Review of prescriptions3315 (45.5)
Findings in medical records during the collection or analysis of other data2121 (100.0)
Voluntary reporting2111 (52.4)
Early hospital readmission113 (27.3)
Review of obstetric notes72 (28.6)
Report on health care–associated infections66 (100.0)
Transfusion Agency Reports22 (100.0)
Review of the cause of death10 (0.0)
Total627274 (43.7)
Type of source on the screening criteriaTotalConfirmed as adverse events (%)
Interviews with health-care professionals357165 (46.2)
Request to start antibiotics8529 (34.1)
Results of laboratory tests4911 (22.5)
Review of operative notes349 (26.5)
Review of prescriptions3315 (45.5)
Findings in medical records during the collection or analysis of other data2121 (100.0)
Voluntary reporting2111 (52.4)
Early hospital readmission113 (27.3)
Review of obstetric notes72 (28.6)
Report on health care–associated infections66 (100.0)
Transfusion Agency Reports22 (100.0)
Review of the cause of death10 (0.0)
Total627274 (43.7)

A total of 176 AEs were identified in 67 admissions using the RSM and 129 AEs in 76 admissions using the GTT, resulting in rates of 126 and 93 AEs/1000 patient-days, respectively. Seventy-two AEs were identified by both methods. There is no significant difference between the methods in the identification of AEs in general (P = 0.10). However, when analyzing subgroups, more AEs categorized as harm “E” were identified using the RSM (P < 0.005), while the GTT was superior in identifying AEs of greater harm (F–I) (P = 0.02) (Table 3). No AE resulted in patient death.

Table 3.

Adverse events by method of identification and category of harm.

Adverse events by category of harmaReference standardGTTP-valueb
E12464<0.005
F42490.11
G38
H78
I00
All category of harm (E–I)1761290.10
Greater harm (F–I)52650.02
Adverse events by category of harmaReference standardGTTP-valueb
E12464<0.005
F42490.11
G38
H78
I00
All category of harm (E–I)1761290.10
Greater harm (F–I)52650.02
a

Category of harm by the NCC MERP-IHI adapted version: (E) temporary harm and required intervention, (F) temporary harm and required initial or prolonged hospitalization, (G) permanent harm, (H) intervention required to sustain life, and (I) death.

b

McNemar Test.

Table 3.

Adverse events by method of identification and category of harm.

Adverse events by category of harmaReference standardGTTP-valueb
E12464<0.005
F42490.11
G38
H78
I00
All category of harm (E–I)1761290.10
Greater harm (F–I)52650.02
Adverse events by category of harmaReference standardGTTP-valueb
E12464<0.005
F42490.11
G38
H78
I00
All category of harm (E–I)1761290.10
Greater harm (F–I)52650.02
a

Category of harm by the NCC MERP-IHI adapted version: (E) temporary harm and required intervention, (F) temporary harm and required initial or prolonged hospitalization, (G) permanent harm, (H) intervention required to sustain life, and (I) death.

b

McNemar Test.

Table 4 shows the frequency of AEs identified by nature using the two methods, considering all AEs and the subgroup of AEs with greater harm (F–I). In general, the most frequent AEs were those related to peripheral venous access, medication, surgical/anesthetics, infections, and delays in providing health services. None of the 16 AEs related to gastric/enteric catheters were identified through the GTT, and only two of the 67 (3%) AEs related to peripheral venous access were evidenced by the index test. All AEs categorized in these two types were classified as harm “E” and mostly referred to accidental removal.

Table 4.

Frequency of occurrence of adverse events by nature and by method of identification in absolute number and in percentage considering all category of harm (E–I) and the subgroup of events of greater harm (F–I).

All (E–I)Greater harm (F–I)
Adverse events by natureReference standard (%)GTT-IHI (%)Reference standard (%)GTT-IHI (%)
Peripheral venous accesses66 (37.5)2 (1.6)0 (0.0)0 (0.0)
Medication29 (16.5)46 (35.7)10 (19.2)12 (18.5)
Surgical/anesthetics19 (10.8)27 (20.9)13 (25.0)21 (32.3)
Infections19 (10.8)22 (17.1)13 (25.0)13 (20.0)
Delays in the provision of health services12 (6.8)12 (9.3)12 (23.1)12 (18.5)
Gastric/enteric catheters16 (9.1)0 (0.0)0 (0.0)0 (0.0)
Transfusion of blood products2 (1.1)6 (4.7)0 (0.0)1 (1.5)
Phlebitis3 (1.7)2 (1.6)0 (0.0)0 (0.0)
Pressure injury1 (0.6)3 (2.3)0 (0.0)0 (0.0)
Radiotherapy1 (0.6)3 (2.3)1 (1.9)3 (4.6)
Airway2 (1.1)0 (0.0)2 (3.9)0 (0.0)
Central vascular accesses2 (1.1)0 (0.0)0 (0.0)0 (0.0)
Bladder catheter0 (0.0)2 (1.6)0 (0.0)2 (3.1)
Obstetric care1 (0.6)1 (0.8)1 (1.9)1 (1.5)
Fall1 (0.6)1 (0.8)0 (0.0)0 (0.0)
Transplants1 (0.6)1 (0.8)0 (0.0)0 (0.0)
Dialysis therapy0 (0.0)1 (0.8)0 (0.0)0 (0.0)
Skin injury from mechanical restraint1 (0.6)0 (0.0)0 (0.0)0 (0.0)
Total1761295265
All (E–I)Greater harm (F–I)
Adverse events by natureReference standard (%)GTT-IHI (%)Reference standard (%)GTT-IHI (%)
Peripheral venous accesses66 (37.5)2 (1.6)0 (0.0)0 (0.0)
Medication29 (16.5)46 (35.7)10 (19.2)12 (18.5)
Surgical/anesthetics19 (10.8)27 (20.9)13 (25.0)21 (32.3)
Infections19 (10.8)22 (17.1)13 (25.0)13 (20.0)
Delays in the provision of health services12 (6.8)12 (9.3)12 (23.1)12 (18.5)
Gastric/enteric catheters16 (9.1)0 (0.0)0 (0.0)0 (0.0)
Transfusion of blood products2 (1.1)6 (4.7)0 (0.0)1 (1.5)
Phlebitis3 (1.7)2 (1.6)0 (0.0)0 (0.0)
Pressure injury1 (0.6)3 (2.3)0 (0.0)0 (0.0)
Radiotherapy1 (0.6)3 (2.3)1 (1.9)3 (4.6)
Airway2 (1.1)0 (0.0)2 (3.9)0 (0.0)
Central vascular accesses2 (1.1)0 (0.0)0 (0.0)0 (0.0)
Bladder catheter0 (0.0)2 (1.6)0 (0.0)2 (3.1)
Obstetric care1 (0.6)1 (0.8)1 (1.9)1 (1.5)
Fall1 (0.6)1 (0.8)0 (0.0)0 (0.0)
Transplants1 (0.6)1 (0.8)0 (0.0)0 (0.0)
Dialysis therapy0 (0.0)1 (0.8)0 (0.0)0 (0.0)
Skin injury from mechanical restraint1 (0.6)0 (0.0)0 (0.0)0 (0.0)
Total1761295265
Table 4.

Frequency of occurrence of adverse events by nature and by method of identification in absolute number and in percentage considering all category of harm (E–I) and the subgroup of events of greater harm (F–I).

All (E–I)Greater harm (F–I)
Adverse events by natureReference standard (%)GTT-IHI (%)Reference standard (%)GTT-IHI (%)
Peripheral venous accesses66 (37.5)2 (1.6)0 (0.0)0 (0.0)
Medication29 (16.5)46 (35.7)10 (19.2)12 (18.5)
Surgical/anesthetics19 (10.8)27 (20.9)13 (25.0)21 (32.3)
Infections19 (10.8)22 (17.1)13 (25.0)13 (20.0)
Delays in the provision of health services12 (6.8)12 (9.3)12 (23.1)12 (18.5)
Gastric/enteric catheters16 (9.1)0 (0.0)0 (0.0)0 (0.0)
Transfusion of blood products2 (1.1)6 (4.7)0 (0.0)1 (1.5)
Phlebitis3 (1.7)2 (1.6)0 (0.0)0 (0.0)
Pressure injury1 (0.6)3 (2.3)0 (0.0)0 (0.0)
Radiotherapy1 (0.6)3 (2.3)1 (1.9)3 (4.6)
Airway2 (1.1)0 (0.0)2 (3.9)0 (0.0)
Central vascular accesses2 (1.1)0 (0.0)0 (0.0)0 (0.0)
Bladder catheter0 (0.0)2 (1.6)0 (0.0)2 (3.1)
Obstetric care1 (0.6)1 (0.8)1 (1.9)1 (1.5)
Fall1 (0.6)1 (0.8)0 (0.0)0 (0.0)
Transplants1 (0.6)1 (0.8)0 (0.0)0 (0.0)
Dialysis therapy0 (0.0)1 (0.8)0 (0.0)0 (0.0)
Skin injury from mechanical restraint1 (0.6)0 (0.0)0 (0.0)0 (0.0)
Total1761295265
All (E–I)Greater harm (F–I)
Adverse events by natureReference standard (%)GTT-IHI (%)Reference standard (%)GTT-IHI (%)
Peripheral venous accesses66 (37.5)2 (1.6)0 (0.0)0 (0.0)
Medication29 (16.5)46 (35.7)10 (19.2)12 (18.5)
Surgical/anesthetics19 (10.8)27 (20.9)13 (25.0)21 (32.3)
Infections19 (10.8)22 (17.1)13 (25.0)13 (20.0)
Delays in the provision of health services12 (6.8)12 (9.3)12 (23.1)12 (18.5)
Gastric/enteric catheters16 (9.1)0 (0.0)0 (0.0)0 (0.0)
Transfusion of blood products2 (1.1)6 (4.7)0 (0.0)1 (1.5)
Phlebitis3 (1.7)2 (1.6)0 (0.0)0 (0.0)
Pressure injury1 (0.6)3 (2.3)0 (0.0)0 (0.0)
Radiotherapy1 (0.6)3 (2.3)1 (1.9)3 (4.6)
Airway2 (1.1)0 (0.0)2 (3.9)0 (0.0)
Central vascular accesses2 (1.1)0 (0.0)0 (0.0)0 (0.0)
Bladder catheter0 (0.0)2 (1.6)0 (0.0)2 (3.1)
Obstetric care1 (0.6)1 (0.8)1 (1.9)1 (1.5)
Fall1 (0.6)1 (0.8)0 (0.0)0 (0.0)
Transplants1 (0.6)1 (0.8)0 (0.0)0 (0.0)
Dialysis therapy0 (0.0)1 (0.8)0 (0.0)0 (0.0)
Skin injury from mechanical restraint1 (0.6)0 (0.0)0 (0.0)0 (0.0)
Total1761295265

Although the number of AEs classified as harm “E” is not large enough to allow a subgroup analysis, the GTT tends to be equal or slightly superior to the RSM for the identification of minor AEs of other natures. For example, 34 medication AEs were identified through the GTT, while only 19 were identified through the RSM (12 of them were identified by both); among infections and surgical/anesthetics, nine and six AEs were identified by each of the methods, respectively, with three of them identified by both in each of these natures.

The results of the accuracy of the GTT compared to the RSM are shown in Table 5. Considering only the subgroup of AEs of greater harm (F–I), the sensitivity, specificity, and global accuracy data were, respectively, 0.90 (95% confidence interval [CI] 0.77; 0.97), 0.90 (95% CI 0.84; 0.94), and 0.90 (95% CI 0.85; 0.94) for the study unit “per patient admission” and 0.85 (95% CI 0.72; 0.93), 0.88 (95% CI 0.82; 0.92), and 0.87 (95% CI 0.82; 0.91) for the study unit “per EA.” When all AEs were evaluated (E–I), the validity estimates were significantly lower for the study unit “per AE,” with a sensitivity of 0.41 (95% CI 0.34; 0.49), specificity of 0.68 (95% CI 0.60; 0.74), and global accuracy of 0.54 (95% CI 0.49; 0.60).

Table 5.

Sensitivity, specificity, and global accuracy of the GTT in relation to the RSM for the identification of adverse events considering all category of harm (E–I) and the subgroup of events of greater harm (F–I).

“Per patient admission” study unit“Per adverse event” study unity
All (E–I) n = 211Greater harm (F–I) n = 211All (E–I) n = 352Greater harm (F–I) n = 226
Sensitivitya0.76 (0.64; 0.86)0.90 (0.77; 0.97)0.41 (0.34; 0.49)0.85 (0.72; 0.93)
Specificitya0.83 (0.75; 0.88)0.90 (0.84; 0.94)0.68 (0.60; 0.74)0.88 (0.82; 0.92)
Global accuracya0.81 (0.75; 0.86)0.90 (0.85; 0.94)0.54 (0.49; 0.60)0.87 (0.82; 0.91)
“Per patient admission” study unit“Per adverse event” study unity
All (E–I) n = 211Greater harm (F–I) n = 211All (E–I) n = 352Greater harm (F–I) n = 226
Sensitivitya0.76 (0.64; 0.86)0.90 (0.77; 0.97)0.41 (0.34; 0.49)0.85 (0.72; 0.93)
Specificitya0.83 (0.75; 0.88)0.90 (0.84; 0.94)0.68 (0.60; 0.74)0.88 (0.82; 0.92)
Global accuracya0.81 (0.75; 0.86)0.90 (0.85; 0.94)0.54 (0.49; 0.60)0.87 (0.82; 0.91)
a

95% confidence interval.

Table 5.

Sensitivity, specificity, and global accuracy of the GTT in relation to the RSM for the identification of adverse events considering all category of harm (E–I) and the subgroup of events of greater harm (F–I).

“Per patient admission” study unit“Per adverse event” study unity
All (E–I) n = 211Greater harm (F–I) n = 211All (E–I) n = 352Greater harm (F–I) n = 226
Sensitivitya0.76 (0.64; 0.86)0.90 (0.77; 0.97)0.41 (0.34; 0.49)0.85 (0.72; 0.93)
Specificitya0.83 (0.75; 0.88)0.90 (0.84; 0.94)0.68 (0.60; 0.74)0.88 (0.82; 0.92)
Global accuracya0.81 (0.75; 0.86)0.90 (0.85; 0.94)0.54 (0.49; 0.60)0.87 (0.82; 0.91)
“Per patient admission” study unit“Per adverse event” study unity
All (E–I) n = 211Greater harm (F–I) n = 211All (E–I) n = 352Greater harm (F–I) n = 226
Sensitivitya0.76 (0.64; 0.86)0.90 (0.77; 0.97)0.41 (0.34; 0.49)0.85 (0.72; 0.93)
Specificitya0.83 (0.75; 0.88)0.90 (0.84; 0.94)0.68 (0.60; 0.74)0.88 (0.82; 0.92)
Global accuracya0.81 (0.75; 0.86)0.90 (0.85; 0.94)0.54 (0.49; 0.60)0.87 (0.82; 0.91)
a

95% confidence interval.

Discussion

Statement of principal findings

The GTT showed satisfactory sensitivity, specificity, and global accuracy for the detection of AEs in inpatients when compared to a composite RSM. The GTT accuracy was higher when AEs with minor harm were not counted. The main AEs categorized as harm “E” that the GTT misses are related to nursing care, such as peripheral venous accesses and gastric/enteric catheters.

Interpretation within the context of the wider literature

To our knowledge, there is no other study that has evaluated the accuracy of the GTT by comparing it with a composite reference standard. A concern about the accuracy of the GTT and other methods that are based on retrospective review of medical records refers to their dependence on the quality of notes and the possible lack of recording of AEs, especially those that caused minor harm to the patient [26]. To assess the supposed impact of this on the GTT accuracy, it was essential that the RSM overcome this limitation and, for this reason, the interview with professionals was included as a search strategy [20]. The screening criteria identified through this source led to the confirmation of 127 of the 176 AEs of the RSM (72.2%), and they were the only source for the identification of 103 AEs (87 categorized as harm “E”).

Considering the study unit “per AE,” there is a significant difference in the GTT accuracy measures when evaluating all AEs in relation to the subgroup of AEs with greater harm. Of the 124 AEs categorized as “E” identified by the RSM, 96 were not identified through the GTT. Of these, notes were found in the medical records of 85; 71 of which were in nursing notes, part of the record not prioritized in the GTT review process [5, 27, 28]. This loss of sensitivity for minor AEs has already been reported by other researchers and can be justified by issues inherent to the GTT methodology, which restricts the time for reviewing medical records and recommends that it is not meant be read “from the first page to the last page” [2, 5], more than due to the low quality of the records [20].

This study was conducted in a university hospital, which has been striving to achieve quality standards in care. These characteristics may have influenced the quality of the medical records and, consequently, the GTT accuracy measures. In addition, during the study period, professionals were approached on the topic of patient safety and this may have improved the quality of notes regarding AEs.

Analyzing only the events identified through the GTT, as in other studies [6–9, 11], the most common natures of AEs were those related to medication, surgery/anesthetics, and infections. The failure to identify a large number of AEs related to nursing care through the GTT method, including those related to peripheral venous access and gastric/enteric catheters, corroborates the impression reported by experienced reviewers that the GTT is primarily focused on harm related to care performed by physicians [27].

Strengths and limitations

The combination of strategies for identifying AEs in the RSM was the strength of the research and aimed to overcome the individual weaknesses of each method, especially regarding underreporting, lack of adequate registration in medical records, and divergences in administrative data [20]. Despite the efforts made by the researchers to create a method “almost perfect,” the reference standard test used in this study is suboptimal. There were identified situations in which the GTT was correct, and the RSM was wrong regarding the occurrence of AEs. This bias led to an inadequate reduction in agreement in 2 × 2 tables and results in underestimated accuracy measures [13–15].

Precautions were taken to reduce the chance of bias. The index test and the RSM were applied independently by groups of exclusive reviewers, ensuring blinding of the results, and both used the same definitions and classifications. Despite the RSM includes data from medical record as a source of information, a systematic search for triggers as in the GTT was not used.

Although primary reviewers with little experience on GTT were employed, we do not consider this as a limitation. Studies have shown that reviewers’ training and experience increase the reliability between them [29, 30], which can also impact the validity of the tool. However, the inter-examiner reliability of this pair of primary reviewers was previously described with substantial reliability results for the identification of hospitalizations with AEs in relation to a pair of experienced nurses [18]. The results found by the authors are comparable to the findings of other studies that used experienced reviewers [2, 28, 29]. Another concern was that use of medical students as primary reviewers could have lowered the accuracy of nursing care–related AEs by the GTT. This is unlikely to have occurred as, among the 93 records evaluated by the two pairs of primary reviewers and included in this study, nurses and medical students identified, respectively, one and two AEs related to peripheral venous access (versus 38 through the RSM) and none AE related to gastric/enteric catheters (versus nine through the RSM).

Although the selection of the methods that composed the RSM was based on the literature, a recent systematic review of all existing methods was not conducted. Feasibility criteria in the local context and the researchers’ view of good opportunities to identify AEs played an important role in this choice. The arbitrary choice of some information sources and screening criteria that were included in the RSM can be considered a limitation of the method.

Another limitation of the methods used to identify AEs, both the GTT and most of those that composed the RSM, is that the decision whether the unfavorable outcome for the patient, which is usually identified by a trigger or screening criterion, is related to the natural evolution of the disease or whether it is related to health care depends on the judgment of a professional. To reduce this subjectivity, clear definitions and training strategies were used, in addition to a scale to support the decision of the medical reviewers of both groups. However, a reproducibility analysis was not performed between the medical reviewers of the GTT and the RSM.

Finally, the interviews with the professionals, method that composes the RSM, were carried out by undergraduate students. The interviews were semi-structured and followed a systematic script (see Supplementary Material 1). They were trained and supervised by the main researchers. However, we did not assess students’ ability to obtain information from professionals during interviews and to deepen what was reported to them.

Implications for policy, practice, and research

The results of this study reinforce the validity of the GTT for the identification of AEs with greater harm, but, on the other hand, they emphasize its inability to identify some natures (or types) of minor harm AEs. In the present study, AEs related to peripheral venous accesses and gastric/enteric catheters correspond to 35.6% of all AEs identified. These events, although individually causing less impact to the patient, become relevant due to their frequency, consuming significant staff time and costly resources [31, 32]. One study estimated that the annual cost of removing medical devices in a 42-bed intensive care unit was over US$250 000; 88% of these events involved gastrointestinal tubes and vascular catheters [33]. Therefore, these natures of AEs should be seen as indicators of poor quality care and target of improvement actions, which include appropriate measurement strategies [31, 32–34].

Although it was not the objective of the study, the analysis of the AEs identified by different sources in the RSM allows inferences about possible complementary methods to identify certain groups of AEs that are usually missed by the GTT, such as the semi-structured interview with the professionals who provide direct patient care. This method was used as a strategy to identify AEs related to feeding tube complications [34] and can be an alternative to direct observation, which is not suitable for global assessment of AEs, with possible gains in safety culture through greater involvement of direct care professionals [20]. However, further studies are needed on the potential of this method to identify different types of AEs, sampling strategies, possible biases, and costs involved.

Conclusions

The GTT proved to be a valid method for identifying AEs in hospitalized adult patients. Its accuracy increases when minor harm AEs are not counted. Among the main AEs missed by the GTT are those related to nursing care. Therefore, it should be used in combination with other measurement strategies to achieve results that are representative of the quality profile of the care provided and, thus, guide the best improvement strategies.

Supplementary data

Supplementary data is available at INTQHC Journal online.

Data availability statement

The data underlying this article will be shared on reasonable request to the corresponding author.

References

1.

James
JT
.
A new, evidence-based estimate of patient harms associated with hospital care
.
J Patient Saf
2013
;
9
:
122
8
.

2.

Mattsson
TO
,
Knudsen
JL
,
Lauritsen
J
et al. 
Assessment of the global trigger tool to measure, monitor and evaluate patient safety in cancer patients: reliability concerns are raised
.
BMJ Qual Saf
2013
;
22
:
571
9
.

3.

Hanskamp-Sebregts
M
,
Zegers
M
,
Vincent
C
et al. 
Measurement of patient safety: a systematic review of the reliability and validity of adverse event detection with record review
.
BMJ Open
2016
;
6
:e011078.

4.

Classen
DC
,
Lloyd
RC
,
Provost
L
et al. 
Development and evaluation of the Institute for Healthcare Improvement Global Trigger Tool
.
J Patient Saf
2008
;
4
:
169
77
.

5.

Griffin
FA
,
Resar
RK
.
IHI Global Trigger Tool for Measuring Adverse Events. IHI Innovation Series
, 2nd ed.
Cambridge
:
Institute for Healthcare Improvement
,
2009
.

6.

Naessens
JM
,
Campbell
CR
,
Huddleston
JM
et al. 
A comparison of hospital adverse events identified by three widely used detection methods
.
Int J Qual Health Care
2009
;
21
:
301
7
.

7.

Classen
DC
,
Resar
R
,
Griffin
F
et al. 
‘Global Trigger Tool’ shows that adverse events in hospitals may be ten times greater than previously measured
.
Health Aff (Millwood)
2011
;
30
:
581
9
. Erratum in: Health Aff (Millwood) 2011;30:1217.

8.

Kennerly
DA
,
Kudyakov
R
,
da Graca
B
et al. 
Characterization of adverse events detected in a large health care delivery system using an enhanced Global Trigger Tool over a five-year interval
.
Health Serv Res
2014
;
49
:
1407
25
.

9.

Rutberg
H
,
Borgstedt Risberg
M
,
Sjodahl
R
et al. 
Characterisations of adverse events detected in a university hospital: a 4-year study using the Global Trigger Tool method
.
BMJ Open
2014
;
4
:e004879.

10.

Mull
HJ
,
Brennan
CW
,
Folkes
T
et al. 
Identifying previously undetected harm: piloting the Institute for Healthcare Improvement’s Global Trigger Tool in the Veterans Health Administration
.
Qual Manag Health Care
2015
;
24
:
140
6
.

11.

Hibbert
PD
,
Molloy
CJ
,
Hooper
TD
et al. 
The application of the Global Trigger Tool: a systematic review
.
Int J Qual Health Care
2016
;
28
:
640
9
.

12.

Klein
DO
,
Rennenberg
RJMW
,
Koopmans
RP
et al. 
A Systematic Review of Methods for Medical Record Analysis to Detect Adverse Events in Hospitalized Patients
.
J Patient Saf
2021
;
17
:
e1234
40
.

13.

Hulley
SB
,
Cummings
SR
,
Browner
WS
et al. 
Designing Clinical Research
, 4th Edition.
Philadelphia, PA
:
Lippincott Williams & Wilkins
,
2013
, 367.

14.

Rutjes
AWS
,
Reitsma
JB
,
Coomarasamy
A
et al. 
Evaluation of diagnostic tests when there is no gold standard. A review of methods
.
Health Technol Assess
2007
;
11
:
iii, ix
51
.

15.

Trikalinos
TA
,
Balion
CM
.
Chapter 9: options for summarizing medical test performance in the absence of a “gold standard”
.
J Gen Intern Med
2012
;
27
:
S67
75
.

16.

Umemneku Chikere
CM
,
Wilson
K
,
Graziadio
S
et al. 
Diagnostic test evaluation methodology: a systematic review of methods employed to evaluate diagnostic tests in the absence of gold standard—An update
.
PLoS One
2019
;
14
:e0223832.

17.

Reitsma JBR
AWS
,
Khan
KS
,
Coomarasamy
A
et al. 
A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard
.
J Clin Epidemiol
2009
;
62
:
797
806
.

18.

Moraes
SM
,
Ferrari
TCA
,
Figueiredo
NMP
et al. 
Assessment of the reliability of the IHI Global Trigger Tool: new perspectives from a Brazilian study
.
Int J Qual Health Care
2021
;
33
:mzab039.

19.

World Health Organization (WHO)
.
Patient safety: rapid assessment methods for assessing hazards
.
Report of the WHO working group meeting
.
Geneva
,
2002
.

20.

Michel
P
.
Strengths and weaknesses of available methods for assessing the nature and scale of harm caused by the health system: literature review
Geneva
:
WHO
,
2004
.

21.

Leape
LL
,
Brennan
TA
,
Laird
N
et al. 
The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II
.
N Engl J Med
1991
;
324
:
377
84
.

22.

National Coordinating Council for Medication Error Reporting and Prevention
.
NCC MERP index for categorizing medication errors algorithm; 1996 [revised February 20, 2001]
.
NCC MERP
,
2001
.

23.

Baker
GR
,
Norton
PG
,
Flintoft
V
et al. 
The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada
.
CMAJ
2004
;
170
:
1678
86
.

24.

Mendes
W
,
Martins
M
,
Rozenfeld
S
et al. 
The assessment of adverse events in hospitals in Brazil
.
Int J Qual Health Care
2009
;
21
:
279
84
.

25.

Knottnerus
JA
,
Muris
JW
.
Assessment of the accuracy of diagnostic tests: the cross-sectional study
.
J Clin Epidemiol
2003
;
56
:
1118
28
.

26.

Wilson
RM
,
Michel
P
,
Olsen
S
et al. 
WHO Patient Safety EMRO/AFRO Working Group
.
Patient safety in developing countries: retrospective estimation of scale and nature of harm to patients in hospital
.
BMJ
2012
;
344
:e832.

27.

Schildmeijer
K
,
Nilsson
L
,
Perk
J
et al. 
Strengths and weaknesses of working with the Global Trigger Tool method for retrospective record review: focus group interviews with team members
.
BMJ Open
2013
;
3
:e003131.

28.

Sharek
PJ
,
Parry
G
,
Goldmann
D
et al. 
Performance characteristics of a methodology to quantify adverse events over time in hospitalized patients
.
Health Serv Res
2011
;
46
:
654
78
.

29.

Schildmeijer
K
,
Nilsson
L
,
Arestedt
K
et al. 
Assessment of adverse events in medical care: lack of consistency between experienced teams using the Global Trigger Tool
.
BMJ Qual Saf
2012
;
21
:
307
14
.

30.

Naessens
JM
,
O’Byrne
TJ
,
Johnson
MG
et al. 
Measuring hospital adverse events: assessing inter-rater reliability and trigger performance of the Global Trigger Tool
.
Int J Qual Health Care
2010
;
22
:
266
74
.

31.

Galazzi
A
,
Adamini
I
,
Consonni
D
et al. 
Accidental removal of devices in intensive care unit: an eight-year observational study
.
Intensive Crit Care Nurs
2019
;
54
:
34
8
.

32.

Marsh
N
,
Webster
J
,
Mihala
G
et al. 
Devices and dressings to secure peripheral venous catheters to prevent complications
.
Cochrane Database Syst Rev
2015
;
12
:CD011070.

33.

Fraser
GL
,
Riker
RR
,
Prato
BS
et al. 
The frequency and cost of patient-initiated device removal in the ICU
.
Pharmacotherapy
2001
;
21
:
1
6
.

34.

Gimenes
FRE
,
Baracioli
FFLR
,
Medeiros
AP
et al. 
Factors associated with mechanical device-related complications in tube fed patients: a multicenter prospective cohort study
.
PLoS One
2020
;
15
:e0241849.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)

Supplementary data