A scoping review of the methodological approaches used in retrospective chart reviews to validate adverse event rates in administrative data

Abstract Patient safety is a key quality issue for health systems. Healthcare acquired adverse events (AEs) compromise safety and quality; therefore, their reporting and monitoring is a patient safety priority. Although administrative datasets are potentially efficient tools for monitoring rates of AEs, concerns remain over the accuracy of their data. Chart review validation studies are required to explore the potential of administrative data to inform research and health policy. This review aims to present an overview of the methodological approaches and strategies used to validate rates of AEs in administrative data through chart review. This review was conducted in line with the Joanna Briggs Institute methodological framework for scoping reviews. Through database searches, 1054 sources were identified, imported into Covidence, and screened against the inclusion criteria. Articles that validated rates of AEs in administrative data through chart review were included. Data were extracted, exported to Microsoft Excel, arranged into a charting table, and presented in a tabular and descriptive format. Fifty-six studies were included. Most sources reported on surgical AEs; however, other medical specialties were also explored. Chart reviews were used in all studies; however, few agreed on terminology for the study design. Various methodological approaches and sampling strategies were used. Some studies used the Global Trigger Tool, a two-stage chart review method, whilst others used alternative single-, two-stage, or unclear approaches. The sources used samples of flagged charts (n = 24), flagged and random charts (n = 11), and random charts (n = 21). Most studies reported poor or moderate accuracy of AE rates. Some studies reported good accuracy of AE recording which highlights the potential of using administrative data for research purposes. This review highlights the potential for administrative data to provide information on AE rates and improve patient safety and healthcare quality. Nonetheless, further work is warranted to ensure that administrative data are accurate. The variation of methodological approaches taken, and sampling techniques used demonstrate a lack of consensus on best practice; therefore, further clarity and consensus are necessary to develop a more systematic approach to chart reviewing.


Introduction
Patient safety is a priority for health systems with 1 in 10 patients experiencing an in-hospital adverse event (AE) in high-income countries annually.The cost of patient harm is estimated at 1-2 trillion US Dollars per year [1], arising from additional resources, increased care, and prolonged hospital stays [2].
Accurate AE data are fundamental to addressing these costs and enabling patient safety and healthcare quality advancements [1].Administrative healthcare data are a valuable source of information on AEs and a potentially efficient tool for monitoring patient safety [3].Translating chart data into alphanumeric data [4] provides a coded summary of the patient and their encounter with the health system [5].The International Classification of Diseases (ICD), used for this purpose by all World Health Organization member states, provides a standardized method of reporting and monitoring health-related issues across hospitals, regions, and countries [6].
Medical knowledge advancements have resulted in many ICD revisions [7].Various derived classifications of ICD and ICD modifications have been developed to monitor conditions and complications in specific areas or settings [8] and address country-specific needs [9].
Administrative data-based AE detection tools have been developed to screen for potential AEs and assess their preventability [10].The Agency for Healthcare Research and Quality (AHRQ) Patient Safety Indicators (PSIs) demonstrate the potential use of administrative data for benchmarking rates of AEs across the USA [11].Similarly, the Classification of Hospital-acquired Diagnoses (CHADx) in Australia [12] generates patient safety data for monitoring AEs.The increasing popularity of administrative data-based safety indicators has coincided with an increase in validation studies [10].
Inaccurate and poor recording of AEs [13,14] warrants validation of administrative datasets and ICD coding [15].A systematic and standardized approach to measuring AEs in administrative data would improve dataset accuracy and enable benchmarking across hospitals [16] for comparison and improvement [17].
Whether a single gold standard for the measurement of harms exists is disputed [18,19]; however, medical charts are often considered a 'gold standard' source of patient information [20][21][22].Despite their limitations, chart reviews are commonly used to assess healthcare quality and inpatient care [23] and identify whether administrative data accurately reflect the events of a patient's hospitalization [24].They utilize readily available data and are more feasible than observational studies which are time-consuming, expensive, and complex in terms of confidentiality and bias [25].The Harvard Medical Practice Study (HMPS) and the Global Trigger Tool (GTT) are examples of two-stage chart review approaches [26].Nonetheless, validated administrative data are potentially another valuable source of AE data [3,[27][28][29].The need to understand how chart reviews have been conducted to identify optimal practices for measuring administrative data accuracy provides rationale for this review.

Methods
This review was conducted in line with the Joanna Briggs Institute (JBI) methodology for scoping reviews [30].A protocol for this study has previously been published [31].Searches in PubMed (MEDLINE), CINAHL, the JBI Evidence Synthesis, and The Cochrane Database of Systematic Reviews did not identify any similar scoping or systematic reviews.Objectives 1.To present an overview of chart reviewing approaches and tools used to validate rates of AEs in administrative datasets.2. To collate and map evidence of chart reviewing to measure the reliability of these datasets.

Inclusion and exclusion criteria
The inclusion and exclusion criteria are outlined in Table 1.

Changes since protocol
The exclusion of review articles is a deviation from the study protocol which initially stated that the types of evidence sources would be left open.All available articles were included by searching reference lists of any identified reviews.

Search strategy
Searches in PubMed (MEDLINE) and CINAHL databases, conducted with a university librarian, yielded titles and abstracts from which text words were extracted and used to develop a search strategy (Supplementary material).Using this strategy, PubMed (MEDLINE) and CINAHL were searched on 4 April 2023 and Web of Science and Scopus were searched on 24 April 2023.The reference lists of included sources were searched and a search for grey literature was conducted [32].
The HMPS was the first large-scale medical record review conducted over 30 years ago [33]; therefore, literature published in English between 1991 and 2023 was eligible for inclusion.

Source of evidence screening and selection
Two researchers independently screened study titles and abstracts.Full-text versions of potentially relevant sources were assessed against the inclusion criteria.Excluded sources were assigned an exclusion reason (Fig. 1).Conflicts were

Data extraction
A data extraction template was developed and piloted.Key variables included: study title, name of author(s), year of publication, country, study aim, study design, methods, study population, specialty associated with the AE reported on, and key findings relating to the accuracy of the administrative data and administrative data-based detection tools.

Analysis and presentation of results
The extracted data were exported and arranged into a charting table in Microsoft Excel that was continuously updated.A descriptive numerical summary and a qualitative content analysis of the included studies were conducted.
The accuracy of the administrative data or administrative data-based detected tools was determined based on the specificity, sensitivity, positive predictive value (PPV) or negative predictive value (NPV), where reported, and on statements from the authors of each study relating to the validity of the datasets or detection tools and the extent to which they can be used to accurately measure rates of AEs.

Search results
Following duplicate removal and screening of the search results as outlined in the Preferred Reporting Items for Systematic reviews and Meta-Analyses flowchart (Fig. 1), 56 sources

Demographics and study characteristics
Studies by country, publication year, specialty, and study design are outlined in Table 2.All 56 studies were published between 1997 and 2022 and were conducted in 12 countries with 31 (55%) in the USA.Various specialties were explored; however, 19 studies reported on AEs in surgical specialties and 14 studies validated AEs in all specialties.All studies included a retrospective chart review; however, little consistency in relation to the methods and study design was identified.Of the included studies, 32 measured the accuracy of the administrative data coding and 24 reported on the accuracy of the administrative data-based AE detection tools (Table 2).
Other notable study features include coding classification system, administrative data-based detection tool, type of medical record system used, and number of medical records reviewed (Table 3).The accuracy of the AHRQ PSIs was measured in 16 studies.Excluding five studies that were unclear in relation to the coding classification system used, all other administrative data-based AE detection tools used versions of ICD.
The medical records system used was unclear in 29 studies as the authors did not specify the format of the charts but referenced using a manual chart review or conducting medical record reviews to validate rates of AEs in administrative data.Paper-based charts, electronic medical records,    and a mix of both were used in the remaining studies (Table 3).

Methodological approaches to chart review
The GTT was used in three studies.A further 7 studies used an unspecified two-stage review approach, and 39 studies used a single-stage review process.The methodological approach used in the remaining seven studies was unclear (Table 2).

Sampling strategy
The sampling strategies used to identify charts for review are outlined in Table 2. Screening programmes were used in 24 studies to select flagged charts, i.e. charts that had been assigned specific codes, indicating that the patient had experienced an AE.A further 11 studies included both flagged charts and non-flagged charts and the remaining 21 studies selected a random sample of non-flagged charts.

Accuracy
The sensitivity, specificity, PPV, and NPV of the administrative data-based identification of AEs validated against medical records, where found, in addition to the number of AEs detected by chart review and in the administrative data are presented in Table 4. Administrative data were reported to be accurate for detecting AEs in 11 studies; however, two studies [34,35] suggested that chart review is a poor method of detecting AEs.The administrative data and detection tools did not accurately represent rates of AEs in 31 studies.Administrative data were reported to be useful for monitoring patient safety and identifying potential patient safety incidents but not for public reporting or measuring performance by four studies.A further 10 studies reported varied or moderate accuracy and suggested that administrative data have potential for monitoring AEs.

Statement of principal findings
This review highlights trends in relation to the characteristics of validation studies and presents an overview of the methodological approaches and strategies used to conduct retrospective chart reviews.It also explores the accuracy of AE rates in administrative data.

Strengths and limitations
The focus on breadth rather than depth of knowledge is an inherent limitation of scoping reviews [36]; however, this methodology was deemed suitable as it is appropriate for addressing broad research questions and mapping evidence from a variety of sources [37].
Interpretation within the context of the wider literature Geographical location of the included studies is notable.Previous researchers similarly identified the USA as a leading location where validation studies are being conducted [10,21,38].The proportion of AHRQ PSI studies included may have contributed to this finding as these indicators are unique to the USA.Collection of administrative data in the USA for billing and insurance claims purposes may be incentive for improving data accuracy and contribute to greater numbers of validation studies.Further global exploration of validating AE rates in administrative data through chart review may allow for broader comparison between health systems.
The upward trend in relation to year of publication demonstrates increasing recognition of administrative data as a source of information on AEs and a momentum towards enhanced accuracy.Retrospective chart review is commonly used to assess quality of care and collect data in clinical epidemiological research [39].Therefore, it is unsurprising that chart reviews have been used in response to calls for validating administrative data.
The potential of the chart review methodology to validate rates of AEs in various specialties in administrative data was highlighted by this review.Surgical AEs such as wound infections, post-operative complications, and accidental punctures or lacerations were most commonly validated and are among the most reported in-hospital AEs [40,41].Although reporting surgical AEs may be associated with litigation and repercussions regarding reputation [42], their disclosure decreases the chances of a patient filing a lawsuit [43][44][45][46].Given the informed consent process and the acknowledgement of such AEs as possible complications, few surgical AEs should be considered truly unanticipated [47]; therefore, they are more likely to be readily acknowledged than AEs resulting from negligence or medical error.As surgical AEs are more visible, their detection may be more common than AEs in other specialties with 97% of surgeons reporting that they would definitely disclose such an AE to a patient [48].The PSIs are largely composed of surgical indicators [49]; therefore, their inclusion may have contributed to the focus on surgical AEs.Given the burden of surgical outcomes in relation to reputation and litigation, accurate data and a standardized chart review method would be beneficial to surgeons for monitoring AEs.
The various named study designs identified reiterate the lack of standards for reporting chart review study designs and the limited availability of literature on chart review methodologies [50].Evidently, there has been little progress towards developing a more systematic and standardized review approach.This further highlights the urgency for consensus on best practice for conducting chart reviews.
AEs detected through incident reporting systems have recently been compared to those detected using the GTT [57].This current review explores all methodological chart review approaches used to validate rates of AEs in administrative data.In line with previous research [26], the variation          of single-stage and two-stage chart review processes demonstrates a lack of consensus on how to measure AEs.A twostage review process has been identified as a valid and reliable method of conducting chart reviews [58].The HMPS and GTT have frequently been compared with some researchers identifying higher sensitivity and specificity and better AE detection using the GTT [59] and others identifying more AEs using the HMPS [60].This further demonstrates the lack of consensus in relation to guidelines for best practice.
The strategy for obtaining a chart review sample is a key consideration [23].Random sampling is the gold standard as each chart has equal chance of being selected.This technique reduces bias and increases the generalizability of results [23] and should be considered when developing a more systematic chart review approach.Reviewing a random sample of charts allows for detection of all potential AEs and inclusion of falsenegatives missed by detection tools if administrative data are inaccurate.A limitation of this method is the large number of medical records required which may not be possible in all chart reviews [23].
A random process for selecting records is critical when using the GTT [61].The included GTT studies used random sampling; however, the use of flagged charts was widespread across other methodological approaches.This further enhances the need for consensus in relation to the chart review method and the strategies used to identify samples.Internal and external validity may be compromised if studies fail to use a random sample or if a sample is selected from an atypical population [62].
Convenience sampling allows charts identified as having an AE by administrative data-based detected tools to be selected.This validates known cases of AEs but may exclude miscoded or unassigned AEs missed by detection tools and underrepresent true AE rates.This method presents limitations in relation to generalizability but is useful when reviewing rare AEs or smaller sample sizes [23].
Reviewing only flagged charts was identified by several authors as a limitation [63][64][65][66][67][68] as excluding non-flagged charts may have underrepresented AE incidence by omitting false-negatives [63,68].The value of these studies is worth considering as rather than validating administrative data, they explored the validity of indicators used to flag the sample of charts as measures of patient safety [16].It may be more valuable to assess the validity of the coding to ensure that the data being screened are accurate.
Including both flagged and non-flagged charts allows truenegatives and false-negatives to be detected and ensures that AE rates are not underestimated [63].Furthermore, flagged and random samples are advantageous in research investigating rare conditions as an entirely random sample may be inefficient and resource demanding [69].
The range of ICD revisions and variants used contributes to the complexity of comparing coded data and clinical contexts internationally [70].The studies that reported on the accuracy of administrative data-based detection tools relied on different modifications and revisions of ICD to create their patient safety data.Therefore, the range of ICD revisions and systems used must be considered as a potential source of result variation.Comparability of AEs internationally is threatened by country-specific modifications of ICD; therefore, a standardized ICD classification that is universally accepted may facilitate benchmarking of AEs using administrative data [9].Furthermore, the medical record system used and the number of charts reviewed are important features of the studies in this review.Parallel use of both paper-based and electronic medical records may result in inconsistencies between record systems [71] and may have contributed to variation in results.Consequently, researchers may wish to consider the type of medical record system used in future validation studies.Standardized processes for entering data and limiting variation in medical records increase the accessibility and usability of chart data.Furthermore, the development of consistent data collection procedures and more systematic approaches are critical to rigour in chart review studies [72].
Sample size is a key sampling consideration in chart review studies.Greater power is associated with studies that have larger sample sizes [23].The number of charts reviewed may be valuable to researchers planning future studies as chart reviewing can be time-consuming and the feasibility of locating charts and extracting data may require consideration [73].
Although various levels of accuracy were identified, administrative data can potentially provide accurate information.In line with previous research [28], while some studies indicated that these datasets can be reliable, others identified inaccuracies when comparing administrative data with chart data.The studies that identified rates of AEs recorded in administrative data with good accuracy [69,[74][75][76][77][78][79][80][81][82][83] strengthen the case for using administrative data to inform health research and policy.Although further work is warranted, these findings are in agreement with previous claims that with consensus on rigorous methodology for validating and improving existing algorithms, administrative data can provide valuable information on AE rates and patient safety incidents [28].

Implications for policy, practice, and research
This review highlights the potential for administrative data to improve patient safety, healthcare quality, and reduce healthcare costs by providing valuable data on AE rates; however, further research is warranted to ensure that administrative data are robust.The lack of consensus on best practice for conducting chart reviews is highlighted.The different methodological approaches and sampling strategies used demonstrate the potential for these studies to differ significantly in relation to the interpretation of their results and their credibility.These inconsistencies mirror previously identified common pitfalls [23] and may devalue the chart review method of validating rates of AEs in administrative data.The development of a standardized protocol with clear guidelines could enhance the methodological rigour of the retrospective chart review and allow for the accuracy of administrative data to be improved.

Conclusions
Accurate administrative data will enable researchers to compare hospital environments in the context of patient safety and facilitate benchmarking of AE rates across hospitals, regions, and countries.Further clarity and consensus on chart review methods is necessary to develop a more systematic chart review approach for improving patient safety and healthcare quality.

Figure 1
Figure 1 PRISMA flowchart of source selection process of 13 specific ICD codes and one code category (I-codes: diseases of the circulatory system) defining AEs.Medical records were obtained as paper copies or were reviewed on location at 94 flagged and 230 unflagged)

Table 1 .
Inclusion and exclusion criteria.
resolved through discussion and included sources were confirmed.A third researcher consulted on certain sources.

Table 2 .
Distribution of included sources according to country, year of publication, clinical specialty associated with the adverse event(s), named study design, methodological approach, and sampling strategy.

Table 3 .
Key study features of included studies-author, coding classification system, administrative data-based detection tool, type of medical record system used, and number of medical records reviewed.

Table 4 .
Included studies-author, no. of AEs identified by chart review, no. of AEs identified by administrative data, sensitivity, specificity, PPV, NPV, verbatim statement by authors on validity/usability of data, summarized validation outcome.