Measurement of clinical documentation burden among physicians and nurses using electronic health records: a scoping review

Abstract Background   Objective Electronic health records (EHRs) are linked with documentation burden resulting in clinician burnout. While clear classifications and validated measures of burnout exist, documentation burden remains ill-defined and inconsistently measured. We aim to conduct a scoping review focused on identifying approaches to documentation burden measurement and their characteristics. Materials and Methods Based on Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) Extension for Scoping Reviews (ScR) guidelines, we conducted a scoping review assessing MEDLINE, Embase, Web of Science, and CINAHL from inception to April 2020 for studies investigating documentation burden among physicians and nurses in ambulatory or inpatient settings. Two reviewers evaluated each potentially relevant study for inclusion/exclusion criteria. Results Of the 3482 articles retrieved, 35 studies met inclusion criteria. We identified 15 measurement characteristics, including 7 effort constructs: EHR usage and workload, clinical documentation/review, EHR work after hours and remotely, administrative tasks, cognitively cumbersome work, fragmentation of workflow, and patient interaction. We uncovered 4 time constructs: average time, proportion of time, timeliness of completion, activity rate, and 11 units of analysis. Only 45.0% of studies assessed the impact of EHRs on clinicians and/or patients and 40.0% mentioned clinician burnout. Discussion Standard and validated measures of documentation burden are lacking. While time and effort were the core concepts measured, there appears to be no consensus on the best approach nor degree of rigor to study documentation burden. Conclusion Further research is needed to reliably operationalize the concept of documentation burden, explore best practices for measurement, and standardize its use.


INTRODUCTION
Rapid adoption of electronic health records (EHRs) following the passage of the Health Information Technology for Economic and Clinical Health (HITECH) Act has led to advances in both individual-and population-level health. 1 HITECH has improved healthcare quality, patient safety, and diagnostic accuracy through enhanced data management and timely reuse; interoperable systems have facilitated care continuity and monitoring of compliance metrics. [2][3][4][5] EHR-facilitated, guideline-based care has been associated with reduced redundancies 6,7 and streamlined billing administration. 8 Largely still in its infancy, the implementation of EHRs has also resulted in unintended consequences on clinical practice and healthcare systems, including significant increases in clinician documentation time. [9][10][11][12][13] Extended work hours, time constraints, clerical workload, and disruptions to the patient-provider encounter, have led to a rise in discontent with existing documentation methods in EHR systems. 6,14,15 This documentation burden has been linked to increases in medical errors, 3,9,16 threats to patient safety, 3,9,16 inferior documentation quality, 17,18 job attrition, and, ultimately, burnout among nurses and physicians. 3,[9][10][11]14,[16][17][18][19][20][21][22] In concert with Affordable Care Act (ACA) reimbursement models, Meaningful Use (MU) mandates, and a regulatory-rich environment, EHRs have drastically altered clinical documentation workflow and communication in routine healthcare. 13,15,23 Physicians have reported willingness to remain out of compliance with EHR incentive programs (eg, MU and the Physician Quality Reporting System 24 ) in favor of mitigating documentation burden (hereinafter referred interchangeably as "burden"). 15,25 Still, studies consistently demonstrate that physicians spend twice as much time on electronic documentation and clerical tasks as compared to time providing direct patient care. 14,[26][27][28][29][30] Similarly, nurses devote more than half of their shift time to EHR data entry and retrieval 19,20 and report reduced direct patient contact. 31,32 While researchers have discussed the challenges of burden and its implications for clinician burnout due to EHRs over the past decade, 5,15,33 limited attention has been paid to discriminating the antecedent concept of burden (defined as a duty, responsibility, etc, that causes worry, difficulty, or hard work), 34 from burnout (defined as long-term work-related stress reaction marked by emotional exhaustion, depersonalization, and a lack of sense of personal accomplishment). 35,36 Clinician burnout has been well-documented and widely quantified using surveys and psychological measurements throughout peer-reviewed literature. [37][38][39][40] Yet, to our best knowledge, there is a lack of consensus on approaches to measure burden. 15,37,[41][42][43][44][45] While EHR dissatisfaction has been extensively studied and some clinician activity metrics have been proposed, 46 few empirically-based readily-available solutions to reduce burden exist. 11 Interventions to assuage burden have ranged from the utilization of scribes and remote transcription services 27 to text summarization and dictation software. 16,47 In March 2020, the Department of Health and Human Services (HHS) released a report outlining 3 primary goals to reduce EHR-related clinician burdens that influence care: reduce the time and effort clinicians require to document health information, reduce the effort required to meet regulatory requirements, and improve EHR ease of use. 48 Evaluating the impact of interventions that target these goals will necessitate standardized, quantitative measurements.

OBJECTIVE
The purpose of this scoping review is to assess the state of science, identify gaps in knowledge, and synthesize characteristics of documentation burden measurement among physicians and nurses using EHRs.

MATERIALS AND METHODS
We conducted a scoping review using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) extension for Scoping Reviews (ScR) guidelines. 49 A scoping review fit our objective to describe the breadth of methods used to measure documentation burden. 49

Search strategy and selection criteria
We systematically searched the MEDLINE, Embase, Web of Science, and CINAHL databases for all English-language studies published in peer-reviewed journals and conference proceedings, investigating documentation burden among physicians and/or nurses in ambulatory and/or inpatient settings from inception to April 20, 2020. We evaluated all relevant literature identified through in-text references among eligible studies. Burden is not specifically represented in Medical Subject Headings (MeSH); therefore, we explored both keyword and MeSH terms for 2 burden-related concepts outlined in the HHS report Strategy on Reducing Burden Relating to the Use of Health IT and EHRs 48 documentation: (a) effort, and (b) time. We also focused our search on: (a) the EHR and (b) physicians or nurses. The finalized search strategy is summarized in Table 1. ) OR ("time*") OR ("Measure*") OR ("measurement") OR ("quantify") OR ("quanti*") OR ("metric") AND documentation effort ("Documentation*"[Mesh]) OR ("documentation*") OR ("note*") or ("unstructured data") OR ("narrative") OR ("Burnout, Professional"

Study selection and selection criteria
We selected inclusion and exclusion parameters a priori, and iteratively modified them to exclude studies involving niche clinical systems and those strictly comparing to paper-based documentation ( Table 2). We included all peer-reviewed primary studies that focused on EHR utilization with an objective time or effort measure 48 (eg, EHR usage logs, which report time stamped documentation events) in the review. The term "physicians" encompassed attending physicians, fellows, resident physicians, and interns; "nurses" referred to registered nurses. We focused on physicians and nurses given our aim of identifying interprofessional measurements of documentation burden. We excluded studies comparing EHR documentation to paperbased systems if they were not focused on measuring burden, but rather on EHR implementation evaluation.
After removing duplicates, 2 reviewers (AJM and JMS, AJM and RC, AJM and SS, or AJM and EL) independently screened article titles and abstracts for relevance using Covidence. 50 Two authors (with a third serving as a tiebreaker) reviewed each potentially relevant abstract for eligibility criteria in the full-text. We included fulltext articles with concordant decisions by the 2 reviewers in the final analysis; for discordant decisions, all reviewers reexamined and adjudicated until a consensus was reached.

Data extraction and analysis
One author (AJM) performed data charting for all articles meeting full-text inclusion criteria (see online Supplementary Table), which was reviewed by all authors and discussed. We extracted the following information: publication year, geographic location, time source, unit(s) of analysis, activity, sample size, sample characteristics, EHR system, provider role/specialty, clinical setting, study design and objectives, study type (eg, quantitative or mixed-methods), site type (eg, single or multisite), exposure and outcome measures, analytical and statistical methods, study limitations/bias, and major findings. We reported study limitations and biases such as threats to internal and external validity to appraise rigor. We used the HHS concepts to organize our reporting of measurement characteristics. 48 HHS does not elaborate further on definitions of: (a) time, (b) effort, and (c) outcomes assessed 48 ; therefore, we conducted purposeful thematic analysis to identify proxies and synthesize these 3 recurring concepts. 51 We iteratively combined themes until we achieved a consensus.

Sources of evidence
Our search strategy yielded 3482 potentially relevant manuscripts from MEDLINE (n ¼ 507), Embase (n ¼ 1143), Web of Science (n ¼ 1007), and CINAHL (n ¼ 825). Seven additional manuscripts were identified through in-text references. After eliminating duplicates, 1946 titles/abstracts were screened; of those, 166 were eligible for full-text review. Consensus was achieved for all disagreements concerning the inclusion of full-text articles. Thirty-five studies meeting criteria were summarized in the final analysis ( Figure 1).

Limitations and biases reported
Two limitations were ubiquitous across included studies (  53,54,64 Six studies cited selection bias derived from both the presence of self-selection and voluntary participation among high-performing subjects 27,28 and the presence of low response. 22,56,64,75 Eleven studies noted a lack of data triangulation, such as combining log data with direct observations, encounter information or qualitative data to offer contextual information corresponding to types of EHR interfaces used (eg, remote, inpatient, outpatient) for login timestamps, direct patient care, and other data. 13,14,[52][53][54][55][56]60,63,66,76 Twelve studies identified the presence of potential confounding. 13

DISCUSSION
In this scoping review, we identified 35 studies that explored the measurement of documentation burden among physicians and nurses, underlining the overall paucity of research in the domain. As may be expected, all 35 studies were published post-HITECH Act. Seven effort constructs, 4 time constructs, and 11 units of analysis were uncovered. Our effort constructs-except workflow fragmentation and cognitively cumbersome work (eg, multitasking)-largely align with "proposed core EHR use measures (for practice efficiency)" published by Sinsky and colleagues which indicates bur- Note: constructs and units are not intended to be comprehensive of all possibilities but rather reflect content identified in scoping review. den may be quantified through existing metrics. 46 Generated with expert stakeholders, Sinsky's core measures include total EHR time, work outside of work, time on documentation, time on prescriptions, inbox time, teamwork for orders, and undivided attention to patients. 46 Further efforts should examine these measures for validity and reliability. Fewer than half (n ¼ 16) of the studies investigated the impact of burden on clinicians and/or patients. Methodologies varied across study design, suggesting there is no current consensus regarding best approach or standard to study burden, although it is possible an ensemble of methods coupled with the triangulation of multiple data sources will emerge as a best practice. Historically, TM studies have been considered the gold standard for quantifying the effects of computer systems on task-based clinical workflow and duration. 82,83 Despite yielding valid results, 70,84,85 TM studies are costly and time-consuming to perform 83 and engage only a handful of participants per study. In addition to concerns regarding the generalizability of TM studies, prior research has identified widespread methodological inconsistencies in their design and conduct as well as in their quantitative analyses and reporting of results, making it difficult to synthesize findings across studies. 70,86 Readily accessible and scalable, and less subject to the Hawthorne effect, evidence may suggest that analyzing EHR usage logs is a more feasible alternative as these data were used in the overwhelming majority of included studies (80.0%). Nevertheless, research on the use of EHR usage logs to evaluate clinical activity has revealed a dearth of validation, cross-study analyses, and, most critically, defined terminology (eg, access log, audit log) and measures. 46,87 These inconsistencies parallel those found in TM studies, as described above. TM studies provide valuable contextual information on time and sequence of activities performed which can be triangulated with EHR usage logs to better understand burden in the context of clinical workflows. In recognizing that all methods have strengths and weaknesses, we anticipate that future work will identify the methods of measurement and triangulation of data that best align with different research objectives related to burden.
One major finding of this review was the absence of quantitative studies assessing the reliability and validity of time and effort measures. Of the 35 studies included, only 1 study intended to develop a measure of burden (ie, EHR usage outside shift), 13 while 2 studies individually employed TM studies to empirically validate proposed measures of workflow and the use of EHR usage log data in characterizing workload. 27,70 Interobserver reliability was reported in only 2 studies. 28,68 As described above, previous studies on quantifying physician EHR activity through EHR usage logs have noted similar challenges. 87 The lack of studies developing and validating burden measures confirms that limited efforts have been dedicated to formally and objectively quantifying and measuring burden, despite increasing references to it in public policy and lay literature. Researchers have often used unstandardized proxies to quantify burden which elucidates why no objective proxies exist. 6,13,14,25 Reinforcing the absence of empirical validation studies, there is a lack of an agreed-upon definition for burden and a plethora of definitions throughout the literature. 6,13,14,25,28,43,47,52,[88][89][90] We found that many related-but different-concepts were used in the context of studies quantifying time and effort, such as workload, 27,78 workflow, 13,74 work disruption, 75 efficiency, 22,52 cognitive burden, 56 usability, 74 and productivity, among others. 69 In contrast, burnout is identifiable in controlled vocabularies including, the International Classification of Diseases (ICD), in addition to the Diagnostic and Statistical Manual for Mental Disorders (DSM) and MeSH. 91,92 Fur-thermore, validated measures of burnout, such as the Maslach Burnout Inventory and the Mini Z burnout survey are often applied, 69,73 whereas no known analog for burden is currently available. Likewise, in a literature review conducted on the impact of EHRs on documentation time among physicians and nurses, Poissant and colleagues suggested that a lack of research evaluating EHR time efficiency is likely associated with the poverty of rigorous methods accurately capturing time. 12 We found that generalizability and measurement error issues were partially driven by the use of distinct EHR systems with some instances of proprietary and opaque vendor-defined time metrics for shift and active EHR time. 73 There was also imprecision in time capture among EHR usage log studies. Reported elsewhere in the literature, EHR usage logs have exhibited unreliable degrees of accuracy for both clinician activity and time durations captured. 87 Intended for troubleshooting technical problems and HIPAA compliance, EHR usage logs originate from many interconnected information systems and sources (eg, devices). 93 Vendor-defined time metrics may not be generalizable between, or within, institutions or provide precise estimates in real-world settings. Therefore, given the value in measuring clinician EHR time, researchers should explore novel algorithmic methods to validate these metrics and EHR usage log data. For example, Dziorny and colleagues developed an automated algorithm to quantify shift duration among physicians in an inpatient setting and internally validated it against scheduled shift-time. 83 Likewise, DiAngi et al proposed the "calculated EHR time outside of clinic" (CLOC) metric for ambulatory settings to measure after clinic hours using EHR usage logs and were able to correlate their findings with selfreported time spent in the EHR after clinic hours. 22 The HHS Report-Strategy on Reducing Burden Relating to the Use of Health IT and EHRs-aims to evaluate the clinical impact of burden (ie, time and effort) on clinicians and/or patients; 48 however, fewer than half the studies reviewed investigated an outcome of interest (n ¼ 16). Of those studies (note: outcomes were not mutually exclusive), the majority examined clinician satisfaction and burnout (n ¼ 12), while only half examined clinical process measures as an end goal. Half evaluated patient satisfaction and health indicators. Research questions and study objectives were widespread across included studies.
In this review, scribes represented 1 of 3 areas of study concerning proposed interventions to mitigate burden (n ¼ 3); 14,52,53 however, associated costs and high turnover rates among scribes suggest that this solution may not be broadly feasible or sustainable. 47 In the context of reducing documentation burden, implementing and measuring the impact of scribes does not solve the higher-level information processing issues that informatics research should be investigating (eg, reduction in data entry requirements, improvement of system usability) and possibly diverts resources away from more sophisticated biomedical informatics approaches. Other identified interventions, such as training on EHR use (n ¼ 2) 22,56 and documentation redesign (n ¼ 3) 58,75,78 also have their strengths and weaknesses. Training may represent a lower cost method of mitigating burden than scribes, while documentation redesign may be more costly but likely more effective at solving information processing and usability concerns. Moreover, lack of standardized measures leads to the inability to conduct comparative effectiveness studies on design modifications within EHR systems 15 or across distinct burden-alleviating interventions.
In summary, our findings identified distinct, but not necessarily comprehensive, characteristics of measuring burden: 7 effort constructs, 4 time constructs, and 11 units of analysis (see Table 3).

Limitations
While this study sought to investigate literature on the operationalization of documentation burden and the development and/or validation of quantitative burden measures, research in this domain has not yet matured. Despite employing broad search terms and queries, the majority of the literature retrieved did not detail how to conceptualize and/or measure burden. We extracted manuscripts using keywords, as extant MeSH terms were unable to capture the phenomenon of study interest; in fact, no term for burden used in this specific context exists. It is conceivable that some articles were not captured because: (a) our keywords were limited, and/or (b) our queries were not sufficiently broad or narrow.

Future directions
Future research should build upon existing burden evidence, focusing on strengthening objectivity and generalizability. Proposed quantitative measures of burden such as the after scheduled duty hours measure described by Cox and colleagues should undergo rigorous testing and validation across settings and specialties. 13 Additionally, HHS links time and effort concepts to clinical impact; 48 therefore, research should directly connect measurement of these concepts with specific outcome measures to be able to accurately evaluate documentation burden over time. This remains a difficult undertaking as studies have shown that neither burden nor task value in the clinical context are identical across all EHR interactions or across different roles and specialties. 10,20,25 Examining tradeoffs between specific tasks within the EHR, Rao and colleagues discovered that EHR functionalities are not equally burdensome. 25 They also found that settings are not equally burdensome, reporting that shift-based work may be associated with less burden and that ambulatory clinical documentation is rated equally valuable and burdensome. 25 Perceptions of distinct documentation types among nurses have also been studied, yet no objective criteria have been established to evaluate value. 19 We found that only 1 study investigating EHR work afterhours (ie, "pajama time") included nurses. 77 While "pajama time" connotes remotely accessing the EHR from home to document, few inpatient nurses do so given the immediacy of their documentation. Thus, data entry rates may be more suitable for measuring nurse burden. 76 Because physicians working in general medicine were most represented in our findings, future work should be dedicated to characterizing and measuring burden among understudied professions and settings (ie, nurses and subspecialties).
However, promisingly, burden measures identified were not strictly unique to individual professions and workflows, supporting the opportunity for defining interprofessional measures of burden in future work. We propose that burden be examined as a global composite measure, indicative of magnitude and directionality, consistent with the characteristics uncovered in this review. This would require: (a) developing a universally agreed-upon inventory for key EHR tasks and activities weighted for relative value according to burden (ie, a taxonomy) that could be linked to clinical outcomes such as "quality, financial or professional satisfaction" 15,27,28 ; and (b) quantifying the relationship between "pain points" and specific features in the EHR with more granularity. This furnishes the examination of task value, as indicated by task relationship with burden, a high priority area for future research. Such research would allow the identification of tasks that are of high burden but low value so that EHR design and intervention efforts may target the elimination or mitigation of these tasks.

CONCLUSION
Documentation burden among interprofessional clinical roles remains understudied and under-measured in both inpatient and ambulatory settings. This review suggests that concrete, validated measures of burden in research are lacking, which pales in comparison to burnout literature. 36 Moreover, this review demonstrates that the existing evidence is imprecise and fragmentary. While there is a multitude of measures for both effort and time among the included studies, the majority lack generalizability across study setting, patient population, EHR system, activity type, role, and subspecialty. In the absence of standardization, these studies additionally run the risk of measurement error including misclassification of idle and active time, completeness of task measurement, and precision of time capture. Hence, it would be prudent to further explore easily accessible, scalable alternatives, such as EHR usage log data. Targeting burden to evaluate the impact of quality improvement strategies and interventions requires quantifiable measures that are comparable and consistent across time, settings, professions, and contexts. We propose that burden should be examined as a global composite measure based on task value, consistent with burden measurement characteristics uncovered in this review. Further research is needed to reliably operationalize and standardize the concept of burden and to explore how it is best measured across clinical settings.

FUNDING
This study was supported by the US National Library of Medicine of the National Institutes of Health (NIH) under the training fellowship award 5T15LM007079 and the National Institute for Nursing Research (NINR) under grant numbers 1R01NR016941 and 5T32NR007969.

AUTHOR CONTRIBUTIONS
AJM and SCR conceptualized the scope of this review. AJM, JMS, RC, SS, and EL conducted the initial and full-text screenings. AJM drafted the manuscript with significant revisions and feedback from JMS, RC, KDC, and SCR.

SUPPLEMENTARY MATERIAL
Supplementary material is available at Journal of the American Medical Informatics Association online.

DATA AVAILABILITY STATEMENT
The data underlying this article are available in the article and in its online supplementary material.