-
PDF
- Split View
-
Views
-
Cite
Cite
Thomas J Reese, Guilherme Del Fiol, Joseph E Tonna, Kensaku Kawamoto, Noa Segall, Charlene Weir, Brekk C Macpherson, Polina Kukhareva, Melanie C Wright, Impact of integrated graphical display on expert and novice diagnostic performance in critical care, Journal of the American Medical Informatics Association, Volume 27, Issue 8, August 2020, Pages 1287–1292, https://doi.org/10.1093/jamia/ocaa086
- Share Icon Share
Abstract
To determine the impact of a graphical information display on diagnosing circulatory shock.
This was an experimental study comparing integrated and conventional information displays. Participants were intensivists or critical care fellows (experts) and first-year medical residents (novices).
The integrated display was associated with higher performance (87% vs 82%; P < .001), less time (2.9 vs 3.5 min; P = .008), and more accurate etiology (67% vs 54%; P = .048) compared to the conventional display. When stratified by experience, novice physicians using the integrated display had higher performance (86% vs 69%; P < .001), less time (2.9 vs 3.7 min; P = .03), and more accurate etiology (65% vs 42%; P = .02); expert physicians using the integrated display had nonsignificantly improved performance (87% vs 82%; P = .09), time (2.9 vs 3.3; P = .28), and etiology (69% vs 67%; P = .81).
The integrated display appeared to support efficient information processing, which resulted in more rapid and accurate circulatory shock diagnosis. Evidence more strongly supported a difference for novices, suggesting that graphical displays may help reduce expert–novice performance gaps.
INTRODUCTION
Despite widespread electronic health record (EHR) adoption in the United States and mounting patient data, most information displays remain analogous to paper-based charts.1,2 Conventional EHR displays are primarily tabular with discrete values over time and typically have limited graphics to support processing patient data.3,4 Clinicians practicing in data-intense settings, such as critical care, are underserved by conventional EHR displays that fail to support efficient patient information processing and pattern recognition.5,6
Substantial research has been done on critical care information displays. Two systematic reviews conducted by our group on information displays in critical care identified 50 displays evaluated in simulated settings and 17 in patient care settings.7,8 We found that integrated displays with patient information organized into clinically relevant and meaningful groups and combined with displaying graphical trend information best supported clinician performance.7 Moreover, a common approach among critical care information displays was designing for a specific purpose, such as detecting adverse respiratory and cardiovascular events.9–11 Of these displays, however, only a small fraction was disseminated for routine patient care. We developed a method of information display consisting of modular and reusable graphical components (widgets) that can be integrated for a variety of users and are scalable as patient care evolves.9 Widgets are small trend graphics (eg, line graph, or bar graph) designed to optimally display individual data elements, such as blood pressure and respiratory rate.
In the present study, we sought to test a widget-based integrated display with a common yet complex critical care task—diagnosing circulatory shock. Circulatory shock affects approximately 1 in 3 critical care patients and can increase mortality up to 80%.12–14 Since hemodynamic instability and correlating abnormal vital sign trends are frequently the first signs, vigilant clinicians search for these subtle cues and learned patterns to effectively anticipate, identify, and monitor patients for circulatory shock.15–24 We tested 2 null hypotheses. Primary null hypothesis: There is no difference between a graphically integrated widget display and a tabular display in physician diagnostic performance. Secondary null hypothesis: There is no difference between a graphically integrated widget display and a tabular display in expert and novice physician diagnostic performance.
MATERIALS AND METHODS
Study design
We used a within-subject experimental study design in a simulation environment to evaluate physician performance using 2 different information displays: Integrated and Conventional. The Integrated display depicted trend information with multiple widgets that were integrated together into a clinically meaningful group for identifying circulatory shock. The Conventional display depicted trend information with a table of discrete values (ie, tabular). We recruited an equal number of experienced intensivists (Experts) and recently graduated physicians (Novices). Each participant was given the option to receive $100 compensation for participation. The study was approved by the Institutional Review Board at the University of Utah and conducted in 2019.
Displays
Integrated
The Integrated display was graphical, consisting of optimized widgets that we previously designed with an iterative user-centered approach. We extracted 33 data elements relevant to circulatory shock from the primary literature and point-of-care medical resources.3 Two critical care providers ranked the priority and utility of each data element for identifying and classifying circulatory shock. The final data element set comprised of select vital sign measures, laboratory results, intravenous interventions, and urine output. The Integrated (Figure 1, Top) and Conventional displays consisted of 12 data elements.

Top: Integrated display composed of multiple widgets showing trend information in a rolling 24-hour time frame. Shaded gray boxes show normal reference range. Labels show the highest and lowest measures, and abnormal values are highlighted in red. Group A widgets included Heart, Respiratory, Mean Arterial Pressure (MAP), Temperature (Temp), Central Venous Pressure (CVP), and Central Venous Oxygen Saturation/Mixed Venous Oxygen Saturation (ScvO2/SvO2) data elements, which are routinely available for hospitalized patients and frequently used to identifying circulatory shock. Group B widgets included Systemic Vascular Resistance (SVR), Cardiac Index (CI), and Pulmonary Artery Wedge Pressure (PAWP), which are commonly used to classify shock etiology but are not routinely available, as a pulmonary artery catheter is needed to obtain measures. Group C widgets included interventions, such as sodium chloride and norepinephrine. Group D included pH, Lactate, and Urine, which are displayed with vertical bar graphs because of the source and frequency of measures. We vertically aligned widgets in column 1 to help participants assess correlating trends and abnormal values at specific times relevant to identifying shock onset. Respiratory and Temp were placed in column 2 since correlating their trends is a lower priority for identifying shock onset. Bottom: Conventional display of numeric measures in tabular format.
Conventional
We designed the Conventional display to represent a typical EHR flow sheet. Flow sheet displays are prevalent in EHRs and similar tabular displays have been used as a baseline for novel information display comparisons.4,10,25 Similar to the Integrated display, abnormal values were in red (Figure 1, Bottom).
Procedure
We used E-prime 3.0 software (https://pstnet.com/) with a 17” laptop to conduct study experiments and track participant responses. Each participant completed all 6 patient scenarios. The sequence of the scenarios was randomly assigned to each participant. Display type was block randomized, so each participant used the Integrated and Conventional display equally. Further, the display type was balance randomized among scenarios, resulting in equal use of the Integrated and Conventional display for each scenario. We provided participants with an overview about the displays and tasks. Additionally, we provided a circulatory shock review and hemodynamics table from UpToDate. The circulatory shock references were only available during the study introduction. Participants were asked to complete the study in 1 sitting before or after a shift and allotted up to 1 hour.
Scenarios
Scenarios were based on deidentified data collected from patients who had experienced circulatory shock at the University of Utah hospital. Each participant completed the same 6 patient scenarios, which included 5 test and 1 control. Scenarios were equally balanced by display type and participant experience. Shock etiologies in the test scenarios included Cardiogenic,1,Distributive,2,Hypovolemic,1 and Mixed.1 At each point in time during a test scenario, the patient was in 1 of 2 states: no-shock or shock. Shock state was displayed by 1 or more covarying data trends and threshold values based on clinical guidelines. For example, cardiogenic shock was characterized by an increase in heart rate, pulmonary capillary wedge pressure, systemic vascular resistance, and central venous pressure; and a decrease in mean arterial pressure, cardiac output, and mixed venous oxygen saturation. The control scenario consisted of normal and abnormal measures, but scenario data did not cross a shock threshold (eg, hypotension and elevated lactic acid). Two critical care providers, who were not among the participants, reviewed and refined each scenario to delineate patient states and shock etiology. While the onset of shock varied for each test scenario, all scenarios had 60 hours (frames) of patient data.
Tasks
Each of the 6 scenarios included the task to review up to 60 frames of patient data to identify shock. Frames were 24-hours of patient data and were advanced in 1-hour increments (eg, 0600–0500 to 0700–0600). With each 1-hour progression (ie, frame), the participants indicated whether the patient was in shock and the shock etiology for patients in shock. Shock and shock etiology options were available on the screen throughout the scenarios.
Outcomes
Study outcomes included Performance, Latency, Time, and Etiology for scenarios. Performance was a composite measure of how accurately physicians could distinguish no-shock and shock states. The composite score was calculated as the average of shock and no-shock accuracy for each frame; thus, it equally weighted physician responses for each state in a scenario. Latency was the delay when identifying shock onset in terms of the number of frames between shock onset and when the participant first selected shock. Time was measured as the interval from the first to the last frame of a scenario, which was a proxy for how quickly the physicians processed the patient information. Etiology was whether participants correctly classified the cause of shock during the shock state (Supplementary MaterialFigure 1).
Analysis
We completed 2 analyses examining the relationship between display type and study outcomes. The primary analysis included all physicians and the subgroup analysis was stratified by physician experience (ie, Expert and Novice). Descriptive statistics included the mean percent and standard error for Performance. Additionally, we calculated the mean frames and standard error for Latency and mean minutes and standard error for Time. Etiology accuracy was calculated as a percent using the number of correctly classified scenarios divided by the number of scenarios. Significant outcome changes were based on P less than or equal to 0.05. All analyses were completed using R 3.5.3 (The R Foundation for Statistical Computing). Mixed-effects models were fitted using the R package lme4 1.1.
Primary and subgroup analyses
The primary analysis examined whether study outcomes were significantly different between Integrated and Conventional displays. Performance, Latency, and Time were analyzed using linear mixed-effects models, and Etiology was analyzed using generalized linear mixed-effects models with the logit function. Display type and physician experience were fixed effects, without an interaction, and scenario and subject were random effects. Likelihood ratio tests were used to obtain P values. The mixed-effects model accounted for repeated measures and scenario variability. The subgroup analysis examined whether study outcomes were significantly different between display types with Expert and Novice physicians separately. The subgroup analysis was identical to the primary analysis, except experience was removed from the fixed effects. We estimated 30 participants were needed to observe a significant performance improvement between displays in the primary analysis with a moderate effect size (0.3–0.5) and 80% power.26
RESULTS
A total of 32 physicians from the University of Utah participated in the study. Experts included attending physicians with a primary appointment in an intensive care unit (eg, cardiac and surgical) and fellows in a pulmonary and critical care medicine or surgical critical care fellowship. Novices were first-year resident physicians with less than 1 year of post-graduate experience. Most participants dedicated greater than 80% effort to patient care (Supplementary MaterialTable 1).
. | . | Integrated . | Conventional . | Mixed-Effects Models . | |
---|---|---|---|---|---|
Outcome . | N^ . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . |
Performance (%) | 192 | 87 (1.6) | 76 (2.6) | −0.1 (0.0) | <.001* |
Latency (frames) | 160 | 1.7 (0.2) | 2.1 (0.2) | 0.4 (0.3) | .13 |
Time (minutes) | 192 | 2.9 (0.2) | 3.5 (0.2) | 0.6 (0.2) | .008* |
Correct | Correct | ||||
Etiology (%) | 192 | 67 | 54 | −0.7 (0.3) | .048* |
. | . | Integrated . | Conventional . | Mixed-Effects Models . | |
---|---|---|---|---|---|
Outcome . | N^ . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . |
Performance (%) | 192 | 87 (1.6) | 76 (2.6) | −0.1 (0.0) | <.001* |
Latency (frames) | 160 | 1.7 (0.2) | 2.1 (0.2) | 0.4 (0.3) | .13 |
Time (minutes) | 192 | 2.9 (0.2) | 3.5 (0.2) | 0.6 (0.2) | .008* |
Correct | Correct | ||||
Etiology (%) | 192 | 67 | 54 | −0.7 (0.3) | .048* |
Notes: Primary analysis of display type (Integrated and Conventional) association with physician performance, latency, time, and etiology in mixed-effects models.
β, estimate; *, P < .05; ^, number of scenarios (fewer for Latency due to control scenarios); SE, standard error.
. | . | Integrated . | Conventional . | Mixed-Effects Models . | |
---|---|---|---|---|---|
Outcome . | N^ . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . |
Performance (%) | 192 | 87 (1.6) | 76 (2.6) | −0.1 (0.0) | <.001* |
Latency (frames) | 160 | 1.7 (0.2) | 2.1 (0.2) | 0.4 (0.3) | .13 |
Time (minutes) | 192 | 2.9 (0.2) | 3.5 (0.2) | 0.6 (0.2) | .008* |
Correct | Correct | ||||
Etiology (%) | 192 | 67 | 54 | −0.7 (0.3) | .048* |
. | . | Integrated . | Conventional . | Mixed-Effects Models . | |
---|---|---|---|---|---|
Outcome . | N^ . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . |
Performance (%) | 192 | 87 (1.6) | 76 (2.6) | −0.1 (0.0) | <.001* |
Latency (frames) | 160 | 1.7 (0.2) | 2.1 (0.2) | 0.4 (0.3) | .13 |
Time (minutes) | 192 | 2.9 (0.2) | 3.5 (0.2) | 0.6 (0.2) | .008* |
Correct | Correct | ||||
Etiology (%) | 192 | 67 | 54 | −0.7 (0.3) | .048* |
Notes: Primary analysis of display type (Integrated and Conventional) association with physician performance, latency, time, and etiology in mixed-effects models.
β, estimate; *, P < .05; ^, number of scenarios (fewer for Latency due to control scenarios); SE, standard error.
Primary analysis
Table 1 shows physicians using the Integrated versus Conventional display had significantly improved Performance (β [SE]: −0.1 [0.0]; P < .001), Time (β [SE]: 0.6 [0.2]; P = .008), and Etiology (β [SE]: −0.7 [0.3]; P = .048). Time to complete scenarios was on average 0.6 minutes (17%) faster when physicians used the Integrated display. Physicians correctly classified shock Etiology with 13% more scenarios using the Integrated display.
Subgroup analysis
Table 2 shows Novices had significantly improved Performance (β [SE]: −0.2 [0.0]; P < .001), Time (β [SE]: 0.8 [0.4]; P = .03), and Etiology (β [SE]: −1.2 [0.5]; P = .02). Novices on average had 18% higher Performance using the Integrated display. Moreover, Novices took on average 22% less time and correctly classified shock Etiology on 23% more scenarios using the Integrated display. Experts had nonsignificantly improved Performance (β [SE]: −0.1 [0.0]; P = .09), Latency (β [SE]: 0.2 [0.3]; P = .61), Time (β [SE]: 0.4 [0.4]; P = .28), and Etiology (β [SE]: −0.1 [0.5]; P = .81) using the Integrated display.
Subgroup analysis of display type stratified by Expert and Novice physicians
. | . | Expert (n = 16) . | Novice (n = 16) . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | Integrated . | Conventional . | Mixed-Effects Models . | Integrated . | Conventional . | Mixed-Effects Models . | ||
Outcome . | N^ . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . |
Performance (%) | 96 | 87 (2.1) | 82 (2.8) | −0.1 (0.0) | .09 | 87 (2.4) | 69 (4.2) | −0.2 (0.0) | <.001* |
Latency (frames) | 80 | 1.8 (0.2) | 2.0 (0.3) | 0.2 (0.3) | .61 | 1.6 (0.2) | 2.1 (0.3) | 0.6 (0.4) | .12 |
Time (minutes) | 96 | 2.9 (0.3) | 3.3 (0.3) | 0.4 (0.4) | .28 | 2.9 (0.3) | 3.7 (0.4) | 0.8 (0.4) | .03* |
Correct | Correct | Correct | Correct | ||||||
Etiology (%) | 96 | 69 | 67 | −0.1 (0.5) | .81 | 65 | 42 | −1.2 (0.5) | .02* |
. | . | Expert (n = 16) . | Novice (n = 16) . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | Integrated . | Conventional . | Mixed-Effects Models . | Integrated . | Conventional . | Mixed-Effects Models . | ||
Outcome . | N^ . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . |
Performance (%) | 96 | 87 (2.1) | 82 (2.8) | −0.1 (0.0) | .09 | 87 (2.4) | 69 (4.2) | −0.2 (0.0) | <.001* |
Latency (frames) | 80 | 1.8 (0.2) | 2.0 (0.3) | 0.2 (0.3) | .61 | 1.6 (0.2) | 2.1 (0.3) | 0.6 (0.4) | .12 |
Time (minutes) | 96 | 2.9 (0.3) | 3.3 (0.3) | 0.4 (0.4) | .28 | 2.9 (0.3) | 3.7 (0.4) | 0.8 (0.4) | .03* |
Correct | Correct | Correct | Correct | ||||||
Etiology (%) | 96 | 69 | 67 | −0.1 (0.5) | .81 | 65 | 42 | −1.2 (0.5) | .02* |
Notes: Subgroup analysis of display type (Integrated and Conventional) association with physician performance, latency, time, and etiology by experience (Expert and Novice) in mixed-effects models.
β, estimate; *, P < .05; ^, number of scenarios (fewer for Latency due to control scenarios); SE, standard error.
Subgroup analysis of display type stratified by Expert and Novice physicians
. | . | Expert (n = 16) . | Novice (n = 16) . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | Integrated . | Conventional . | Mixed-Effects Models . | Integrated . | Conventional . | Mixed-Effects Models . | ||
Outcome . | N^ . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . |
Performance (%) | 96 | 87 (2.1) | 82 (2.8) | −0.1 (0.0) | .09 | 87 (2.4) | 69 (4.2) | −0.2 (0.0) | <.001* |
Latency (frames) | 80 | 1.8 (0.2) | 2.0 (0.3) | 0.2 (0.3) | .61 | 1.6 (0.2) | 2.1 (0.3) | 0.6 (0.4) | .12 |
Time (minutes) | 96 | 2.9 (0.3) | 3.3 (0.3) | 0.4 (0.4) | .28 | 2.9 (0.3) | 3.7 (0.4) | 0.8 (0.4) | .03* |
Correct | Correct | Correct | Correct | ||||||
Etiology (%) | 96 | 69 | 67 | −0.1 (0.5) | .81 | 65 | 42 | −1.2 (0.5) | .02* |
. | . | Expert (n = 16) . | Novice (n = 16) . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | . | Integrated . | Conventional . | Mixed-Effects Models . | Integrated . | Conventional . | Mixed-Effects Models . | ||
Outcome . | N^ . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . | Mean (SE) . | Mean (SE) . | β (SE) . | P value . |
Performance (%) | 96 | 87 (2.1) | 82 (2.8) | −0.1 (0.0) | .09 | 87 (2.4) | 69 (4.2) | −0.2 (0.0) | <.001* |
Latency (frames) | 80 | 1.8 (0.2) | 2.0 (0.3) | 0.2 (0.3) | .61 | 1.6 (0.2) | 2.1 (0.3) | 0.6 (0.4) | .12 |
Time (minutes) | 96 | 2.9 (0.3) | 3.3 (0.3) | 0.4 (0.4) | .28 | 2.9 (0.3) | 3.7 (0.4) | 0.8 (0.4) | .03* |
Correct | Correct | Correct | Correct | ||||||
Etiology (%) | 96 | 69 | 67 | −0.1 (0.5) | .81 | 65 | 42 | −1.2 (0.5) | .02* |
Notes: Subgroup analysis of display type (Integrated and Conventional) association with physician performance, latency, time, and etiology by experience (Expert and Novice) in mixed-effects models.
β, estimate; *, P < .05; ^, number of scenarios (fewer for Latency due to control scenarios); SE, standard error.
DISCUSSION
In this experimental study comparing an Integrated display to a Conventional display, we observed improved diagnostic performance by physicians using the Integrated display. The primary analysis showed that 32 physicians using the Integrated display had improved shock classification, patient information processing, and shock etiology accuracy compared to when they used the Conventional display. The subgroup analysis of novices and experts separately revealed similar statistically significant improvements for 16 Novices using the Integrated display. The analysis on 16 Experts alone revealed no significant differences. While experience seems to impact improvements between displays, generally these results contribute to the concept that an information display consisting of reusable widgets, integrated for patient condition recognition, can improve physician diagnostic performance.
Through recent literature reviews of critical care information displays, we identified several graphical displays that integrated patient data and were evaluated with a complex task.7,8 For example, Blike et al compared a novel display with a conventional numeric patient monitor to help anesthesiologists identify and classify shock.27 Scenarios were presented on a single screen and participants were asked to identify the patient state (ie, shock or no-shock). The authors observed improved response times and accuracy using the novel display. Further, Agutter et al compared a novel display with a conventional multiparameter monitor to help anesthesiologists identify cardiovascular events.11 The authors observed faster response times by physicians using the novel display. Overall, several studies have demonstrated that a graphical display with integrated patient data can improve response time and diagnostic accuracy, which is similar to our observations with the Integrated (widget) display.
In contrast with our widget display, however, several displays such as those by Blike et al and Agutter et al were designed as Objects. Object displays make values visually apparent by integrating data into a polygon object (eg, rectangle).28 Aspects of Object displays may not be practical for a diverse and evolving healthcare environment. First, Object displays emphasize current patient state with limited to no trend information.29 Second, Object displays are frequently designed for specific tasks. For instance, customized objects display for cardiovascular and pulmonary events are unlikely to be interchangeable.10,11 Alternatively, widget displays provide trend information and are modular. Widgets can be substituted, grouped, and organized for a variety of tasks and settings which may allow organizations and users to adapt displays for their needs.
Our secondary hypothesis was based on the visual information processing gap between experts and novices. Visual information processing is often measured by linking eye movement patterns with faster and more accurate responses.30 Wood et al illustrated the gap between experts and novices with electrocardiogram interpretation.31 The physicians were given a waveform line graph and asked to diagnose the patient. The authors found experts took less time to make more accurate diagnoses compared to novices. The amount of time before initial fixations was a primary contributor to a faster diagnosis, and the experts were able to achieve a general impression without an extensive search. Therefore, integrated graphical widgets likely minimize novice search time by integrating correlated data elements close together for pattern recognition; whereas, experts are more adept at grouping data elements and identifying trends with tabular data. Moreover, an optimized widget display may lower the cognitive load associated with concurrent searching and processing information, which can translate to higher performance in novices.32,33
This study has limitations. First, we compared a graphical display to a tabular display. While tabular displays are prevalent in today’s EHR systems, it is unknown how the Integrated display would impact performance compared to other graphical display approaches. Second, our power calculation was based on Performance in the primary analysis and a relatively large effect size which may have led to an insufficient sample size for detecting differences with Latency among all physicians and all outcomes in the subgroup analysis with Experts. Third, scenarios were based on patient data and modified by expert intensivists; however, testing with patient data in a live environment is needed. Finally, participants were primed to identify circulatory shock. In actual patient care, patients may have shock as 1 of several problems. Future work should examine graphical displays with widgets and patient data in a patient care setting.
CONCLUSION
This study demonstrated that physicians could diagnose circulatory shock faster and more accurately using a widget display. When stratified by experience, the effects remained significant only for the subgroup of novice physicians. These findings suggest the performance gap between experts and novices may be reduced by optimized graphical information displays.
FUNDING
This work was supported by the National Institutes of Health grant numbers: R56LM011925, T15LM007124, and 5UL1TR001067.
AUTHOR CONTRIBUTIONS
Study concept and design: TR, MCW, JT, GDF, BCM. Acquisition, analysis, and interpretation of data: MCW, TR, GDF, NS, JT, BCM, KK, CW, PK. Drafting and revising the manuscript: MCW, TR, GDF, NS, JT, BCM, KK, CQ, PK.
SUPPLEMENTARY MATERIAL
Supplementary material is available at Journal of the American Medical Informatics Association online.
CONFLICT OF INTEREST STATEMENT
None declared.
REFERENCES