Methods used to evaluate usability of mobile clinical decision support systems for healthcare emergencies: a systematic review and qualitative synthesis

Abstract Objective The aim of this study was to determine the methods and metrics used to evaluate the usability of mobile application Clinical Decision Support Systems (CDSSs) used in healthcare emergencies. Secondary aims were to describe the characteristics and usability of evaluated CDSSs. Materials and Methods A systematic literature review was conducted using Pubmed/Medline, Embase, Scopus, and IEEE Xplore databases. Quantitative data were descriptively analyzed, and qualitative data were described and synthesized using inductive thematic analysis. Results Twenty-three studies were included in the analysis. The usability metrics most frequently evaluated were efficiency and usefulness, followed by user errors, satisfaction, learnability, effectiveness, and memorability. Methods used to assess usability included questionnaires in 20 (87%) studies, user trials in 17 (74%), interviews in 6 (26%), and heuristic evaluations in 3 (13%). Most CDSS inputs consisted of manual input (18, 78%) rather than automatic input (2, 9%). Most CDSS outputs comprised a recommendation (18, 78%), with a minority advising a specific treatment (6, 26%), or a score, risk level or likelihood of diagnosis (6, 26%). Interviews and heuristic evaluations identified more usability-related barriers and facilitators to adoption than did questionnaires and user testing studies. Discussion A wide range of metrics and methods are used to evaluate the usability of mobile CDSS in medical emergencies. Input of information into CDSS was predominantly manual, impeding usability. Studies employing both qualitative and quantitative methods to evaluate usability yielded more thorough results. Conclusion When planning CDSS projects, developers should consider multiple methods to comprehensively evaluate usability.


Introduction
Clinical decision support systems (CDSSs) have been developed as potentially powerful diagnostic adjuncts in many clinical situations. 1 A CDSS is a form of technology, designed to provide information to clinicians at the time of a decision to improve clinical judgment. [1][2][3][4] In order for a CDSS to be implemented and adopted into clinical practice, it must be considered usable and useful to the end users of the technology. 5,6 A systematic review of CDSSs found little evidence that these systems improved clinician diagnostic performance. It was suggested that 1 method to address this issue is to better understand and improve human-computer interaction prior to CDSS implementation. 7 For this reason, early evaluation of the usability and usefulness of CDSSs is important to increase the likelihood of successful implementation and adoption. However, for CDSSs designed for clinicians treating patients with medical emergencies, few usability studies exist to guide the development process of these technologies.
Usability is defined as a "quality attribute that assesses how easy interfaces are to use", which has several components: learnability, efficiency, memorability, errors, and satisfaction. 8 The ISO (International Organisation for Standardisation) Standard 9241-11:2018 defines usability more specifically as "the extent to which a product can be used to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use". 9 A recent systematic review showed that almost half of studies also described usefulness as a usability metric. 10 Usefulness refers to the degree to which using a technology will enhance job performance. 11 Mobile health (mHealth) refers to applications (apps) which are developed on handheld devices (such as smartphones or tablets) for use in healthcare-either by healthcare professionals, patients, or carers. 12 The potential benefits of mHealth to healthcare systems include time saving, reduced error rates, and cost savings. 13,14 Types of app uses include diagnostics and decision-making, behavior change intervention, digital therapeutics, and disease-related education. 14 There are numerous apps tailored to specific professions, specialties, patient groups, or clinical situations, including healthcare emergencies. 15,16 Some CDSSs have been designed for use in healthcare emergencies. Healthcare emergencies can be defined as any situation where a person requires immediate medical attention in order to preserve life or prevent catastrophic loss of function. There are multiple clinical situations which could be considered healthcare emergencies, and many healthcare professionals who may care for these patients. Examples include problems with the patient's airway (eg, airway obstruction), breathing (eg, pulmonary embolism), circulation (eg, heart attack or stroke), or multi-system conditions such as injury or burns. 17,18 These scenarios are time-critical, requiring timely decision-making and action.

Study motivation
Design of mobile CDSSs used in healthcare emergencies is important because it must be easy to use, useful, and seamlessly fit into the clinical workflow. The input must be minimal and ideally automatic, while the outputs must be simple, intuitive, and immediately applicable in order to avoid workflow disruption. [19][20][21] Usability of CDSSs designed for emergencies is therefore arguably more important than for CDSSs designed for nonemergency (ie, elective) clinical settings.
There are multiple methods of usability testing. Though systematic reviews have been published which address usability methods used for CDSS evaluation, 10,[22][23][24][25] none have focused on mobile CDSSs designed or used in healthcare emergencies. For stakeholders, including academics, clinicians, healthcare managers, and information technologists, who are designing mobile CDSS for use in healthcare emergencies, the methods for testing usability, and associated standards must be understood in this unique context.

OBJECTIVE
This study answers the question: "What methods are employed to assess the usability of mobile clinical decision support systems designed for clinicians treating patients experiencing medical emergencies?" Our primary aim was to determine the methods of usability evaluation used by researchers of mobile healthcare decision support in clinical emergencies. Our secondary aims were to determine the characteristics of healthcare decision support in emergencies which underwent usability evaluations; and to determine the quantitative and qualitative standards and results achieved, utilizing descriptive quantitative and qualitative evidence synthesis (Supplementary Table S1).

MATERIALS AND METHODS
This systematic review was conducted according to the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) guidelines (Supplementary Table S2), 26 and it was prospectively registered with the PROSPERO database, ID number CRD42021292014. 27 Search strategy Relevant publications were identified by an electronic search of the Pubmed/Medline, Embase, Scopus, and IEEE Xplore databases using combinations of the following keywords and their synonyms: "usability", "assessment", "mobile", "application", "decision support", "healthcare", and "emergency". The full search strategy is available in Supplementary Table S3. Searches were limited to Title and Abstract, and English-language only (Supplementary Table  S4). The search was performed on December 9, 2021. The search results were uploaded to Endnote X9.3.3 (Clarivate analytics, Philadelphia, PA, USA), in order to identify and delete duplicates, conference abstracts, and book chapters. Two authors (JW and EP) independently screened individual citations against the inclusion criteria using Rayyan software (Rayyan Systems Inc, Cambridge, MA, USA). 28 Two authors then independently assessed the full text of all identified citations for eligibility. Disagreements were resolved by a third independent reviewer (EK). Reasons for excluding studies were recorded (Figure 1). The reference lists of included articles, as well as excluded systematic reviews, were searched to identify additional publications.

Eligibility criteria and study designs/settings
Inclusion and exclusion criteria are listed in Table 1. The study eligibility criteria used the PECOS (population, exposure, comparator/control, outcomes, study designs/settings) framework. The population was any study testing/evaluating usability using human participants. The exposure was any study which tested usability of a healthcare-related mobile application which provided clinical decision support to clinicians. There was no comparator/control used. The outcomes included studies which provided empirical results from an evaluation of a system's usability (either quantitative, qualitative, or both). The setting was studies which evaluated a CDSS which was designed for use by clinicians in healthcare emergencies.

Quality of studies assessment
The methodological quality of included studies were assessed using a modified Downs and Black (D&B) checklist by 1 study author (JW). 29 The D&B checklist was developed to evaluate the quality of both randomized and nonrandomized studies of healthcare interventions on the same scale. 29 We omitted questions 5,9,12,14,17,25,26 of the 27, because they were deemed not appropriate for assessing the included papers' methods of usability assessment (Supplementary  Table S5). 10 We did not exclude articles due to poor quality. Quality of Studies (QOS) was classified according to the proportion of modified D&B categories present per paper, as low (<50%), medium (50-74%), and high (75%) quality.

Data extraction
Data were extracted and tabulated in Microsoft Excel (Microsoft, Redmond, WA, USA), according to the study aims (Supplementary Table S1). Demographic data were collected by JW. Two authors (JW and EP) independently extracted data relating to the study aims, using a standardized proforma, which were combined for analysis. Any discrepancies were resolved by consensus. The following data were extracted from each study: Study demographics (citation details, country of study conduct, type of study); Aim (1) method of usability evaluation, including usability definition, metrics and methods used to evaluate usability, number and characteristics of participants, and quantitative and qualitative results reported; Aim (2) characteristics of the CDSS, including type and number of medical specialties targeted, number and type  Inclusion criteria 1 The paper tests/evaluates usability 2 The paper is focused on a healthcare-related technology/application/ software/system including mobile, smartphone, tablet, digital, electronic, handheld/portable device, or website 3 The paper provides empirical results (quantitative or qualitative) 4 The system provides decision support/aid/tool, or risk prediction, or prognosis or diagnosis for decision-making 5 The system is designed for use in healthcare emergencies Exclusion criteria 1 Not written in English 2 Not testing usability, or does not describe the methods adequately 3 Not mobile clinical decision-support 4 Not designed for or tested in clinical emergencies 5 Not targeting clinicians as users 6 Not human participants 7 Not an empirical study (is a theory or review paper) 8 Study protocol only 9 Full text is not available of conditions targeted, CDSS input (number, type, method, and description), CDSS computation (complexity, method, and description), CDSS output (number, type, and description), device used, guideline on which the CDSS is based, stage of CDSS (Development, Feasibility, Evaluation, Implementation), 30 and CDSS name and description (Supplementary Table S1). Supplemental material was sought if available. Any links in the paper to external information (app website, web calculator, etc.), or articles cited which contain missing information (such as published article describing app development) were sought. Missing or unclear information was discussed between JW and EP, and if uncertainty remained, study authors were contacted. Missing data were not included in quantitative or qualitative analysis for individual study metrics.

Strategy for data synthesis
Data synthesis was descriptive only for quantitative data addressing the primary and secondary outcomes. Results from individual studies were summarized and reported individually, with no meta-analysis planned or performed.
To describe the qualitative standards and results achieved of assessing usability of CDSSs in medical emergencies, qualitative evidence synthesis methods were used. The PerSPecTIF (perspective, setting, phenomenon of interest, environment, comparison, timing, and findings) question formulation framework was used to define the context and basis for qualitative evidence synthesis (Supplementary Table S6). 31 Inductive thematic analysis of qualitative results in included studies was undertaken to identify usability-related barriers and facilitators to adoption of mobile CDSS in healthcare emergencies, using a 6-step inductive thematic analysis method: (1) familiarization with the data, (2) generating initial codes, (3) searching for themes, (4) reviewing themes, (5) defining and naming themes, and (6) producing the report/manuscript. 32 For qualitative evidence synthesis, our research questions were "what were the themes of usability-related barriers to, and facilitators of adoption of mobile CDSS in emergency settings, and what is the relationship between these themes and the method used to assess usability?" Qualitative data were extracted from individual studies and imported into NVIVO software version 12.0 (QSR International, Melbourne, Australia).

Study inclusion
The systematic search identified 974 studies. Of 505 unique full-text studies, 67 appeared to meet inclusion criteria from screening, and 23 were included in the analysis after full-text review ( Figure 1). For 7 studies, there was disagreement between 2 reviewers after full text review, in which the papers appeared to meet inclusion. A third reviewer (EK) included 4 of these, excluding 3 papers: 1 because it was not usability, 33 1 because it was not testing mobile CDSS, 34 and 1 because it was not a healthcare emergency. 35 Overall, key reasons for exclusions (n ¼ 50) were the paper did not evaluate usability (n ¼ 16), did not report mobile clinical decision support (n ¼ 22), was not a healthcare emergency (n ¼ 8), did not assess clinicians (n ¼ 3), or full text was unavailable (n ¼ 1) (Figure 1).
Studies used a number of validated tools to assess usability: The system usability scale (SUS 43 ), and the technology acceptance model (TAM 6 ) were each included in 5 (22%) studies, Nielsen's Heuristics 70 in 3 (13%) studies, NASA Task Load Index (TLX) in 2 (9%) studies, technology readiness index (TRI) in 2 (9%) studies, the poststudy system usability questionnaire (PSSUQ) in 2 (9%) of studies, and 8 other validated methods were used in 1 included study each (Table 2). Five (22%) studies used no validated method. All studies included clinician participants, while 3 studies also included data managers, 36,37,68 1 study included usability engineers, 71 and 1 study had information scientists as participants. 72

Quality of studies
Results for the modified Downs and Black (D&B) quality assessment of included studies (QOS) showed that overall, only 3 studies (13%) had high QOS, 14 (61%) had medium QOS, and 6 (26%) had low QOS (Figure 2). Studies which employed more methods to evaluate usability did not have a substantial difference in risk of bias (Figure 3). There was, however, lower risk of bias overall in studies which used mixed methods (both qualitative and quantitative), rather than only quantitative or only qualitative methods of usability evaluation (Figure 3). A median of 29 (IQR 12-51) participants were recruited for questionnaire-based studies, 28 (IQR 9-44) participants for user trials, 26 (IQR 11-43) participants for interview-based studies, and 4 (IQR 4-8) participants for heuristics studies.

Definition of usability in included studies
Of the 23 included studies, 13 (57%) did not define usability. Of the 10 which provided a definition, 3 (30%) used the definition provided by the ISO (ISO 9241-11), 9 which is the "extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness,  efficiency and satisfaction in a specified context of use". 51,52,57 Two (20%) defined usability as "the design factors that affect the user experience of operating the application's device and navigating the application for its intended purpose". 46,50 Other definitions of usability included: • Differentiating "content usability" (data completeness and reassurance of medical needs), from "efficiency improvement" (quicker and easier evaluation), and "overall usefulness of systems" 41 • "ease of use, confidence in input, preference in an emergency setting, speed, accuracy, ease of calculation, and ease of shading" 39 • "efficiency, perspicuity, dependability" 38 • "functionality, convenience, triage accuracy, and accessibility." 69

Usability evaluation metrics used
Though not all studies defined usability explicitly, all studies reported how usability was evaluated. The most frequent evaluation metrics were Efficiency and Usefulness, measured in 15 (65%) studies. User Errors were measured in 14 (61%), Satisfaction in 13 (57%), Learnability in 11 (48%), Effectiveness in 9 (39%), and Memorability in 2 (9%) studies. The frequency of usability evaluation metrics was similar between studies utilizing questionnaire, user testing, and interview methods, though studies using heuristics only measured Usefulness, Efficiency, and user Errors (Figure 4).

Description of quantitative results
Descriptive quantitative results from included studies are summarized in Supplementary Table S9. The 5 studies which used SUS as a method all achieved acceptable usability scores (>67). The 5 studies which used TAM as a method achieved mixed results, with 1 study demonstrating worse usability than the existing system, 40 and another study having different usability depending on user group (physicians vs nurses). 41 Both studies which used NASA TLX to measure mental effort found it was acceptably low, with 1 study stating that perceived workload was comparable whether the app was used or not. 38 Of the 2 studies which employed the TRI, 1 found no difference based on demographics, and 1 found that younger users were more ready for the technology. 60 Of the 3 studies which employed Nielsen's Heuristics, 2 identified usability issues in each of the 10 design heuristics categories. 71,72 Qualitative results synthesis Themes of usability-related barriers to adoption included: external issues, hardware issues, input problems, output problems, poor software navigation, poor user interface design, user barrier, and user emotion or experience (Table 3). A higher proportion of codes (of barriers and facilitators to adoption) were generated by interviews and heuristics evaluation methods, than questionnaire or user testing methods ( Table 3). Themes of usability-related facilitators of adoption included: automaticity, user interface design, efficiency, feasibility, learnability, patient benefit, trustworthiness, ease of use, usefulness, and user experience (Table 4). A more complete identification of themes (of barriers and facilitators to adoption) occurred when included studies used interviews and heuristic evaluation, compared to user testing or questionnaire (Table 4).

DISCUSSION
The standardized framework for defining usability (ISO) was established in 1998, and updated in 2018 (ISO 9241-11:2018). 9 Despite this, the majority of included papers in this review demonstrated deviation in the definition of usability used. Importantly, this standard does not describe specific methods of design, development, or evaluation of usability. Nevertheless, differing definitions of usability likely contributed to the evidence generated from this systematic review, which revealed that a wide range of metrics and methods are used to assess usability of mobile CDSSs. Researchers favored evaluation metrics, including efficiency, user errors, usefulness, and satisfaction over measures such as effectiveness, learnability, and memorability. Qualitative evidence synthesis including thematic analysis identified that more codes and themes were generated from studies utilizing interview and heuristic evaluation, than studies which employed user testing or questionnaires to assess usability of CDSSs. Synthesis of quantitative results was not attempted, due to the multiple different methods used (validated and nonvalidated) to measure usability quantitatively across included studies.

Implications
There are 5 main implications of this study. Firstly, the study reveals that a plethora of approaches are evident, which suggests that comparison of usability metrics between different CDSS is inherently difficult and could contribute to confusion and misunderstanding when attempting to understand the value of these tools to practitioners, patients, and health systems. The lack of consistency in evaluating the usability of CDSS is a material problem for the field. In particular, the quantitative approaches used by included studies were so diverse that no meaningful data synthesis could be made. There is a dire need for a standard approach to quantitative analysis on the usability of CDSS. There are multiple validated methodologies in current use. 73 The best solution likely involves a combination or amalgamation of commonly used methodologies, focusing on those with few items and high reliability. 73 Secondly, nearly half of included studies evaluated usability using a purely quantitative approach, even though a mixed methods approach may reduce bias. 10 A mixed methods approach might elicit more complete and useful information when evaluating the usability of CDSS. 10 However, like quantitative approaches, a plethora of methodological approaches to qualitative analysis exist for the evaluation of usability of CDSS, which makes between-study comparison challenging. 74 Identifying consistent and shared themes across studies would be more achievable if description and approach of qualitative methodology were explicitly stated. 74 Table 3. Qualitative evidence synthesis of included studies (n ¼ 13/23): usability-related themes and codes of barriers to adoption, by usability method category   Themes  Q  U  I  H  Codes  Q  U  I  H   External issues  0  0  3  1  External issues  0  0  3  1  Hardware issues  0  3  5  1  Hardware issues  0  3  5  1  Input problems  4  6  37  24  Difficult tasks  0  0  2  2  Inaccurate results  0  1  3  0  Instructions unclear  1  1  11  9  Mismatch with reality  0  1  2  2  Not automated  1  0  0  1  Not efficient  1  3  7  1  Not enough information  1  0  2  Thirdly, many CDSSs were designed in ways which hampers their usability. A universal problem with the design of CDSS for mobile use is any reliance on user input, which may be an important fatal flaw for healthcare emergencies. Though studies evaluated mobile CDSSs which were designed for different conditions in multiple emergency settings, most required information to be input manually. Manual information input is a known barrier to usability, and is likely to be particularly burdensome to the end user during clinical emergencies. [19][20][21] This study has identified that only a minority of included studies demonstrated any form of automatic data entry system for mobile CDSS, with most utilizing manual checkbox inputs. Automation of CDSS was associated with improved clinical outcome. 75 Ideally, CDSSs input data automatically in real-time, reducing disruption to clinician workflow, and allowing timely CDSS output. 76,77 Physicians make better decisions when they do not have to input the information first, but only integrate available information. 78 Therefore, automation of data entry should be a focus for future CDSS if they are going to improve their likelihood of implementation and use in emergency settings.
Fourthly, we found divergence with regard to output, with the majority of tools offering a recommendation or specific treatment to clinicians, and a minority providing risk information. The benefits of CDSSs which provide clear recommendations is that they may be easier for clinicians to action than risk information, and therefore increase uptake of CDSSs. 77 One study demonstrated that CDSSs which provided a recommendation rather than simply an assessment improved clinical outcome. 75 However, some treatment decisions may be based on factors which cannot be accounted for by the CDSS. Thus, by providing a recommendation, the CDSS is in danger of "overstepping" its bounds, into the realm of decision-making instead of decision support. This is a contentious area, which may also have medico-legal implications if patients come to harm after a clinician provides treatment based on an inaccurate or inappropriate CDSS recommendation. These medico-legal issues become more pertinent for recommendations which are more directive, 2,3,79,80 though remains a topic of keen interest and debate. 3,81 Fifthly, studies which have evaluated CDSSs designed for nonemergency settings, rather than healthcare emergencies, used similar usability methods but different usability metrics. Usability methods were similar between studies included in a recent systematic review (primarily nonemergency settings), and studies included in our review (emergency settings), including questionnaires (78% in nonemergency settings vs 87% in emergency settings), user testing (86% vs 74%), interviews (20% vs 26%), and heuristics evaluations (14% vs 13%). 10 Conversely, the proportion of studies evaluating usability metrics differed depending on setting, including usefulness (39% in nonemergency setting vs 65% in emergency settings), user errors (31% vs 61%), learnability (24% vs 48%), and memorability (2% vs 9%). More studies evaluated satisfaction (75% vs 57%) and effectiveness (61% vs 39%) of CDSS in the nonemergency setting compared to the emergency setting, and a similar proportion evaluated efficiency (63% vs 65%). 10 That researchers evaluated different metrics may denote differences in end-user priorities based on setting. For a CDSS to be used in emergencies, it must be useful compared to other competing priorities, 6 have a low propensity for user errors given the user's cognitive load, 82 and be easy to use, learn, and remember. 6,82 Automatic data entry may reduce user errors, [75][76][77][78] and more directive recommendations may be easier to apply cognitively than risk percentages alone. 75,77 In clinical emergencies, clinicians are focused on the patient's immediate care needs. Consequently, using a CDSS in this setting may be more prone to user error than in the elective setting. While measuring user errors in the evaluation stage is important, ensuring CDSS design and development follows best principles of user interface design is key to reducing the propensity for user errors in the first place. However, the heterogeneity of usability metrics evaluated in studies provides an impetus for a more standardized approach so that studies can be meaningfully compared, regardless of setting.
Similar literature exists which corroborates our findings regarding user errors, effectiveness, and efficiency. A user error is defined as either a slip (unintended action with correct goal; ie, misspelling an email address), or a mistake (intended action with incorrect goal; ie, clicking on an un-clickable heading), and can highlight interface problems. 83 Effectiveness (or "success") is defined as the number of successfully completed tasks or the percentage of correct responses; while efficiency is the time taken, or number of clicks required, to complete a task. 10 In the same systematic review as above, focusing on usability metrics within usability evaluation studies, 31% of studies measured user errors. 10 These included 23 different user error measurement techniques, while the number of user errors or percentage of user errors were most frequently reported. Conversely, effectiveness was measured in 61% of studies, and efficiency measured in 63% of studies. The study concluded that there are multiple methods to evaluate usability, each with benefits and deficiencies. To mitigate these and provide the most complete usability evaluation, a combination of multiple methods is advised.

Limitations
There are several limitations to this review. First, while we provided a synthesis of the qualitative results provided by included studies, it was impossible to synthesize the quantitative data in a meaningful way due to their heterogeneity. Further, while the qualitative analysis was conducted using a robust method, 32 and framework, 31 synthesizing qualitative results from studies with heterogeneous designs may produce unreliable results. Second, while it is recognized that a description which weighted usability methods to determine which methods are better would be desirable, this was not our aim. Rather, we provided a descriptive summary of quantitative outcomes achieved, and synthesis of qualitative results, to highlight the relative benefits of different methodological approaches to usability evaluation, with regards to the ability of each method to identify barriers and facilitators to CDSS adoption. Structural differences in study methodology will have impacted results, such that questionnaires and user testing studies often did not allow open responses to elicit additional user input, resulting in comparatively more qualitative information from interview and heuristic evaluation studies. Third, the narrow search criteria did not account for recent technical developments, including the rapid pace of CDSS utilizing machine learning and artificial intelligence. Accordingly, though the review protocol included a goal to determine trends over time in healthcare decision support in emergencies, including how statistical or computational complexity and devices have changed over time, our search yielded studies which demonstrated little variation in either of these parameters. This question may be best answered by a scoping review or narrative literature review. The authors considered Google Scholar as a search engine in order to broaden the review's inclusion, but decided against it due to evidence reporting its imprecision as a systematic search engine. 84,85 Fourth, studies were not excluded based on assessed quality, and 5 did not use validated methods to assess usability. However, the authors preferred a "real-world" evaluation of available literature. Fifth, this paper evaluates methods and metrics of usability of CDSSs which were largely in development and feasibility stages, with only a small minority in the evaluation or implementation stages. Therefore, results may be less generalizable to studies which evaluate usability of CDSS in later stages, including implementation and adoption.

CONCLUSION
Usability evaluation of mobile CDSS in medical emergencies is heterogeneous. Studies evaluated multiple aspects of usability in a variety of study designs. More questionnaires and user testing studies were conducted than interviews and heuristics evaluations. However, interviews and heuristic evaluations identified a greater proportion of the usability issues than did questionnaire and user testing studies. The findings have future research implications on both the design of CDSSs and the evaluation of their usability. Developers should acknowledge that automatic data input into a CDSS may improve its usability, and that outputs which provide a clinical recommendation may be controversial. When planning CDSS usability evaluation studies, developers should consider multiple methods to comprehensively evaluate usability, including qualitative and quantitative approaches. Researchers should apply a more standardized approach to usability evaluation in mobile CDSS while considering the context and workflow.

AUTHOR CONTRIBUTIONS
JMW had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Concept, design, and drafting the manuscript: JMW and EP. Critical revision of the manuscript for important intellectual content, acquisition, analysis, or interpretation of data, and final approval: JMW, EP, EK, RSS, WM, ZBP, and NRMT. Statistical analysis: JMW. Supervision: WM, ZBP, and NRMT.

SUPPLEMENTARY MATERIAL
Supplementary material is available at JAMIA Open online.

DATA AVAILABILITY
Template data collection forms, data extracted from included studies, data used for all analyses, and qualitative synthesis are all available upon request from the authors.