Diagnostic Accuracy of the 4AT for delirium detection: systematic review and meta-analysis.

Objective: Detection of delirium in hospitalised older adults is recommended in national and international guidelines. The 4 'A's Test (4AT) is a short (<2 min) instrument for delirium detection that is used internationally as a standard tool in clinical practice. We performed a systematic review and meta-analysis of diagnostic test accuracy of the 4AT for delirium detection. Methods: We searched MEDLINE, EMBASE, PsycINFO, CINAHL, clinicaltrials.gov and the Cochrane Central Register of Controlled Trials, from 2011 (year of 4AT release on the website www.the4AT.com) until 21 December 2019. Inclusion criteria were: older adults ([≥]65y); diagnostic accuracy study of the 4AT index test when compared to delirium reference standard (standard diagnostic criteria or validated tool). Methodological quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Pooled estimates of sensitivity and specificity were generated from a bivariate random effects model. Results: 17 studies (3721 observations) were included. Settings were acute medicine, surgery, a care home, and the emergency department. Three studies assessed performance of the 4AT in stroke. The overall prevalence of delirium was 24.2% (95% CI 17.8-32%; range 10.5-61.9%). The pooled sensitivity was 0.88 (95% CI 0.80-0.93) and the pooled specificity was 0.88 (95% CI 0.82-0.92). Excluding the stroke studies, the pooled sensitivity was 0.86 (95% CI 0.77-0.92) and the pooled specificity was 0.89 (95% CI 0.83-0.93). The methodological quality of studies varied but was moderate to good. Conclusions: The 4AT shows good diagnostic test accuracy for delirium in the 17 available studies. These findings support its use in routine clinical practice in delirium detection.


INTRODUCTION
Delirium is a serious acute neuropsychiatric disorder of consciousness, attention and cognition triggered by general medical conditions, drugs, surgery, or a combination of causes. It manifests through acute and fluctuating cognitive, psychomotor and perceptual disturbances which develop over hours to days. Delirium is common in hospitalised older adults, with a recent meta-analysis of 33 studies of medical inpatients finding an overall delirium occurrence of 23% (95% CI 19-26%) [1]. It is also common in surgical patients, in care homes and palliative care settings [2]. Delirium is associated with significant adverse outcomes including functional decline and mortality, and patient and carer distress [3,4].
Detection of delirium at the earliest possible time point is important for several reasons, including prompting the search for acute triggers, gaining access to recommended treatment pathways, in managing delirium-associated risks such as falls, in identifying and treating distress, in providing prognostic information, and in communicating the diagnosis to patients and carers. Detection has been recommended in multiple guidelines including the Scottish Intercollegiate Guidelines Network (SIGN) guidelines on delirium [5]. More than 30 delirium assessment tools exist, though these vary considerably in purpose and clinical applicability [6,7]. Categories of tools include: those intended for use at first presentation or at other points when delirium is suspected; regular use (that is, daily or more frequently) in monitoring for new onset delirium in inpatients; 'ultra-brief' screening tools; intensive care unit tools; measurement of delirium severity; informant-based; detailed phenomenological assessment.
The 4 'A's Test or 4AT was developed as a short delirium assessment tool intended for clinical use in general settings at first presentation and when delirium is suspected. It was initially published on a dedicated website in 2011 [8]. It consists of four items: an item assessing level of alertness, a test of orientation (the Abbreviated Mental Test-4, comprising 4 orientation

METHODS
The methods and search strategy were documented in advance and published in the PROSPERO database (available at http://www.crd.york.ac.uk/PROSPERO/ with registration number CRD42019133702). The review and meta-analysis was conducted in compliance with the principles in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy [15], and reported using the Preferred Reporting Items for a Systematic Review and Metaanalysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines [16]. Studies were included if they met the following criteria: (1) age ≥ 65; (2) attempted to examine the diagnostic accuracy of the 4AT for detection of delirium; (3) reference standard assessment of delirium made using standardised diagnostic criteria or a validated tool; and (4) cross-sectional, retrospective or prospective cohort design. If identified studies included adults both younger and older than the threshold age, the study authors were contacted to enquire about the possibility to access data on the older adults only. Studies in patients with delirium tremens were excluded. All rights reserved. No reuse allowed without permission.

Search strategy and selection criteria
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020. .

Data extraction
Titles and abstracts were independently screened for inclusion by individuals in pairs of review authors (C.B. and R.G., and Z.T. and A.A.). Full-text screens were carried out independently by two review authors (Z.T. and A.M.). The reviewer pairs performed data extraction independently, resolving disagreement by discussion, or by involving a fifth review author (S.S.) where necessary.
Data were extracted on: type of study; setting; study population; patient demographics; prevalence of delirium; co-morbid illness or illness severity if reported, details of 4AT administration (timing, assessors etc.) and the reference standard; statistics used including adjustments made, and study conclusions. Test accuracy data were extracted to a two-by-two table (number of true positives, false positives, true negatives and false negatives for the 4AT).
Study authors were contacted for further information on index and reference test results if insufficient data were provided to perform statistical analyses.

Risk of bias assessment
Studies were assessed for methodological quality by two independent review authors (R.G. and Z.T.) using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Narrative summaries were generated describing risk of bias (high, low, or unclear) and concerns regarding applicability. As part of a tailoring phase of the QUADAS-2 tool, the item on the threshold used was omitted because the design of the 4AT pre-specifies the threshold to be used for delirium detection (cut-off ≥ 4). For the item on the appropriate interval between index test and reference standard, the interval was set to a maximum of three hours [Appendix 3: Assessment of methodological quality with the QUADAS-2 tool].
interest was the identification of delirium (presented as a dichotomous yes/no variable) by a reference standard (i.e. Diagnostic and Statistical Manual of Mental Disorders (DSM)) or a validated diagnostic tool such as the Confusion Assessment Method (CAM) [19]. Summary estimates of sensitivity and specificity with 95% confidence intervals (CI) were calculated using a bivariate random effects model. Receiver operating characteristic (ROC) curves were used to plot summary estimates of sensitivity and specificity.
A sensitivity analysis was performed including only those studies which were deemed to have an overall low risk of bias (that is, high study quality). Pre-planned subgroup analyses were also conducted to investigate clinical heterogeneity across studies: (i) excluding studies in patients with stroke, because of the potential influence of aphasia on the test, [20,21], to assess test accuracy of the 4AT in non-stroke populations, (ii) analysing separately for studies using (a) a clinical reference standard (e.g. DSM) or (b) a validated assessment tool (e.g. the Confusion Assessment Method (CAM)). A post-hoc subgroup analysis was conducted to compare diagnostic accuracy of the English 4AT versus the translated versions. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020. .

Study identification
We identified 853 records from our initial search and 3 records from conference abstracts ( Figure 1). A total of 780 records remained after initial deduplication. Following title and abstract screening, 21 records had full-text review and 16 articles were included reporting 17 different studies [9, 10, 22 -35]. The main reason for exclusion of articles was that studies were not designed as a diagnostic accuracy study of the 4AT and/or did not include data that allowed derivation of diagnostic test accuracy data. One conference proceeding reported two separate studies [25]. Two authors provided study data on subgroups of older patients [23,27,36].

Study characteristics
A summary of the characteristics of the included studies is provided in Table  this modification does not affect the threshold scoring for delirium versus no delirium in the tool.
Studies were conducted in inpatient general medical or geriatric medical wards, acute stroke units, emergency departments and post-operative care units, and nursing homes, in eleven countries. In one study in Australia 39% of participants were non-English speakers and required an interpreter during the assessment [28].

Study quality
The methodological quality of studies varied but was moderate to good overall. Potential for bias in studies was generally low, but where present was due to the selection of participants (excluding patients unable to give consent or those with dementia, n=2), the timing between the All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020. . reference standard assessment and the 4AT (not reported (n=6) or exceeding the maximum interval of 3 hours (n=2)), and the blinding of assessments (unblinded raters (n=2) or blinding status unclear (n=3)) (Table 1 and Figure 2). Seven papers were of higher concern (rated high or unclear risk of bias across three areas). Nine studies were considered low risk overall.  (Table 2; results of the other subgroup analyses are presented in Appendix 4]. Three studies reported findings in subset of patients with known dementia, with All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020. .

Statement of principal findings
This systematic review identified 17 studies involving 3721 patients evaluating the diagnostic test accuracy of the 4AT for detection of delirium in older patients (≥65y) across eleven countries, a variety of care settings and in multiple languages. The prevalence of delirium was 24.2% (N=945), ranging from 10.5%-61.9%. Pooled sensitivity and specificity were 0.88 and 0.88, respectively, indicating good accuracy. Notably, the sensitivity and specificity were balanced. Similar estimates were demonstrated when subgroup analyses were performed based on study quality and population type. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Results in the context of the current literature
The copyright holder for this preprint this version posted June 12, 2020. . specificities of 0.84-1.0 reported. There is limited published information on its performance in routine clinical care. One recent large study found a sensitivity of 0.27 [39] though the CAM was scored without the recommended preceding interview and cognitive testing. Alternative tools include the 3D-CAM, a 20-item variant of the CAM that takes 2-5 minutes to complete (median 3 minutes) [40], and the bCAM, a 2 minute, 4-item variant of the CAM designed and validated for use in the emergency department [41]. Both of these tools show generally good performance in published diagnostic test accuracy studies, with reported 3D-CAM sensitivities of 0.85-1.0 and specificities of 0.88-0.97 [40,[45][46][47], and reported bCAM sensitivities of 0.65-0.84 and specificities of 0.87-0.97 [35,[41][42][43][44]. To our knowledge there are currently no published clinical implementation data for these tools.
Our review provides evidence that the 4AT has good diagnostic test accuracy for identification of delirium, with a body of validation data comparable to the CAM. The 4AT has some advantages over the CAM and 3D-CAM, being shorter and simpler, and not requiring special training. Notably, the 4AT had a higher sensitivity than and similar specificity to the CAM in a recent STARD-compliant randomised controlled trial [10]. However, currently the 4AT lacks diagnostic accuracy data in palliative care settings and the community. The number of studies examining its performance in patients with known dementia is relatively small; the three studies presented in this review found lower specificity in delirium superimposed on dementia [9, 28, 34]. As with other delirium tools, studies on clinical implementation of the 4AT are relatively lacking. These kinds of studies might expose training needs or other challenges in implementation such as lower sensitivity when used in routine practice.

Strengths and weaknesses of the study
This is the first meta-analysis of 4AT diagnostic test accuracy studies. Our findings were broadly consistent across different care settings and languages. We published the protocol in advance, and we used systematic and robust methods including using a comprehensive search strategy, and independent reviewers to identify, select, appraise and synthesise relevant All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020. . https://doi.org/10.1101/2020.06.11.20128280 doi: medRxiv preprint studies. The selected studies originated from nine countries, and eight were conducted with a translated version of the tool. Thus, the findings of the review demonstrate good generalisability. The methodological quality of the studies was moderate to good, despite some uncertainty in relation to the conduct of the 4AT in four studies. The two studies showing low sensitivities both had high risk of bias. Due to the data of the studies included in this review, it was not possible to perform sensitivity analyses to determine the impact of time interval between tests and this should be the subject of further studies. In one study [30] we were unable to analyse data from those aged 65 or older separately; however the number of observations in patients <65 was likely low and would have a small potential impact on the overall results. Also, the Cochrane guidelines recommend the use of a single reference standard in order to prevent bias or ambiguity, but we included studies using either DSM-IV, DSM-5 or CAM as reference standard to maximise comprehensiveness.

Areas for further research
Methodological deficiencies related to the timings of the reference standard and 4AT identified in this review, as well as lack of adherence to the STARD guidelines, should be better addressed in future validation studies. Studies evaluating the 4AT in other settings and in patients with dementia, preferably taking into account the severity of dementia, are required.
Clinical implementation studies evaluating 4AT performance including completion rates as well as diagnostic accuracy in routine clinical practice are also needed.

Conclusion
This meta-analysis quantifies the diagnostic accuracy of the 4AT. The psychometric performance is good and coupled with its simplicity and brevity, the present findings support ongoing adoption and evaluation of the 4AT in routine clinical practice. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020. . (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020.    Table 2. Summary estimates of sensitivity and specificity.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020. .   4AT is a screening instrument designed for rapid initial assessment of delirium and cognitive impairment. A score of 4 or more suggests delirium but is not diagnostic: more detailed assessment of mental status may be required to reach a diagnosis. A score of 1-3 suggests cognitive impairment and more detailed cognitive testing and informant history-taking are required. A score of 0 does not definitively exclude delirium or cognitive impairment: more detailed testing may be required depending on the clinical context. Items 1-3 are rated solely on observation of the patient at the time of assessment. Item 4 requires information from one or more source(s), eg. your own knowledge of the patient, other staff who know the patient (eg. ward nurses), GP letter, case notes, carers. The tester should take account of communication difficulties (hearing impairment, dysphasia, lack of common language) when carrying out the test and interpreting the score. Alertness: Altered level of alertness is very likely to be delirium in general hospital settings. If the patient shows significant altered alertness during the bedside assessment, score 4 for this item. AMT4 (Abbreviated Mental Test -4): This score can be extracted from items in the AMT10 if the latter is done immediately before. Acute Change or Fluctuating Course: Fluctuation can occur without delirium in some cases of dementia, but marked fluctuation usually indicates delirium. To help elicit any hallucinations and/or paranoid thoughts ask the patient questions such as, "Are you concerned about anything going on here?"; "Do you feel frightened by anything or anyone?"; "Have you been seeing or hearing anything unusual?" index test(s) and

Supplementary
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020. -All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 12, 2020. .