What gets recorded, counts: dementia recording in primary care compared with a specialist database

Abstract Background databases of electronic health records are powerful tools for dementia research, but data can be influenced by incomplete recording. We examined whether people with dementia recorded in a specialist database (from a mental health and dementia care service) differ from those recorded in primary care. Methods a retrospective cohort study of the population covered by Lambeth DataNet (primary care electronic records) between 2007 and 2019. Documentation of dementia diagnosis in primary care coded data and linked records in a specialist database (Clinical Records Interactive Search) were compared. Results 3,859 people had dementia documented in primary care codes and 4,266 in the specialist database, with 2,886/5,239 (55%) documented in both sources. Overall, 55% were labelled as having Alzheimer’s dementia and 29% were prescribed dementia medication, but these proportions were significantly higher in those documented in both sources. The cohort identified from the specialist database were less likely to live in a care home (prevalence ratio 0.73, 95% confidence interval 0.63–0.85), have multimorbidity (0.87, 0.77–0.98) or consult frequently (0.91, 0.88–0.95) than those identified through primary care codes, although mortality did not differ (0.98, 0.91–1.06). Discussion there is under-recording of dementia diagnoses in both primary care and specialist databases. This has implications for clinical care and for generalizability of research. Our results suggest that using a mental health database may under-represent those patients who have more frailty, reflecting differential referral to mental health services, and demonstrating how the patient pathways are an important consideration when undertaking database studies.


Introduction
The complexity and heterogeneity of dementia means that many of the remaining research questions relating to dementia and dementia care cannot be answered by conventional means, such as randomised controlled trials. Healthcare database studies offer excellent opportunities. Many utilise electronic health records (EHR) to capture key demographic and clinical data, such as diagnostic codes, referrals and prescribing [1][2][3]. However, findings in healthcare databases may be influenced by where in the patient care pathway the data are collected [4,5]. If such databases are used they may not be representative of the full population of people living with dementia, particularly in terms of clinical features. This can affect the generalisability of the findings or even the results themselves, for instance extrapolation of prevalence or absolute risk [6,7].
In the UK, the main pathway to a dementia diagnosis, and treatment with acetylcholinesterase inhibitors for those with Alzheimer's type dementia, is assessment in primary care followed by referral to a specialist dementia diagnostic service, often either in community mental health services or memory clinics provided by mental health services [8]. Brayne and Davis's [6] review of sources of data for research in dementia suggests that, compared with data from primary care, data from specialist services (mental health and memory clinic providers) will tend to over-represent those who have 'memory problems' but are otherwise 'relatively fit'. This study takes advantage of linked primary care EHR and specialist EHR databases to explore the degree of overlap between the cohorts of people with recorded dementia in each data source, and thus the extent of under-documentation. We explore whether the character of patients in those cohorts reflect the patient pathway such that, compared to primary care, those with dementia diagnosis in the specialist database are (i) less likely to have markers of frailty and complexity (ii) more likely to have Alzheimer's-type dementia and be prescribed dementia medication.

Methods
A retrospective cohort study where the cohort was patients registered with a Lambeth GP any time in the years 2007-2019, utilising linkage to a specialist database.

Databases
Lambeth DataNet (LDN) provided data from primary care. LDN collects structured data from the EHR of all GP surgeries in the borough of Lambeth [9]. A person with a record in LDN will have had some contact with a Lambeth GP practice, which does not require residence in Lambeth.
South London and Maudsley NHS Foundation Trust (SLaM) provides specialist mental health and dementia care services for four London boroughs (Lambeth, Southwark, Lewisham and Croydon) [10]. Data from SLaM feed into a bespoke database of de-identified records through the infrastructure and oversight arrangements of the Clinical Records Interactive Search (CRIS), which can then be linked to other local and national data sources [10]. This allows the opportunity of utilising data from detailed assessments, such as those provided in memory clinics, alongside important outcomes recorded elsewhere, such as admission to general hospitals and death [11][12][13].
The CRIS/LDN linkage is conducted by the CRIS datalinkage service [10]. CRIS, including linkage to Lambeth DataNet, has received ethical approval as an anonymized data resource (Oxford Research Ethics Committee C, reference 18/SC/0372). This project was approved by the CRIS oversight committee. Code lists used are in Appendix 1 (ST1-6).

Cohort
Our population was the 1.2 million people with an LDN health records between 2007 and 2019. This means we only included people who were registered with a Lambeth GP, and we included them whether or not they had a record in secondary care. We defined dementia documentation from the structured fields of the respective databases. CRIS contained ICD-10 diagnostic codes, from which we selected codes referring to dementia from the mental and behavioural disorder chapter (F00-03); LDN had Read codes and SNOMED clinical terminologies following the recent national change in preferred ontology [14]. The Read code list ascertaining dementia replicated that from the SAIL-Dementia eCohort [15] and SNOMED codes were derived from those lists using the NHS mapping file [16]. By review of the English terms attached to the Read and SNOMED Concept terms we allocated them into 'high specificity' (e.g. 'Unspecified dementia'), which were sufficient on their own to indicate a diagnosis of dementia, and 'low specificity' (e.g. 'Delirium superimposed on dementia') that required supporting codes (Appendix 1, ST1).
Inclusion criteria for our main cohort were: (

Cohort characteristics
We extracted year of birth, gender and ethnicity from LDN. When describing the denominator of people aged above 65 in LDN, we included all those with age 65 or above at the median diagnosis date of those in LDN with a diagnosis of dementia (24/05/2013). Ethnicity was assigned within LDN as 16 classes, from which we used White British unchanged as the reference class, and condensed ethnicities that may be subject to disadvantage into White non-British, Black (Black and Black British), Asian (Asian and Asian British), Mixed and Other (Chinese and Any other). LDN gives last known address at the level of Lower Super Output Areas (LSOA, a standard geographic unit with an average population of 1,700), which allowed us to calculate a neighbourhood measure of deprivation (Index of Multiple Deprivation, IMD) using publicly available data tables [17]. For sensitivity analyses, we also ascertained whether a patient lived in Lambeth and whether they had at least one consultation documented in LDN on or prior to the data of the first documentation of dementia, which we term 'prior consultation'. Selected health indices were extracted from LDN for dates prior to the first documentation of dementia: number of GP consultations in the previous 2 years (Appendix 1:3), smoking status (Appendix 1:4) and comorbidity score. The comorbidity score was a modified Charlson comorbidity index that used SNOMED codes for chronic conditions adapted from Read code lists developed for the CALIBER project [18] converted using the NHS mapping file [16] and summed with weights from Quan et al. [19] (excluding dementia, Appendix 1:5). Care home residence was indicated by any care home visit in consultation type in the 2 years before diagnosis.
The subtype of dementia was determined from CRIS, where possible, taking the most recent ICD-10 dementia diagnosis. Where dementia was identified in LDN only, Read/SNOMED codes that represented specific dementia subtypes were extracted from LDN (Appendix 1:2) and the most frequent subtype was allocated. Unspecified subtype was allocated where dementia was categorised as unspecified in CRIS, or no subtype codes were used in LDN. Dementia medication was defined as acetylcholinesterase inhibitors or memantine (Appendix 1:6) prescribed at least once in LDN.

Analysis
Prevalence and patterns of missing data were explored. Descriptive statistics were calculated in MS Excel and R version 3.5.1. Confidence intervals are given around at 95% confidence (using Wilson's method for proportions and binomial method for prevalence ratio). Proportions are given to the nearest percentage point unless <10%. Chisquared tests were used to compare characteristics where we had specific hypotheses.

Results
Of patients with a LDN record between 2007 and 2019 aged 65 or over, 3,859 had dementia codes in primary care, with a median of two different codes from the list in Appendix 1 (interquartile range 1-7 different codes). Meanwhile 4,266 had dementia documented in the specialist care database. Combining the two sources of documentation found 5,239 unique patients with documented dementia in either source, making up 0.45% of all adult LDN patients or 5.4% of those over 65. This is our main cohort for analysis. Fifty-five percent of people identified with dementia were identified by both primary care codes and specialist database (2,886/5,239), as shown in Figure 1. 75% of those identified by primary care codes were also identified by the specialist database, and 68% of those identified by the specialist database were also identified by primary care codes. Of those identified, 84% resided in Lambeth and 85% had a prior GP consultation. Figure 1 shows the effect on overlap of restricting to these subpopulations and with a date restriction allowing for longer follow-up. Restricting the sample by residence or prior consultation modestly increased the percentage overlap in documentation from 55% to 57% (by residence, see also Appendix 2) or 60% (by prior consultation). Appendix 2 shows that both the number of cases per year and the proportion of primary care recording was highest in the years 2011-2015.
Characteristics of the main cohort are shown in Appendix 3. Three variables from LDN were found to contain missing data: ethnicity (744/5,239, 14%), smoking status (652/5,239, 12%) and LSOA/address (111/5,239, 2%). Restricting to those living in Lambeth made little difference, but prior consultation reduced the risk of missing data. Dividing the cohort into exclusive groups of those identified by both primary care codes and the specialist database ('both', n = 2,886), those identified by the specialist database only (n = 1,380) and those with primary care codes only (n = 973), levels of missing data were higher for people in the specialist only group. Tables 1 and 2 show proportions excluding missing data, while Appendix 4 shows the equivalent with missing data or restricting by prior consultation. Table 1 shows the demographic features of the three documentation groups. The three documentation groups had similar age, sex and deprivation distribution, but ethnicity differed, with under-representation of documented Black ethnicity in those in the specialist only group. Table 2 displays the outcome of tests on the hypothesis that there was a difference between the groups on measures of frailty or complexity. A significant difference was found in all threeway comparisons (P < 0.001). The specialist-only group had lower Charlson comorbidity index, lower numbers of prior consultations and fewer care home consultations. The primary care only group had the highest mortality. Restricting to people who had consulted primary care in the 2 years prior to diagnosis (Appendix 4) reduced but did not abolish the differences. Table 2 also shows that those with documentation in both databases were more commonly recorded with Alzheimer's type dementia and less commonly documented as having 'unspecified' or vascular dementia than those with only one type of documentation. 29% (1,505/5,239) of patients were prescribed dementia medications in primary care, and this Lambeth addresses = last known address in Lambeth; 1+ primary care consultations prior = one face-to-face or telephone encounter in primary care in the 2 years before first specialist or primary care dementia code.   varied from 41% in those documented in both sources to 8% in the specialist only group. Among those prescribed dementia medication, 93% had primary care codes for dementia.
Comparing the overlapping samples of the LDN cohort that could have been generated from the specialist database (combining 'specialist only' and 'both' from Table 1, n = 3,859) and primary care codes (combining 'primary care only' and 'both', n = 4,266), Table 3 shows the specialist database sample had significantly lower proportions of White British ethnicity, lower consultation rates, lower multimorbidity and fewer in care homes-but with fairly small effect size (prevalence ratios 0.94, 0.91, 0.85, 0.73, respectively). There was no difference in mortality (prevalence ratio 0.98, 0.91-1.06). The specialist database sample are also less likely to have been prescribed dementia medication (prevalence ratio 0.85, 0.80-0.91), explored further in Appendix 5, which shows the largest discrepancy in being prescribed medication was in those with Alzheimer's type dementia.

Discussion
We investigated the likely generalisability of findings made from databases of routinely recorded healthcare data by assessing patient characteristics associated with cohorts derived from two methods of ascertaining dementia cases in a defined population: structured diagnosis in a specialist mental health dementia service and coded documentation in primary care. We identified 5,239 patients with eligible dementia documentation, 55% of whom were documented in both data sources, 26% only in specialist care and 19% only in primary care. Those with dementia documented in the specialty database were less likely to live in a care home, consult the GP less frequently and have fewer comorbidities than those with dementia documented in the primary care codes. It therefore seems likely that the specialist database under-reflects frail and complex patients. Perhaps surprisingly, those in the specialist database were not more likely to have Alzheimer's dementia and they were less likely to be prescribed dementia medication. Both NICE guidelines and the primary care services contract emphasise the need for full memory clinic assessment in most cases when dementia is suspected, and that the clinic will assess for suitability for medication [8,20], which led to our hypothesis that we would see over-representation of those prescribed dementia medication in the specialist database. However, 93% of those prescribed dementia medication had primary care coding, compared with 66% of those not prescribed dementia medication. This may be due to reverse causation-those patients prescribed dementia medications by their GP subsequently get coded with dementia. Those without diagnosis in the specialist database, some of whom were prescribed dementia medications, might reflect diagnosis in other places such as clinics for the care of older people (not included in our data-source), which may be deemed more appropriate if patients had a mixture of physical and cognitive difficulties.
Of the people with dementia documented in the specialist database, 32% did not have this formally documented in primary care; this despite pressure on GPs to recognise possible dementia, refer and document diagnosis [20]. Our work is consistent with others in that primary care documentation increased in 2011-2015 when specific funding was available for dementia case finding [21][22][23], but that gaps in documentation remain. For example, comparing general hospital statistics with primary care codes has shown proportions of cases with a dementia diagnosis on their hospital data that did not have this recorded in primary care was 44% in an English sample [24] and 39% in Wales [15]. Severity is thought to be a predictor of documentation in primary care [13,25], to which we can add prescription of dementia medication. Our results suggest that White British people are more likely to have primary care codes than those of Black ethnicity-although our study was not looking at this, and so the finding should be regarded as tentative. Some under-ascertainment may occur when people move in and out of areas (for example to enter a care home), as GP practices in the UK each have their own electronic records that may not move with the patient or integrate with other IT systems. Under-documentation is a barrier to good clinical care [12,26]. Initiatives are consequently being developed to integrate care records to ensure clinicians have the information they need wherever the patient presents [27,28].
Under-documentation will have obvious repercussions on estimating the prevalence of diagnosed dementia, but a lack of sensitivity has wider consequences for research [29]. Unless a source of dementia diagnosis is near-complete, identification of people with dementia using this documentation will reflect patient and system factors that influenced the documentation, with risk of misclassification in the study. Our findings indicate that when patients with dementia are selected using single agency data the cohort may not be fully representative in both demographics and clinical characteristics. Conversely, these findings may indicate that the patient pathways themselves are not delivering equity of access.

Strengths and limitations
To our knowledge, this is the first study to compare dementia recorded in primary and specialist care in the UK. While the exact findings may not be generalisable elsewhere (especially due to the populations served in this catchment [10]), we expect the observations about under-documentation will be widely applicable. We used previously applied code lists to maximise the applicability of findings; however, limiting to coded data may have under-ascertained dementia documented as free-text. For comorbidities, we took lack of documentation to mean absence of condition, but they will be subject to the same under-documentation biases as we describe for dementia. For prescribing, we are assuming that specialist services always asked primary care to prescribe dementia medications (as was the policy), but there may have been patients who received it directly. Our inclusion criteria included people who were registered with a GP practice in Lambeth for only part of the date window, which may have accounted for another portion of under-ascertainment. Including more data sources to our search (such as from general hospitals in the area) may have increased the number of individuals we identified, and would be likely to show more under-documentation.
Any documentation of dementia that met our criteria was taken to represent a true positive case of dementia, but the code lists and our algorithm have not been externally validated against a clinical assessment. Given the relatively high prevalence of dementia in older adults and the known problem of under-documentation [5] we assume that false negatives are more likely than false positives as a cause for lack of overlap, but it is likely that there are also cases of mistakes in documentation. We are also conscious that the documentation gap we have demonstrated is related, but separate to, the diagnosis gap. To fully understand the underdocumentation for people with dementia, we would need to include a cohort screened for dementia to identify those without diagnosis.

Conclusions and implications
Documentation in EHR is important for clinical care and secondary use for database research studies. We found that two EHR databases for the same population sample found broadly equal numbers of people documented as living with dementia with substantial, but incomplete, overlap in the people identified. This incomplete documentation may suggest some inequality of access, which deserves further investigation. Researchers and clinicians using healthcare databases should be aware that where they cover only some of the reallife patient pathways, they may miss a proportion of people with dementia, and take this into account when choosing databases and interpreting the results. Opportunities for data linkage drawing from multiple databases will improve the generalisability of findings.