Validity of estimated prevalence of decreased kidney function and renal replacement therapy from primary care electronic health records compared with national survey and registry data in the United Kingdom

Background Anonymous primary care records are an important resource for observational studies. However, their external validity is unknown in identifying the prevalence of decreased kidney function and renal replacement therapy (RRT). We thus compared the prevalence of decreased kidney function and RRT in the Clinical Practice Research Datalink (CPRD) with a nationally representative survey and national registry. Methods Among all people ≥25 years of age registered in the CPRD for ≥1 year on 31 March 2014, we identified patients with an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2, according to their most recent serum creatinine in the past 5 years using the Chronic Kidney Disease Epidemiology Collaboration equation and patients with recorded diagnoses of RRT. Denominators were the entire population in each age–sex band irrespective of creatinine measurement. The prevalence of eGFR <60 mL/min/1.73 m2 was compared with that in the Health Survey for England (HSE) 2009/2010 and the prevalence of RRT was compared with that in the UK Renal Registry (UKRR) 2014. Results We analysed 2 761 755 people in CPRD [mean age 53 (SD 17) years, men 49%], of whom 189 581 (6.86%) had an eGFR <60 mL/min/1.73 m2 and 3293 (0.12%) were on RRT. The prevalence of eGFR <60 mL/min/1.73 m2 in CPRD was similar to that in the HSE and the prevalence of RRT was close to that in the UKRR across all age groups in men and women, although the small number of younger patients with an eGFR <60 mL/min/1.73 m2 in the HSE might have hampered precise comparison. Conclusions UK primary care data have good external validity for the prevalence of decreased kidney function and RRT.

it problematic to determine the community prevalence of CKD [7,8]. For example, people who have kidney function measured routinely by serum creatinine may not represent the general population and serum creatinine assays may not be uniformly standardized.
Data derived from routine patient care, such as the anonymous primary care records held in the UK Clinical Practice Research Datalink (CPRD) [9], are an important resource for observational studies [10]. Because CRPD broadly represents the UK population in terms of demographics [11], it can be a useful source to estimate a disease prevalence in the UK. However, using routine electronic records to investigate renal disease is only possible if the general practitioners (GPs) appropriately test, identify and record everyone in the population who has kidney disease. Reliable measures of renal disease in electronic health records would allow a more robust use of primary care data to investigate renal disease epidemiology; for example, researchers would be able to investigate the association between kidney diseases and other comorbidities or medications recorded in primary care data. To date, a number of definitions for diseases or specific conditions have been validated in the CPRD at the individual or population level [12,13]. However, to our knowledge, there has been no external validation study for the prevalence of decreased kidney function and RRT in the CPRD. The best available methods to identify CKD and RRT in CPRD are to use serum creatinine records measured by GPs and recorded diagnoses of RRT in the CPRD, respectively, yet the validity or appropriateness of these methods are unknown.
The Health Survey for England (HSE), a nationally representative survey of health condition, included measurement of kidney function in 2009 and 2010 [14]. Every consenting participant had kidney function measured, giving representative statistics for the prevalence of decreased kidney function in the general population. Meanwhile, the UK Renal Registry (UKRR), which records information regarding all people on RRT in the UK, provides annual reports of the prevalence of RRT [15]. Referring to these two nationally representative sources of data, we aimed to evaluate the external validity of the prevalence of decreased kidney function and RRT in the CPRD.

Details of the CPRD and study population
In the UK, the primary care system acts as a gatekeeper to health care-patients need to be registered with a primary care doctor to access National Health Service (NHS) nonemergency care. Health care is free at the point of access. Primary care practices have used computerized electronic health records since the early 1990s. There are only a limited number of suppliers of GP electronic health record software. The CPRD uses data from VISION software system (In Practice Systems, London, UK) and has evolved as an observational data and interventional research service provided by the NHS. Currently >650 GP practices contribute data meeting quality control standards to the CPRD, covering and representing nearly 7% of the UK population [11]. Previous studies have suggested that the distribution of age, sex, ethnicity, practice location deprivation, and other health indicators such as smoking and morbidities are similar to that of external UK-based sources [11,[16][17][18][19]. The database includes patient demographics, coded diagnoses and outpatient laboratory test results. The Secretary of State waived informed consent for CPRD data because data are anonymized and there is an overall benefit for research. Ethical approval for this study was obtained from the Independent Scientific Advisory Committee, which oversees research on CPRD data ( protocol no. 16_055), as well as the London School of Hygiene and Tropical Medicine Ethics Committee (reference: 9196).
The study population was all people ≥25 years of age who were alive and registered in the CPRD for at least 1 year on 31 March 2014. The choice of age 25 years as a lower limit was made for the best comparability between the CPRD and HSE or UKRR: the HSE and UKRR collected data of people <25 years of age differently (the HSE grouped people 16-24 years of age, while the UKRR grouped people 18-24 years of age). One-year registration was considered necessary for GPs to record a history of RRT for newly registered patients or to test their kidney function if they had a key CKD risk factor such as diabetes [5].

Details of external data
For the prevalence of decreased kidney function, we compared the data from the CPRD with those from the HSE 2009 and 2010 (combined) [14]. Briefly, the HSE 2009/2010 included a cross-sectional study of kidney disease among people selected using a multistage stratified random probability sampling method. Blood samples were taken from nearly 6000 consenting participants, accounting for 77% for men and 73% for women among all the HSE participants. Data were weighted for non-response to reduce response bias. Creatinine was measured by an internationally standardized enzymatic method, which is traceable to isotope dilution mass spectrometry (IDMS) [20]. Estimated glomerular filtration rate (eGFR) was calculated from the serum creatinine value using the Modification of Diet in Renal Disease Study equation in the original HSE report [14], whereas a post hoc analysis was conducted using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation [21]. The prevalence of people with a single eGFR <60 mL/min/1.73 m 2 was reported according to age (every 10 years) and sex.
For RRT prevalence, we referred to the data from the UKRR 2014 [15]. The UKRR 2014 collected data from all 71 renal centres in the UK. The prevalence of RRT in 2013 was estimated by dividing the number of patients on RRT by the 2013 UK population, according to age (every 10 years), sex and RRT modality: haemodialysis, peritoneal dialysis or kidney transplantation.
Definition of decreased kidney function and RRT in the CPRD We identified patients with an eGFR <60 mL/min/1.73 m 2 according to their most recent single serum creatinine measured by a GP in the past 5 years (i.e. the period between 1 April 2009 and 31 March 2014) using the CKD-EPI equation [22]. We used a single eGFR to define decreased kidney ii143 V a l i d i t y o f C K D & R R T p r e v a l e n c e i n U K p r i m a r y c a r e function in the main analysis because the HSE (reference data in this study), as well as previous large epidemiological studies [23,24], have used this definition. For the main analysis, we made the following assumptions: (i) all the UK laboratories reported IDMS-traceable creatinine; (ii) people with a missing record of ethnicity in the CPRD had non-black ethnicity and (iii) people without any creatinine measurement for the past 5 years did not have decreased kidney function.
We identified patients on RRT based on the diagnoses recorded in the CPRD anytime from the date of their registration to 31 March 2014. The list of diagnosis codes (Read codes) indicative of RRT was determined by using a recommended strategy [25] and agreed upon among the authors (Supplementary data, Table S1). In addition, in order to examine the validity of diagnoses of different RRT modality in CPRD, we classified patients with RRT into those with haemodialysis, peritoneal dialysis or kidney transplantation. We used the most recent recorded diagnosis, as this is the best available approach to estimate the prevalence of the current RRT modality in CPRD.

Data analysis
We calculated the prevalence [95% confidence interval (CI)] of eGFR <60 mL/min/1.73 m 2 according to age (every 10 years) and sex in the CPRD and HSE, respectively, using the CKD-EPI equation. Denominators in the CPRD were the entire population in each age-sex band irrespective of creatinine measurement in the past 5 years. Patients ≥75 years of age were grouped in the CPRD to be consistent with the HSE. We calculated the difference (95% CI) in the prevalence of eGFR <60 mL/min/1.73 m 2 between the CPRD and HSE. We also reported the proportion of patients with at least one creatinine measurement for the past 5 years in the CPRD.
Similarly, we calculated the prevalence of RRT in the CPRD and UKRR, respectively, and then the difference between the CPRD and UKRR, in 10-year age bands by sex. We also reported results by RRT modality.
All statistical analyses were conducted using Stata 14 software (StataCorp, College Station, TX, USA).

Sensitivity analyses
We repeated our analyses using a number of alternative eGFR definitions and restricted study populations in order to determine the impact of the definition for decreased kidney function that we used. We defined decreased kidney function as follows: (i) we assumed that all the UK laboratories reported non-IDMS-traceable creatinine, and therefore multiplied the recorded creatinine value by 0.95 to use the CKD-EPI equation for IDMS-traceable creatinine [26]; (ii) we conducted a complete case analysis for ethnicity (restricting the analysis to people with recorded ethnicity in the CPRD); (iii) we used the participants' most recent creatinine in the past 2 years, instead of 5 years; (iv) we restricted the region to England, by excluding data from Scotland, Wales and Northern Ireland; (v) we additionally required a measure of chronicity to define decreased kidney function [27]: two or more eGFR results <60 mL/min/ 1.73 m 2 needed to be recorded consecutively ≥3 months apart in the past 5 years; and (vi) we conducted a complete case analysis for creatinine by restricting the analysis to people with at least one creatinine measurement in the past 5 years.
We also compared the prevalence of eGFR <45 mL/min/ 1.73 m 2 (calculated from the most recent creatinine in the past 5 years) between the CPRD and HSE, which may be a more robust indicator of decreased kidney function with prognostic implications [28,29].

R E S U LT S
From 685 GP practices, we identified 2 761 755 people [mean age 53 (SD 17) years, men 49%] who were alive and registered in the CPRD for ≥1 year on 31 March 2014. Their age-sex distribution was broadly similar to that of the UK Census 2013 (Supplementary data, Table S2). Of those identified, 189 581 patients (6.86%) had an eGFR <60 mL/min/1.73 m 2 and 3293 patients (0.12%) were on RRT.
The prevalence of eGFR <60 mL/min/1.73 m 2 increased steeply with age (Table 1 and Figure 1). There was no evidence that the prevalence of eGFR <60 mL/min/1.73 m 2 in the CPRD was different from that in the HSE across age groups, both in men and women, except for the group of men 25-34 years of age, in which no one had an eGFR <60 mL/min/1.73 m 2 in the HSE. The proportion of people who had a recorded measurement of creatinine increased with age, with 26% of men and 46% of women 25-34 years of age with tests in the past 5 years, up to 92% (both men and women) among people 75 years of age.
The prevalence of RRT gradually increased according to age (Table 2 and Figure 2). The difference between the CRPD and the UKRR was small across all age groups, both in men and women. Table 3 shows the subgroup analysis by RRT modality. The prevalence of patients with haemodialysis in the CPRD was slightly lower than that in the UKRR across all age groups, while the prevalence of those with peritoneal dialysis and kidney transplantation in the CPRD were similar to or slightly higher than those in the UKRR. Table 4 shows the results of sensitivity analyses. By assuming all creatinine results were non-IDMS traceable, the prevalence of eGFR <60 mL/min/1.73 m 2 in the CPRD decreased predominantly among older people, and overall prevalence decreased from 6.86 to 5.35%. Restricting to people with recorded ethnicity in the CPRD, using a serum creatinine value in the past 2 years and restricting to English data produced similar results to the main analysis. By defining decreased kidney function including a measure of chronicity, the prevalence decreased slightly in each age group, and overall prevalence decreased from 6.86 to 6.27%. Finally, in a complete case analysis (using as the denominator only those with serum creatinine tests) the prevalence of eGFR <60 mL/min/1.73 m 2 increased substantially compared to that in the main analysis.
The overall prevalence of eGFR <45 mL/min/1.73 m 2 was 2.33% (64 425/2 761 755) in the CPRD. The number of people with an eGFR <45 mL/min/1.73 m 2 was small and CIs of the prevalence estimates were large in the HSE ( Table 5). The proportion of people with an eGFR <45 mL/min/1.73 m 2 in the age group ≥75 years in the CPRD was significantly higher than that of the HSE, both in men and women.

D I S C U S S I O N
In this study, we examined the external validity of the prevalence of decreased kidney function (based on serum creatinine measured by GPs) and RRT (based on recorded diagnoses) in the CPRD by comparing them with results from two nationally representative sources (the HSE and UKRR). Across all ages for men and women the prevalence of eGFR <60 mL/min/1.73 m 2 in the CPRD was similar to that in the HSE, although the small number of younger patients with an eGFR <60 mL/min/1.73 F I G U R E 1 : Prevalence of eGFR <60 mL/min/1.73 m 2 in the CPRD and HSE.
ii145 V a l i d i t y o f C K D & R R T p r e v a l e n c e i n U K p r i m a r y c a r e m 2 in the HSE might have hampered precise comparison. The prevalence of RRT in the CPRD was broadly similar to that obtained from the UKRR, although there were differences in the RRT modality-specific prevalence between the CPRD and UKRR.
Routinely collected primary care data can be a useful resource for epidemiological studies, particularly in the UK, where >98% of citizens are registered with NHS GPs [11]. Although the prevalence or incidence of various diseases in the CPRD have good comparability with other UK-based data sources [12,13], the external validity of the prevalence of decreased kidney function and RRT has not been studied. Concerns specific to kidney diseases include that GPs do not test every registered patient's kidney function, which could lead to underestimation of the true prevalence of decreased kidney function. In our study, the proportion of people with creatinine measurement was small among young and middle-aged people, especially men. However, using the entire practice population as a denominator, the prevalence of eGFR <60 mL/min/1.73 m 2 in the CPRD was close to that in the HSE across all age groups, both in men and women. A possible explanation would be that, in line with the current National Institute for Health and Care Excellence (NICE) guidance for CKD [5], GPs are efficiently testing kidney function for people with CKD risk factors, including hypertension, diabetes, cardiovascular diseases and hereditary kidney disease (e.g. autosomal dominant polycystic kidney disease). In addition, the Quality and Outcome Framework (QOF) incentivizes GPs to register and manage patients with CKD [30]. Since the launch of the QOF for CKD in 2006/7, the identification and management of patients with CKD have been improving in the UK [31], although there are delays in coding patients with CKD in the system [32]. In older age groups, very high proportions had undergone testing of kidney function, and it is likely that those not tested are healthier, with a lower risk of CKD.
In sensitivity analyses, we examined to what extent the prevalence estimates for decreased kidney function changed under different assumptions related to uncertainties in the CPRD. First, the estimation changed considerably with the assumption of whether the UK laboratories reported creatinines traceable to IDMS or not. We expect that most of the UK laboratories reported IDMS-traceable creatinines during the study period, yet if a few laboratories reported non-IDMStraceable creatinines, the true prevalence of eGFR <60 mL/ min/1.73 m 2 in the CPRD would become lower than our estimation in the main analysis. Standardization of serum creatinine assays is thus important in studies regarding CKD epidemiology. Second, the assumption of non-black ethnicity for people with missing ethnicity data in the CPRD affected the prevalence estimates only slightly. This is probably because the proportion of people with black ethnicity is small in the UK, at ∼3% [18]. Third, using creatinine records for the past 2 instead of 5 years made little change to prevalence estimates for decreased kidney function. This may relate to recommendations for regular testing in line with the QOF and the current NICE guidance for CKD [5]. Fourth, in the CPRD the prevalence of eGFR <60 mL/min/1.73 m 2 in England was similar to that in the whole UK, ensuring comparability between the HSE and CPRD in our study. Fifth, the prevalence estimates slightly decreased by using the CKD criteria including chronicity. This may suggest that some patients with a single eGFR <60 mL/min/1.73 m 2 had transient kidney dysfunction, probably because serum creatinine was measured at the time of acute illness when they may have developed acute kidney injury. Finally, the prevalence of decreased kidney function was likely to be overestimated by restricting the denominator to only people with creatinine measurement. This suggests that GPs selectively test people at high risk of CKD, especially among younger people. The prevalence of RRT was also similar between the CPRD and UKRR across all age groups in men and women. Patients receiving RRT are in frequent contact with kidney units, so GPs do not provide comprehensive routine care for these individuals. However, patients on RRT remain registered with their GPs and therefore we would anticipate that GPs update patient records to reflect commencement of RRT. Our results suggest that the estimated prevalence of RRT based on recorded diagnoses in the CPRD was broadly valid when compared against comprehensive UKRR. However, using the most recent diagnosis indicating RRT modality, the prevalence of haemodialysis was underestimated in the CPRD, while those of peritoneal dialysis and kidney transplantation were similar, or somewhat overestimated, especially among older people. This may be because patients with peritoneal dialysis and kidney transplantation are often healthier and have more regular contact with their GPs compared with those on haemodialysis. In addition, for patients with a change in their RRT modality (e.g. from peritoneal dialysis to haemodialysis) there may be a delay in updating the modality in the GP record. Therefore, some patients currently on haemodialysis might be misclassified into the group of peritoneal dialysis or kidney transplantation because their previous diagnoses (i.e. peritoneal dialysis or kidney transplantation) are not yet updated. Another possibility is that patients commencing haemodialysis died before this was recorded in the CPRD, given the high early mortality rates of these patients [33].
There are several limitations to our study. First, this is a cross-sectional study examining the validity of prevalence of decreased kidney function and RRT. Our results do not ensure that UK primary care data are reliable for identifying the incidence of CKD and RRT. Second, our comparison of data between the CPRD and HSE or UKRR was only at the population rather than the individual level. Our analyses did not allow us to calculate the sensitivity or specificity of RRT diagnoses. In the absence of linked data, it is possible that there was a similar extent of misclassification between cases and non-cases, resulting in an overall agreement of the prevalence estimates in the CPRD with those in the HSE and UKRR. Third, the prevalence of decreased kidney function in the HSE was the best available estimate, but not a perfect reference standard. The survey did not include people who were temporarily hospitalized for acute illness or were in residential care [14]. In addition, people with poor health might be reluctant to give a blood sample, and the existing adjustment for nonresponse in the HSE may not have fully dealt with this bias. This may explain the finding in our sensitivity analysis that the proportion of people with an eGFR <45 mL/min/1.73 m 2 in the oldest age group in the CPRD was significantly higher than that of the HSE. Blood sampling was conducted on only one occasion in the HSE. Accordingly, we defined decreased kidney function in the CPRD using one serum creatinine measurement in our main analysis. However, some patients might have had their kidney function checked as a result of acute illness, and therefore their decreased kidney function might have been transient. Previous research has shown that creatinine fluctuation can affect the CKD prevalence estimates in routinely collected data [34], although the influence was not large in our study. At ∼6000, the sample size in the HSE was not small, yet  the relatively wide 95% CIs for the prevalence estimates in each age-sex group hampered more precise comparisons. In particular, the number of patients with an eGFR <60 mL/min/1.73 m 2 was small among younger age groups. We could not compare the prevalence of more severe kidney dysfunction, because patients with an eGFR <30 mL/min/1.73 m 2 were rare, even among older people in the HSE [14]. Meanwhile, testing of albuminuria is known to be incomplete in UK primary care electronic health records [32], which prevented us from comparing the prevalence of albuminuria, or CKD stages 1 and 2, between the CPRD and HSE. Because albuminuria is an important prognostic factor in people with and without low eGFR [35], the unknown validity of albuminuria in UK primary care remains an obstacle to the study of CKD using the CPRD. Finally, our findings may not be generalizable to other GP practices in the UK if GP practices contributing to the CPRD were more likely to measure kidney function and record the diagnoses of RRT. Generalizability to primary care electronic health records in other European countries is also uncertain, because the frequency of practices such as blood testing, chronic disease monitoring, recording of diagnoses, incentives and access to public primary care clinics differ.
In the era of a rising global prevalence of ESRD [4], highquality epidemiological research on kidney diseases is becoming more important. Routinely collected electronic health record data would play an important role for kidney research, because most patients with CKD are diagnosed and managed in primary care. Accurate identification of CKD and RRT in the CPRD would allow investigation of the association between kidney diseases and other comorbidities or medications. It is also possible to investigate equity of care (e.g. referral to nephrologists), given that the database is less biased for ascertaining advanced CKD than population surveys and disease registries. In this study, we demonstrated that identifying the prevalence of CKD and RRT is valid at the population level in the CPRD. Although further validation of individual-level data is needed, our findings support the use of UK primary care data for research into kidney disease.

CO N C L U S I O N S
We examined the external validity of the prevalence of decreased kidney function and RRT in the CPRD. The prevalence of eGFR <60 mL/min/1.73 m 2 in the CPRD was similar to that in a national sampling survey (HSE 2009/2010), and the prevalence of RRT in the CPRD was close to that obtained from a national disease registry (UKRR 2014) across all age groups, in both men and women. These findings suggest that UK primary care data can be used to identify the prevalence of decreased kidney function and RRT in future studies.

S U P P L E M E N TA RY D ATA
Supplementary data are available online at http://ndt.oxfordjournals.org.