Deep learning algorithms to detect diabetic kidney disease from retinal photographs in multiethnic populations with diabetes

Abstract Objective To develop a deep learning algorithm (DLA) to detect diabetic kideny disease (DKD) from retinal photographs of patients with diabetes, and evaluate performance in multiethnic populations. Materials and methods We trained 3 models: (1) image-only; (2) risk factor (RF)-only multivariable logistic regression (LR) model adjusted for age, sex, ethnicity, diabetes duration, HbA1c, systolic blood pressure; (3) hybrid multivariable LR model combining RF data and standardized z-scores from image-only model. Data from Singapore Integrated Diabetic Retinopathy Program (SiDRP) were used to develop (6066 participants with diabetes, primary-care-based) and internally validate (5-fold cross-validation) the models. External testing on 2 independent datasets: (1) Singapore Epidemiology of Eye Diseases (SEED) study (1885 participants with diabetes, population-based); (2) Singapore Macroangiopathy and Microvascular Reactivity in Type 2 Diabetes (SMART2D) (439 participants with diabetes, cross-sectional) in Singapore. Supplementary external testing on 2 Caucasian cohorts: (3) Australian Eye and Heart Study (AHES) (460 participants with diabetes, cross-sectional) and (4) Northern Ireland Cohort for the Longitudinal Study of Ageing (NICOLA) (265 participants with diabetes, cross-sectional). Results In SiDRP validation, area under the curve (AUC) was 0.826(95% CI 0.818-0.833) for image-only, 0.847(0.840-0.854) for RF-only, and 0.866(0.859-0.872) for hybrid. Estimates with SEED were 0.764(0.743-0.785) for image-only, 0.802(0.783-0.822) for RF-only, and 0.828(0.810-0.846) for hybrid. In SMART2D, AUC was 0.726(0.686-0.765) for image-only, 0.701(0.660-0.741) in RF-only, 0.761(0.724-0.797) for hybrid. Discussion and conclusion There is potential for DLA using retinal images as a screening adjunct for DKD among individuals with diabetes. This can value-add to existing DLA systems which diagnose diabetic retinopathy from retinal images, facilitating primary screening for DKD.


Introduction
Diabetic kidney disease (DKD) is the leading cause of endstage renal disease worldwide. 1Early detection of DKD would allow prompt preventive actions to limit excess morbidity and mortality in patients with diabetes mellitus (DM). 2 Current guidelines recommend yearly blood tests to calculate estimated glomerular filtration rate (eGFR) from serum creatinine, or spot urine sampling for urine albumin/creatinine ratio (UACR). 3,4In individuals with DM, more frequent monitoring may be appropriate because of increased risk of progressive kidney disease relative to individuals with no diabetes. 3,4rthermore, population screening for chronic kidney disease (CKD) among the DM subgroup was found to have acceptable cost effectiveness. 5However, studies in the United States, 6 Australia, 7 and Asia 8 have reported suboptimal utilization or poor adherence to screening, and underdiagnosis of DKD.This remains a crucial barrier to early intervention.
Retinal photography is noninvasive and convenient, commonly used in primary care settings for screening of eye pathologies, particularly diabetic retinopathy.Because the retina and other end organs, including the kidneys, share similar structural, physiological (renin-angiotensin-aldosterone system), and pathogenic (inflammation, oxidative stress, endothelial dysfunction, microangiopathy) properties, the retinal vessels are an indirect representation of renal microvasculature. 9Clinically appreciable retinal microvascular changes have been associated with DKD, 10,11 suggesting that retinal images contain substantial representative information of the kidney's function.As the prevalence of diabetes grows worldwide, 12 a noninvasive tool to screen for DKD would complement existing deep learning algorithm (DLA) systems to diagnose diabetic retinopathy from retinal images, 13 facilitating primary care screening for complications of DM.Our group has previously developed and validated a DLA to detect CKD from retinal photographs in the general population in Singapore (RetiKid), which showed good performance in external datasets from Singapore and China. 14In this study, we developed and validated a DLA for detecting DKD (Reti-Kid-Diab) using retinal images from a clinic-based diabetic population.This model was compared with 2 other modelsone using clinical risk factor (RF) data and another hybrid model combining a retinal imaging score and RF data-to assess if this could lead to improved DKD predictions compared to an image-only model.

Methods
This study was approved by the Centralized Institutional Review Board (CIRB) of SingHealth, Singapore and conducted in accordance with the Declaration of Helsinki.Patients' informed consent was exempted by the CIRB for the use of deidentified health information and retinal images.We performed a conventional development, validation, and external testing study on 3 models (retinal images only; RF only; hybrid) using retinal images and clinical data collected from 3 population-based studies.We developed and internally validated the models using data from the Singapore Integrated Diabetic Retinopathy Program (SiDRP). 15External testing was performed on 2 independent datasets: (1) Singapore Epidemiology of Eye Diseases (SEED) study 16 and (2) Singapore Macroangiopathy and Microvascular Reactivity in Type 2 Diabetes (SMART2D) in Singapore.

Definition of DKD
eGFR was calculated from serum creatinine using the CKD Epidemiology Collaboration (CKD-EPI) creatinine equation. 17Since SiDRP provided annual screening for retinopathy among individuals with diabetes since 2010, presence and absence of DKD were assessed in all visits where serum creatinine data were available.We included individuals with 4 or more screening visits.We defined DKD (cases) as those with eGFR < 60 mL/min/1.73m 2 on !2 consecutive visits between 3 months to 2 years apart (in the SiDRP development cohort; in external test cohorts, DKD was defined by a single visit).We defined no DKD (controls) as eGFR of !60 mL/ min/1.73m 2 in all visits.Definitions of DKD in the external validation datasets (SEED and SMART2D) were the same as SiDRP.
Training dataset for the DLA For development, data and retinal images were obtained from patients with DM who participated in SiDRP (2010-2019), a national-level, telemedicine-based program established in 2010 to optimize eye screening for a general urban diabetes population. 15For each patient, 2 retinal photographs (optic disc-centered and macula-centered) were taken from each eye after pupil dilation according to the Early Treatment for Diabetic Retinopathy Study (ETDRS) protocol using a digital retinal camera (TRC-NW200, Topcon, Japan).Figure 1 is a flowchart detailing the breakdown of participants and images included from SiDRP.187 563 visits from 79 511 unique individuals were recruited and assessed for eligibility.After excluding samples missing creatinine or age data for eGFR calculation, having unstable CKD status, or poor image quality, 5356 cases (DKD positive visits) and 7928 controls (DKD negative visits) from SiDRP (total 6066 unique participants) were used for training and validation of the algorithm.

External testing datasets for the DLA
Two datasets were used for external testing: SEED and SMART2D.SEED is an ongoing population-based study of Chinese, Malay, and Indian participants aged !40 years at baseline.Detailed methods of SEED have been published. 18,19fter excluding those missing eGFR data or with poor quality images, data, and images from 1885 participants in SEED (798 cases, 1171 controls) were used for external testing, providing a total of 3938 fundus photographs (Table 1).SMART2D was a cross-sectional study conducted between 2011 and 2014 including 2057 adults aged 21-90 years with Type 2 DM.Detailed methods of SMART2D have been published. 201163 participants of SMART2D with eye screening visits were recruited for this study, of which 439 participants (227 cases, 485 controls) were eventually included, totaling 1424 fundus photographs.

Risk factors
We used 6 classic RF (age, sex, ethnicity, duration of diabetes, HbA1c, and systolic blood pressure [SBP]) as predictors for the RF model.Age, sex, ethnicity, and duration of diabetes were self-reported in all cohorts.HbA1c and SBP were obtained from physical examination or laboratory tests in all datasets.

Algorithm architecture and development
The image-only DLA was trained on 26 568 retinal images from 6066 SiDRP participants.The DLA models were based off the ResNet18 21 neural network architecture (Figure 2).The input layer takes 2 standardized macula-centered images (1 image per eye per participant) with resolution of 512 Â 512.The output layer was a binary classifier with one node predicting the presence of DKD.During the training process, network parameters were initialized with weights pretrained using a large-scale diabetic retinopathy dataset (https://www.kaggle.com/competitions/diabetic-retinopathydetection/data) to improve generalizability.For each image, the prediction given by the neural network is compared with its ground truth label, and parameters updated via backpropagation to reduce prediction error.We used 5-fold crossvalidation to evaluate model performance, preserving the ratio of DKD cases and controls from the original dataset.The validation set had no overlap with the training set.The performance of the trained DLA was evaluated on the validation set by calculating the AUC combining the five sets of scores.Heatmaps were generated to identify the most important regions in a retinal image contributing to the DLA classification decision.

Statistical analysis
We presented the characteristics of participants using number (%), mean (standard deviation, SD), or median [interquartile range, IQR] as appropriate for the variable.Model data from SiDRP were compared with each external validation set using Pearson's chi-squared test, Fisher's exact test, Student t-test, or Mann-Whitney U-test as appropriate (P-value <.05).We developed 3 models: (1) image-only model using retinal images; (2) RF-only model using multivariable logistic regression (LR) adjusted for age, sex, ethnicity, duration of diabetes, HbA1c, and SBP; (3) hybrid model using multivariable LR adjusted for age, sex, ethnicity, duration of diabetes, HbA1c, SBP, and standardized z-scores from the image-only model.Primary analysis was to evaluate the performance of the 3 models by calculating the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) at the optimal threshold defined by Youden's J Index, on internal validation and external validation.

Supplementary analysis
We performed several supplementary analyses including: (1) because misclassification of DKD would be common among those with eGFR near the normal range (55-60 mL/min/  1.73 m 2 ), we tested all 3 models using an alternate definition of DKD (eGFR <45 mL/min/1.73m 2 ; G3b and above).( 2) We recalculated model performance when sensitivity or specificity were fixed at 80%. (3) We performed subgroup analysis calculating model accuracy for individuals in different CKD severity stages.(4) We externally validated the DLA in 2 predominantly Caucasian cohorts-a high-risk cohort for coronary artery disease (CAD) from the Australian Eye and Heart Study (AHES), 22 and an older cohort from the Northern Ireland Cohort for the Longitudinal Study of Ageing (NICOLA). 23We performed these as supplementary analyses because the sample size and number of events were low.AHES is a cross-sectional study of 1680 participants (460 included in this study) presenting to the Westmead Hospital in Sydney for assessment of suspected CAD between 2009 and 2012. 22NICOLA is a cross-sectional study which collected health and lifestyle data from 8452 participants (265 included in this study) in Northern Ireland, aged !50 years.Baseline data were collected between 2013 and 2016.For AHES and NICOLA RF-only and Hybrid models, we did not adjust for ethnicity because a majority of AHES and NICOLA participants were Caucasian; in AHES, we also did not adjust for duration of diabetes as this information was unavailable.
For missing values of duration of diabetes (in NICOLA), SBP and HbA1c, we used mean/mode imputation as per the SMART2D cohort.( 5) Finally, we performed error analysis of false positive (FP) and false negative (FN) samples by key characteristics including albuminuria and presence of ocular diseases to gain some insights into the misclassification by the DLA.

Results
Participant characteristics of each dataset are shown in Table 1.Characteristics of participants in SiDRP and SEED were similar, except for higher mean SBP in SEED (SiDRP: Figure 3 shows the receiver operating characteristic (ROC) curve plots for the image-only DLA, RF-only multivariable LR model, and hybrid model.In SiDRP validation, AUC was 0.826 (95% CI 0.818-0.833)for image-only, 0.847 (0.840-0.854) for RF-only, and 0.866 (0.859-0.872) for hybrid.In external validation with SEED, AUC was 0.764 (0.743-0.785) for image-only, improving to 0.802 (0.783-0.822) for RF-only and 0.828 (0.810-0.846) for hybrid.In SMART2D external validation, AUC was 0.726 (0.686-0.765) for imageonly, decreasing to 0.701 (0.660-0.741) in RF-only and improving to 0.761 (0.724-0.797) for hybrid.Table 2 provides additional performance metrics to compare both models in internal validation and external test sets, at the optimal cutoff point defined by Youden's J Index.In SiDRP, the imageonly model had a sensitivity of 76% and specificity of 75%.The RF-only and hybrid models had sensitivity ranging from 78% to 79% and specificity ranging from 76% to 77%.In SEED, the image-only model had a sensitivity of 70% and specificity of 71%.The RF-only and hybrid models had higher sensitivities at 79% and 76%, respectively, while specificity was 67% and 74%, respectively.In SMART2D, image-only model had a sensitivity of 64% and specificity of 71%.RF-only and hybrid models had higher sensitivities at 75% and 71%, respectively, while specificity was 61% and 72%, respectively.NPV for the image-only model was 82% in SiDRP, 78% in SEED, and 81% in SMART2D.In RF-only and hybrid models, NPV levels were generally higher, ranging from 82% to 86% in all datasets.PPV values were consistently lower than NPV in all models and all datasets.
We also calculated model performances when sensitivity or specificity were fixed at 80% (Table S1).When sensitivity was fixed at 80%; the image-only model had a specificity of 70% in SiDRP, 57% in SEED, and 51% in SMART2D.Specificities of the RF-only and hybrid models ranged from 51% to 77%.When specificity was fixed at 80%; the image-only model had a sensitivity of 70% in SiDRP, 58% in SEED, and 53% in SMART2D.Sensitivities of the RF-only and hybrid models ranged from 40% to 77%.Next, we performed subgroup analysis for individuals in different eGFR categories.When fixed at 80% sensitivity, the image-only model was able to detect 81%-82% of the cases in Stage G3B CKD, 87%-93% of the cases in Stage G4 CKD, and 82%-96% of the Stage G5 DKD.
In our error analysis of the image-only model, images from controls (non-DKD) are more likely to be labeled positive (FP) if they were taken from patients that were older (66.2 [7.3] vs 56.2 [8.1] years), of Chinese ethnicity, had longer duration of diabetes (6.0 [3.0, 9.0] vs 4.0 [2.0, 8.0] years), higher SBP (129.9 [15.8]Finally, in our supplementary analysis on the AHES and NICOLA cohort, characteristics of AHES and NICOLA participants are provided in Table S2.Regarding model performance in AHES, the image-only model achieved an AUC of 0.670 (0.612-0.729), which was slightly lower than its performance in the 3 main datasets.Otherwise, the RF-only and hybrid models performed similarly, achieving AUCs of 0.685 (0.626-0.745) and 0.695 (0.640-0.751) respectively (Figure S1).Regarding model performance in NICOLA, the imageonly model achieved an AUC of 0.638 (0.562-0.714), which was lower than its performance in the 3 main datasets.However, the RF-only and hybrid models performed well, reaching AUCs of 0.721 (0.652-0.790) and 0.710 (0.640-0.779) respectively (Figure S1).

Discussion
We developed and validated a DLA for detecting DKD from retinal images (RetiKid-Diab), aiming to determine if the models are sufficiently robust to screen individuals with diabetes in the primary care setting.To our knowledge, this is the first study that attempts to predict DKD from retinal images in a population with diabetes, augmenting existing reports of CKD diagnosis using fundus images in the general population. 14,24Our models showed reasonable performance, faring well in internal validation (AUC image-only ¼ 0.826 [0.818-0.833]),with moderate performance in external validation (AUC image-only ¼ 0.764 [0.743-0.785] in SEED; 0.726 [0.686-0.765] in SMART2D).In particular, the imageonly model performed comparably well in all datasets compared to the RF-only model.That being said, the hybrid models, comprising both retinal image and RF, performed better than the image-only or RF-only versions across all datasets.Taken together, these results suggest that there is potential for DLA using retinal images as a screening adjunct for DKD among individuals with diabetes, alongside standard screening methods.Inclusion of common, readily acquirable RFs will add some value to the performance of the algorithm, if it were translated into the primary care setting.
An important consideration for any artificial intelligence (AI)-based screening system is its clinical relevance.DKD has a long asymptomatic phase, and early detection is critical for optimal management.It is widely reflected in medical guidelines that people with diabetes should be regularly evaluated for DKD, typically with an annual urine test for albuminuria and a blood test for serum creatinine to estimate GFR. 25 However, current screening rates are suboptimal 26 ; in a systematic Table 2. Performance of the 3 models in internal and external test sets at the optimal thresholds by Youden's J Index.review exploring screening rates among individuals with diabetes for diabetes-related complications, de Jong et al 27 reported that two-thirds of studies described nephropathy screening rates of less than 70%.In the Korean National Health and Nutrition Examination Survey, only 40.5% of patients with diabetes received screening for diabetic nephropathy during the previous year, even though they knew that they had diabetes. 28While an AI-based retinal image screening system for DKD may not replace current screening methods in the near future, it has potential to serve as a screening adjunct, to improve worldwide screening rates.Firstly, telemedicine for diabetic retinopathy screening among those with diabetes has remained strongly cost-effective compared with in-person office screening. 29Secondly, AI-based diabetic retinopathy screening programs have begun real-world implementation 30 ; a noninvasive, low cost, point-of-care DKD screening tool that uses the same input (retinal fundus photographs) provides the opportunity for simultaneous screening of 2 major microvascular complications of diabetes (DRþDKD) at the population level.For example, in Singapore, patients on follow-up for diabetes are routinely screened for referable DR using a nationwide implemented deep learning software-Singapore Eye Lesions Analyzer (SELENA). 31Using the same retinal images, patients could be screened for DKD as well.
Our results had several notable trends.There was a reduction in performance of the image-only model on SMART2D external validation, with particularly poorer specificity and PPV.However, this was also noticed with the RF-only and Hybrid models.The proportion of DKD cases in SMART2D (31.9%) was lower than SiDRP (40.3%) or SEED (40.5%), which could explain the lower PPV.Comparing this to realworld rates, reported prevalence of DKD among patients with diabetes vary greatly from approximately 20%-40%, 32,33 depending on the study population and presence of cardiovascular comorbidities.To increase PPV in populations with low prevalence of DKD, our DLA could be applied to higher risk individuals, including individuals with diabetes that are older, have poorer glycemic control, or multiple cardiovascular comorbidities.We also suggest applying this tool as part of a 2-stage screening, where a noninvasive DLA pegged at higher sensitivity can be applied first; individuals who screen positive are recalled for further testing with serum creatinine, to exclude false positives.There were 2 other notable trends regarding the image-only model in our results.First, performance of the image-only model improved in all datasets when the stricter definition of DKD (eGFR <45 mL/min/1.73m 2 ) was used.Second, the accuracy of the image-only model when sensitivity was fixed at 80% increased with the severity of DKD.This would be beneficial for a community screening adjunct, to safeguard that moderate/severe cases are less likely to be missed.Next, in our error analysis of the image-only model, FP labels occurred in patients that were older, had a longer duration of diabetes, or higher SBP.Conversely, FN labels occurred in patients that were younger and had lower SBP.This trend suggests that a positive label is associated with patients with a higher cardiovascular risk profile (and vice versa), which is not unexpected.FP labels were more likely in patients with lower HbA1c, while FN labels more likely in patients with higher HbA1c.It seems contradictory that better glycemic control is associated with a positive label.FP labels were more likely to occur in patients with cataracts (and vice versa)-this could be related to the known association between CKD and cataracts. 34The presence of cataracts can also affect the quality of fundus images, and in turn the performance of the DLA.Patients with albuminuria were more likely to have FP labels, suggesting that the DLA is identifying signals from those with early renal impairment (for example, stage 1 and 2 DKD which we classified as controls).Finally, on external validation, we observed slight reductions in AUCs for the image-only model.The small sample size of these external validation datasets (n ¼ 460 in AHES, n ¼ 265 in NICOLA) could explain the reduced, albeit modest, imageonly model performance.With further training on more ethnically diverse datasets, our model has the potential to generalize well in populations of varying ethnic predominance, which is advantageous for any AI-based community screening device.
Several image-based DLAs have been created to screen for CKD, 14,24,35,36 but not for DKD (Table 4).The RetiKid-Diab algorithm we describe in this article is a "sister" algorithm to RetiKid, 14 developed by our group in 2020.In our previous report, 14 RetiKid was also tested on a subgroup of individuals with diabetes (ie, predicting DKD), achieving better AUCs than RetiKid-Diab on internal validation (RetiKid hybrid AUC: 0.925; RetiKid-Diab hybrid AUC: 0.876).The algorithms used were inherently different in architecture-RetiKid is a neural network, able to utilize nonlinear associations and interaction terms.RetiKid-Diab image-only was also a neural network.However, RetiKid-Diab RF-only and hybrid models are multivariable LR models, which cannot utilize interaction or high-order terms for prediction.We note that a direct comparison of AUCs does not provide a full picture of model performance, and we suggest further evaluation of existing retina-based CKD screening tools in diabetic cohorts.
The strengths of our study include the validation of our DLA on independent cohorts with similar imaging and DKD diagnostic protocols.Next, we had a robust ground truth, where DKD was defined based on at least 2 measurements of eGFR on consecutive visits, potentially reducing misclassification of DKD cases.In addition, we explored the utility of hybrid models with additional clinical RF data, which showed improvement relative to image-only models.Nevertheless, our study has several limitations.First, because our study was based in an Asian population (Chinese, Indian, and Malay), our DLA would be more relevant in Asian countries with a high burden of diabetes and DKD.Further evaluation of its generalizability in non-Asian populations, and in more diverse demographic cohorts, may improve clinical utility and diagnostic accuracy.We explored this with our supplementary analyses on the predominantly Caucasian AHES (Australia) and NICOLA (Northern Ireland) cohorts, which demonstrated modest performance.Second, we did not have data on albuminuria for all participants, thus we could not incorporate albuminuria levels into the prediction.Third, we had a low representation of Stage G5 cases in the SiDRP training set, while Stage G5 cases were over-represented in SMART2D.Fourth, while heatmaps (Figure 4) indicated microvascular changes characteristic of retinopathy, it is unclear what specific features were used by the DLA to identify DKD.This is a problem faced by most existing imagebased DLAs.A multistep algorithm that detects characteristic microvascular changes and uses these features to predict DKD is possible, although this might overcomplicate the prediction process without a substantial increase in performance.Fifth, our current model underperforms relative to our prior work in a general population. 14However, this could be because the training dataset of our prior work was under a SEED research dataset which has less noisy labels.Our current model was trained on a real-world dataset which could account for the difference in performance.We attempted other machine learning classifiers (such as random forest and support vector machine), but overall the results were suboptimal to LR.We also attempted vision transformers (ViT) as the deep learning image model, performance was comparable, but it was computationally intensive and the training took longer.
In conclusion, our study shows the potential of a retina image DLA to screen for DKD among individuals with diabetes.Since access to digital retinal photography is increasing at the primary care level, a retinal image-based DLA, if adopted, has the potential to improve DKD screening rates worldwide, as an adjunct to existing laboratory methods.Next steps might include validating the algorithm in non-Asian populations, among young patients with diabetes (ie, T1DM patients) and performing implementation and cost-effectiveness studies.
technicians, managers, and receptionists.The Atlantic Philanthropies, the Economic and Social Research Council, the UKCRC Centre of Excellence for Public Health Northern Ireland, the Centre for Ageing Research and Development in Ireland, the Office of the First Minister and Deputy First Minister, the Health and Social Care Research and Development Division of the Public Health Agency, the Wellcome Trust/Wolfson Foundation, and Queen's University Belfast provide core financial support for NICOLA.We also acknowledge staff and participants of SiDRP, SEED, SMART2D, and AHES.The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Figure 1 .
Figure 1.Flowchart showing detailing inclusion and exclusion of participants and images from SiDRP.

Figure 2 .
Figure 2. Convolutional neural network architecture for predicting diabetic kidney disease from retinal images.

Figure 3 .
Figure 3. ROC curves for prediction of diabetic kidney disease in image-only, RF-only, and hybrid models.

Table 1 .
Baseline characteristics of participants.

Table 4 .
Performance of deep-learning algorithms related to chronic kidney disease in current literature.: AUC, area under the receiver operating curve; BES, Beijing Eye Study; CC-FII, China Consortium of Fundus Image Investigation; CGMH, Chang Gung Memorial Hospital; CKD, chronic kidney disease; CMUH, China Medical University Hospital in Taiwan; CNN, convolutional neural network; COACS, China Suboptimal Health Cohort Study; DKD, diabetic kidney disease; eGFR, estimated glomerular filtration rate; RF, Risk Factors; SiDRP, Singapore Integrated Diabetic Retinopathy Program; SEED, Singapore Epidemiology of Eye Disease; SMART2D, Singapore Macroangiopathy and Microvascular Reactivity in Type 2 Diabetes; SP2, Singapore Prospective Study Program.
Abbreviationsa The current study is the only one conducted for DKD in a diabetic population.All other published studies listed here were conducted in a general population.