-
PDF
- Split View
-
Views
-
Cite
Cite
Amélie Jacq, Georges Tarris, Adrien Jaugey, Michel Paindavoine, Elise Maréchal, Patrick Bard, Jean-Michel Rebibou, Manon Ansart, Doris Calmo, Jamal Bamoulid, Claire Tinel, Didier Ducloux, Thomas Crepin, Melchior Chabannes, Mathilde Funes de la Vega, Sophie Felix, Laurent Martin, Mathieu Legendre, Automated evaluation with deep learning of total interstitial inflammation and peritubular capillaritis on kidney biopsies, Nephrology Dialysis Transplantation, Volume 38, Issue 12, December 2023, Pages 2786–2798, https://doi.org/10.1093/ndt/gfad094
- Share Icon Share
ABSTRACT
Interstitial inflammation and peritubular capillaritis are observed in many diseases on native and transplant kidney biopsies. A precise and automated evaluation of these histological criteria could help stratify patients’ kidney prognoses and facilitate therapeutic management.
We used a convolutional neural network to evaluate those criteria on kidney biopsies. A total of 423 kidney samples from various diseases were included; 83 kidney samples were used for the neural network training, 106 for comparing manual annotations on limited areas to automated predictions, and 234 to compare automated and visual gradings.
The precision, recall and F-score for leukocyte detection were, respectively, 81%, 71% and 76%. Regarding peritubular capillaries detection the precision, recall and F-score were, respectively, 82%, 83% and 82%. There was a strong correlation between the predicted and observed grading of total inflammation, as for the grading of capillaritis (r = 0.89 and r = 0.82, respectively, all P < .0001). The areas under the receiver operating characteristics curves for the prediction of pathologists’ Banff total inflammation (ti) and peritubular capillaritis (ptc) scores were respectively all above 0.94 and 0.86. The kappa coefficients between the visual and the neural networks' scores were respectively 0.74, 0.78 and 0.68 for ti ≥1, ti ≥2 and ti ≥3, and 0.62, 0.64 and 0.79 for ptc ≥1, ptc ≥2 and ptc ≥3. In a subgroup of patients with immunoglobulin A nephropathy, the inflammation severity was highly correlated to kidney function at biopsy on univariate and multivariate analyses.
We developed a tool using deep learning that scores the total inflammation and capillaritis, demonstrating the potential of artificial intelligence in kidney pathology.

What was known:
Total interstitial inflammation and peritubular capillaritis are observed in many native and transplant kidney diseases.
These lesions are frequently evaluated for diagnostic, severity and prognostic purposes, but these histological evaluations suffer from a lack of precision and reproducibility between pathologists.
This work aimed at automating and standardizing the grading of total interstitial inflammation and peritubular capillaritis with a convolutional neural network.
This study adds:
We developed and evaluated a tool that effectively segments leukocytes and peritubular capillaries on Masson's trichrome–stained kidney samples.
The convolutional neural networks' predictions for the total inflammation and the peritubular capillaritis scores were close to that of trained kidney pathologists.
This deep learning tool could also provide more precise predictions such as leukocyte density which is closely related to immunoglobulin A nephropathy patients’ kidney function.
Potential impact:
A more homogenized evaluation of total inflammation and capillaritis could help stratify patients’ kidney prognoses and guide therapeutic management.
If the prognostic impact of these automatic evaluations is confirmed, we can imagine the rise of dedicated histological classifications in native and transplant kidney biopsies.
The interstitial leukocyte density, which can be routinely calculated with the tool, might limit the impact of fibrosis and edema on the inflammation assessment and might be a stronger marker of interstitial inflammation.
INTRODUCTION
Interstitial inflammation is defined by kidney leukocyte infiltration sometimes associated with peritubular capillaritis and/or tubulitis [1]. Interstitial inflammation is the main lesion in tubulo-interstitial nephritis (TIN) diseases but can also be observed in other situations such as graft rejections and glomerulonephritis [2–5]. These histological lesions can lead to tubular dysfunction, fibrosis and kidney failure [1, 2, 6]. As an example, the grade of interstitial inflammation in immunoglobulin A (IgA) nephropathy (IgAN) is linked to the risk of disease progression [7–10]. Thus, in addition to its diagnostic function, the evaluation of interstitial inflammation is frequently evaluated in many diseases for prognostic purposes [11, 12].
Total interstitial inflammation can be graded as the percentage of affected cortical area affected but to limit its lack of reproducibility, a semi-quantitative grading is frequently chosen instead [8, 13–15]. The total inflammation (ti) and peritubular capillaritis (ptc) scores of the Banff classification currently represent one of the main standardizing methods [5, 16–18]. Nevertheless, these semi-quantitative evaluations still suffer from poor to moderate interrater reliability [14, 19, 20]. A more precise and reproducible evaluation could help target patients at risk of disease progression and guide therapeutic management [19].
Artificial intelligence has led to many advances in kidney pathology. Our team and others have previously shown that convolutional neural networks can automate the measurement of several quantitative histological criteria including interstitial fibrosis, tubular atrophy and mean glomerular density [21, 22]. Thanks to its high reproducibility, deep learning limits inter-observer variability and allows exhaustive and precise segmentations [14, 15, 23–26]. This high precision also allows us to refine the quantification of histological abnormalities and to measure histologic criteria that are virtually impossible to assess routinely by a pathologist. This work aims at automating the grading of total interstitial inflammation and peritubular capillaritis with a convolutional neural network in Masson's trichrome–stained kidney samples.
MATERIALS AND METHODS
Population
Kidney samples were obtained from the university hospitals of Dijon between January 2009 and January 2023, and from Besançon between January 2016 and January 2020. Several types of kidney samples were included:
kidney biopsies with either acute or chronic TIN, primary IgAN, IgA vasculitis or minimal change disease;
transplant kidney biopsies at the Dijon center or protocol transplant biopsies performed within the first year of transplantation at the Besançon center;
non-tumor kidney sample from total nephrectomy for cancer.
Biopsies with tumor lesions were excluded. Patients had to be 14 years of age or older and give oral consent for research purposes. This work received the agreement of the local ethics committee and was following the Helsinki Convention.
Clinico-biological data on the day of the biopsy were retrospectively collected, which included: age, sex, history of diabetes, hypertension, use of renin–angiotensin system inhibitors, serum creatinine level and proteinuria. Estimation of the glomerular filtration rate (eGFR) was calculated using the Chronic Kidney Disease Epidemiology Collaboration formula. To evaluate clinical correlations in homogenized populations, patients with IgAN or a transplant were subsequently evaluated in secondary analyses.
Training, test and application cohorts
Kidney samples from 423 patients were divided into three independent groups (Fig. 1).

Training, test and application cohorts. MCD, minimal change disease.
The training cohort's purpose was to train the neural network on recognizing leukocytes and peritubular capillaries in limited areas. This group consisted of 83 kidney samples from Dijon including 43 transplant biopsies, 16 IgAN, 9 IgA vasculitis, 7 TIN, 5 nephrectomies and 3 minimal change diseases.
The test cohort's purpose was to validate the neural network detection performances in limited areas. It compared manual annotations with the network’s predictions. This group consisted of 36 samples from Dijon (17 IgAN, 5 TIN, 5 minimal change diseases, 5 nephrectomies and 4 IgA vasculitis), and 70 samples from Besançon (60 protocol transplant biopsies and 10 IgAN) for external validation.
The application cohort's purpose was to compare the automated and visual gradings on whole biopsies and nephrectomies samples. This cohort consisted of 234 kidney samples from Dijon and Besançon including 89 IgAN, 46 TIN, 20 minimal change diseases, 20 IgA vasculitis biopsies, 19 nephrectomies, and 40 transplant biopsies (32 graft rejections biopsies and 8 normal biopsies).
Histological analyses
Biopsies were formalin-fixed, paraffin-embedded, cut into 2-μm sections, and stained with blue or green Masson's trichrome. Transplant biopsies from the application cohort were stained with unpolarized Sirius Red, and the resulting images were visually analyzed and compared with the neural networks' assessment of fibrosis. The slides were read, analyzed and annotated blindly to patients’ medical histories. The digitization of the biopsy slides was performed using the Hamamatsu scanner (model C9600-12) with a 200× lens, and a resolution of 454 nm/pixel. Biopsies were inferred at a 25× zoom and nephrectomy samples at a 100× zoom. Images were analyzed and manually annotated by two trained nephropathologists using ASAP annotation software (ASAP, Netherlands). The gold standard was defined by the mean of the two pathologists’ evaluations.
Algorithms
Training and evaluations were carried out on a PC Titan RTX (Nvidia, CA, USA) graphics card (24 GB VRAM). The used Convolutional Neural Network was Mask R-CNN Inception ResNet V2, which was implemented using in Python using Tensorflow and keras. The implementation is based on the existing Mask R-CNN github repository and is available on github [27]. We previously developed two preliminary training steps with this neural network [21]. The first training consisted of isolating the cortical area from the capsule, the medulla and the background. Within the cortical area, the second training consisted of recognizing the following structures: sclerotic and non-sclerotic glomeruli, healthy and atrophic tubules, arteries and veins.
In the current study, after this pre-processing step, we used an image cleaned of cortical structures previously detected by the second training. Thus, areas of interest were virtually only containing interstitial areas. This method could enhance the detection accuracy by limiting the number of histological structures encountered by the network. This algorithm was based on the segmentation of leukocytes and peritubular capillaries. Within the application cohort, the three algorithms were automatically and sequentially executed.
Neural network training and testing
No biopsy that had been used to train the previous algorithms was used in the test and application cohorts. For training and testing, several regions at, respectively, a ×200 zoom and a ×400 zoom were randomly selected. A preprocessing phase used the annotations selected by previous algorithms. The image was then sliced into small vignettes of 1024 × 1024 pixels with at least 33% overlap between each adjacent vignette. This resulted in 902 vignettes (95 different regions) which were used for the training of the neural network. A total of 20 840 leukocytes and 7962 peritubular capillaries were manually annotated for the training. To artificially enhance the training data, the images were randomly rotated 90° at each epoch (reiteration of training). The neural network was trained on 600 epochs. A total of 116 different regions and 718 vignettes were used for the test cohort. A total of 3599 leukocytes and 2310 peritubular capillaries were manually annotated for the test.
Lesions grading
The cortical area without annotation by the second algorithm was considered an interstitial area. The number of annotations per category and the area of each category was obtained. The percentage of total inflammation was the ratio of total leukocyte area to cortical area. The number of peritubular capillaries containing leukocytes as well as the number of leukocytes found within them were evaluated. Total inflammation, capillaritis and interstitial fibrosis were classified according to the ti, ptc and interstitial fibrosis (ci) scores of the last version of the Banff classification (Supplementary Methods) [5]. The percentages of total inflammation and interstitial fibrosis were visually evaluated with a step of 5. The predicted percentage of interstitial fibrosis corresponded to the ratio between the surface of the areas which were not annotated by the second neural network and the total cortical area [21]. We estimated leukocyte interstitial density based on glomerular density assessment methods (Supplementary Methods) [28]. This density was only evaluated on the interstitial area. For patients with primary IgAN, the MEST-C score was assessed by pathologists [29, 30].
Statistical analysis
Quantitative data were expressed as mean and standard deviation. Semi-quantitative data were expressed as numbers and percentages. The correlation between two quantitative variables was calculated using the Spearman test. A multiple linear regression test was used to assess the effect of several quantitative variables on a target variable (eGFR). Student's or Mann–Whitney's T-tests compared two quantitative variables depending on whether the distribution was normal or not. The performance of the neural networks was evaluated by precision, recall, F-score and intersection over union (IOU) (Supplementary Methods). Inter-observer variability was assessed with Cohen's Kappa (ĸ) test. A ĸ <0.40 was considered poor, 0.40–0.59 moderate, 0.60–0.79 substantial and >0.80 major. Receiver operating characteristics (ROC) curves were constructed for the prediction of ti and ptc scores. As the range of predicted inflammation was smaller than the observed range, the Youden test was used to determine the thresholds with the best sensitivity and specificity for predicted ti scores. Statistical analyses were performed using GraphPad PRISM 6.01 software (GraphPad Software, La Jolla, CA, USA) and IBM SPSS 23 software (IBM, Chicago, IL, USA).
RESULTS
Population
The clinical, biological and histological data of the patients from the three cohorts are described in Table 1. Of the 423 samples included, 251 (59%) were native kidney biopsies, 144 (34%) were transplant biopsies and 29 (7%) were nephrectomy samples. Most native kidney biopsies involved IgAN (n = 138/252, 55%). Among the 58 patients with interstitial nephritis, the etiology was unknown in 22 (38%) patients, autoimmune in 13 (22%) patients, drug-induced in 13 (22%) patients, toxic in 9 (16%) patients and Hantavirus in 1 (2%) patient. Seventy-nine (55%) normal biopsies, 19 (13%) acute antibody-mediated, 19 (13%) acute T-cell-mediated, 11 (8%) chronic T-cell-mediated, 5 (3%) borderline, 6 (4%) viral nephritis, 3 (2%) mixed acute rejections and 2 (1%) chronic mixed rejections were included in the transplant biopsies.
The clinical, biological and histological data of the patients from the three cohorts.
Data . | All patients (n = 423) . | Training cohort (n = 83) . | Test cohort (n = 106) . | Application cohort (n = 234) . |
---|---|---|---|---|
Age (years) | 53 ± 18 | 54 ± 17 | 55 ± 15 | 52 ± 19 |
Male sex | 281 (66) | 58 (70) | 72 (68) | 151 (65) |
Diabetes mellitus | 84 (20) | 14 (17) | 24 (23) | 46 (20) |
Hypertension | 300 (71) | 62 (75) | 91 (86) | 147 (63) |
Native kidney biopsy | 251 (59) | 35 (42) | 41 (39) | 175 (75) |
TIN | 58 (14) | 7 (8) | 5 (5) | 46 (19) |
IgAN | 132 (31) | 16 (19) | 27 (25) | 89 (38) |
IgA vasculitis | 33 (8) | 9 (11) | 4 (4) | 20 (9) |
Minimal change disease | 28 (7) | 3 (4) | 5 (5) | 20 (8) |
Kidney transplant | 144 (34) | 43 (52) | 60 (56) | 40 (17) |
Antibody-mediated rejectiona | 24 (6) | 3 (4) | 6 (6) | 15 (6) |
T-cell-mediated/borderline rejectiona | 40 (9) | 10 (12) | 10 (9) | 20 (9) |
Viral nephritis | 6 (2) | 3 (4) | 3 (3) | 0 (0) |
Nephrectomy | 29 (7) | 5 (6) | 5 (5) | 19 (8) |
Serum creatinine at biopsy (mg/dL) | 2.1 ± 1.8 | 2.0 ± 1.4 | 1.9 ± 1.4 | 2.2 ± 2.1 |
eGFR at biopsy (mL/min/1.73 m2) | 53 ± 34 | 52 ± 33 | 51 ± 30 | 55 ± 37 |
Proteinuria at biopsy (g/day) | 1.9 ± 2.6 | 1.3 ± 1.5 | 1.2 ± 1.8 | 2.4 ± 3.0 |
Mean interstitial fibrosis (%)b | 22 ± 17 | 20 ± 17 | 19 ± 15 | 25 ± 17 |
ci1b | 144 (34) | 41 (49) | 64 (60) | 101 (43) |
ci2b | 101 (24) | 13 (16) | 13 (12) | 75 (32) |
ci3b | 36 (9) | 7 (8) | 8 (8) | 21 (9) |
Mean total inflammation (%)b | 27 ± 22 | 24 ± 22 | 24 ± 20 | 30 ± 22 |
ti1b | 99 (23) | 19 (23) | 29 (27) | 51 (22) |
ti2b | 114 (27) | 19 (23) | 23 (22) | 72 (31) |
ti3b | 72 (17) | 13 (16) | 16 (15) | 43 (18) |
ptc1b | 97 (23) | 22 (27) | 33 (31) | 42 (18) |
ptc2b | 124 (29) | 19 (23) | 26 (24) | 79 (34) |
ptc3b | 82 (19) | 8 (10) | 8 (8) | 66 (28) |
Data . | All patients (n = 423) . | Training cohort (n = 83) . | Test cohort (n = 106) . | Application cohort (n = 234) . |
---|---|---|---|---|
Age (years) | 53 ± 18 | 54 ± 17 | 55 ± 15 | 52 ± 19 |
Male sex | 281 (66) | 58 (70) | 72 (68) | 151 (65) |
Diabetes mellitus | 84 (20) | 14 (17) | 24 (23) | 46 (20) |
Hypertension | 300 (71) | 62 (75) | 91 (86) | 147 (63) |
Native kidney biopsy | 251 (59) | 35 (42) | 41 (39) | 175 (75) |
TIN | 58 (14) | 7 (8) | 5 (5) | 46 (19) |
IgAN | 132 (31) | 16 (19) | 27 (25) | 89 (38) |
IgA vasculitis | 33 (8) | 9 (11) | 4 (4) | 20 (9) |
Minimal change disease | 28 (7) | 3 (4) | 5 (5) | 20 (8) |
Kidney transplant | 144 (34) | 43 (52) | 60 (56) | 40 (17) |
Antibody-mediated rejectiona | 24 (6) | 3 (4) | 6 (6) | 15 (6) |
T-cell-mediated/borderline rejectiona | 40 (9) | 10 (12) | 10 (9) | 20 (9) |
Viral nephritis | 6 (2) | 3 (4) | 3 (3) | 0 (0) |
Nephrectomy | 29 (7) | 5 (6) | 5 (5) | 19 (8) |
Serum creatinine at biopsy (mg/dL) | 2.1 ± 1.8 | 2.0 ± 1.4 | 1.9 ± 1.4 | 2.2 ± 2.1 |
eGFR at biopsy (mL/min/1.73 m2) | 53 ± 34 | 52 ± 33 | 51 ± 30 | 55 ± 37 |
Proteinuria at biopsy (g/day) | 1.9 ± 2.6 | 1.3 ± 1.5 | 1.2 ± 1.8 | 2.4 ± 3.0 |
Mean interstitial fibrosis (%)b | 22 ± 17 | 20 ± 17 | 19 ± 15 | 25 ± 17 |
ci1b | 144 (34) | 41 (49) | 64 (60) | 101 (43) |
ci2b | 101 (24) | 13 (16) | 13 (12) | 75 (32) |
ci3b | 36 (9) | 7 (8) | 8 (8) | 21 (9) |
Mean total inflammation (%)b | 27 ± 22 | 24 ± 22 | 24 ± 20 | 30 ± 22 |
ti1b | 99 (23) | 19 (23) | 29 (27) | 51 (22) |
ti2b | 114 (27) | 19 (23) | 23 (22) | 72 (31) |
ti3b | 72 (17) | 13 (16) | 16 (15) | 43 (18) |
ptc1b | 97 (23) | 22 (27) | 33 (31) | 42 (18) |
ptc2b | 124 (29) | 19 (23) | 26 (24) | 79 (34) |
ptc3b | 82 (19) | 8 (10) | 8 (8) | 66 (28) |
Quantitative data are expressed as numbers (%) and semi-quantitative data as mean ± standard deviation.
aIncluding mixed rejections.
bData from the region of interest trained and/or analyzed.
The clinical, biological and histological data of the patients from the three cohorts.
Data . | All patients (n = 423) . | Training cohort (n = 83) . | Test cohort (n = 106) . | Application cohort (n = 234) . |
---|---|---|---|---|
Age (years) | 53 ± 18 | 54 ± 17 | 55 ± 15 | 52 ± 19 |
Male sex | 281 (66) | 58 (70) | 72 (68) | 151 (65) |
Diabetes mellitus | 84 (20) | 14 (17) | 24 (23) | 46 (20) |
Hypertension | 300 (71) | 62 (75) | 91 (86) | 147 (63) |
Native kidney biopsy | 251 (59) | 35 (42) | 41 (39) | 175 (75) |
TIN | 58 (14) | 7 (8) | 5 (5) | 46 (19) |
IgAN | 132 (31) | 16 (19) | 27 (25) | 89 (38) |
IgA vasculitis | 33 (8) | 9 (11) | 4 (4) | 20 (9) |
Minimal change disease | 28 (7) | 3 (4) | 5 (5) | 20 (8) |
Kidney transplant | 144 (34) | 43 (52) | 60 (56) | 40 (17) |
Antibody-mediated rejectiona | 24 (6) | 3 (4) | 6 (6) | 15 (6) |
T-cell-mediated/borderline rejectiona | 40 (9) | 10 (12) | 10 (9) | 20 (9) |
Viral nephritis | 6 (2) | 3 (4) | 3 (3) | 0 (0) |
Nephrectomy | 29 (7) | 5 (6) | 5 (5) | 19 (8) |
Serum creatinine at biopsy (mg/dL) | 2.1 ± 1.8 | 2.0 ± 1.4 | 1.9 ± 1.4 | 2.2 ± 2.1 |
eGFR at biopsy (mL/min/1.73 m2) | 53 ± 34 | 52 ± 33 | 51 ± 30 | 55 ± 37 |
Proteinuria at biopsy (g/day) | 1.9 ± 2.6 | 1.3 ± 1.5 | 1.2 ± 1.8 | 2.4 ± 3.0 |
Mean interstitial fibrosis (%)b | 22 ± 17 | 20 ± 17 | 19 ± 15 | 25 ± 17 |
ci1b | 144 (34) | 41 (49) | 64 (60) | 101 (43) |
ci2b | 101 (24) | 13 (16) | 13 (12) | 75 (32) |
ci3b | 36 (9) | 7 (8) | 8 (8) | 21 (9) |
Mean total inflammation (%)b | 27 ± 22 | 24 ± 22 | 24 ± 20 | 30 ± 22 |
ti1b | 99 (23) | 19 (23) | 29 (27) | 51 (22) |
ti2b | 114 (27) | 19 (23) | 23 (22) | 72 (31) |
ti3b | 72 (17) | 13 (16) | 16 (15) | 43 (18) |
ptc1b | 97 (23) | 22 (27) | 33 (31) | 42 (18) |
ptc2b | 124 (29) | 19 (23) | 26 (24) | 79 (34) |
ptc3b | 82 (19) | 8 (10) | 8 (8) | 66 (28) |
Data . | All patients (n = 423) . | Training cohort (n = 83) . | Test cohort (n = 106) . | Application cohort (n = 234) . |
---|---|---|---|---|
Age (years) | 53 ± 18 | 54 ± 17 | 55 ± 15 | 52 ± 19 |
Male sex | 281 (66) | 58 (70) | 72 (68) | 151 (65) |
Diabetes mellitus | 84 (20) | 14 (17) | 24 (23) | 46 (20) |
Hypertension | 300 (71) | 62 (75) | 91 (86) | 147 (63) |
Native kidney biopsy | 251 (59) | 35 (42) | 41 (39) | 175 (75) |
TIN | 58 (14) | 7 (8) | 5 (5) | 46 (19) |
IgAN | 132 (31) | 16 (19) | 27 (25) | 89 (38) |
IgA vasculitis | 33 (8) | 9 (11) | 4 (4) | 20 (9) |
Minimal change disease | 28 (7) | 3 (4) | 5 (5) | 20 (8) |
Kidney transplant | 144 (34) | 43 (52) | 60 (56) | 40 (17) |
Antibody-mediated rejectiona | 24 (6) | 3 (4) | 6 (6) | 15 (6) |
T-cell-mediated/borderline rejectiona | 40 (9) | 10 (12) | 10 (9) | 20 (9) |
Viral nephritis | 6 (2) | 3 (4) | 3 (3) | 0 (0) |
Nephrectomy | 29 (7) | 5 (6) | 5 (5) | 19 (8) |
Serum creatinine at biopsy (mg/dL) | 2.1 ± 1.8 | 2.0 ± 1.4 | 1.9 ± 1.4 | 2.2 ± 2.1 |
eGFR at biopsy (mL/min/1.73 m2) | 53 ± 34 | 52 ± 33 | 51 ± 30 | 55 ± 37 |
Proteinuria at biopsy (g/day) | 1.9 ± 2.6 | 1.3 ± 1.5 | 1.2 ± 1.8 | 2.4 ± 3.0 |
Mean interstitial fibrosis (%)b | 22 ± 17 | 20 ± 17 | 19 ± 15 | 25 ± 17 |
ci1b | 144 (34) | 41 (49) | 64 (60) | 101 (43) |
ci2b | 101 (24) | 13 (16) | 13 (12) | 75 (32) |
ci3b | 36 (9) | 7 (8) | 8 (8) | 21 (9) |
Mean total inflammation (%)b | 27 ± 22 | 24 ± 22 | 24 ± 20 | 30 ± 22 |
ti1b | 99 (23) | 19 (23) | 29 (27) | 51 (22) |
ti2b | 114 (27) | 19 (23) | 23 (22) | 72 (31) |
ti3b | 72 (17) | 13 (16) | 16 (15) | 43 (18) |
ptc1b | 97 (23) | 22 (27) | 33 (31) | 42 (18) |
ptc2b | 124 (29) | 19 (23) | 26 (24) | 79 (34) |
ptc3b | 82 (19) | 8 (10) | 8 (8) | 66 (28) |
Quantitative data are expressed as numbers (%) and semi-quantitative data as mean ± standard deviation.
aIncluding mixed rejections.
bData from the region of interest trained and/or analyzed.
Detection accuracy
Among limited cortical areas from the 106 kidney samples of the test cohort, the neural network’s predictions were compared with pathologists’ segmentations. Regarding leukocyte detection, the precision, recall, F-score and IOU were 81%, 71%, 76% and 52%, respectively. Regarding peritubular capillaries detection, the precision, recall, F-score and IOU were 82%, 83%, 82% and 70%, respectively. The most common errors were a lack of leukocyte detection in the most inflammatory areas and granulomas. Endothelial and fibroblast cells were sometimes labeled as leukocytes, and some biopsy borders were mistaken for capillaries (Supplementary data, Fig. S1).
Evaluation of total interstitial inflammation and capillaritis
In the application cohort, neural networks' lesions grading was compared with that of the pathologists on 215 kidney biopsies and 19 nephrectomy samples (Fig. 2). Mean percentages of total inflammation and fibrosis were 30 ± 22% and 25 ± 17%, respectively. Respectively, 68 (29%), 51 (22%), 72 (31%) and 43 (18%) samples had a ti0, ti1, ti2 and ti3 score, and 47 (20%), 42 (18%), 79 (34%) and 66 (28%) samples had a ptc0, ptc1, ptc2 and ptc3 score.

Masson's trichrome–stained kidney biopsy of a patient with tubulointerstitial nephritis before and after neural networks inferences. (A) Biopsy at a ×25 zoom. (B) Biopsy after the pre-processing steps, image cleaned of cortical structures previously detected by the first pieces of training. (C–F) Biopsy after the third training inference at ×25 (C), ×100 (D), ×200 (E) and ×400 (F) zooms. Leukocytes are artificially colorized in red and peritubular capillaries in green. Scale bars: 50 µm.
The three neural networks’ mean total inference time was 44 ± 28 min per biopsy. The tool detected a mean of 2665 ± 1647 peritubular capillaries, 7138 ± 6060 leukocytes in the interstitial area and 1026 ± 773 inside the peritubular capillaries. The mean interstitial leukocyte density was 26 161 ± 9477 leukocytes/mm2. There was a strong correlation between the predicted and observed percentage of total inflammation (r = 0.89, P < .0001) (Fig. 3), as for the predicted and observed degree of capillaritis (r = 0.82, P < .0001). The predicted percentage of inflammation was also associated with both the observed and predicted percentages of fibrosis (respectively, r = 0.77 and r = 0.89, all P < .0001). Neural network predictions based on pathologist scores and kidney diseases are shown in Fig. 4.

Correlation between the predicted and observed total inflammation. Mean values of pathologists’ evaluations were used for the total cortical inflammation observed.

Neural network predictions of total cortical inflammation and leukocyte capillary infiltration depending on pathologists’ scores (A, B) and kidney disease etiologies (C, D). Pathologists 1 (orange) and 2 (yellow) scores are presented in (A) and (B). Tukey box plots.
The areas under the ROC curves for the prediction of pathologists’ ti ≥1, ti ≥2 and ti3 with the neural networks’ percentage of inflammation were 0.96 [95% confidence interval (CI) 0.94–0.98], 0.95 (95% CI 0.92–0.97) and 0.94 (95% CI 0.91–0.97), respectively (with all P < .0001). The areas under the ROC curves for the prediction of pathologists’ ptc ≥1, ptc ≥2 and ptc3 with the neural networks’ leukocyte count in the most affected capillary were 0.92 (95% CI 0.88–0.97), 0.86 (95% CI 0.82–0.92) and 0.96 (95% CI 0.94–0.99), respectively (with all P < .0001) (Fig. 5). The ĸ coefficients between the visual and the neural networks' scores were respectively 0.74, 0.78 and 0.68 for ti ≥1, ti ≥2 and ti ≥3 (with all P < .0001), and 0.62, 0.64 and 0.79 for ptc ≥1, ptc ≥2 and ptc ≥3 (with all P < .0001). The ĸ coefficients between the two pathologists’ scores were, respectively, 0.84, 0.83 and 0.82 for ti ≥1, ti ≥2 and ti ≥3 (with all P < .0001), and 0.57, 0.61 and 0.51 for ptc ≥1, ptc ≥2 and ptc ≥3 (with all P < .0001).

ROC curves for the prediction of pathologists’ ti (A–C) and ptc (D–F) scores with the predicted percentage of cortical inflammation and leukocyte count in the most affected capillary. Optimal cut-off values for ti scores were obtained with Youden tests. The optimal cortical inflammation scores were 16.5% (sensitivity of 83%, specificity of 98%) for ti1, 20.1% (sensitivity of 89%, specificity of 93%) for ti2 and 30.4% (sensitivity of 91%, specificity of 89%) for ti3. AUC, area under the curve.
Patients with IgAN
Patients in the application cohort with primary IgAN were analyzed to assess the association of predicted histological data with baseline clinico-biological characteristics. Patients’ characteristics at biopsy are described in Table 2.
Data . | Patients (N = 89) . |
---|---|
Age (years) | 49 ± 19 |
Male sex | 70 (79) |
Diabetes mellitus | 11 (12) |
Hypertension | 59 (66) |
Renin–angiotensin–aldosterone blockers | 56 (63) |
Serum creatinine level (mg/dL) | 2.3 ± 2.4 |
eGFR at biopsy (mL/min/1.73 m2) | 59 ± 39 |
Proteinuria (g/day) | 2.8 ± 3.0 |
M1 | 31 (35) |
E1 | 46 (51) |
S1 | 59 (66) |
T1 | 34 (38) |
T2 | 8 (9) |
C1 | 4 (4) |
C2 | 16 (18) |
Interstitial fibrosis predicted (%) | 34 ± 9 |
Total cortical inflammation predicted (%) | 27 ± 13 |
ti1 (predicted) | 8 (9) |
ti2 (predicted) | 30 (34) |
ti3 (predicted) | 19 (21) |
Leukocytes in the most affected capillary (cell) | 7 ± 3 |
ptc1 (predicted) | 9 (10) |
ptc2 (predicted) | 59 (66) |
ptc3 (predicted) | 15 (17) |
Leukocyte density predicted (cell/mm2) | 26 374 ± 8288 |
Data . | Patients (N = 89) . |
---|---|
Age (years) | 49 ± 19 |
Male sex | 70 (79) |
Diabetes mellitus | 11 (12) |
Hypertension | 59 (66) |
Renin–angiotensin–aldosterone blockers | 56 (63) |
Serum creatinine level (mg/dL) | 2.3 ± 2.4 |
eGFR at biopsy (mL/min/1.73 m2) | 59 ± 39 |
Proteinuria (g/day) | 2.8 ± 3.0 |
M1 | 31 (35) |
E1 | 46 (51) |
S1 | 59 (66) |
T1 | 34 (38) |
T2 | 8 (9) |
C1 | 4 (4) |
C2 | 16 (18) |
Interstitial fibrosis predicted (%) | 34 ± 9 |
Total cortical inflammation predicted (%) | 27 ± 13 |
ti1 (predicted) | 8 (9) |
ti2 (predicted) | 30 (34) |
ti3 (predicted) | 19 (21) |
Leukocytes in the most affected capillary (cell) | 7 ± 3 |
ptc1 (predicted) | 9 (10) |
ptc2 (predicted) | 59 (66) |
ptc3 (predicted) | 15 (17) |
Leukocyte density predicted (cell/mm2) | 26 374 ± 8288 |
Quantitative data are expressed as numbers (%) and semi-quantitative data as mean ± standard deviation.
M, mesangial hypercellularity score; E, endocapillary hypercellularity score; S, sclerosis score; C, crescent score.
Data . | Patients (N = 89) . |
---|---|
Age (years) | 49 ± 19 |
Male sex | 70 (79) |
Diabetes mellitus | 11 (12) |
Hypertension | 59 (66) |
Renin–angiotensin–aldosterone blockers | 56 (63) |
Serum creatinine level (mg/dL) | 2.3 ± 2.4 |
eGFR at biopsy (mL/min/1.73 m2) | 59 ± 39 |
Proteinuria (g/day) | 2.8 ± 3.0 |
M1 | 31 (35) |
E1 | 46 (51) |
S1 | 59 (66) |
T1 | 34 (38) |
T2 | 8 (9) |
C1 | 4 (4) |
C2 | 16 (18) |
Interstitial fibrosis predicted (%) | 34 ± 9 |
Total cortical inflammation predicted (%) | 27 ± 13 |
ti1 (predicted) | 8 (9) |
ti2 (predicted) | 30 (34) |
ti3 (predicted) | 19 (21) |
Leukocytes in the most affected capillary (cell) | 7 ± 3 |
ptc1 (predicted) | 9 (10) |
ptc2 (predicted) | 59 (66) |
ptc3 (predicted) | 15 (17) |
Leukocyte density predicted (cell/mm2) | 26 374 ± 8288 |
Data . | Patients (N = 89) . |
---|---|
Age (years) | 49 ± 19 |
Male sex | 70 (79) |
Diabetes mellitus | 11 (12) |
Hypertension | 59 (66) |
Renin–angiotensin–aldosterone blockers | 56 (63) |
Serum creatinine level (mg/dL) | 2.3 ± 2.4 |
eGFR at biopsy (mL/min/1.73 m2) | 59 ± 39 |
Proteinuria (g/day) | 2.8 ± 3.0 |
M1 | 31 (35) |
E1 | 46 (51) |
S1 | 59 (66) |
T1 | 34 (38) |
T2 | 8 (9) |
C1 | 4 (4) |
C2 | 16 (18) |
Interstitial fibrosis predicted (%) | 34 ± 9 |
Total cortical inflammation predicted (%) | 27 ± 13 |
ti1 (predicted) | 8 (9) |
ti2 (predicted) | 30 (34) |
ti3 (predicted) | 19 (21) |
Leukocytes in the most affected capillary (cell) | 7 ± 3 |
ptc1 (predicted) | 9 (10) |
ptc2 (predicted) | 59 (66) |
ptc3 (predicted) | 15 (17) |
Leukocyte density predicted (cell/mm2) | 26 374 ± 8288 |
Quantitative data are expressed as numbers (%) and semi-quantitative data as mean ± standard deviation.
M, mesangial hypercellularity score; E, endocapillary hypercellularity score; S, sclerosis score; C, crescent score.
In univariate analysis, the predicted percentage of total inflammation, cortical fibrosis, capillaritis score and mean leukocyte density were all associated with baseline eGFR (Fig. 6). The other factors associated with eGFR were age, hypertension, proteinuria, and M, S and C status. In multiple linear regression, only interstitial leukocyte density, age and percentage of interstitial fibrosis were associated with baseline eGFR, with β scores, respectively, of –0.36 (95% CI –0.71, –0.01, P = .042), –0.45 (95% CI –0.63, –0.27, P < .0001) and –0.47 (95% CI –0.86, –0.08, P = .019) (Table 3).

Correlation between eGFR at biopsy and neural networks predictions in IgAN application cohort patients.
N = 89 . | Univariatea . | Multivariate . | Multivariate . | Multivariate . | ||||
---|---|---|---|---|---|---|---|---|
. | . | Model 1b . | Model 2c . | Model 3d . | ||||
Data . | r (95% CI) . | P-value . | Beta (95% CI) . | P-value . | Beta (95% CI) . | P-value . | Beta (95% CI) . | P-value . |
Age (per years) | –0.64 (–0.75, –0.49) | <.001 | –0.45 (–0.63, –0.26) | <.001 | –0.45 (–0.63, –0.27) | <.001 | –0.45 (–0.63, –0.27) | <.001 |
Male sex | –0.03 (–0.25, 0.18) | .882 | ||||||
Hypertension | –0.38 (–0.55, –0.18) | <.001 | –0.07 (–0.24, 0.10) | .419 | –0.09 (–0.25, 0.08) | .299 | –0.09 (–0.26, 0.08) | .292 |
Diabetes mellitus | –0.14 (–0.34, 0.08) | .118 | ||||||
Renin–angiotensin–aldosterone blockers | –0.10 (–0.31, 0.12) | .343 | ||||||
Proteinuria (per 0.1 g/day) | –0.36 (–0.53, –0.15) | <.001 | –0.03(–0.17, 0.11) | .668 | –0.02 (–0.16, 0.12) | .770 | –0.01 (–0.15, 0.14) | .906 |
M1 | –0.34 (–0.51, –0.13) | .001 | –0.07 (–0.21, 0.09) | .287 | –0.07 (–0.22, 0.08) | .344 | –0.08 (–0.23, 0.07) | .270 |
E1 | –0.16 (–0.36, 0.06) | .131 | ||||||
S1 | –0.23 (–0.42, –0.01) | .035 | 0.03 (–0.12, 0.18) | .653 | 0.03 (–0.11, 0.18) | .655 | 0.03 (–0.12, 0.18) | .679 |
Predicted cortical fibrosis (per %) | –0.68 (–0.78, –0.54) | <.001 | –0.18 (–0.46, 0.09) | .197 | –0.32 (–0.49, –0.15) | <.001 | –0.47 (–0.86, –0.08) | .019 |
C >1 | –0.19 (–0.38, 0.02) | .080 | ||||||
Predicted Total cortical inflammation (per %) | –0.67 (–0.77, –0.53) | <.001 | –0.28 (–0.55, 0.00) | .054 | 0.24 (–0.33, 0.81) | .396 | ||
Predicted ptc >2 | –0.21 (–0.41, –0.01) | .046 | 0.10 (–0.05,0.25) | .207 | 0.14 (–0.01, 0.30) | .072 | 0.16 (–0.00, 0.32) | .055 |
Predicted leukocyte density (per cell/mm2) | –0.47 (–0.62, –0.28) | <.001 | –0.23 (–0.40, –0.06) | .007 | –0.36 (–0.71, –0.01) | .042 |
N = 89 . | Univariatea . | Multivariate . | Multivariate . | Multivariate . | ||||
---|---|---|---|---|---|---|---|---|
. | . | Model 1b . | Model 2c . | Model 3d . | ||||
Data . | r (95% CI) . | P-value . | Beta (95% CI) . | P-value . | Beta (95% CI) . | P-value . | Beta (95% CI) . | P-value . |
Age (per years) | –0.64 (–0.75, –0.49) | <.001 | –0.45 (–0.63, –0.26) | <.001 | –0.45 (–0.63, –0.27) | <.001 | –0.45 (–0.63, –0.27) | <.001 |
Male sex | –0.03 (–0.25, 0.18) | .882 | ||||||
Hypertension | –0.38 (–0.55, –0.18) | <.001 | –0.07 (–0.24, 0.10) | .419 | –0.09 (–0.25, 0.08) | .299 | –0.09 (–0.26, 0.08) | .292 |
Diabetes mellitus | –0.14 (–0.34, 0.08) | .118 | ||||||
Renin–angiotensin–aldosterone blockers | –0.10 (–0.31, 0.12) | .343 | ||||||
Proteinuria (per 0.1 g/day) | –0.36 (–0.53, –0.15) | <.001 | –0.03(–0.17, 0.11) | .668 | –0.02 (–0.16, 0.12) | .770 | –0.01 (–0.15, 0.14) | .906 |
M1 | –0.34 (–0.51, –0.13) | .001 | –0.07 (–0.21, 0.09) | .287 | –0.07 (–0.22, 0.08) | .344 | –0.08 (–0.23, 0.07) | .270 |
E1 | –0.16 (–0.36, 0.06) | .131 | ||||||
S1 | –0.23 (–0.42, –0.01) | .035 | 0.03 (–0.12, 0.18) | .653 | 0.03 (–0.11, 0.18) | .655 | 0.03 (–0.12, 0.18) | .679 |
Predicted cortical fibrosis (per %) | –0.68 (–0.78, –0.54) | <.001 | –0.18 (–0.46, 0.09) | .197 | –0.32 (–0.49, –0.15) | <.001 | –0.47 (–0.86, –0.08) | .019 |
C >1 | –0.19 (–0.38, 0.02) | .080 | ||||||
Predicted Total cortical inflammation (per %) | –0.67 (–0.77, –0.53) | <.001 | –0.28 (–0.55, 0.00) | .054 | 0.24 (–0.33, 0.81) | .396 | ||
Predicted ptc >2 | –0.21 (–0.41, –0.01) | .046 | 0.10 (–0.05,0.25) | .207 | 0.14 (–0.01, 0.30) | .072 | 0.16 (–0.00, 0.32) | .055 |
Predicted leukocyte density (per cell/mm2) | –0.47 (–0.62, –0.28) | <.001 | –0.23 (–0.40, –0.06) | .007 | –0.36 (–0.71, –0.01) | .042 |
aSpearman test.
bLinear regression Model 1 with age, hypertension, proteinuria, M1, S1, predicted cortical fibrosis, predicted total cortical inflammation, predicted ptc >2.
cLinear regression Model 2 with age, hypertension, proteinuria, M1, S1, predicted cortical fibrosis, predicted ptc >12, predicted leukocyte density.
dLinear regression Model 2 with age, hypertension, proteinuria, M1, S1, predicted cortical fibrosis, predicted total cortical inflammation, predicted ptc >2, predicted leukocyte density.
M, mesangial hypercellularity score; E, endocapillary hypercellularity score; S, sclerosis score; C, crescent score.
P-values of the factors statistically associated with the eGFR are bolded.
N = 89 . | Univariatea . | Multivariate . | Multivariate . | Multivariate . | ||||
---|---|---|---|---|---|---|---|---|
. | . | Model 1b . | Model 2c . | Model 3d . | ||||
Data . | r (95% CI) . | P-value . | Beta (95% CI) . | P-value . | Beta (95% CI) . | P-value . | Beta (95% CI) . | P-value . |
Age (per years) | –0.64 (–0.75, –0.49) | <.001 | –0.45 (–0.63, –0.26) | <.001 | –0.45 (–0.63, –0.27) | <.001 | –0.45 (–0.63, –0.27) | <.001 |
Male sex | –0.03 (–0.25, 0.18) | .882 | ||||||
Hypertension | –0.38 (–0.55, –0.18) | <.001 | –0.07 (–0.24, 0.10) | .419 | –0.09 (–0.25, 0.08) | .299 | –0.09 (–0.26, 0.08) | .292 |
Diabetes mellitus | –0.14 (–0.34, 0.08) | .118 | ||||||
Renin–angiotensin–aldosterone blockers | –0.10 (–0.31, 0.12) | .343 | ||||||
Proteinuria (per 0.1 g/day) | –0.36 (–0.53, –0.15) | <.001 | –0.03(–0.17, 0.11) | .668 | –0.02 (–0.16, 0.12) | .770 | –0.01 (–0.15, 0.14) | .906 |
M1 | –0.34 (–0.51, –0.13) | .001 | –0.07 (–0.21, 0.09) | .287 | –0.07 (–0.22, 0.08) | .344 | –0.08 (–0.23, 0.07) | .270 |
E1 | –0.16 (–0.36, 0.06) | .131 | ||||||
S1 | –0.23 (–0.42, –0.01) | .035 | 0.03 (–0.12, 0.18) | .653 | 0.03 (–0.11, 0.18) | .655 | 0.03 (–0.12, 0.18) | .679 |
Predicted cortical fibrosis (per %) | –0.68 (–0.78, –0.54) | <.001 | –0.18 (–0.46, 0.09) | .197 | –0.32 (–0.49, –0.15) | <.001 | –0.47 (–0.86, –0.08) | .019 |
C >1 | –0.19 (–0.38, 0.02) | .080 | ||||||
Predicted Total cortical inflammation (per %) | –0.67 (–0.77, –0.53) | <.001 | –0.28 (–0.55, 0.00) | .054 | 0.24 (–0.33, 0.81) | .396 | ||
Predicted ptc >2 | –0.21 (–0.41, –0.01) | .046 | 0.10 (–0.05,0.25) | .207 | 0.14 (–0.01, 0.30) | .072 | 0.16 (–0.00, 0.32) | .055 |
Predicted leukocyte density (per cell/mm2) | –0.47 (–0.62, –0.28) | <.001 | –0.23 (–0.40, –0.06) | .007 | –0.36 (–0.71, –0.01) | .042 |
N = 89 . | Univariatea . | Multivariate . | Multivariate . | Multivariate . | ||||
---|---|---|---|---|---|---|---|---|
. | . | Model 1b . | Model 2c . | Model 3d . | ||||
Data . | r (95% CI) . | P-value . | Beta (95% CI) . | P-value . | Beta (95% CI) . | P-value . | Beta (95% CI) . | P-value . |
Age (per years) | –0.64 (–0.75, –0.49) | <.001 | –0.45 (–0.63, –0.26) | <.001 | –0.45 (–0.63, –0.27) | <.001 | –0.45 (–0.63, –0.27) | <.001 |
Male sex | –0.03 (–0.25, 0.18) | .882 | ||||||
Hypertension | –0.38 (–0.55, –0.18) | <.001 | –0.07 (–0.24, 0.10) | .419 | –0.09 (–0.25, 0.08) | .299 | –0.09 (–0.26, 0.08) | .292 |
Diabetes mellitus | –0.14 (–0.34, 0.08) | .118 | ||||||
Renin–angiotensin–aldosterone blockers | –0.10 (–0.31, 0.12) | .343 | ||||||
Proteinuria (per 0.1 g/day) | –0.36 (–0.53, –0.15) | <.001 | –0.03(–0.17, 0.11) | .668 | –0.02 (–0.16, 0.12) | .770 | –0.01 (–0.15, 0.14) | .906 |
M1 | –0.34 (–0.51, –0.13) | .001 | –0.07 (–0.21, 0.09) | .287 | –0.07 (–0.22, 0.08) | .344 | –0.08 (–0.23, 0.07) | .270 |
E1 | –0.16 (–0.36, 0.06) | .131 | ||||||
S1 | –0.23 (–0.42, –0.01) | .035 | 0.03 (–0.12, 0.18) | .653 | 0.03 (–0.11, 0.18) | .655 | 0.03 (–0.12, 0.18) | .679 |
Predicted cortical fibrosis (per %) | –0.68 (–0.78, –0.54) | <.001 | –0.18 (–0.46, 0.09) | .197 | –0.32 (–0.49, –0.15) | <.001 | –0.47 (–0.86, –0.08) | .019 |
C >1 | –0.19 (–0.38, 0.02) | .080 | ||||||
Predicted Total cortical inflammation (per %) | –0.67 (–0.77, –0.53) | <.001 | –0.28 (–0.55, 0.00) | .054 | 0.24 (–0.33, 0.81) | .396 | ||
Predicted ptc >2 | –0.21 (–0.41, –0.01) | .046 | 0.10 (–0.05,0.25) | .207 | 0.14 (–0.01, 0.30) | .072 | 0.16 (–0.00, 0.32) | .055 |
Predicted leukocyte density (per cell/mm2) | –0.47 (–0.62, –0.28) | <.001 | –0.23 (–0.40, –0.06) | .007 | –0.36 (–0.71, –0.01) | .042 |
aSpearman test.
bLinear regression Model 1 with age, hypertension, proteinuria, M1, S1, predicted cortical fibrosis, predicted total cortical inflammation, predicted ptc >2.
cLinear regression Model 2 with age, hypertension, proteinuria, M1, S1, predicted cortical fibrosis, predicted ptc >12, predicted leukocyte density.
dLinear regression Model 2 with age, hypertension, proteinuria, M1, S1, predicted cortical fibrosis, predicted total cortical inflammation, predicted ptc >2, predicted leukocyte density.
M, mesangial hypercellularity score; E, endocapillary hypercellularity score; S, sclerosis score; C, crescent score.
P-values of the factors statistically associated with the eGFR are bolded.
Transplant biopsies
Patients from the application cohort with transplant biopsies were analyzed to assess the predictive ability of neural networks on transplant inflammation and fibrosis. Patients’ characteristics are described in Table 4. The area under the ROC curve for the neural networks’ percentage of total inflammation to predict cellular rejections was 0.84 (95% CI 0.71–0.96, P < .0001). The area under the ROC curve for the neural networks’ leukocyte count in the most affected capillary to predict humoral rejection was 0.70 (95% CI 0.54–0.86, P = .036). To evaluate the accuracy of fibrosis detection by the second neural network, we compared its results to those of visual evaluation of Sirius Red–stained biopsies (Supplementary data, Fig. S2). Of the 30 biopsies with enough material to be stained with Sirius Red, the mean fibrosis percentage was 34 ± 24% with Sirius Red and 32 ± 12% with neural networks. The ĸ coefficients between the Sirius Red and the neural networks’ scores were 0.81, 0.61 and 0.60 for ci ≥1, ci ≥2 and ci3, respectively (with all P < .0001).
Characteristics of patients from the application cohort with a transplant biopsy.
Data . | Patients (N = 40) . |
---|---|
Age (years) | 51 ± 14 |
Male sex | 23 (58) |
Diabetes mellitus | 14 (35) |
Hypertension | 30 (75) |
Serum creatinine level (mg/dL) | 253 ± 149 |
eGFR at biopsy (mL/min/1.73 m2) | 31 ± 19 |
Proteinuria (g/day) | 1.4 ± 2.1 |
Graft rejection at biopsy | 32 (80) |
Antibody-mediated rejection | 12 (30) |
T-cell-mediated rejection | 16 (40) |
Mixed rejection | 3 (8) |
Borderline rejection | 1 (3) |
Percentage of non–globally sclerotic glomeruli (%) | 89 ± 19 |
Interstitial fibrosis (%) | 34 ± 21 |
Inflammation outside fibrosis area (%) | 32 ± 24 |
Inflammation in fibrosis area (%) | 42 ± 26 |
Total cortical inflammation (%) | 38 ± 23 |
ti1 | 3 (8) |
ti2 | 18 (45) |
ti3 | 11 (28) |
ptc1 | 3 (8) |
ptc2 | 9 (23) |
ptc3 | 19 (48) |
Total cortical inflammation predicted (%) | 38 ± 18 |
Interstitial fibrosis predicted (%) | 32 ± 13 |
ti1 (predicted) | 0 |
ti2 (predicted) | 8 (20) |
ti3 (predicted) | 20 (50) |
Leukocytes in the most affected capillary (cell) | 9 ± 7 |
ptc1 (predicted) | 4 (10) |
ptc2 (predicted) | 10 (25) |
ptc3 (predicted) | 18 (45) |
Data . | Patients (N = 40) . |
---|---|
Age (years) | 51 ± 14 |
Male sex | 23 (58) |
Diabetes mellitus | 14 (35) |
Hypertension | 30 (75) |
Serum creatinine level (mg/dL) | 253 ± 149 |
eGFR at biopsy (mL/min/1.73 m2) | 31 ± 19 |
Proteinuria (g/day) | 1.4 ± 2.1 |
Graft rejection at biopsy | 32 (80) |
Antibody-mediated rejection | 12 (30) |
T-cell-mediated rejection | 16 (40) |
Mixed rejection | 3 (8) |
Borderline rejection | 1 (3) |
Percentage of non–globally sclerotic glomeruli (%) | 89 ± 19 |
Interstitial fibrosis (%) | 34 ± 21 |
Inflammation outside fibrosis area (%) | 32 ± 24 |
Inflammation in fibrosis area (%) | 42 ± 26 |
Total cortical inflammation (%) | 38 ± 23 |
ti1 | 3 (8) |
ti2 | 18 (45) |
ti3 | 11 (28) |
ptc1 | 3 (8) |
ptc2 | 9 (23) |
ptc3 | 19 (48) |
Total cortical inflammation predicted (%) | 38 ± 18 |
Interstitial fibrosis predicted (%) | 32 ± 13 |
ti1 (predicted) | 0 |
ti2 (predicted) | 8 (20) |
ti3 (predicted) | 20 (50) |
Leukocytes in the most affected capillary (cell) | 9 ± 7 |
ptc1 (predicted) | 4 (10) |
ptc2 (predicted) | 10 (25) |
ptc3 (predicted) | 18 (45) |
Quantitative data are expressed as numbers (%) and semi-quantitative data as mean ± standard deviation.
Characteristics of patients from the application cohort with a transplant biopsy.
Data . | Patients (N = 40) . |
---|---|
Age (years) | 51 ± 14 |
Male sex | 23 (58) |
Diabetes mellitus | 14 (35) |
Hypertension | 30 (75) |
Serum creatinine level (mg/dL) | 253 ± 149 |
eGFR at biopsy (mL/min/1.73 m2) | 31 ± 19 |
Proteinuria (g/day) | 1.4 ± 2.1 |
Graft rejection at biopsy | 32 (80) |
Antibody-mediated rejection | 12 (30) |
T-cell-mediated rejection | 16 (40) |
Mixed rejection | 3 (8) |
Borderline rejection | 1 (3) |
Percentage of non–globally sclerotic glomeruli (%) | 89 ± 19 |
Interstitial fibrosis (%) | 34 ± 21 |
Inflammation outside fibrosis area (%) | 32 ± 24 |
Inflammation in fibrosis area (%) | 42 ± 26 |
Total cortical inflammation (%) | 38 ± 23 |
ti1 | 3 (8) |
ti2 | 18 (45) |
ti3 | 11 (28) |
ptc1 | 3 (8) |
ptc2 | 9 (23) |
ptc3 | 19 (48) |
Total cortical inflammation predicted (%) | 38 ± 18 |
Interstitial fibrosis predicted (%) | 32 ± 13 |
ti1 (predicted) | 0 |
ti2 (predicted) | 8 (20) |
ti3 (predicted) | 20 (50) |
Leukocytes in the most affected capillary (cell) | 9 ± 7 |
ptc1 (predicted) | 4 (10) |
ptc2 (predicted) | 10 (25) |
ptc3 (predicted) | 18 (45) |
Data . | Patients (N = 40) . |
---|---|
Age (years) | 51 ± 14 |
Male sex | 23 (58) |
Diabetes mellitus | 14 (35) |
Hypertension | 30 (75) |
Serum creatinine level (mg/dL) | 253 ± 149 |
eGFR at biopsy (mL/min/1.73 m2) | 31 ± 19 |
Proteinuria (g/day) | 1.4 ± 2.1 |
Graft rejection at biopsy | 32 (80) |
Antibody-mediated rejection | 12 (30) |
T-cell-mediated rejection | 16 (40) |
Mixed rejection | 3 (8) |
Borderline rejection | 1 (3) |
Percentage of non–globally sclerotic glomeruli (%) | 89 ± 19 |
Interstitial fibrosis (%) | 34 ± 21 |
Inflammation outside fibrosis area (%) | 32 ± 24 |
Inflammation in fibrosis area (%) | 42 ± 26 |
Total cortical inflammation (%) | 38 ± 23 |
ti1 | 3 (8) |
ti2 | 18 (45) |
ti3 | 11 (28) |
ptc1 | 3 (8) |
ptc2 | 9 (23) |
ptc3 | 19 (48) |
Total cortical inflammation predicted (%) | 38 ± 18 |
Interstitial fibrosis predicted (%) | 32 ± 13 |
ti1 (predicted) | 0 |
ti2 (predicted) | 8 (20) |
ti3 (predicted) | 20 (50) |
Leukocytes in the most affected capillary (cell) | 9 ± 7 |
ptc1 (predicted) | 4 (10) |
ptc2 (predicted) | 10 (25) |
ptc3 (predicted) | 18 (45) |
Quantitative data are expressed as numbers (%) and semi-quantitative data as mean ± standard deviation.
DISCUSSION
We developed and evaluated a tool that automatically grades the total inflammation and peritubular capillaritis. This tool had a great ability to detect interstitial leukocytes and peritubular capillaries. The reproducibility between pathologists and the network for ti and ptc scores were both substantial. The tool's predictions appeared to correlate with IgAN patients’ baseline eGFR and with the diagnosis of rejection in transplantation.
A total of 423 samples of various diseases from two different centers were included. This sample diversity and the external validation may reduce the impact of image alterations linked to the variability in conditioning, staining protocols and digitalization. Included biopsies contained close to normal tissues, such as in minimal change diseases and nephrectomy samples, as well as highly inflammatory tissues such as in T-cell-mediated rejections and TINs, but also an intermediate level of inflammation such as in IgAN and borderline rejections [7, 8]. This heterogeneity of inflammation severity could enhance the generalization of the tool application in many kidney diseases.
Few studies have focused on automated quantification of the interstitial area with classical unlabeled stains. Hermsen et al. have developed a tool that is able to estimate the interstitial surface [22]. Even if this surface was correlated to pathologists’ ti scores (r = 0.71), this was only the statistical association between interstitial fibrosis/edema and total inflammation, as also described in the current study (r = 0.77). Our tool provided a more reliable measurement of total inflammation through leukocyte segmentation. Yi et al. have performed a similar leukocyte detection algorithm in a kidney transplant study, but their correlation with the total inflammation score was limited due to the low number of biopsies with significant inflammation [31]. Our tool had a good ability to assess ti scores, with all areas under the ROC curve >0.94, and a strong correlation between the predicted and observed percentages of inflammation (r = 0.89). This accuracy could partly be explained by the high number of leukocytes annotated for training. This number of objects was higher than in most nephropathology studies with deep learning [22, 32–34]. We used the same deep learning network as Yi et al. and our previous work [21, 31]. This convolutional neural network was designed for the segmentation of small objects as it was pre-trained with the recognition of cell nuclei [35].
In the Jayapandian et al. study, periodic acid–Schiff was concluded to be the best stain for peritubular capillaries segmentation due to better detection of vascular membranes [32]. With an F-score of 81%, the precision was close to ours with Masson's trichrome (82%). Pretreatment by removing tubular, vascular and glomerular structures may explain our similar scores, even with a slightly lower number of capillaries trained [32]. An evaluation with periodic acid–Schiff stain would probably have helped to generalize our results, as it is likely the most commonly used staining technique for kidney histological analysis. We used Masson's trichrome staining in our study, as our previously published neural networks for evaluating fibrosis were only trained and evaluated on this staining [21, 36]. In many institutions, Masson's trichrome is the preferred stain used to quantify the matrix [37–39]. Masson's trichrome allows screening for interstitial fibrosis and inflammation, and helps with the recognition of interstitial edema (having a pale green or blue appearance) [40]. However, the evaluation of fibrosis by Masson's trichrome is subject to variability between pathology centers due to the sensitivity of the dyes to the duration of formalin fixation [37]. Other stains, such as unpolarized Sirius Red and Collagen III immunohistochemistry, are believed to be more specific for interstitial fibrosis as they bind to collagen fibers [37, 39]. However, these techniques are time-consuming, expensive and less widely available. Moreover, the assessment of fibrosis by our neural networks does not depend on the color intensity of the staining. This evaluation corresponds to the cortical area not recognized as tubular, glomerular or vascular structures. Similar training based on instance segmentation with these stains would probably not improve fibrosis recognition. Among the transplant biopsies, we observed good interobserver reliability between our neural network and unpolarized Sirius Red.
To our knowledge, the evaluation of capillaritis had never been carried out with deep learning. We observed a substantial diagnostic accuracy. Most of the errors were linked to endothelial cells identified as leukocytes, and a lack of detection in the most inflammatory areas. A larger training set including more highly inflammatory TIN biopsies could improve those results. Nonetheless, our reproducibility for ptc scores was higher than that observed between trained pathologists in this and another study [14].
As previously described, baseline kidney function in IgAN was linked to interstitial inflammation severity [7–10]. CD20-positive B cells are thought to form the main cell population infiltrating the interstitial area in IgAN [41]. These B lymphocytes promote fibrosis, inflammation and kidney destruction through their secretions of cytokines and chemokines [42]. In its initial definition, the MEST-C classification had not retained the evaluation of the interstitial infiltrate because of its low reproducibility among pathologists [43]. In univariate analysis, the percentage of total inflammation score was strongly correlated to eGFR. This suggests the necessity of a precise measurement of inflammation to accurately reflect disease severity. Unlike inter-observer evaluations by pathologists, the deep learning tools’ predictions have high inner reproducibility [19]. If the prognostic impact of interstitial inflammation in IgAN is confirmed, we could imagine the standardization of this evaluation by deep learning in a dedicated classification. Of note, in multivariate analysis apart from the fibrosis evaluation, the MEST-C score was no longer correlated with kidney function at biopsy. We also evaluated a new interstitial inflammation criterion, the interstitial leukocyte density. This evaluation might limit the impact of fibrosis and edema on the inflammation assessment. This density cannot be calculated routinely by a pathologist as it requires a comprehensive assessment of cortical, glomerular, vascular and leukocyte areas. In the multivariate analysis, leukocyte density appeared to be better associated with eGFR compared with the percentage of total inflammation. Thus, this precise evaluation may be a stronger marker of interstitial inflammation than the estimation of the surface of interstitial inflammation related to the cortical area. Nonetheless, this work did not assess whether this method was more related to kidney prognosis than the percentage of total inflammation, or not. As the purpose of this study was to automatize the grading of histological markers, we did not study patients’ kidney function over time. These prognostic performances need to be evaluated in another study with a higher number of IgAN patients.
The percentage of predicted cortical inflammation and the leukocyte count in the most affected capillary were respectively associated with the T-cell- and antibody-mediated rejections diagnoses. However, as the number of normal biopsies in this subgroup of kidney transplant was low, and as rejections diagnoses do not solely depend on those histological lesions (including also, among others, C4d staining, glomerulitis, tubulitis, vascular lesions and donor-specific antibodies), the interpretation of the predictive values should be done with caution [5]. Since our tool is not a rejection classifier as in the Kers et al. study, its diagnostic capacity is also necessarily lower [44].
This study had some limitations. First, the application cohort was designed to assess a wide range of inflammatory lesions, which led to a higher proportion of biopsy samples with inflammatory and fibrosis lesions than the other groups, resulting in fewer normal transplant biopsies. Then, our tool only allows a total inflammation score and therefore cannot separately evaluate the inflammation inside and outside of the fibrosis areas as in i-IFTA and i scores [5, 45]. Even though the ti score seems to be a better reflection of the patient's kidney prognosis, another study is mandatory to select these areas as well as tubulitis lesions [18, 46, 47].
While our tool could limit costs and gain time, it could not separate leukocyte subclasses. Even polymorphonuclear leukocytes were not isolated from the other ones as the degree of uncertainty in labeling was too high with the image resolution. Hermsen et al. previously carried out a deep learning study on transplant biopsies using a multiplex immunohistochemistry technology to classify leukocyte subclasses [33]. This method allows multiple immunostainings to be carried out sequentially in the same section. However, labeling techniques are time-consuming and expensive. Although a comparison with immunohistochemistry could have provided additional validation, this assessment would have been carried out on a different section from Masson's trichrome. Immunohistochemistry would not have ensured that the recognized cells were indeed leukocytes and not fibroblasts or endothelial cells.
In conclusion, we developed a tool using deep learning that scores the total inflammation and capillaritis, demonstrating the potential of artificial intelligence in kidney pathology.
FUNDING
This work was funded by the NEPHRIN-APJ2019 (Appel d'offre jeunes chercheurs) GIRCI EST (47755 euros) (Mathieu Legendre).
AUTHORS’ CONTRIBUTIONS
A.Jacq, G.T., M.P., A.Jaugey, E.M., L.M, J.-M.R. and M.L. were responsible for conception, analysis and interpretation of data. A.Jacq, E.M., A.Jaugey and M.L. drafted the article. A.Jacq, G.T., L.M. and M.L. were responsible for histological digitalization and/or analyze. Deep learning algorithms programming and evaluations were carried out by A.Jaugey, M.P., M.A. and P.B. M.A., P.B., D.C., J.B., D.D., T.C., M.C., M.F.V. and S.F. helped with data acquisition and analysis. C.T. provided intellectual content of critical importance to the work described. All authors gave final approval of the version to be published.
DATA AVAILABILITY STATEMENT
The data underlying this article will be shared on reasonable request to the corresponding author. The neural networks are freely available (https://github.com/AdrienJaugey/Mask-R-CNN-Inference-Tool).
CONFLICT OF INTEREST STATEMENT
None declared.
Comments