Comparison of Endoscopic and Artificial Intelligence Diagnoses for Predicting the Histological Healing of Ulcerative Colitis in a Real-World Clinical Setting

Abstract Background Artificial intelligence (AI)-assisted colonoscopy systems with contact microscopy capabilities have been reported previously; however, no studies regarding the clinical use of a commercially available system in patients with ulcerative colitis (UC) have been reported. In this study, the diagnostic performance of an AI-assisted ultra-magnifying colonoscopy system for histological healing was compared with that of conventional light non-magnifying endoscopic evaluation in patients with UC. Methods The data of 52 patients with UC were retrospectively analyzed. The Mayo endoscopic score (MES) was determined by 3 endoscopists. Using the AI system, healing of the same spot assessed via MES was defined as a predicted Geboes score (GS) < 3.1. The GS was then determined using pathology specimens from the same site. Results A total of 191 sites were evaluated, including 159 with a GS < 3.1. The MES diagnosis identified 130 sites as MES0. A total of 120 sites were determined to have healed based on AI. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy of MES0 for the diagnosis of GS < 3.1 were 79.2%, 90.6%, 97.7%, 46.8%, and 81.2%, respectively. The AI system performed similarly to MES for the diagnosis of GS < 3.1: sensitivity, 74.2%; specificity: 93.8%; PPV: 98.3%; NPV: 42.3%; and accuracy: 77.5%. The AI system also significantly identified a GS of < 3.1 in the setting of MES1 (P = .0169). Conclusions The histological diagnostic yield the MES- and AI-assisted diagnoses was comparable. Healing decisions using AI may avoid the need for histological examinations.


Introduction
Ulcerative colitis (UC) is a refractory intestinal disorder caused by a combination of mechanisms, including immunological mechanisms. 1 The number of patients with UC in Japan is increasing. 2 Several patients with UC have recurrent or chronic persistent intestinal inflammation.][5] Achieving endoscopic mucosal remission is a therapeutic goal for UC that can help avoid these complications.The Selecting Therapeutic Targets in Inflammatory Bowel Disease-II initiative, proposed by the International Organization for the Study of Inflammatory Bowel Disease, recently recommended that endoscopic remission be a therapeutic goal to achieve the higher goals of improved quality of life and disappearance of disability. 6The Mayo endoscopic score (MES) is used in the endoscopic evaluation of UC. 7,8 Endoscopic mucosal remission is often defined by an MES of 0 or 1. 9,10 However, endoscopic assessment involves a subjective component and is prone to variability. 8,11,12Histologic healing has been reported as a more advanced therapeutic goal for UC [13][14][15][16] and has the potential to demonstrate mucosal healing more objectively than an endoscopic evaluation.However, the determination of histologic healing requires invasive biopsies.
The EndoBRAIN-UC system (Cybernet Systems) is a fully automated diagnostic system with artificial intelligence (AI) that uses endocytoscopy to identify the presence of histologic inflammation associated with UC.The AI system analyzes features such as invisibility, dilation, and hyperplasia of capillaries in the colonic mucosa via ultra-magnification endoscopic observation using narrow band imaging (NBI) 17 and enables the histological evaluation of UC via the determination of a Geboes score (GS). 18An AI-assisted diagnosis of histological healing, based on a GS < 3.1, may reduce unnecessary biopsies; however, there are no published reports regarding the clinical use of the AI system in patients with UC.This study compared the diagnostic performance for histological healing of the AI-assisted EndoBRAIN-UC system with that of conventional light non-magnifying endoscopic evaluations in patients with UC.

Materials and Methods
This retrospective study was conducted at Tokyo Women's Medical University from June to November 2021.Consecutive patients who met the diagnostic criteria for UC in Japan 1 and underwent a total colonoscopy in a laboratory equipped with an ultra-magnifying endoscope were included in this study.Therefore, patients in the non-remitting phase were included.However, patients with severe symptomatic UC were excluded due to the potential physical burden of total colonoscope observation and the extended examination time.Patients who did not undergo a biopsy at the time of AI-assisted diagnosis were also excluded from the study.Nonmagnified observations using white light were used to determine an MES diagnosis.Simultaneously, the AI system was used to diagnose the same site at which the MES diagnosis was obtained, and a biopsy was also performed.The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy of the MES-and AI-assisted diagnoses were calculated using the biopsy pathology results as the gold standard.The results of the AI-assisted diagnosis for each MES value (0, 1, 2, and 3) were compared with the biopsy pathology results.

MES Diagnoses
All patients underwent colonoscopy using the usual preparation of bowel-cleansing agents and an Olympus CF-H290ECI (Olympus) colonoscopy device.The colonoscopies and MES diagnoses were performed by 3 physicians with at least 10 years of endoscopic experience in patients with UC.The MES was agreed upon by the endoscopists using images of the biopsy site after colonoscopy.Discrepancies were resolved via discussion.MES0 was defined as mucosa in endoscopic remission.

AI-Assisted Diagnoses
Each AI-assisted diagnosis was performed at the same site as the MES diagnosis using conventional endoscopy.The AI system was connected directly to an endoscopic system (EVIS LUCERA ELITE, Olympus).The latest version of the AI system (EndoBRAIN-UC) was recently approved in Japan, and each AI-assisted diagnosis was performed using the ultramagnification function of the Olympus CF-H290ECI, a scope with a 12.8-mm-dia.tip, which provides a maximum magnification of 520×.
The NBI observation mode was used for the colonic mucosa in which the inflammatory activity was evaluated.The endoscope was then set to the maximum magnification (520×), and an ultra-magnified endoscopic image was acquired.When the image was captured, the program automatically analyzed the image, and the results of the analysis were displayed on a computer screen.The predicted output of the

Pathological Diagnoses
Each pathological diagnosis was made using biopsies of the same mucosal surface imaged for the MES-and AI-assisted diagnoses.The pathological diagnosis was made by a single pathologist using prepared hematoxylin and eosin-stained specimens, with a final agreement with a second pathologist.The GS was used for pathological diagnoses (Supplementary Table S1).The GS subdivides grades according to morphological changes in the mucosal tissue and inflammatory cell infiltration.In this study, histological healing was defined as a GS < 3.1 with no histological erosions, ulcers, or crypt neutrophil infiltration.

Statistical Analyses
All data are expressed as median and interquartile range (IQR).The sensitivity, specificity, PPV, NPV, accuracy, and precision of the diagnostic methods were determined using Fisher's test and a 2 × 2 table.JMP statistical analysis software (version 16; SAS) was used for all analyses.

Ethical Considerations
The study protocol was approved by the Institutional Ethics Review Committee of our hospital on January 7, 2023 (IRB number 2022-0123).Based on the retrospective nature of the study, all patients were offered the opportunity to refuse treatment.A public announcement was posted on our website on January 7, 2023, as approved by the ethics committee.

Diagnostic Yields of MES and AI for Pathology
MES had a sensitivity of 79.2%, specificity of 90.6%, PPV of 97.7%, NPV of 46.8%, and accuracy of 81.2% for the diagnosis of GS < 3.1.The AI system had a sensitivity of 74.2%, specificity of 93.8%, PPV of 98.3%, NPV of 42.3%, and accuracy of 77.5% for the diagnosis of GS < 3.1 (Tables 3 and 4).

Comparison of AI-Assisted Diagnosis and Pathology for Each MES Value
For all MES values, there were both "Healing" and "Active" decisions based on AI-assisted diagnosis.Among the MES0 lesions, the AI system diagnosed 83.7% as the Healing decision.In MES0, there was no significant difference in the percentage of GS < 3.1 in the Healing and the Active decisions based on AI-assisted diagnosis.Among the MES2 lesions, the AI system diagnosed 92.9% as Active decision.In MES2, there was also no significant difference in the percentage of GS < 3.1 regardless of the result of AI-assisted diagnosis.Among the MES1 lesions, 29.4% were classified as the Healing decision and 70.6% as the Active decision.In MES1, the Healing decision with AI-assisted diagnosis identified significantly more GS < 3.1 than did the Active decision with AI-assisted diagnosis.(P = .0169)(Table 5 and Figure 1).

Discussion
The EndoBRAIN-UC system became commercially available following a study by Maeda et al. 18 regarding the use of AI in the management of UC.This system detects the GS score via a histological evaluation of UC based on endoscopy that is capable of ultra-magnified observation.Maeda et al. reported different relapse rates at 1 year for lesions reported as active or healing by the AI system. 19However, the previous study was conducted in a research and development facility; no reports of the real-world clinical utility of ultra-magnified endoscopic observation of UC using commercially available AI systems have been reported.
Histological evaluations based on the capillary structure of the mucosa obtained using automated evaluation systems have been reported, 20 though no studies have demonstrated the usefulness of the system in actual clinical practice.2][23] This study compares the diagnostic yield of the commercially available EndoBRAIN-UC system with that of MES in a real-world clinical setting.
In this study, the diagnostic yield for a GS < 3.1 was similar between MES (using white-light observation) and the AI.The diagnostic yield of the AI-assisted diagnosis was equivalent to that in the report by Maeda et al. 18 with the exception of the NPV, confirming the high reproducibility of the AI-assisted diagnosis in clinical settings.
The NPV in the current study was lower than previously reported NPVs as this investigation was conducted in a realworld setting and fewer MES ≥ 2 specimens were endoscopically classified as inflammatory.Additionally, the results of the AI system used in this study do not fully reflect the pathological results as the GS is based on various factors including inflammatory cell infiltration and crypt destruction. 13mong the present 52 cases, 83.7% of the MES0 cases were judged to be Healing by the AI-assisted diagnosis, and 97.7% of the MES0 cases were GS < 3.1 by histological diagnosis.On the other hand, 92.9% of the MES2 cases were also judged as Active in the AI-assisted diagnosis, and only 32.1% of the MES2 cases were judged as GS < 3.1 by histological diagnosis.Therefore, an AI-assisted diagnosis may not be necessary when MES0 or MES2 can be clearly determined using unmagnified white light observation.Unnecessary ultramagnification should be avoided as it is a more technical procedure that may increase the examination time compared to conventional endoscopy. 19Although an endoscopic diagnosis involves subjectivity and may lead to divided judgment, MES0 and MES2 are unlikely to be confused. 8,11,12In addition, an AI system that provides MES classifications using unmagnified white light may be commercially available in the future. 22,23ifferent prognoses for subsequent relapses have been reported for MES0 and MES1. 9,24,25However, MES1 lesions often do not relapse.Although the presence of histological inflammation plays a role in relapse, [13][14][15][16] there may be intervening histological differences in the mucosa classified as MES1 (Figure 2).In the present study, the AI system reported a GS < 3.1 in MES1 lesions as healing, suggesting that this AI-based diagnosis may help determine differences in histological inflammation and the subsequent risk of relapse.
Histological determinations are conducted to evaluate inflammation and to diagnose neoplastic lesions.However, in  cases where the purpose of biopsy is to evaluate inflammation, tissue sampling can be reduced if inactive mucosa can be identified without biopsy.The use of AI-assisted diagnosis may reduce unnecessary biopsies for the diagnosis of MES1, although prospective studies with relapse as an outcome are required to test this possibility.This study has several limitations.This was a single-center, retrospective analysis of a small number of patients.This approach reveals the degree of inflammation but does not contribute to the detection of dysplasia.It is also unclear whether treatment interventions affect AI-assisted diagnoses, and further studies are required.In addition, ultra-magnification requires technical proficiency, which may have affected the results.However, the AI-assisted diagnoses with ultramagnification were all performed by the same experienced endoscopists.The AI-assisted diagnostic protocol also included obtaining multiple images from the same site, and the most reproducible results were used.This reduces the technical influence as much as possible.

Conclusion
In conclusion, MES and AI-assisted diagnoses have similar diagnostic yields for a GS < 3.1.An AI-based diagnosis of MES1 may reduce the need for biopsies for histologic examination.

Figure 1 .
Figure 1.Comparison of Geboes score based on artificial intelligence (AI) diagnosis and Mayo endoscopic score (MES).The percentage of lesions with a Geboes score (GS) < 3.1 was significantly higher when the AI-assisted diagnosis was healing in MES1 lesions.In MES0 and MES2, the percentage of GS < 3.1 was not significantly different between the AI-assisted diagnoses.

Table 3 .
Comparison of Geboes score based on AI diagnosis and MES diagnosis.

Table 5 .
AI diagnostic performance and proportion of Geboes <3.1 in each endoscopic assessment score.
Data are presented as the number (%) of biopsy points.Abbreviations: AI, artificial intelligence; GS, Geboes score; MES, Mayo endoscopic score.