A novel fully automated MRI-based deep-learning method for classification of 1p/19q co-deletion status in brain gliomas

Abstract Background One of the most important recent discoveries in brain glioma biology has been the identification of the isocitrate dehydrogenase (IDH) mutation and 1p/19q co-deletion status as markers for therapy and prognosis. 1p/19q co-deletion is the defining genomic marker for oligodendrogliomas and confers a better prognosis and treatment response than gliomas without it. Our group has previously developed a highly accurate deep-learning network for determining IDH mutation status using T2-weighted (T2w) MRI only. The purpose of this study was to develop a similar 1p/19q deep-learning classification network. Methods Multiparametric brain MRI and corresponding genomic information were obtained for 368 subjects from The Cancer Imaging Archive and The Cancer Genome Atlas. 1p/19 co-deletions were present in 130 subjects. Two-hundred and thirty-eight subjects were non-co-deleted. A T2w image-only network (1p/19q-net) was developed to perform 1p/19q co-deletion status classification and simultaneous single-label tumor segmentation using 3D-Dense-UNets. Three-fold cross-validation was performed to generalize the network performance. Receiver operating characteristic analysis was also performed. Dice scores were computed to determine tumor segmentation accuracy. Results 1p/19q-net demonstrated a mean cross-validation accuracy of 93.46% across the 3 folds (93.4%, 94.35%, and 92.62%, SD = 0.8) in predicting 1p/19q co-deletion status with a sensitivity and specificity of 0.90 ± 0.003 and 0.95 ± 0.01, respectively and a mean area under the curve of 0.95 ± 0.01. The whole tumor segmentation mean Dice score was 0.80 ± 0.007. Conclusion We demonstrate high 1p/19q co-deletion classification accuracy using only T2w MR images. This represents an important milestone toward using MRI to predict glioma histology, prognosis, and response to treatment.

Genetic profiling and molecular subtyping of glial neoplasms have revolutionized our ability to optimize therapeutic strategies and enhance prognostic accuracy. Perhaps, the most compelling evidence supporting this paradigm is the 2016 revision of the World Health Organization's (WHO) classification of gliomas which now includes genetic analysis. The impact of glioma reclassification based on molecular profiling has subsequently been studied and 3 genetic alterations have been extensively validated: O-6-methylguanine-DNA methyltransferase (MGMT), isocitrate dehydrogenase (IDH), and 1p/19q co-deletion status. 1 MGMT is a DNA repair enzyme that protects normal and glioma cells from alkylating chemotherapeutic agents. Mutations that result in methylation of the MGMT promoter result in loss of function of the enzyme and its protective effect. Mutations of IDH alter the function of the enzyme to produce D-2-hydroxyglutarate instead of α-ketoglutarate. This altered function results in increased sensitivity of the glioma to radiation and chemotherapy. Gliomas that are IDH mutated can be further divided into gliomas with or without a 1p/19q co-deletion. The 1p/19q co-deletion is defined as the combined loss of the short arm of chromosome 1 (1p) and the long arm of chromosome 19 (19q). According to the 2016 WHO classification of gliomas, an IDH-mutated glioma with a 1p/19q co-deletion is classified as an oligodendroglioma, whereas an IDHmutated glioma without a 1p/19q co-deletion is classified as a diffuse astrocytoma. Oligodendrogliomas have a better prognosis when compared to diffuse astrocytomas. Additionally, even patients with an IDH-mutated anaplastic oligodendroglioma (WHO grade III) have a longer median overall survival than IDH-wild type, 1p/19q non-co-deleted, WHO grade II astrocytomas and are more responsive to chemotherapy. 2 Therefore, determination of 1p/19q status in IDH-mutated gliomas is critical for guiding therapy and predicting prognosis. Currently, the only reliable way to determine 1p/19q mutation status requires analysis of glioma tissue obtained either via an invasive brain biopsy or following open surgical resection. These diagnostic procedures carry the burden of implicit risk. Therefore, considerable attention has been dedicated to developing noninvasive, image-based diagnostic methods.
Recent advances in deep learning have led to a significant interest in advancing techniques for noninvasive, image-based molecular profiling of gliomas. Our group has previously demonstrated a highly accurate, MRI-based, voxel-wise deep-learning IDH-classification network using only T2-weighted (T2w) MR images. 3 T2w images facilitate clinical translation because they are routinely acquired, they can be obtained within 2 min, and high-quality T2w images can even be obtained in the presence of active patient motion. Because the current standard of care for IDH-mutated gliomas is heavily influenced by 1p/19q co-deletion status, the purpose of this study was to develop a highly accurate, fully automated deep-learning 3D network for 1p/19 co-deletion classification using T2w images only.

Data and Preprocessing
Multiparametric brain MRI data of glioma patients were obtained from The Cancer Imaging Archive (TCIA) database. 4,5 Genomic information was provided from both the TCIA and TCGA (The Cancer Genome Atlas) databases. [4][5][6] Only preoperative studies were used. Studies were screened for the availability of 1p/19q status and T2w image series. The final dataset of 368 subjects included 268 low-grade glioma (LGG, 130 co-deleted, 138 non-co-deleted) and 100 high-grade glioma (HGG, all nonco-deleted) subjects. TCGA subject IDs, 1p/19q co-deletion status, and tumor grade are listed in Supplementary Table  1. This study utilized data from the TCIA database including their diagnoses and designations of "low-grade" versus "high-grade" gliomas.
Tumor masks for 209 subjects were available through previous expert segmentation. 3,7,8 Tumor masks for the remaining 159 subjects were generated by the 3D-IDH network 3 and validated by in-house neuro-radiologists. The tumor masks were used as ground truth for tumor segmentation

Importance of the Study
One of the most important recent discoveries in brain glioma biology has been the identification of the isocitrate dehydrogenase mutation and 1p/19q co-deletion status as markers for therapy and prognosis. 1p/19q co-deletion is the defining genomic marker for oligodendrogliomas and confers a better prognosis and treatment response than gliomas without it. Currently, the only reliable way to determine 1p/19q mutation status requires analysis of glioma tissue obtained either via an invasive brain biopsy or following open surgical resection. The ability to noninvasively determine 1p/19q co-deletion status has significant implications in determining therapy and predicting prognosis. We developed a highly accurate, deep-learning network that utilizes only T2-weighted MR images and outperforms previously published image-based methods. The high classification accuracy of our T2w image-only network (1p/19q-net) in predicting 1p/19q co-deletion status marks an important step toward image-based stratification of brain gliomas. Imminent clinical translation is feasible because T2-weighted MR imaging is widely available and routinely performed in the assessment of gliomas.

Network Details
Transfer learning was performed with the previously trained 3D-IDH network for 1p/19q classification. 3 The decoder part of the network was fine-tuned for a voxel-wise dual-class segmentation of the whole tumor with Classes 1 and 2 representing 1p/19q co-deleted and 1p/19q non-co-deleted types, respectively. The schematics for the network architecture are shown in Figure 2B. A detailed description of the network is given in Figure 1 of the Supplementary Material section.

Network Implementation and Cross-validation
To generalize the reliability of the networks, a 3-fold crossvalidation was performed on the 368 subjects by randomly shuffling the dataset and distributing it into 3 groups (approximately 122 subjects for each group). During each fold of the cross-validation procedure, the 3 groups were alternated between training, in-training validation, and held-out testing. Group 1 had 122 subjects (43 co-deleted, 79 non-codeleted), Group 2 had 124 subjects (44 co-deleted, 80 nonco-deleted), and Group 3 had 122 subjects (43 co-deleted, 79 non-co-deleted). An in-training validation dataset helps the network improve its performance during training. Each fold of the cross-validation is a new training phase based on a unique combination of the 3 groups. However, network performance is only reported on the held-out testing group for each fold as it is never seen by the network. The group membership for each cross-validation fold is listed in Supplementary Table 1.
Seventy-five percent overlapping patches were extracted from the training and in-training validation subjects. No patch from the same subject was mixed with the training, in-training validation, or testing datasets in order to avoid the data leakage problem. 16,17 The data augmentation steps included vertical flipping, horizontal flipping, translation rotation, random rotation, addition of Gaussian noise, addition of salt and pepper noise, and projective transformation. Additionally, all images were down-sampled by 50% and 25% (reducing the voxel resolution to 2mm × 2mm × 2mm and 4mm × 4mm × 4mm) and added to the training and validation sets. Data augmentation provided a total of approximately 300 000 patches for training and 300 000 patches for in-training validation for each fold. Networks were implemented using Keras 18 and TensorFlow 19 with an Adaptive Moment Estimation optimizer (Adam). 20 The initial learning rate was set to 10 −5 with a batch size of 15 and maximal iterations of 100. Initial parameters were chosen based on previous work with Dense-UNets using brain imaging data and semantic segmentation. 3,21,22 1p/19q-net yields 2 segmentation volumes. Volume 1 provides the voxel-wise prediction of 1p/19q co-deleted tumor and Volume 2 identifies the predicted 1p/19q nonco-deleted tumor voxels. A single tumor segmentation map is obtained by fusing the 2 volumes and obtaining the largest connected component using a 3D connected component algorithm in MATLAB. Majority voting over the voxel-wise classes of co-deleted type or non-codeleted type provided a single 1p/19q classification for each subject. Networks were implemented on Tesla V100s, P100, P40, and K80 NVIDIA-GPUs. The 1p/19q classification process developed is fully automated, and a tumor segmentation map is a natural output of the voxel-wise classification approach.

Statistical Analysis
MATLAB and R were used for statistical analysis of the network's performance. Majority voting (ie, voxel-wise cutoff of 50%) was used to evaluate the accuracy of the network. The accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the model for each fold of the cross-validation procedure were calculated using this threshold. A receiver operating characteristic (ROC) curve was also generated for each fold. A Dice score was used to evaluate the performance of the networks for tumor segmentation. The Dice score calculates the amount of spatial overlap between the ground truth segmentation and the network segmentation.

ROC Analysis
The ROC curves for each cross-validation fold for the network are shown in Figure 3. The network demonstrated very good performance curves with high sensitivities and specificities.

Voxel-Wise Classification
Since the network is a voxel-wise classifier, it performs a simultaneous tumor segmentation. Figure 4A and B shows examples of the voxel-wise classification for a co-deleted type and non-co-deleted type, respectively, using the network. The volume fusion procedure was effective in removing false positives to increase accuracy. This procedure improved the Dice scores by approximately 4% for the network. We also computed the voxel-wise accuracy for the network. The mean voxel-wise accuracies were 85.86% ± 0.01 for non-co-deleted type and 80.51% ± 0.01 for co-deleted type.

Training and Segmentation Times
It took approximately 1 week to fine-tune the decoder portion of the network. The trained network took approximately 3 min to segment the whole tumor and predict the 1p/19qco-deletion status for each subject.

Discussion
We developed a fully automated, highly accurate, deeplearning network that outperforms previously reported 1p/19q co-deletion status classification algorithms. [23][24][25][26] When comparing our T2-network with previous work, our results suggest that algorithm accuracy can be improved by using T2w images only. Clinical translation becomes much simpler using only T2w images because these images are routinely acquired and are robust to motion. When compared to previously published algorithms, our methodology is fully automated. The time required for 1p/19q-net to segment the whole tumor and predict the 1p/19q co-deletion status for one subject is approximately 3 min on a K80, P40, P100, or V100s NVIDIA-GPU. The higher performance achieved by our network when compared to previous work is likely due to several factors. Similar to our IDH classification network, we employed 3D networks whereas prior attempts at 1p/19q co-deletion status classification have relied on 2D networks. 23 The dense connections in a 3D network architecture are advantageous because they carry information from all the previous layers to the following layers. 21 Additionally, 3D networks are easier to train and can reduce over-fitting. 27 As we previously reported, the dual volume fusion (DVF) postprocessing step helps in effectively eliminating false positives while improving the segmentation accuracy by excluding extraneous voxels not connected to the tumor. DVF improved the Dice scores by approximately 4% for the network. The 3D networks interpolate between slices to maintain inter-slice information more accurately. The network does not require extraction of pre-engineered features from the images or histopathological data. 28 Our approach also uses voxel-wise classifiers and provides a classification for each voxel in the image. This provides a simultaneous single-label tumor segmentation. Another factor that may explain the higher performance achieved by our network is that previous approaches required multi-contrast input which can be compromised due to patient motion from lengthier examination times, and the need for gadolinium contrast. High-quality T2w images are almost universally acquired during clinical brain tumor diagnostic

Neuro-Oncology Advances
evaluation. Clinically, T2w images are typically acquired within 2 min at the beginning of the exam and are relatively resistant to the effects of patient motion. Several of the previous 1p/19q deep-learning studies were trained and tested on only LGGs achieving accuracies ranging from 65.9% to 87.7%. [23][24][25] Our algorithm was trained and evaluated on the multi-institutional TCIA database 4 with a mix of HGG and LGG, which is a better representative of algorithm robustness, real-world performance, and potential clinical utilization.
In the clinical setting, histologic evaluation remains the gold standard for genetic profiling of gliomas. Several different methods to detect 1p/19q co-deletion have been employed: fluorescence in-situ hybridization (FISH), array comparative genomic hybridization, multiplex ligation-dependent probe amplification, and PCR-based loss of heterozygosity analysis. 29 FISH is the most routinely performed method. 30 FISH relies on fluorescentlabeled DNA probes to directly detect chromosomal abnormalities on a tissue slide in interphase nuclei. 31 The fraction of nuclei that demonstrate a deletion or relative deletion (in cases with polysomy) is summed and a percentage is calculated. 32 When the percentage of "deleted" nuclei exceeds a predetermined cutoff, the tumor is classified as 1p/19q co-deleted. 32 A drawback of FISH is that it lacks standardized criteria for analysis of 1p/19q co-deletion status. 30 For example, there is no consensus on what cutoff level to use when classifying co-deletion status. As a result, variability in institutional-based cutoff values can span from 20% to 70% and can affect accurate diagnosis. 32 This limitation affects the sensitivity, specificity, PPV, and NPV of 1p/19q detection by FISH based on the cutoff value selected. 32 There are interesting parallel considerations when studying our deep-learning method of 1p/19q determination. Our network is a voxel-wise classifier and as a result some portions within each glioma are classified as 1p/19 co-deleted while other areas are 1p/19q non-codeleted. The overall determination of 1p/19q co-deletion status is based on the majority of voxels in the tumor. Given the variability in the cutoff values for FISH detection of 1p/19q co-deletion, we performed a Youden's statistical index analysis to determine if the optimal cutoff for our deep-learning algorithm was different than majority voting (>50%). The analysis demonstrated that maximum accuracy, sensitivity, specificity, PPV, and NPV were obtained at an optimal cutoff of 50%, the same as majority voting.
The algorithm misclassified 24 cases: 12 subjects were misclassified as non-co-deleted and 12 as co-deleted. These 12 cases were subsequently reviewed but no common features resulting in misclassification were identified. Despite these misclassifications, our network achieved a mean cross-validation testing accuracy of 93.46% which is similar to what is reported for FISH. 32 However, our sensitivity, specificity, PPV, and NPV were significantly better than when compared to FISH. 30 While FISH requires tissue to be obtained from an invasive procedure and subsequent tissue processing for at least 48 h, our deep-learning algorithm can segment the entire glioma and provide a 1p/19q co-deletion status in 3 min. The deep-learning algorithm can also be fine-tuned to variations in institutional MRI scanners, while FISH analysis currently lacks standardization as mentioned above.
The limitations of our study are that deep-learning studies require large amounts of data and the relative number of subjects with 1p/19q co-deletions is small. Additionally, acquisition parameters and imaging vendor platforms vary across imaging centers that contribute data. Despite these caveats our algorithm demonstrated high 1p/19q co-deletion classification accuracy.

Conclusion
We demonstrate high 1p/19q co-deletion classification accuracy using only T2w MR images. This represents an important milestone toward using MRI to predict glioma histology, prognosis, and response to treatment.

Supplementary Data
Supplementary data are available at Neuro-Oncology Advances online.

Keywords
deep learning | glioma | 1p/19 co-deletion Funding Support for this research was provided by the National Institutes of Health/National Cancer Institute (U01CA207091 to A.J.M. and J.A.M.).