CT-based radiomic signatures for prediction of pathologic complete response in esophageal squamous cell carcinoma after neoadjuvant chemoradiotherapy

Abstract The objective of this study was to build models to predict complete pathologic response (pCR) after neoadjuvant chemoradiotherapy (nCRT) in esophageal squamous cell carcinoma (ESCC) patients using radiomic features. A total of 55 consecutive patients pathologically diagnosed as having ESCC were included in this study. Patients were divided into a training cohort (44 patients) and a testing cohort (11 patients). The logistic regression analysis using likelihood ratio forward selection was performed to select the predictive clinical parameters for pCR, and the least absolute shrinkage and selection operator (LASSO) with logistic regression to select radiomic predictors in the training cohort. Model performance in the training and testing groups was evaluated using the area under the receiver operating characteristic curves (AUC). The multivariate logistic regression analysis identified no clinical predictors for pCR. Thus, only radiomic features selected by LASSO were used to build prediction models. Three logistic regression models for pCR prediction were developed in the training cohort, and they were able to predict pCR well in both the training (AUC, 0.84–0.86) and the testing cohorts (AUC, 0.71–0.79). There were no differences between these AUCs. We developed three predictive models for pCR after nCRT using radiomic parameters and they demonstrated good model performance.


INTRODUCTION
Esophageal carcinoma (EC) is the one of the most common malignancies, and it ranks sixth as a cause of cancer-related mortality globally [1]. Approximately 80% of the new cases occur in less developed regions in the world, and 60% of these cases occur in China [2]. Despite multimodality advances in treatment, it remains a devastating disease, with a 5-year overall survival rate of 15-25%.
One of the main reasons for the poor prognosis of EC is that patients are often diagnosed at an advanced stage. To reduce the difficulty of operating and to improve outcomes, applying concurrent chemoradiotherapy before surgery is now the first-line option for locoregional disease. Patients having a pathologic complete response (pCR) after neoadjuvant chemoradiotherapy (nCRT) appear to have superior overall survival [3,4]. On the other hand, non-responders to nCRT may receive no benefit from this therapy [5]. We urgently need to be able to identify the EC patients who will benefit from nCRT so that the best treatment can be given to each individual patient.
Radiomics is an emerging method that extracts quantitative features from computed tomography (CT) images, magnetic resonance (MR) images, positron emission tomography (PET) images, etc. Radiomics could be applied to identify tumor phenotype characteristics [6,7], and to discover prognostic or predictive biomarkers for cancers [8,9].
As well, radiomics can be applied in prediction of treatment response, e.g. pCR in EC after nCRT. Yip et al. studied the possibility of using CT-based texture features to predict tumor regression grade (TRG, 1-3 vs 4-5, Mandard et al. [10]) in a cohort of 31 EC patients [which included only 9 esophageal squamous cell carcinoma (ESCC)] patients after nCRT [11]. The pre-and posttreatment standard deviation of the histogram was significantly associated with TRG. However, those researchers only analyzed a small number (n = 6) of histogram-based texture features. Zhen et al. analyzed 214 CT-based radiomic features to predict pCR in 49 ESCC patients. Prediction models developed by support vector machine (SVM) or artificial neural network (ANN) algorithms can discriminate non-responders from responders [12]. The area under the receiver operating characteristic curve (AUC) is 0.818 in SVM and 0.927 in ANN algorithms. More studies have linked the radiomic features from 18-F-deoxyglucose ( 18 F-FDG) PET scans and pCR in EC patients after nCRT. Combining clinical factors and 18 F-FDG PET-based radiomic features improved the prediction ability [13]. Roelof et al. developed a prediction model combining clinical T-stage and restaging after nCRT, which can provide high discriminatory accuracy in predicting pCR (AUC, 0.81) [14].
However, most radiomic studies [11,[14][15][16][17][18] included both adenocarcinoma and squamous cell carcinoma patients. In contrast to patients with adenocarcinoma, patients with ESCC may have a higher pCR rate after nCRT [19,20]. In pCR prediction analysis, grouping these two histological types of EC together may have a negative impact on the accuracy of the analytical results. Here, we used quantitative features from radiomics as prognostic biomarkers to build models to predict pCR for patients with ESCC after nCRT. This model may help doctors to make the the best clinical decision at the beginning of treatment.

Patients
A total of 55 consecutive patients pathologically diagnosed as having ESCC were included in this retrospective study. The nCRT was administrated to all patients between May 2012 to August 2016 at the Cancer Hospital of Shantou University Medical College. Informed consent was obtained from all individual participants included in the study. All patients received a pre-treatment CT scan for radiotherapy planning.
The patients' clinical stage of ECSS was performed according to the 7th edition of the American Joint Committee on Cancer (AJCC) staging system with CT scan and endoscopic ultrasonography. More information about the patients is listed in Table 1.

Neoadjuvant chemoradiotherapy
Patients were treated with 3D conformal radiotherapy (3D-CRT) or intensity-modulated radiotherapy (IMRT) using 6 MV X-rays. The gross tumor volume (GTV) was identified using both diagnostic and radiotherapy planning CT images and barium esophagography, and GTV included the primary tumor (GTVp) and grossly involved regional lymph nodes (GTVn). The clinical target volume (CTV) was defined as the GTVp with a margin of 1.0 cm laterally and a 3.0 cm margin in the superior and inferior dimensions plus GTVn with a 0.5 cm to 1.5 cm expansion. The PTV was determined by adding 0.5 cm radially to the CTV. A total prescription dose of 40-64 Gy (median, 50 Gy) was delivered in 2 Gy per fraction 5 days a week.
A concurrent 3-weekly schedule of platinum-based nCRT was administrated to all patients. Twenty-three patients received NP (vinorelbine + cisplatin) chemotherapy, which consists of cisplatin (75 mg/m 2 on Day 1) plus vinorelbine (25 mg/m 2 on Days 1 and 8). Twenty-five patients received the PF (cisplatin + fluorouracil) regimen, which consists of cisplatin (75 mg/m 2 on Day 1) and fluorouracil (750 mg/m 2 /24 h on Days 1 to 4). For the TP (paclitaxel + cisplatin) regimen for 7 patients, paclitaxel was administrated using 135-180 mg/m 2 on Day 1 and cisplatin 75 mg/m 2 on Day 1. In cases of severe hematologic toxicity, dose adjustment was implemented in the second chemotherapy cycle.

Surgery
All patients underwent clinical re-examination 4 weeks after nCRT, including a barium esophagography test and thoracoabdominal CT. A transthoracic esophagectomy with two-field or three-field lymphadenectomy was performed 5-6 weeks after the neoadjuvant treatment. A pCR patient was defined as a patient with no residual, viable tumor cells in the surgical specimen.

CT image radiomic feature collection
For all patients, CT scans (CT scanner: Philips Brilliance CT Big Bore Oncology Configuration, Cleveland, OH) were performed in the supine position with intravenous contrast. A standard clinical acquisition protocol (tube voltage, 120 kVp; rotation time, 0.75 seconds; pitch, 0.938; matrix, 512 × 512; field of view, 350 mm × 350 mm; pixel size, 1.46 mm; slice thickness, 5 mm; reconstruction kernel, standard) was applied for each patient in this cohort. No resample of the voxel size of the CT images was used. The bit depth of patients' CT images was 12 and the number of gray levels was 4096. The GTVs of ESCC were delineated for the ESCC on the planning non-enhanced CT-scan by experienced radiation oncologists, using a reference of barium radiography of the esophagus or contrast-enhanced CT. A 3DSlicer (version, 4.8.1, Stable Release) with its extension (radiomics) was used for collecting the radiomic features from pre-treatment CT [21]. Any pixel with an attenuation of less than −50 HU was excluded to remove the intra-luminal air from GTVs. In image pre-processing, Laplacian of Gaussian or wavelet filters were used. Five values of Laplacian of Gaussian spatial band-pass filter (0, no filtration; 1.0, fine textures; 1.5 and 2.0, medium textures; 2.5, coarse textures) for image smoothing and a fixed number of  (32,64 or 128) of discrete bins for image resampling were applied. In the wavelet filter, the Coiflet 1 mother wavelet was used and a high-pass filter or low-pass filter were applied in the x, y and z directions. In all, 624 wavelet features (Supplemental Table 1) and 406 non-wavelet features (Supplemental Table 2

Statistical analysis
The 55 patients were divided into two groups (a training group of 44 patients, and a testing group of 11 patients). We performed multivariate logistic regression analysis using likelihood ratio forward selection in the training group to select the most predictable clinical factors for pCR. All the radiomic features were normalized using Z-score normalization. Three groups of radiomic data were analyzed separately: Group 1, non-wavelet features with bin size = 32 and all the wavelet radiomic features; Group 2, non-wavelet features with bin size = 64 and all the wavelet radiomic features; Group 3, non-wavelet features with bin size = 128 and all the wavelet radiomic features. The least absolute shrinkage and selection operator (LASSO) with logistic regression was applied to select optimal predictors in the training group. LASSO with 10-fold crossvalidation was performed using the glmnet [22,23]  where P is the probability of the event occurring; S = β 0 + β 1 x 1 + β 1 x 1 + … + β m x m , where x 1 , x 2 . . . x m are different input parameters, β 0 is the constant for S, and β 1 ... β m are the logistic regression coefficients of the corresponding input parameters. In this paper, S is the function for radiomic signature, b is the coefficient for S, and c is the constant in logistic regression. Model performance was evaluated by the AUC using pROC [24] package in R software in both the training and testing groups. The AUCs were compared using the method suggested by Delong et al. [25] through pROC. The Chi-squared test or Fisher's test was used to determine whether there was a significant difference in the categorical variables between these groups. A P-value of < 0.05 was considered statistically significant.

RESULTS
The pCR rate of this study was 42% (23/55), 43% (19/44) and 36% (4/11) in the whole, training and testing cohorts after nCRT, respectively. No clinical differences were found between the training and testing groups (Table 1). No clinical factors were identified as predictable factors for pCR by logistic regression analysis in either the training or testing groups (Table 1). Three groups of radiomic features were analyzed by LASSO separately to build three radiomic signatures, and the results are presented in Table 2.
Three logistic regression models for pCR prediction were developed based on these three signatures separately ( Table 3). The AUCs of Model 1, Model 2 and Model 3 in the training dataset were 0.84 to 0.86 and in the testing group were 0.71 to 0.79 ( Table 3). The receiver operating characteristic curves (ROCs) are shown in Fig. 1

DISCUSSION
Early identification of pCR prior to nCRT might avoid unnecessary chemoradiation-associated morbidity. However, there is still no powerful tool that can yield sufficient ability to predict pCR after nCRT [26,27]. For example, using only 18 F-FDG PET to predict pCR after nCRT in EC is not recommended [26]; not is applying the combination of 18 F-FDG PET and endoscopic biopsy [27]. Radiomics may become a better method for predicting pCR. In previous studies that used radiomic data to predict pCR, the AUCs were between 0.71 and 0.93 [12,[14][15][16][17][18]. In the study that also analyzed CT-based radiomic features [12], the AUCs were the highest among these studies (0.818 in SVM and 0.927 in ANN modeling). However, modeling overfitting has likely occurred as a result of the small sample size (49 patients) and large number of predictors (214 radiomic features) included in the modeling. Although the number of patients in the present study was small (n = 55) as well, and more radiomic features (n = 1030) were analyzed in our study, the LASSO analysis was applied to handle overfitting in logistic regression and the AUCs were 0.84 to 0.86 in the training cohort and 0.71 to 0.79 in the testing cohort. Certainly, validating our models in an independent cohort is necessary before any clinical practice can be adopted, because of the small sample size and retrospective nature of our study.
In CT images, different scales of smoothing using Laplacian of Gaussian spatial band-pass filters are important; they can reduce image noise and highlight different anatomical spatial scales (from fine to medium to coarse texture) within the tumor [28,29]. We applied five scales of smoothing (1 to 2.5, in steps of 0.5) to obtain the best radiomic features for pCR prediction, but only radiomic features with sigma = 2.5 were selected by LASSO. Therefore, the smoothing procedure using Laplacian of Gaussian spatial band-pass filters with suitable scales might be important in unenhanced CT images. Hatt et al. [30] found that significant texture details were lost when using a quantization of <32 bins. Based on the experience of PET-CT [30], we chose three different bins (32,64 or 128) to resample the CT images. We found a similar prediction ability for the three radiomic signatures with the different bins. This finding is in line with previous findings claiming that ≥32 discrete values for the bin are recommended in order to properly quantify tumor heterogeneity [31], and that textural features computed with resampling values >64 may not provide additional prognostic information compared with the tumor volume [30]. However, although the features selected in all three models with similar performance appear to have been stable, the radiomic predictors differed between the three models. This suggests that we may need to use different bin sizes to resample the CT images and compare the performance of the different models for the different bin sizes. In these three models in our study, Model 1 had the highest AUC value (although the AUCs were not significantly different between the three models) in both the training and the testing cohorts, and it had the least number of predictors. Thus, when readers try to apply our models to their patients' imaging, Model 1 might be the first choice.
Female sex, age, poor differentiation grade, tumor length, and low cT-stage were identified as the predictors for pCR after nCRT [20,32,33]. In our study, clinical factors were not found to be significantly related to pCR after nCRT, and this may be due to the relatively small number of ESCC patients. Although only radiomic features were selected, three radiomic signatures that we built could well predict the pCR in our cohort (AUCs, 0.71 to 0.80). Tumor volume is considered as a risk predictor for pCR [13,30,34]. However, the tumor volume failed to predict pCR (P = 0.157, AUC = 0.61, 95% CI, 0.46 to 0.76) in any of the patients in our cohort. The SurfaceVolumeRatio (bin = 32) (surface area to volume ratio, a lower value indicates a more compact shape; P = 0.04, AUC 0.66, 95% CI 0.52 to 0.81, for all patients) might provide  [30]. However, the radiomic signatures provided better prediction performance in the case of only one patient with a tumor >10 cm 3 in our cohort. Soufi et al. [35] tested wavelet radiomic features from different mother wavelets in survival prediction of non-small cell lung carcinoma patients; they found Symlet and Biorthogonal mother wavelets yielded the best performance. The radiomic tool that we used in this study only provides a wavelet radiomic feature from the Coiflet 1 mother wavelet. Thus, the wavelet radiomic features in our study might be not the optimal ones. In the studies using texture features to predict pCR for EC [11,[14][15][16][17][18], both adenocarcinoma and squamous cell carcinoma patients were included, and the malignancy of the majority of these patients was adenocarcinoma. The AUCs in these 'mixed' studies were between 0.71 and 0.89. Compared with these 'mixed' studies, the AUCs in our study and that of Hou et al. [12], which both included only ESCC patients, seem to be higher (0.84-0.97). Hence, it might improve the model performance by dividing the patients according to histological types for different analyses in predicting pCR. The rationale behind this might be the differences in the pCR rate (49% in ESCC and 23% in adenocarcinoma) [19] and the genomic characterization [36]. Thus, including only one type of EC for analysis maybe more appropriate, and we developed this study to explore the radiomic predictors for ESCC.

CONCLUSION
We developed three CT-based radiomic models for predicting the pCR in ESCC patients after nCRT. These predictive models demonstrated good model performance in predicting pCR and might help physicians identify candidates for nCRT.