A multivariable normal tissue complication probability model for predicting radiation-induced hypothyroidism in nasopharyngeal carcinoma patients in the modern radiotherapy era

Abstract Radiation-induced hypothyroidism (RHT) is a common long-term complication for nasopharyngeal carcinoma (NPC) survivors. A model using clinical and dosimetric factors for predicting risk of RHT could suggest a proper dose–volume parameters for the treatment planning in an individual level. We aim to develop a multivariable normal tissue complication probability (NTCP) model for RHT in NPC patients after intensity-modulated radiotherapy or volumetric modulated arc therapy. The model was developed using retrospective clinical data and dose–volume data of the thyroid and pituitary gland based on a standard backward stepwise multivariable logistic regression analysis and was then internally validated using 10-fold cross-validation. The final NTCP model consisted of age, pretreatment thyroid-stimulating hormone and mean thyroid dose. The model performance was good with an area under the receiver operating characteristic curve of 0.749 on an internal (200 patients) and 0.812 on an external (25 patients) validation. The mean thyroid dose at ≤45 Gy was suggested for treatment plan, owing to an RHT incidence of 2% versus 61% in the >45 Gy group.

Despite these modern RT techniques, normal tissues in the head and neck region are inevitably exposed to radiation.One of the most common late complications in NPC survivors is radiation-induced hypothyroidism (RHT) that causes various degree of symptoms, including fatigue, mood changes, daytime sleepiness, weight gain, cold intolerance, constipation, dyspnea on exertion, edema, dry skin, difficulty concentrating and increased risk of cardiovascular disease.Therefore, lifelong thyroid hormone supplementation is recommended for symptomatic and asymptomatic high-risk patients but it results in an inconvenience, poor compliance and frequent hospital visits and investigations [6][7][8].
Therefore, this study aims to develop a new NTCP model that integrates clinical and dosimetric data from a uniform VMAT or IMRT technique in Thai population.Furthermore, we compared our new model with the previously published models.

MATERIALS AND METHODS
We retrospectively collected data on NPC patients treated with definitive radiotherapy using the VMAT or IMRT technique 66-70 Gy in 30-35 fractions with or without CMT, between January 2010 and April 2019.Inclusion criteria included pathologically proven NPC, age ≥18 years, normal baseline TSH and free thyroxine (FT4) and at least 2 years of follow-up or sooner if RHT developed.Exclusion criteria were history of preexisting thyroid disease, abnormal or no baseline TSH and FT4, history of thyroid surgery and history of I-131 or radiation at the neck.This study was approved by the Institutional Review Board (IRB No. 836/62).

Outcome measurements
The primary endpoint was RHT, defined in three ways; subclinical hypothyroidism (high TSH with normal FT4), primary hypothyroidism (high TSH with low FT4) and central hypothyroidism (low or normal TSH with low FT4), according to our institutional reference range (Supplementary 1), regardless of symptoms.

RT and dosimetric analyses
Two or three planning target volumes (PTVs) were used in the radiotherapy planning.For the three PTVs plan, PTV-high risk (PTV-HR) was defined as gross tumor and gross pathologic lymph nodes (LNs) that received doses of 66-70 Gy.PTV-intermediate risk was defined as the subclinical disease that received prophylactic doses of 60-66 Gy.PTV-low risk was defined as the elective LN region (bilateral cervical LN Level II-V, VII) that received doses of 50-56 Gy.For the two PTVs plan, subclinical disease was included in PTV-HR.The prescribed doses were delivered in 30-35 fractions at the rate of 5 fractions per week.A simultaneous integrated boost (SIB) or sequential cone-down boost of 20 Gy to the PTV HR was selected by physician's decision.
The six MV photon beams were used with typical nine beam angles of the IMRT plan (220 • , 260 • , 300 • , 340 • , 20 • , 60 • , 100 • , 140 • and 180 • ) and three arcs for VMAT plans.RT was planned using the Eclipse treatment planning system (Eclipse version 6.5-15.0,Varian, Palo Alto, CA, USA).Radiation was delivered using Varian linear accelerator (Varian Medical Systems Inc., Palo Alto, CA, USA) with dynamic 80-120 Leaf multi-leaf collimators.Treatment verification included daily electronic portal images and at least weekly cone-beam CT.The criteria for dose optimization were 95% of PTV volumes receiving the prescribed dose and maximum dose receiving lower than 107%.Dose constraints for the organs at risk were according to institutional protocol.In practice, we did not apply specific dose constraints to the thyroid and pituitary gland.Thyroid and pituitary gland (pituitary fossa) were manually re-contoured in CT non-contrast images for all patients.Dosimetric data were obtained from a treatment planning system consisting of: thyroid volume, the maximum, mean and minimum dose to thyroid gland (D max , D mean , D min ), the percentage of thyroid volume receiving at least 25, 30, 35, 40, 45, 50, 55 and 60 Gy (V 25 , V 30 , V 35 , V 40 , V 45 , V 50 , V 55 and V 60 ), the absolute thyroid volume spared from 45, 50, 55 and 60 Gy (VS 45 , VS 50 , VS 55 and VS 60 ), which can also be calculated as VS x = (100−Vx (%)) 100 × thyroid volume (cm 3 ), as well as the maximum, mean and minimum dose to pituitary gland (P max , P mean and P min ).

Chemotherapy
Concurrent platinum-based CMT (cisplatin or carboplatin) was administered during radiation in Stage II-IV patients.Neoadjuvant or adjuvant CMT regimens were based on physician discretion, including platinum-based CMT (cisplatin or carboplatin), infusion fluorouracil (5FU), paclitaxel, or gemcitabine, given at 3-or 4-week intervals for a maximum of three cycles.

Statistical analyses
Clinical and dosimetric features between patients with or without RHT were compared using a χ 2 test or t-test.We used three steps in developing our NTCP model.First, univariate analysis was initially performed for all variables, followed by selecting significant variables at P-value of <0.10 to be considered in the multivariate analysis.A standard backward stepwise logistic regression method was used to obtain the NTCP model: where β 0 is a constant and β 1 , . . ., β n are the logistic regression coefficients of the variable x 1 , x 2 , . . ., x n , respectively.Pearson's correlations were also used to investigate the relationship between the variables.
Second, our model performance was investigated by calculating the area under the curve of receiver operating characteristic (AUROC) curves.An expected AUROC ≥ 0.7 was required to have a good predictive model.The Brier score was also used to evaluate the performance of our model that predicts a binary outcome.Finally, model calibration using the Hosmer-Lemeshow goodness-of-fit test was used, where a non-significant P-value >0.05 indicated good prediction for our model compared with the observed cases.
Third, internal model validation was performed with a k-fold crossvalidation.We used k = 10, with 9-fold as a training dataset to fit the model, and the remaining 1-fold as a test dataset to internally validate the model.Eventually, the external validation was performed in consecutive patients who were treated after the model development period.Statistical analyses were done using STATA 15.1 (StataCorp LLC, College Station, TX, USA).

RESULTS
Of the total 498 NPC patients, 298 were excluded due to less than a 2-year follow-up (148), no pretreatment (99) or an abnormal pretreatment thyroid function test (37), a preexisting thyroid disease (13) or a previous neck radiation (1).The remaining 200 patients (144 males and 56 females) with a mean age of 48.7 years (with a range 18-83 years) were analyzed.The RHT group tended to have younger ages, earlier T stages (T1-T2), more advance N stages (N2-3) and higher pretreatment TSH.The patient and treatment characteristics are shown in Table 1.
The mean dose, D min , D max and V x of the thyroid gland in RHT group were significantly higher than non-RHT group, while the VS 45 , VS 50 and VS 55 were significantly lower.The maximum correctly classified cutoff for the mean thyroid dose to differentiate between patients with no RHT and RHT group was 45 Gy.The incidence of RHT was 2 and 61% in the thyroid D mean of ≤45 Gy and >45 Gy subgroups, respectively.
Other dosimetric parameters, such as thyroid volume and pituitary exposure dose, were not found to be significant in this study.The mean thyroid volume was 14.8 cm 3 (15.1 vs 14.6 cm 3 , with a non-significant P-value = 0.64) and the mean pituitary dose was 53.6 Gy (55.5 vs 52.4 Gy, with a non-significant P-value = 0.22).In subgroup analysis, patients with central RHT received a significantly higher pituitary dose that those without central RHT (a median of P mean 72.3 Gy vs 57.4 Gy with a significant P-value = 0.009, and a median of P max 74.7 Gy vs 63.5 Gy with a significant P-value = 0.019) as shown in Supplementary 2.

NTCP model development
In the univariate logistic regression analyses, age, T and N stage, pretreatment TSH, D mean , D min , D max , V 35 , V 40 , V 45 , V 50 , V 55 , V 60 , VS 45 , VS 50 , and VS 55 of the thyroid gland were significantly associated with RHT at any time point.In the multivariate logistic regression analysis, age, pretreatment TSH and D mean were the only significant parameters, as shown in Table 2.
It was noted that the negative coefficient for age (−0.03) indicated that patients with an older age had a reduced risk of RHT.However, the positive coefficients for pretreatment TSH (0.91) and D mean (0.12) indicated that these two variables were positively associated with a risk of RHT.Pearson's correlation tests showed no significant correlations between age and D mean , but there were high correlations between the dose-volume variables of thyroid gland as shown in Supplementary 3.
T and N stages were significant predictors of RHT in univariate analysis, but not in multivariate analysis.The supraclavicular LN involvement was not significantly different between the non-RHT and RHT groups (6.8 vs 14.3% with a non-significant P-value = 0.107).Additionally, there was no difference of the presence of PTV 70 Gy in the supraclavicular region between the non-RHT and RHT groups (21.6 vs 32.5% with a non-significant P-value = 0.099).

Model performance and validation
Our NTCP model for predicting RHT had a good performance with an AUROC of 0.749 [95% confidence interval (CI) 0.677-0.821]and good calibration of expected and observed event rates (Fig. 1).The Hosmer-Lemeshow goodness-of-fit test was used to assess the performance of our NTCP model.There was a good degree of fit with a non-significant P-value = 0.837.Internal validation was performed with a 10-fold cross-validation and resulted in a mean AUROC of 0.764 (95% CI 0.634-0.795).The model performance on external validation dataset (n = 25) was excellent with an AUROC of 0.812.We compared our model to two previously reported NTCP models for predicting   D mean = mean dose of the thyroid (Gy), P mean = mean dose of the pituitary (Gy), V x = volume of thyroid gland receiving x Gy or more (%), VS x = absolute thyroid volume spared from x Gy or more (cm 3 ).a The multivariate results were obtained using a requirement of a P-value <0.10 from the univariate and backward stepwise logistic regression analyses to select the final model.
RHT in NPC patients, Luo [12] and Shen [16], and revealed the low AUROC in both training and external validation dataset (Table 3).

DISCUSSION
We developed a new NTCP model as a useful tool to estimate the risk of hypothyroidism after radiation in NPC patients treated with IMRT or VMAT using a standard multivariate logistic regression analysis.The final model with three significant variables comprised of age, pretreatment TSH, and D mean of thyroid gland.Although there were high correlations between dose-volume variables of the thyroid gland, our model selected D mean thyroid because it represented the best predictor as shown in the multivariate analysis and was easily obtained from the treatment planning system.The model performance of our NTCP model was good in both training and validation dataset.
Our results confirmed that thyroid exposure dose was clearly associated with the risk of RHT, which was consistent with the results of previous studies [9][10][11][12][13][14][15][16][17][18].The volume of thyroid that spared from high dose was a recent indicator.Chyan et al. indicated that if thyroid VS 45 ≥3 cm 3 , then RHT rates at 3 years post-RT was reduced (38 vs 55%) [19].Lertbutsayanukul et al. discovered that VS 60 ≥10 cm 3 was associated with lower risk of RHT (2-year incidence of 68.4 vs 40.4%) and recommended for treatment planning [11].In this study, we also found the association of VS 45 and VS 60 with the risk of RHT in a univariate analysis; however, they had strong correlation with D mean and were not included in the final model.Vogelius et al. revealed a 50% risk of RHT at a dose of >45 Gy [20], and Kim et al. suggested a V 45 of 50% as a possible dose-volumetric threshold of RHT [21].Lin et al. reported that >80% of the RHT patients received D mean of over 43 Gy [22].Similarly, our study also concluded that a D mean >45 Gy was associated with an increased risk of developing RHT, thus we recommended using this constraint for the treatment planning.
Age had a negative effect on the risk of RHT in our patients, which suggested that younger patients were more susceptible to develop RHT than older patients.The result agreed with previous reports [9,10] and the finding in Sommat et al. who showed that the odds of RHT decreased by a factor of 0.94 with each additional year of age [23].This implied that radiosensitivity of the thyroid gland decreased with age.
TSH is one of the vital endocrine hormones in the hypothalamicpituitary-thyroid axis and plays an important role to maintain thyroid homeostasis.However, the association between pretreatment TSH level and RHT is scarce.Ronjom et al. indicated that increasing baseline TSH was a significant risk factor for developing RHT in a univariate analysis, but not in a multivariate analysis [17,18].Lertbutsayanukul et al. demonstrated that pretreatment TSH levels of ≥1.55 μU/ml was associated with smaller thyroid volume and significantly increased risk of developing RHT by 2.5 times [11].It was postulated that the small thyroid gland might have less functional subunit to produce thyroid hormone and send negative feedback by increasing TSH level from the pituitary gland.Our finding was in line with these studies and included the pretreatment TSH in the final NTCP model.However, it was worth noting that the absolute TSH level was different according to the laboratory units, so the model needed to be calibrated prior to application in another population.
There were two studies from China developing the NTCP models for predicting RHT in NPC patients.Luo et al. reported the incidence of RHT in 39/174 patients (22.4%) and 81.6% of them were treated with IMRT technique [12].The predictors in NTCP model were female gender, use of CMT, radiation dose to thyroid gland (V 50 ) and pituitary gland (P max ).Another model developed by Shen et al. from 69 IMRT-treated NPC patients consisted of two parameters, pretreatment thyroid volume and D mean thyroid [16].In comparison to their studies, our patient population were also in an endemic area of NPC and had similar proportion of locally advanced disease.However, there was a difference in treatment approach.For example, neoadjuvant CMT was routinely administered in China but it is selected in very high-risk patients in Thailand.In addition, CMT regimens in Luo et al. consisted of neoadjuvant cisplatin and docetaxel combined with concurrent and adjuvant cisplatin and 5-fluorouracil [12], whereas concurrent cisplatin combined with adjuvant cisplatin and 5-fluorouracil was mostly administered in our patients.Hence, in the external validation process of those two models [12,16] using our data, the established NTCP models did not perform well, as shown by low AUROC values.The reduction in the performance of the previous two NTCP models is probably due to the differences in treatment strategies.
Higher N stage was a significant factor in univariate analysis but not in multivariate analysis.We hypothesized that patients with advanced nodal disease had larger nodes or had lower neck involvement (N3b).Thus, they received higher radiation doses to the supraclavicular region that was adjacent to the thyroid gland.However, we found no difference in supraclavicular LN involvement or PTV 70 Gy in the supraclavicular region between the non-RHT and RHT groups.This could be due to the small number of cases to demonstrate the difference.
In our study, seven patients had central RHT and one of them was initially diagnosed with subclinical RHT and later progressed to central RHT.The median time to develop central RHT was 40 months (with a range 18-97 months).Pituitary dose was significantly higher in central RHT group.This finding was contradicted to Luo et al. who reported that a higher pituitary dose was a protective factor of biochemical RHT by preventing TSH from rising [12].But it should be noted that the median follow-up time in their study was 24 months which was in the period of a decrease of TSH level in NPC patients who were treated with radiotherapy [24].The incidence of central RHT might increase after a longer period of observation.Consequently, we suggested to constrain pituitary doses as low as possible while not compromising target dose to achieve optimal disease control.
Several strengths of our study included the uniform treatment including radiotherapy technique and regimen as well as CMT use, the relatively large number of patients with longer follow-up periods and our comprehensive analyses of dose-volume variables.Our patients were treated with IMRT or VMAT techniques with either SIB technique or sequential/two-step boost.Both techniques resulted in an inhomogeneous radiation dose to the tissues; therefore, our NTCP model could be applied to all IMRT patients.
The retrospective design of our study resulted in missing data, which could possibly lead to a selection bias.However, we attempted to reduce the bias by recruiting all consecutive patients in the study period and clearly defining inclusion and exclusion criteria.It should also be noted that the data used for our study were gathered exclusively from a single institution.Therefore, larger dataset and external validation of

CONCLUSION
RHT is a dose-limiting late toxicity in NPC survivors and can be predicted using our multivariable NTCP model.We demonstrated that our three-variable model, comprising of age, pre-treatment TSH, mean thyroid dose, was effective in predicting RHT in both internal and external validation cohort.Furthermore, we recommended keeping the mean thyroid dose below 45 Gy to minimize the risk of toxicity.
WHO = World Health Organization, RT = radiotherapy, TFT = thyroid function test, D min = minimal dose of the thyroid (Gy), D mean = mean dose of the thyroid (Gy), D max = maximum dose of the thyroid (Gy), P min = minimal dose of the pituitary (Gy), P mean = mean dose of the pituitary (Gy), P max = maximum dose of the pituitary (Gy), V x = volume of thyroid gland receiving x Gy or more (%), VS x = absolute thyroid volume spared from x Gy or more (cm 3 )

Fig. 1 .
Fig. 1.The area under the receiver operating characteristic (AUROC) curves (A) and calibration plot (B) of our NTCP model for predicting RHT on nasopharyngeal cancer patients (n = 200).

Table 1 .
Patient characteristics

Table 2 .
Univariate and multivariate analyses for RHT

Table 3 .
Model performance in our training dataset and external validation dataset our model with NPC patients from other institutions are future work that would be useful.