Evaluation of deep learning-based deliverable VMAT plan generated by prototype software for automated planning for prostate cancer patients

Abstract This study aims to evaluate the dosimetric accuracy of a deep learning (DL)-based deliverable volumetric arc radiation therapy (VMAT) plan generated using DL-based automated planning assistant system (AIVOT, prototype version) for patients with prostate cancer. The VMAT data (cliDose) of 68 patients with prostate cancer treated with VMAT treatment (70–74 Gy/28–37 fr) at our hospital were used (n = 55 for training and n = 13 for testing). First, a HD-U-net-based 3D dose prediction model implemented in AIVOT was customized using the VMAT data. Thus, a predictive VMAT plan (preDose) comprising AIVOT that predicted the 3D doses was generated. Second, deliverable VMAT plans (deliDose) were created using AIVOT, the radiation treatment planning system Eclipse (version 15.6) and its vender-supplied objective functions. Finally, we compared these two estimated DL-based VMAT treatment plans—i.e. preDose and deliDose—with cliDose. The average absolute dose difference of all DVH parameters for the target tissue between cliDose and deliDose across all patients was 1.32 ± 1.35% (range: 0.04–6.21%), while that for all the organs at risks was 2.08 ± 2.79% (range: 0.00–15.4%). The deliDose was superior to the cliDose in all DVH parameters for bladder and rectum. The blinded plan scoring of deliDose and cliDose was 4.54 ± 0.50 and 5.0 ± 0.0, respectively (All plans scored ≥4 points, P = 0.03.) This study demonstrated that DL-based deliverable plan for prostate cancer achieved the clinically acceptable level. Thus, the AIVOT software exhibited a potential for automated planning with no intervention for patients with prostate cancer.


INTRODUCTION
Highly conformal radiation therapies, such as intensity-modulated radiotherapy (IMRT) and volumetric-modulated arc radiotherapy (VMAT), can improve the conformance of the dose distribution to the planning target volume (PTV) and reduce the dose to the organs at risks (OARs) [1,2].These technologies provide complex dose distribution with a sharp gradient.These IMRT/VMAT plans require an inverse planning process to search for treatment machine parameters with clinically acceptable dose distributions [3].Inverse planning has multiple parameter adjustments and dose calculation processes.Owing to its basic property of inverse planning, even experienced planners can spend hours tuning the optimization parameters and calling the optimization algorithms multiple times to create a clinically acceptable plan.In addition, this skill-based planning technique causes variations in plan quality among planners [4].
• 842 A typical DL dose prediction technique uses a convolutional neural network model that receives a 2D or 3D input in the form of PTV/OAR contours with/without planning computed tomography (CT) and produces a voxel-level predicted dose distribution (pre-Dose) as its output [8][9][10].
Nguyen et al. assessed the feasibility of 2D-U-net-based 3D voxellevel dose prediction for patients with prostate cancer.The prediction error of OARs was <5% [11].Recently, Gronberg et al. proposed a 3D densely connected U-net with dilated convolutions and reported that the prediction error of PTV was within 3% for patients with head and neck cancer [12].These studies demonstrated that DL-based 3D dose prediction exhibited reasonable prediction accuracy for various clinical sites.However, it remained unclear whether preDose could be created via radiotherapy treatment planning system (TPS) (e.g.deliverable plan or not).To solve this issue, several methods were proposed.Fan et al. performed automatic treatment planning using preDose with a simple objective function for all voxels indifferently implemented in an open source cross-platform radiation treatment planning toolkit (matRad) [13].As they used open source TPS to create the RT plan, it could not be implemented in an actual RT machine [13].Xia et al. proposed using the DVH metrics generated from the preDose as objective functions for automatic RT planning [14].They used a commercial TPS (Pinnacle) and relied on DVH metrics for inverse planning [14], reporting that optimization may not be fully achieved using preDose.Miki et al. proposed automatic RT planning using dose-based structures created from preDose [7].This method can be applied for inverse planning methods across all commercially available TPS.However, creating the multiple dummy regions of interest (i.e.dose-based structures)-for inverse planning and optimizing objective functions for inverse planning-is expensive.
Recently, a prototype of commercial AI-based auto planning assistant system (AIVOT, AiRato.Inc., Yokohama, Japan) was released.This system can efficiently generate multiple dose structures generated from preDose.Furthermore, it can generate a deliverable plan that can reproduce the preDose using the vender-supply objective functions.Thus, this study aimed to clarify the dosimetric accuracy of a DL-based deliverable plan using AIVOT for patients with prostate cancer.

Patient characteristics
This retrospective study was approved by our institutional review board (2022-1-220) and included 68 patients who received radiotherapy with VMAT in our hospital (training, n = 55; test, n = 13).All patients were treated in our hospital from 2018 to 2022.The total dose was 70-74 Gy/28-37 fractions at the discretion of the radiation oncologists.Planning CT images were acquired using SOMATOM Definition AS+ (Siemens, Forchheim, Germany) with a matrix size of 512 × 512, slice thickness of 2.0 mm and pixel size of 1.27 mm.
All contours were delineated by radiation oncologists.PTV was created by adding a 6-mm margin in all directions (and 5 mm in the posterior direction) to the clinical target volume (CTV).CTV was delineated to include the prostate with/without the base of the seminal vesicles and divided into two parts: PTV 1 was the volume calculated by excluding the rectum from PTV, while PTV 2 was the overlap volume of PTV and rectum.Moreover, 95% of PTV1 received the prescribed dose (i.e.D95@PTV1).Rectum located at the level of the PTV and 6 mm outside of the PTV on the CT images was contoured.
The bladder was entirely contoured.The beam arrangement of one full arc VMAT was the same in all the patients.

Treatment planning
For each patient, VMAT plan was created using one full arc beams using Eclipse commercial treatment system version 15.6 (Varian Medical systems, Palo Alto, USA).All plans were calculated using an Acuros XB algorithm.The dose grid size was 2 mm.The final contours and treatment planning were carefully reviewed and approved by our radiotherapy team, comprising experienced radiation oncologists and physicists.The dose constraints for each plan are summarized in Table 1.
The clinically approved VMAT plans (cliDose) were used as 'ground truth' plans for training and testing.It should be noted that training data were checked again by experienced medical physicists, and treatment plans that met the clinical protocol but were not of high quality were replanned by experienced physicists.That is, in practice, some clinical plans, although fit for clinical use, could be finetuned further in certain DVH parameters (such as DVH parameters of the rectum, bladder and PTV) to improve to the maximum limit possible (the limits of such finetuning are not fully known, rendering it impossible to predict them in advance).Sasaki et al. showed that the plan quality may be improved by the commercial software that presents the limit point [15].In addition, the clinical plans whose normalization was different from the clinical protocol were modified to match the clinical protocol without majorly changing the plan quality (from D95@PTV to D95@PTV1).The dosimetric data in the training and test clinical plans are shown in Table 2.

Creation of preDose
The overview of auto planning workflow using AIVOT is shown in Fig. 1.There were six processes: step 1 was importing planning CT and the target/OAR structures from TPS; step 2 was setting a DL model connected to the import and preset structure's name; step 3 was creating preDose via DL; step 4 was creating preDose-based structures and exporting these dose-based structures to TPS; step 5 was inverse planning (optimization) using vender-supply objective functions; and step 6 was obtaining deliDose following the optimization.In step 1, planning CT and the target/OAR structures were imported in a DICOM format.In step 2, it was necessary to associate the contour's name preset in AIVOT with the corresponding contour's name in the input structure set.The PTV, rectum, bladder, small bowel, right femoral head, left femoral head and body can be used for DL model setting.Subsequently, a prescription dose was set in the DL model (i.e.D95 of PTV1).The DL model architecture of AIVOT was based on the Hierarchically Densely Connected U-net (HD U-net) proposed by Nguyen et al. [6].Overall, 55 cases were used in the learning process.After completing the learning process, the testing process used the 13 cases that were held out.Additional details are held confidential by the vender and not accessible.After setting the DL model, DL was performed to obtain the preDose (step 3).

Inverse planning using preDose-based structures with vender-supply objective functions
Inverse planning was performed using dose-based structures created from preDose exported from AIVOT and vender-supply objective functions of the target/OARs.At this time, optimization was also performed once.The beam settings were the same as in the training/validation data (i.e. 1 full arc VMAT).The vender-provided objective functions are shown in Table 3.In addition, normal tissue objective function (priority: 80; distance from the target border: 0 cm; start dose: 100%; end dose: 85%; fall-off: 1) was used.DeliDose was created using the same objective functions in all the patients.

Evaluation
The DVH parameters used for dose constraints in clinical practice (Table 1) were calculated for preDose and deliDose.For the quantitative evaluation of dose distribution, we calculated the isodose volume dice similarity coefficient (DSC) of the dose distribution (dose interval: 20%) and conformity index (CI) and gradient index (GI) [16,17].CI is defined as: where TV is the target volume, TV RI is the target volume covered by the prescription isodose and V RI is the total volume covered by the prescription isodose.GI is defined as the ratio of volume of 50% isodose line and volume of prescription isodose line.A blinded scoring of the deliDose and cliDose plans was performed by an expert radiation oncologist.Plan rating performed using an assessment form  In addition, using a 3D diode array detector (Arc-CHECK, SunNuclear, Melbourne, FL, USA), we assessed patientspecific QA by evaluating the global gamma passing rate (%/mm) for the absolute dose at the set criteria of 3%/2 mm.The dose threshold was 10%.To ensure the positional accuracy of the setup, all measurements were irradiated with a 10 × 10 cm 2 field defined by collimator jaws.We assessed the VMAT plan complexity using modulation complexity scores (MCSv) to evaluate the multileaf collimator movement during the delivery of VMAT plans [18,19].The statistical differences among them were determined via Wilcoxon signed-rank test using JMP version 16.0.0(SAS Institute, Cary, USA).Differences with a P-value of <0.05 were considered significant.

DISCUSSION
We evaluated the dosimetric accuracy of DL-based deliverable plan using AIVOT for patients with prostate cancer.AI-based predicted and deliverable plans were within 6% of dose difference in all the DVH parameters of the target and OAR in all the patients compared with the clinical plan.The AIVOT software exhibited a great potential for automated planning with no intervention for patients with prostate cancer.
The absolute doses value range in preDose and deliDose VMAT plans differed within 6% for all the DVH parameters, target tissue or OARs combinations in all the patients.Regarding the preDose, Nguyen et al. evaluated the prediction accuracy of preDose for patients with head and neck cancer using the same DL model (i.e.HD-U-net) as AIVOT.They reported that the prediction error of the OARs was within 6.3% [6].Our result is consistent with their result (our data: 5.43%).Miki et al. evaluated the deliDose prepared in a similar manner in patients with head and neck cancer; the mean dose of the OARs between deliDose and cliDose was approximately within 4-5 Gy.Our result had better prediction accuracy than their result.This may be due to difference in the clinical site (e.g.prostate vs head and neck cancer), consistency of treatment plan quality and DL architecture (e.g.U-net vs HD-U-net).
Regarding the plan quality of deliDose, the deliDose was superior to the cliDose in all DVH parameters for bladder and rectum (Table 4).These results indicated that DL-based automated planning may produce a better plan than a clinically accepted plan in OAR sparing.The reason for this may be due to data cleansing for DL model by replanning the treatment plans that met the clinical protocol but were not of high quality.The blinded physician plan score was >3 across all the treated patients.A physician plan score of >3 is the criteria for clinical use for the deliverable and clinical VMAT plans.However, the average physician plan score for the clinical VMAT plan was significantly higher than that for the deliverable plan (4.54 vs 5.0).In case b (Fig. 3), which is a representative case wherein deliDose is inferior to cliDose, there was no major difference in DVH parameters (within 3%); however, there was a slight difference in GI (deliDose: 3.6 vs cliDose: 3.89).The isodose DSC also showed moderate agreement with the dose ranges <60% (i.e.0-20%: 0.56; 20-40%: 0.62; 40-60%: 0.53; 60-80%: 0.60; 80-100%: 0.76; and 100-120%: 0.94).The reason that deliverable plan had lower physician plan score may be due to several factors, such as (i) slight increase of hot spot (dose >105%) in PTV (although deli-Dose had the steeper dose gradient) and (ii) the deviation in dose range (deviations observed <60%).The VMAT plan complexity and reproducibility with linear accelerator were similar for the deliverable and clinical plans; MCSv and the gamma passing rate were similar.Plans created using AI tend to be complicated.Kubo et al. evaluated the plan complexity (MCSv) using RapidPlan (RP) (Varian Medical System, a knowledge-based planning system that uses machine learning) [20].They reported that MCSv for RP was 0.25 ± 0.02 and that for clinical plan was 0.35 ± 0.03, indicating RP may cause more complexity plan owing to increase in smaller segments.This is not consistent with our result.This discrepancy can be due to differences in the optimization process employed by AIVOT and RP.The optimization calculation in AIVOT probably converges more easily because of the preDose-base structure inputs in TPS.Therefore, this simple method may lead to lower complexity than RP.
Taken together, our results indicate that the deliDose VMAT planning module based on AIVOT DL can be introduced for clinical application as a DL-based automated VMAT planning module to treat prostate cancer.To use this system widely, it may be necessary to adapt it to various planning strategies.Despite having nationally accepted practice guidelines, an actual clinical practice is rarely clearly defined in black and white [21].Kandalan et al. evaluated a method for adapting the DL model to different treatment planning practices with minimal input data.To utilize this system in more facilities, it may be necessary to develop a way to adapt a standard AI model built with large-scale data to different strategies in a small amount of data.
The limitations of this study are as follows.First, the feasibility of automatic planning workflow was assessed for one TPS (i.e.Eclipse).Second, the impact of SpaceOAR (Boston Scientific, Marlborough, USA) on AIVOT was not evaluated.Because in our hospital, the SpaceOAR is not administered to patients with prostate cancer who are treated with conventionally fractionated radiotherapy, the patients with SpaceOAR were not evaluated.Third, significant differences were observed between some DVH parameters in the training and test data; probably replanning for cases resulted in improved plan quality.Fourth, RP is already in clinical use as an AI-based optimization calculation system [22,23].Currently, RP supports only Eclipse, but AIVOT can, in principle, support any TPS.In the future, we plan to further verify this point by comparing the performance of RP and AIVOT.Fifth, the parameters of the preDose-based structure are set based on vender recommendations.As these settings affect the quality of the deliverable plans, one must evaluate its impact on plan quality in future studies.

Table 2 .
Dosimetric data for training and test data PTV1 = PTV without rectum, PTV2 = overlap volume between PTV and rectum.

Table 3 .
List of objective functions for optimization PTV1 = PTV without rectum, PTV2 = overlap volume between PTV and rectum.

Table 5 .
Summary of relative and absolute differences in DVH parameters between preDose or deliDose and cliDose PTV1 = PTV without rectum, PTV2 = overlap volume between PTV and rectum, preDose = predicted dose distribution, deliDose = deliverable dose distribution, cli-Dose = clinical dose distribution.