Postoperative outcomes in oesophagectomy with trainee involvement

Abstract Background The complexity of oesophageal surgery and the significant risk of morbidity necessitates that oesophagectomy is predominantly performed by a consultant surgeon, or a senior trainee under their supervision. The aim of this study was to determine the impact of trainee involvement in oesophagectomy on postoperative outcomes in an international multicentre setting. Methods Data from the multicentre Oesophago-Gastric Anastomosis Study Group (OGAA) cohort study were analysed, which comprised prospectively collected data from patients undergoing oesophagectomy for oesophageal cancer between April 2018 and December 2018. Procedures were grouped by the level of trainee involvement, and univariable and multivariable analyses were performed to compare patient outcomes across groups. Results Of 2232 oesophagectomies from 137 centres in 41 countries, trainees were involved in 29.1 per cent of them (n = 650), performing only the abdominal phase in 230, only the chest and/or neck phases in 130, and all phases in 315 procedures. For procedures with a chest anastomosis, those with trainee involvement had similar 90-day mortality, complication and reoperation rates to consultant-performed oesophagectomies (P = 0.451, P = 0.318, and P = 0.382, respectively), while anastomotic leak rates were significantly lower in the trainee groups (P = 0.030). Procedures with a neck anastomosis had equivalent complication, anastomotic leak, and reoperation rates (P = 0.150, P = 0.430, and P = 0.632, respectively) in trainee-involved versus consultant-performed oesophagectomies, with significantly lower 90-day mortality in the trainee groups (P = 0.005). Conclusion Trainee involvement was not found to be associated with significantly inferior postoperative outcomes for selected patients undergoing oesophagectomy. The results support continued supervised trainee involvement in oesophageal cancer surgery.


Introduction
Oesophagectomy is associated with significant postoperative morbidity and mortality, with over 60 per cent of patients experiencing a postoperative complication, and reported 90-day mortality rates of almost 5 per cent [1][2][3] . The complexity of oesophageal surgery and the significant risk of negative outcomes necessitates that oesophagectomy is predominantly performed by a consultant surgeon, or a senior trainee under direct supervision.
Current evidence on the impact of trainee involvement in oesophagectomy is predominantly limited to single-centre, small-volume retrospective series, and analyses of the American College of Surgeons' National Surgical Quality Improvement Program (NSQIP) database. These studies have suggested that, within structured supervised training, trainee input does not negatively impact on outcomes [4][5][6][7][8][9] . However, despite these findings, concerns remain around involving trainees in oesophagectomy, as other evidence from a variety of complex procedures from different surgical specialties has suggested increased morbidity with trainee involvement. For example, trainee involvement in major lower limb amputation is associated with increased major morbidity, increased operative time, and an increased need for intraoperative transfusions 10 . Similar findings have also been reported for appendicectomy, cholecystectomy, and bariatric procedures, with studies evaluating combined trainee-performed and trainee-supervised procedures suggesting that trainee involvement increased morbidity, operative time, and length of hospital stay 11,12 . In light of these findings, greater evaluation of the impact of trainee involvement in oesophagectomy is required, to determine the effect of training on patient outcomes. Some countries publish surgeon-specific outcome data that are freely available to the public 13,14 . Although this leads to greater accountability, and the ability to compare outcomes across units, it could potentially create an environment where training opportunities are diminished due to fears that this could negatively impact on published outcome data 15,16 . As such, it is important to identify whether trainee involvement in oesophagectomy impacts patient outcome to dispel these fears, and ensure that the next generation of surgeons receives adequate training opportunities, in order to provide continued high-quality oesophageal surgery in the future. The Oesophago-Gastric Anastomosis Audit (OGAA) was an international multicentre cohort study, investigating perioperative outcomes for patients undergoing oesophagectomy for oesophageal cancer 1,17,18 . The aim of this present study was to use the data from the OGAA

Outcome measures
The primary aim of the study was to assess the impact of trainee involvement on postoperative mortality, as defined as death within 90 days from surgery. Secondary outcomes included the rates and grades of either anastomotic leak or conduit necrosis, complication rates, length of stay, need for reoperation, and 30day mortality. Complications were defined by the Esophageal Complications Consensus Group (ECCG) framework 19 , and were classified based on the Clavien-Dindo grade; the overall complication and major (grade III-V) complication rates were analysed as separate outcomes. All outcomes were analysed separately by the anastomosis location, as both operative difficulty and patient outcomes are known to differ between procedures with chest and neck anastomoses 1,20 .
Tumour staging was performed in accordance with the TNM eighth edition 21 . Positive longitudinal and circumferential tumour margins in the OGAA were defined as tumour identifiable 1 mm or less, in accordance with the Royal College of Pathologists guidance 22 .
Ethical approval and data sharing for OGAA Ethical approval was dependent on local protocols and was country specific. It was the responsibility of the local principal investigator of the enrolled unit to ensure appropriate ethical or audit approval was gained prior to commencement of the study. In the UK, the study was registered at each site as either a clinical audit or service evaluation, as it was an observational study designed to collect routine, anonymized data, with no change to the clinical care pathway.

Statistical methods
Initially, cohort characteristics and outcomes were compared across the four groups of trainee involvement. Continuous variables were analysed using Kruskal-Wallis tests, and reported as mean (s.d.) if approximately normally distributed, with median and interquartile range (i.q.r.) used otherwise. Ordinal variables were also assessed using Kruskal-Wallis tests, with v 2 tests used for nominal variables.
For the primary outcomes, comparisons across the groups were then repeated using a generalized estimating equation approach, in order to account for potential non-independence of outcomes for patients treated at the same centre. As such, the centre was set as the subject variable, and the patient ID was the within-subject variable, with an exchangeable correlation structure used. All outcomes considered in this analysis were dichotomous; hence, a binary logistic model was used. Initially, univariable models were produced for each outcome, with the trainee involvement being the only independent variable. Multivariable models were then produced, to adjust for other potentially confounding factors. These used a backwards stepwise approach (removal at P > 0.1) to select other patient-, tumour-, and treatment-related factors for inclusion in the model. The goodness-of-fit of continuous variables was assessed graphically prior to producing the final model, with variables being divided into categories and treated as nominal where poor fit was detected. Where non-convergence of the model occurred owing to small within-group sample sizes, the offending variables were identified, and had categories combined to increase within-group sample sizes, where possible. Where this could not be meaningfully performed, patients from the affected category were excluded. The performance of the final multivariable models was quantified using the area under the receiver operating characteristic curve (AUROC), and Hosmer-Lemeshow tests.
All analyses were performed using SPSS 22 (IBM, Armonk, NY, USA), with P < 0.05 deemed to be indicative of statistical significance throughout.

Cohort characteristics
Data were available for a total of 2247 oesophagectomies from 137 centres, of which 106, 30, and one were from high-, medium-, and low-income countries, respectively. Contributing centres had a median of three (i.q.r. [2][3][4] surgeons, 700 (i.q.r. 350-1020) total hospital beds, and 24 (i.q.r. 14-36) ICU beds. Seventy-one per cent of centres had a 24-hour on-call oesophageal surgery specialist, and 68.7 per cent had a 24-hour on-call interventional radiology specialist. Of the procedures recorded, 15 were excluded, either because no anastomosis was performed (four procedures), no anastomosis site was recorded (five procedures), or details of trainee involvement were not recorded (six procedures). As such, a total of 2232 procedures were included in the final analysis. Of these, 650 (29.1 per cent) had trainee involvement, with a trainee performing only the 'abdominal' phase in 230 procedures (T abdomen , 10.3 per cent), only the 'chest and/or neck' phase in 130 procedures (T chest , 4.7 per cent), and 'both' phases in 315 procedures (T abdomenþchest , 14.1 per cent).
The proportion of procedures with trainee involvement was found to differ significantly across centres (P < 0.001, Fig. 1). For the 41 centres that contributed more than 20 procedures to the analysis, the proportion of trainee-involved procedures ranged from 0 per cent (in 10 centres) to 100 per cent (in one centre). There was no evidence of a significant correlation between the centre volume and the proportion of trainee-involved oesophagectomies (Spearman's rho: À0.046 (P ¼ 0.775), Fig. 1b). However, trainee involvement rates were found to differ significantly by continent (P < 0.001; Fig. S1), with the lowest rates in Africa (17.7 per cent) and Europe (23.8 per cent), and the highest rates in Asia (62.5 per cent) and North America (75.2 per cent).
For subsequent analysis, procedures were divided by the site of the anastomosis, with 1722 (77.2 per cent) being located in the chest and 510 (22.8 per cent) in the neck.

Baseline characteristics
In procedures with an anastomosis in the chest, a trainee performed only the abdominal phase in 175 procedures (T abdomen , 10.2 per cent), only the chest phase in 93 procedures (T chest , 5.4 per cent), both phases in 198 procedures (T abdomenþchest , 11.5 per cent), and in neither phase in 1256 (T neither , 72.9 per cent). No significant differences in the distributions of age, sex, BMI, or ASA grade were detected between these four groups ( Table 1). However, significant differences in rates of cardiovascular disease, current smokers, and squamous cell carcinoma histology were observed, all of which were more frequent in the T neither group. In addition, a significant difference in Eastern Cooporative Oncology Group (ECOG) status was observed, being lower in the T chest group, while the Charlson Comorbidity Index (CCI) was significantly lower in the T abdomenþchest group.
Comparison of the approach to treatment and surgery across the groups found several significant differences, including the use of pre-and postoperative nutritional support, neoadjuvant therapy, anastomotic technique, operative approach, and gastric tube size ( Table 2).

Postoperative outcomes
On univariable analysis, 90-day mortality rates were found to be similar across the four groups (P ¼ 0.451; Table 3). Secondary outcomes, including 30-day mortality (P ¼ 0.587), major complication rates (P ¼ 0.933), and the proportion of cases requiring return to theatre (P ¼ 0.382), were also not found to differ significantly between groups. However, a significant difference in the composite rate of anastomotic leak or conduit necrosis was observed, which was higher in procedures without trainee involvement than those where trainees completed at least one phase (14.0 per cent versus 6.3-11.6 per cent; P ¼ 0.030). Duration of surgery also differed significantly between groups (P < 0.001), being longer when the trainee was involved in any phase of the procedure. The overall length of stay also differed significantly between groups (P ¼ 0.010), tending to be shorter in procedures with trainee involvement (median 11 versus 12 days), while ICU stay tended to be longer in the T chest and T abdomenþchest groups (median 4 days versus 3 days in other groups; P ¼ 0.009). The total number of lymph nodes removed and the rate of positive margins was not significantly different between groups (P ¼ 0.261 and P ¼ 0.129, respectively).
Rates of 90-day mortality, anastomotic leaks or conduit necrosis, and Clavien-Dindo Grade III-V complications were then assessed using multivariable analysis ( Table 4; Tables S1 and S2). It was not possible to produce a reliable multivariable analysis of 90-day mortality, in light of the low event rate. For the other outcomes assessed, the multivariable models had reasonable performance, with AUROCs of 0.60-0.65, and P > 0.05 on Hosmer-Lemeshow tests. Overall rates of anastomotic leaks or conduit necrosis were significantly reduced in trainee-involved procedures on multivariable analysis (P ¼ 0.043), with the adjusted

Baseline characteristics
The analysis was then repeated for the subgroup of 510 procedures with anastomoses in the neck. Of these, a trainee performed only the abdominal phase in 55 procedures (T abdomen , 10.8 per cent), the chest and/or neck phase only in 12 procedures (T chest , 2.4 per cent), both phases in 117 procedures (T abdomenþchest , 22.9 per cent), and in neither phase in 326 procedures (T neither , 63.9 per cent). Comparison of cohort characteristics across these groups found significant differences in age, ECOG status, CCI, smoking status, and tumour location ( Table 5).
In addition, differences in a range of factors relating to the operative approach were observed ( Table 6).

Postoperative outcomes
Univariable analysis found significant differences in both 90-day mortality rates (P ¼ 0.005) and Clavien-Dindo Grade III-V complication rates (P ¼ 0.028) between the four groups, both of which were lower in the groups with trainee involvement ( Table 7).
There was no significant difference in the rate of anastomotic leak/conduit necrosis (P ¼ 0.430). Duration of surgery was not significantly different between groups (P ¼ 0.133). The overall length of stay and ICU stay were significantly shorter in procedures with trainee involvement (P ¼ 0.013 and P ¼ 0.033, respectively). The number of lymph nodes removed did not significantly differ between groups (P ¼ 0.220); however, procedures without trainee involvement were significantly more likely to have positive margins (P ¼ 0.041). Multivariable analysis was not possible for the outcome of 90day mortality, on account of the small number of events. After adjustment for other confounding factors, the difference between groups in the rate of Clavien-Dindo grade III-V complications became non-significant (P ¼ 0.185; Table 8 and Tables S1 and S2).

Discussion
This analysis of an international multicentre cohort found no evidence to suggest that trainee involvement in oesophagectomy negatively impacts on postoperative outcome. Postoperative mortality, anastomotic leak rate, and complications were not significantly inferior when a trainee performed all or part of an oesophagectomy. Importantly, some key postoperative outcome measures, including the anastomotic leak rate and length of stay  (24.4) in patients with an anastomosis in the chest, were found to be superior in procedures with trainee involvement. Patients undergoing oesophagectomy with trainee involvement were found to be significantly less comorbid and had different treatment approaches, as compared to consultant-performed oesophagectomy, suggesting that appropriate patient selection for training procedures occurred, which may have helped ensure safe patient outcomes. Concerns exist about the safety of trainee involvement in complex surgery. For example, trainee-performed hepatectomy and pancreatectomy have been shown to be associated with increased complication rates and operative times 23 . In the case of   Univariable analyses are from generalized estimating equation models, accounting for correlations between procedures from the same centre. Multivariable analyses extend these models to additionally adjust for all factors in Tables 1 and 2-full details of the multivariable models are reported in Tables S1 and S2. Bold P-values are significant at < 0.05. OR, odds ratio; c.i., confidence interval. *It was not possible to produce a multivariable model of 90 day mortality, due to the low event rate.   performed by a trainee. They also found no significant differences in patient comorbidities or preoperative treatment, or in postoperative morbidity or mortality between procedures with or without trainee involvement. Finally, Baron et al. reported on a similar case mix, including 241 open thoracoabdominal two-and three-stage oesophagectomies, 35 per cent of which were performed by a trainee. However, they found trainee-performed oesophagectomies to have significantly higher anastomotic leak rates (consultant 7 per cent versus trainee 20 per cent), although this did not lead to a significant difference in postoperative mortality or survival between the two groups 4 . Saliba et al., using NSQIP data, reported outcomes from nine different surgical specialties on 1 349 684 patients, appraising trainee and patient outcomes. Procedures with trainee  (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24) 18 (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29) 16 (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) 18 (14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24) 0.220 Continuous data are reported as median (i.q.r.), with P-values obtained from Kruskal-Wallis tests. Categorical data are reported as n (column %), with P-values obtained from v 2 tests. Bold P-values are significant at < 0.05. involvement were performed on younger, more functionally independent patients with a lower BMI but higher ASA grades. Subsequent postoperative morbidity was comparable, operative duration was longer, and overall mortality was lower when trainees were involved 24 . A similar analysis of NSQIP data by Ferraris et al. in 266 411 procedures used propensity matching to control for baseline differences between groups, and showed that, although procedures with trainee involvement were associated with increased morbidity, mortality rates were comparable 25 .
Patients with similar levels of complications were increasingly likely to suffer 'failure to rescue' in consultant-performed cases, demonstrating that trainee involvement may be protective for patients. In a NSQIP analysis by Cobb et al., propensity-matched patients undergoing oesophagectomy had a significantly lower mortality in trainee-performed cases 8 . Khoushhal et al. evaluated outcomes for 5142 oesophagectomies from the NSQIP database and found that neither surgical specialty (cardiothoracic, general surgery) nor trainee involvement influenced mortality 9 . However, a major limitation when evaluating trainee involvement in procedures using NSQIP data is that the NSQIP definition of trainee involvement is trainee 'in' or 'not in' the operating room, and does not define the degree to which the trainee is involved (assisting or performing), as is the case of the presented analysis of the OGAA cohort.
There remains a lack of data analysing the effect of training in minimally invasive oesophagectomy (MIE). The OGAA study provides postoperative outcomes on 2232 oesophagectomies; trainees were involved in 28.4 per cent of open versus 26.4 per cent of hybrid and 32.9 per cent of MIE procedures, demonstrating that modern trainees are receiving similar levels of exposure to both open and minimally invasive oesophageal surgery. The learning curve associated with MIE is associated with a significant increase in postoperative morbidity, including an associated increase in anastomotic leak rates of an additional 10 per cent 26,27 . In a retrospective study of 2121 consultant-performed Ivor Lewis MIEs, Claassen et al. showed that the length of the learning curve for textbook outcome was 46 cases, after which a plateau was reached, with 44.0 per cent achieving a textbook outcome 28 . Evidence from novel robotic oesophagectomy training programmes shows that safety can be maintained while reducing the learning curve to 22 cases 29 .
The current study has some limitations. There was considerable variability in rates of trainee involvement between centres, which varied from 0 to 100 per cent. This may have contributed to the significant differences between groups in the factors relating to the operative approach, which is generally centre related, and so may have introduced bias, particularly if higher-quality centres were more likely to engage in training. In an attempt to negate such confounding, multivariable models were used to adjust for within-centre correlation of outcomes, and for baseline differences between groups. However, these were limited by the small within-group sample sizes and the low event rates for some outcomes, hence residual confounding may remain. The small within-group sample sizes, particularly for the subgroup of neck anastomoses, will also have reduced the statistical power of the comparisons across the groups of trainee involvement. This will have increased the minimal detectable effect sizes, resulting in an increased false-negative rate. When defining the groups, surgeons were classified as either 'consultant' or 'trainee'. However, training grade is known to convey differing levels of autonomy and ability, depending on the level of experience. Owing to the variability in nomenclature and training structures across countries, it was not possible to ascertain further details about the grade or level of experience of trainees; hence, it was not possible to assess variability in outcome within subgroups of trainees 30 . It was also not possible to identify cases where the primary surgeon was a consultant who had taken over from the trainee due to operative difficulty which, in such cases, may be at higher risk of negative outcomes, owing to increased operative difficulty.  Univariable analyses are from generalised estimating equation models, accounting for correlations between procedures from the same centre. Multivariable analyses extend these models to additionally adjust for all factors in Tables 1 and 2-full details of the multivariable models are reported in Tables S1 and S2. Bold P-values are significant at < 0.05. OR, odds ratio; *OR represents a comparison of any trainee involvement versus no trainee involvement, due to the within-group sample sizes being insufficient to produce a reliable model comparing across four groups. †It was not possible to produce a multivariable model of 90-day mortality, due to the low event rate.