-
PDF
- Split View
-
Views
-
Cite
Cite
Mark D Danese, John S Groundland, Effect of chemotherapy and surgery timing on mortality in upper and lower extremity osteosarcoma, JNCI: Journal of the National Cancer Institute, Volume 117, Issue 4, April 2025, Pages 611–618, https://doi.org/10.1093/jnci/djae229
- Share Icon Share
Abstract
Surgery with neoadjuvant and adjuvant chemotherapy represents the standard of care for extremity osteosarcoma despite a lack of high-quality evidence for its use, and trial evidence that suggests upfront surgery may result in better outcomes. This study estimated the difference in overall survival for the standard of care (neoadjuvant first) vs upfront surgery first followed by adjuvant chemotherapy (surgery first).
Using Surveillance, Epidemiology, and End Results data, we identified patients aged 5-29 years diagnosed with a primary cancer of upper or lower extremity osteosarcoma between 2007 and 2019 who received surgery and chemotherapy. Our primary endpoint was the 5-year survival difference between the surgery first and neoadjuvant first groups.
Adjusted 5-year survival was 74% for surgery first patients and 67% for neoadjuvant first patients, with a survival difference of 6.9% (95% confidence interval [CI] = −4.2% to 16.1%). In sensitivity analyses of 5-year survival, the results were consistent, showing a 6.8%-13.7% higher 5-year survival in surgery first patients. Statistically significant mortality risk factors included older age, larger tumor size, the type of resection (salvage vs amputation), and stage III-IV disease (vs stage I-II disease).
The evidence supporting neoadjuvant therapy in osteosarcoma care is weak. However, there is evidence that pausing chemotherapy in the perisurgical period might affect outcomes. Consequently, this study, and its consistency with the results from the only randomized trial to address this question, suggests that there is reason to revisit a prospective, randomized trial of osteosarcoma treatment regarding the timing of surgery and chemotherapy.
The treatment paradigm for osteosarcoma changed in the mid- to late 1970s with the advent of multiagent, cytotoxic chemotherapy (1,2). Prior to chemotherapy, even with aggressive, ablative surgical care, 5-year survival was only 10%-20% (3-5). With the development of multiagent chemotherapy, 5-year survival rates quickly rose to 60%-70% (6-10).
During the early years of chemotherapy development, the various regimens were assessed by monitoring the response of the primary tumor to each combination of medicine (11-13). Effectiveness was judged through a variety of measures, including imaging response of the tumor during treatment, pathology assessment of tumor viability described at time of surgical resection, event-free survival, and overall survival.
Shortly thereafter, in 1986, Simon et al. (14) demonstrated equivalent survival between amputation and limb-salvage patients when chemotherapy was part of the treatment regimen. This set the stage for a comprehensive treatment regimen that encompassed chemotherapy and limb-sparing surgery. Because of the Simon et al. (14) results, endoprostheses became popular for surgical reconstruction following en bloc resections of osteosarcoma. However, metallic endoprostheses were not readily available for off-the-shelf implantation in the 1980s (15-18). Rather, custom implants were manufactured on a case-by-case basis or the early modular implants required production and shipment to treatment centers. This took time, and the parallel evolutions of chemotherapy and limb-salvage surgeries created a paradigm, out of necessity, of starting chemotherapy prior to surgical resection and reconstruction. Once this treatment algorithm was established, it quickly became the standard of care for high-grade osteosarcoma, remaining unchanged to this day: neoadjuvant multiagent, cytotoxic chemotherapy followed by limb-salvage surgery followed by adjuvant chemotherapy (19,20).
The National Comprehensive Cancer Network (NCCN) has adopted this treatment strategy as a category 1 recommendation, signifying a recommendation “based upon high-level evidence” and that “there is uniform NCCN consensus that the intervention is appropriate” (20). Despite the near universal adoption of this treatment algorithm, clear, high-level evidence to suggest that a course of neoadjuvant chemotherapy provides better overall survival or improved surgical outcomes compared with upfront surgery is lacking.
In 2003, Goorin et al. (21) published a prospective, randomized, multi-institutional trial comparing the clinical effectiveness of a course of neoadjuvant chemotherapy with upfront surgery in the treatment of high-grade osteosarcomas with no evidence of metastasis on presentation. The total chemotherapy dose and length of treatment (44 weeks) was the same between the groups, allowing a direct comparison of surgery at week 0 vs surgery at week 10 after diagnosis. Although some other studies had investigated timing of surgery relative to chemotherapy as a peripheral variable in the treatment of osteosarcoma, the Goorin et al. (21) study is the only prospective study that compared the timing of surgery while maintaining equivalent overall doses of chemotherapy.
Their original study design specified a 215-person study with 80% power to detect a 15% difference in event-free survival. Because of difficulty in enrollment, in part because of already established treatment preferences, Goorin et al. eventually enrolled only 100 patients and found a 5-year event-free survival of 69% in the upfront surgery group vs 61% in the neoadjuvant group. With their limited enrollment, this difference was not statistically significant. Despite the clearly noninferior performance and point estimates in favor of the upfront surgery group, there have been no further studies comparing these strategies in terms of their effects on overall survival in high-grade extremity osteosarcoma. As such, the literature remains incomplete.
The purpose of this study is to investigate the 5-year survival of patients presenting with extremity osteosarcoma, comparing patients who received upfront surgical resection followed by chemotherapy (surgery first) with patients who received neoadjuvant chemotherapy prior to surgical resection and followed by adjuvant chemotherapy (neoadjuvant first). Data from the US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program were used to maximize the number of patients available for analysis.
Methods
Data source
We identified patients from the SEER Program of the NCI. The SEER program collects detailed information about cancer location, stage, grade, histology, initial course of cancer directed therapy, and mortality. Currently SEER registries cover approximately 48% of the US population, but this varies based on the number of registries included and the time period of the data used for analysis. For the current study, we used the most recent data from the SEER 17 cancer registries, which covered patients diagnosed from 2000 through 2020, with follow-up data on survival available through the end of 2020 (22). Data were accessed through NCI SEER*Stat software. Because SEER data are a limited dataset requiring a data use agreement, the study was exempt from ethics committee approval.
The use of chemotherapy was based on a restricted-use SEER variable that is coded “yes” or “no or unknown” by SEER. Analyses by NCI showed that “yes” has a positive predictive value of at least 85%, but the remaining unknown treatment patients reflect a mixture of treated and untreated patients and cannot be reliably characterized (23). Patients with unknown treatment were not used in this study.
Cohort creation
To be included, patients must have been diagnosed with a primary osteosarcoma (histology 9180-9187, 9195) between 2007 and 2019. This date range was chosen because patients diagnosed prior to 2007 do not have data on chemotherapy and surgery sequencing, and we wanted to allow for at least 1 year of mortality follow-up. Osteosarcoma must have been microscopically confirmed and not initially diagnosed at the time of death. Patients must have had complete follow-up data. These initial selection criteria yielded 3552 patients. Our final cohort was limited to patients meeting the following additional selection criteria: patients aged 5-29 years at diagnosis, diagnosis of primary upper or lower extremity conventional or chondroblastic osteosarcoma (accounting for 92% of all tumors in this age range in the SEER data), and treatment with surgery and chemotherapy. Other histologic subdivisions, such as telangiectatic, fibroblastic, periosteal, and nonosteogenic osteosarcomas, were excluded because they were likely to be associated with mortality and their rarity was expected to make matching or adjustment virtually impossible. Patients were also excluded if they had missing staging, tumor size, treatment sequencing data, or time to treatment. Patients must have received initial treatment within 3 months of diagnosis.
In these analyses, the goal was to estimate the per protocol treatment effect for surgery first vs neoadjuvant first therapy. Per protocol was defined as patients who initiated chemotherapy after surgery (ie, patients who planned to initiate chemotherapy after surgery but did not were not included). The time of chemotherapy initiation after surgery was defined as the index date, which defined the start of time at risk for death. Because only time from diagnosis to the first treatment in the sequence was available, we made the following assumptions, in keeping with typical clinical practice. Surgery first patients were assumed to begin postsurgical chemotherapy 1 month after surgery, and neoadjuvant first patients were assumed to begin postsurgical chemotherapy 4 months after initial chemotherapy (ie, allowing 3 months for neoadjuvant chemotherapy and 1 month for surgery).
Variables and follow-up
Surgery first and neoadjuvant first patients were identified using SEER variables for the systemic therapy and surgery sequence. Variables to address confounding between treatment and mortality were selected based on clinical knowledge and the availability of data. These included age (continuous), sex (male, female), race (non-Hispanic Black, non-Hispanic White, Hispanic, all other), tumor size (continuous), histological subtype (conventional, chrondroblastic), metropolitan area (≥1 million, <1 million), diagnosis year (2007-2010, 2011-2014, 2015-2019), surgery type (salvage, amputation), and stage (I-II, III-IV). Age and tumor size were split into categories to enable matching in the sensitivity analyses.
Patient follow-up was censored at the end of the SEER administrative reporting date (December 31, 2020) for this data.
Statistical analysis
Our primary endpoint was the 5-year survival difference between the surgery first and neoadjuvant first groups. SEER does not capture progression or relapse as part of its data collection; therefore, event-free survival could not be estimated. Because the sample size for surgery first patients was small and potentially highly selected, we included several sensitivity analyses of the treatment effect.
In our primary analysis, we estimated the adjusted 5-year survival difference on the basis of the estimated average treatment effect (or the treatment effect for all patients in the study) using flexible parametric survival models and g-computation (24-25). We tested the proportional hazards, proportional odds, and normal distribution models with 0-5 internal knots by evaluating the model Akaike information criterion. We selected the proportional odds formulation with 1 internal knot because it had the lowest Akaike information criterion suggesting that other options did not yield statistically better fitting models. The 5-year risk differences were estimated on the basis of marginal effects using g-computation. Confidence intervals (CIs) were estimated using 2000 bootstrap samples.
We also conducted 2 sensitivity analyses using matching before g-computation. First, we used full matching (with a nonfixed ratio of matches) using the same variables used in the main analyses and estimated the average treatment effect for the treated (or the treatment effect for the surgery first patients). Second, we used 2:1 cardinality matching in place of full matching. In both matching analyses, all surgery first patients were used. In both analyses, we used g-computation after matching, adjusted for the same variables as in the matched analysis, with standard errors estimated using 2000 bootstraps of the matching and g-computation process.
We also conducted exploratory analyses of factors associated with the use of surgery first (as part of creating propensity scores), unadjusted overall survival based on the Kaplan–Meier estimator, and factors associated with survival using Cox proportional hazards regression.
All analyses were conducted using R (version 4.4.1). Survival analyses were conducted using the flexsurv package for spline-based flexible parametric survival models. Matching was performed using the MatchIt package.
Results
Patient characteristics and follow-up time
Our study had 792 patients: 67 (8.5%) in the surgery first group and 725 (91.5%) in the neoadjuvant first group (Table 1; see Supplementary Table 1, available online, for the cohort attrition details). Surgery first patients were slightly older (aged 16.9 vs 14.0 years), slightly more likely to have chondroblastic histology (21% vs 16%), and slightly less likely to receive amputation (21% vs 24%). Other clinical baseline characteristics were similar between the surgery first and neoadjuvant first groups including male sex (55% vs 58%), stage I-II (78% vs 77%), lower extremity osteosarcoma (84% for both), and tumor size (11 mm for both).
Characteristics . | All . | Surgery first . | Neoadjuvant first . |
---|---|---|---|
No. (%) . | No. (%) . | No. (%) . | |
(n = 792) . | (n = 67) . | (n = 725) . | |
Age at diagnosis, y | |||
Age, mean (SD) | 14.3 (4.8) | 16.9 (5.5) | 14.0 (4.7) |
5-9 | 128 (16.2) | 6 (9.0) | 122 (16.8) |
10-14 | 298 (37.6) | 17 (25.4) | 281 (38.8) |
15-19 | 265 (33.5) | 28 (41.8) | 237 (32.7) |
20-24 | 75 (9.5) | 10 (14.9) | 65 (9.0) |
25-29 | 26 (3.3) | 6 (9.0) | 20 (2.8) |
Sex and race and ethnicity | |||
Male | 459 (58.0) | 37 (55.2) | 422 (58.2) |
Hispanic | 259 (32.7) | 26 (38.8) | 233 (32.1) |
Non-Hispanic Black | 107 (13.5) | 8 (11.9) | 99 (13.7) |
Non-Hispanic White | 328 (41.4) | 22 (32.8) | 306 (42.2) |
Other race and ethnicitya | 98 (12.4) | 11 (16.4) | 87 (12.0) |
Stage at diagnosis | |||
Stage I-II | 607 (76.6) | 52 (77.6) | 555 (76.6) |
Stage III-IV | 185 (23.4) | 15 (22.4) | 170 (23.4) |
Year of diagnosis | |||
2007-2010 | 187 (23.6) | 18 (26.9) | 169 (23.3) |
2011-2014 | 243 (30.7) | 20 (29.9) | 223 (30.8) |
2015-2019 | 362 (45.7) | 29 (43.3) | 333 (45.9) |
Metropolitan area | |||
>1 million vs smaller or rural areas | 503 (63.5) | 46 (68.7) | 457 (63.0) |
Site | |||
Lower extremity | 668 (84.3) | 56 (83.6) | 612 (84.4) |
Upper extremity | 124 (15.7) | 11 (16.4) | 113 (15.6) |
Size, cm | |||
Size, mean (SD) | 11.2 (5.4) | 11.1 (5.1) | 11.2 (5.4) |
0.1-3.9 | 23 (2.9) | 3 (4.5) | 20 (2.8) |
4.0-7.9 | 191 (24.1) | 16 (23.9) | 175 (24.1) |
8.0-11.9 | 287 (36.2) | 25 (37.3) | 262 (36.1) |
12.0-15.9 | 166 (21.0) | 12 (17.9) | 154 (21.2) |
16.0-19.9 | 73 (9.2) | 6 (9.0) | 67 (9.2) |
≥20.0 | 52 (6.6) | 5 (7.5) | 47 (6.5) |
Histology | |||
Conventional | 661 (83.5) | 53 (79.1) | 608 (83.9) |
Chondroblastic | 131 (16.5) | 14 (20.9) | 117 (16.1) |
Type of surgery | |||
Amputation | 185 (23.4) | 14 (20.9) | 171 (23.6) |
Salvage | 607 (76.6) | 53 (79.1) | 554 (76.4) |
Study time, mean (SD) | |||
Time to treatment | 1.0 (0.6) | 1.1 (0.7) | 1.0 (0.6) |
Follow-up time | 60.3 (44.0) | 64.9 (42.4) | 59.9 (44.1) |
Characteristics . | All . | Surgery first . | Neoadjuvant first . |
---|---|---|---|
No. (%) . | No. (%) . | No. (%) . | |
(n = 792) . | (n = 67) . | (n = 725) . | |
Age at diagnosis, y | |||
Age, mean (SD) | 14.3 (4.8) | 16.9 (5.5) | 14.0 (4.7) |
5-9 | 128 (16.2) | 6 (9.0) | 122 (16.8) |
10-14 | 298 (37.6) | 17 (25.4) | 281 (38.8) |
15-19 | 265 (33.5) | 28 (41.8) | 237 (32.7) |
20-24 | 75 (9.5) | 10 (14.9) | 65 (9.0) |
25-29 | 26 (3.3) | 6 (9.0) | 20 (2.8) |
Sex and race and ethnicity | |||
Male | 459 (58.0) | 37 (55.2) | 422 (58.2) |
Hispanic | 259 (32.7) | 26 (38.8) | 233 (32.1) |
Non-Hispanic Black | 107 (13.5) | 8 (11.9) | 99 (13.7) |
Non-Hispanic White | 328 (41.4) | 22 (32.8) | 306 (42.2) |
Other race and ethnicitya | 98 (12.4) | 11 (16.4) | 87 (12.0) |
Stage at diagnosis | |||
Stage I-II | 607 (76.6) | 52 (77.6) | 555 (76.6) |
Stage III-IV | 185 (23.4) | 15 (22.4) | 170 (23.4) |
Year of diagnosis | |||
2007-2010 | 187 (23.6) | 18 (26.9) | 169 (23.3) |
2011-2014 | 243 (30.7) | 20 (29.9) | 223 (30.8) |
2015-2019 | 362 (45.7) | 29 (43.3) | 333 (45.9) |
Metropolitan area | |||
>1 million vs smaller or rural areas | 503 (63.5) | 46 (68.7) | 457 (63.0) |
Site | |||
Lower extremity | 668 (84.3) | 56 (83.6) | 612 (84.4) |
Upper extremity | 124 (15.7) | 11 (16.4) | 113 (15.6) |
Size, cm | |||
Size, mean (SD) | 11.2 (5.4) | 11.1 (5.1) | 11.2 (5.4) |
0.1-3.9 | 23 (2.9) | 3 (4.5) | 20 (2.8) |
4.0-7.9 | 191 (24.1) | 16 (23.9) | 175 (24.1) |
8.0-11.9 | 287 (36.2) | 25 (37.3) | 262 (36.1) |
12.0-15.9 | 166 (21.0) | 12 (17.9) | 154 (21.2) |
16.0-19.9 | 73 (9.2) | 6 (9.0) | 67 (9.2) |
≥20.0 | 52 (6.6) | 5 (7.5) | 47 (6.5) |
Histology | |||
Conventional | 661 (83.5) | 53 (79.1) | 608 (83.9) |
Chondroblastic | 131 (16.5) | 14 (20.9) | 117 (16.1) |
Type of surgery | |||
Amputation | 185 (23.4) | 14 (20.9) | 171 (23.6) |
Salvage | 607 (76.6) | 53 (79.1) | 554 (76.4) |
Study time, mean (SD) | |||
Time to treatment | 1.0 (0.6) | 1.1 (0.7) | 1.0 (0.6) |
Follow-up time | 60.3 (44.0) | 64.9 (42.4) | 59.9 (44.1) |
88% of the Other race and ethnicity is classified as Asian or Pacific Islander. Results presented as % (count) for categorical variables and mean (SD) for continuous variables.
Characteristics . | All . | Surgery first . | Neoadjuvant first . |
---|---|---|---|
No. (%) . | No. (%) . | No. (%) . | |
(n = 792) . | (n = 67) . | (n = 725) . | |
Age at diagnosis, y | |||
Age, mean (SD) | 14.3 (4.8) | 16.9 (5.5) | 14.0 (4.7) |
5-9 | 128 (16.2) | 6 (9.0) | 122 (16.8) |
10-14 | 298 (37.6) | 17 (25.4) | 281 (38.8) |
15-19 | 265 (33.5) | 28 (41.8) | 237 (32.7) |
20-24 | 75 (9.5) | 10 (14.9) | 65 (9.0) |
25-29 | 26 (3.3) | 6 (9.0) | 20 (2.8) |
Sex and race and ethnicity | |||
Male | 459 (58.0) | 37 (55.2) | 422 (58.2) |
Hispanic | 259 (32.7) | 26 (38.8) | 233 (32.1) |
Non-Hispanic Black | 107 (13.5) | 8 (11.9) | 99 (13.7) |
Non-Hispanic White | 328 (41.4) | 22 (32.8) | 306 (42.2) |
Other race and ethnicitya | 98 (12.4) | 11 (16.4) | 87 (12.0) |
Stage at diagnosis | |||
Stage I-II | 607 (76.6) | 52 (77.6) | 555 (76.6) |
Stage III-IV | 185 (23.4) | 15 (22.4) | 170 (23.4) |
Year of diagnosis | |||
2007-2010 | 187 (23.6) | 18 (26.9) | 169 (23.3) |
2011-2014 | 243 (30.7) | 20 (29.9) | 223 (30.8) |
2015-2019 | 362 (45.7) | 29 (43.3) | 333 (45.9) |
Metropolitan area | |||
>1 million vs smaller or rural areas | 503 (63.5) | 46 (68.7) | 457 (63.0) |
Site | |||
Lower extremity | 668 (84.3) | 56 (83.6) | 612 (84.4) |
Upper extremity | 124 (15.7) | 11 (16.4) | 113 (15.6) |
Size, cm | |||
Size, mean (SD) | 11.2 (5.4) | 11.1 (5.1) | 11.2 (5.4) |
0.1-3.9 | 23 (2.9) | 3 (4.5) | 20 (2.8) |
4.0-7.9 | 191 (24.1) | 16 (23.9) | 175 (24.1) |
8.0-11.9 | 287 (36.2) | 25 (37.3) | 262 (36.1) |
12.0-15.9 | 166 (21.0) | 12 (17.9) | 154 (21.2) |
16.0-19.9 | 73 (9.2) | 6 (9.0) | 67 (9.2) |
≥20.0 | 52 (6.6) | 5 (7.5) | 47 (6.5) |
Histology | |||
Conventional | 661 (83.5) | 53 (79.1) | 608 (83.9) |
Chondroblastic | 131 (16.5) | 14 (20.9) | 117 (16.1) |
Type of surgery | |||
Amputation | 185 (23.4) | 14 (20.9) | 171 (23.6) |
Salvage | 607 (76.6) | 53 (79.1) | 554 (76.4) |
Study time, mean (SD) | |||
Time to treatment | 1.0 (0.6) | 1.1 (0.7) | 1.0 (0.6) |
Follow-up time | 60.3 (44.0) | 64.9 (42.4) | 59.9 (44.1) |
Characteristics . | All . | Surgery first . | Neoadjuvant first . |
---|---|---|---|
No. (%) . | No. (%) . | No. (%) . | |
(n = 792) . | (n = 67) . | (n = 725) . | |
Age at diagnosis, y | |||
Age, mean (SD) | 14.3 (4.8) | 16.9 (5.5) | 14.0 (4.7) |
5-9 | 128 (16.2) | 6 (9.0) | 122 (16.8) |
10-14 | 298 (37.6) | 17 (25.4) | 281 (38.8) |
15-19 | 265 (33.5) | 28 (41.8) | 237 (32.7) |
20-24 | 75 (9.5) | 10 (14.9) | 65 (9.0) |
25-29 | 26 (3.3) | 6 (9.0) | 20 (2.8) |
Sex and race and ethnicity | |||
Male | 459 (58.0) | 37 (55.2) | 422 (58.2) |
Hispanic | 259 (32.7) | 26 (38.8) | 233 (32.1) |
Non-Hispanic Black | 107 (13.5) | 8 (11.9) | 99 (13.7) |
Non-Hispanic White | 328 (41.4) | 22 (32.8) | 306 (42.2) |
Other race and ethnicitya | 98 (12.4) | 11 (16.4) | 87 (12.0) |
Stage at diagnosis | |||
Stage I-II | 607 (76.6) | 52 (77.6) | 555 (76.6) |
Stage III-IV | 185 (23.4) | 15 (22.4) | 170 (23.4) |
Year of diagnosis | |||
2007-2010 | 187 (23.6) | 18 (26.9) | 169 (23.3) |
2011-2014 | 243 (30.7) | 20 (29.9) | 223 (30.8) |
2015-2019 | 362 (45.7) | 29 (43.3) | 333 (45.9) |
Metropolitan area | |||
>1 million vs smaller or rural areas | 503 (63.5) | 46 (68.7) | 457 (63.0) |
Site | |||
Lower extremity | 668 (84.3) | 56 (83.6) | 612 (84.4) |
Upper extremity | 124 (15.7) | 11 (16.4) | 113 (15.6) |
Size, cm | |||
Size, mean (SD) | 11.2 (5.4) | 11.1 (5.1) | 11.2 (5.4) |
0.1-3.9 | 23 (2.9) | 3 (4.5) | 20 (2.8) |
4.0-7.9 | 191 (24.1) | 16 (23.9) | 175 (24.1) |
8.0-11.9 | 287 (36.2) | 25 (37.3) | 262 (36.1) |
12.0-15.9 | 166 (21.0) | 12 (17.9) | 154 (21.2) |
16.0-19.9 | 73 (9.2) | 6 (9.0) | 67 (9.2) |
≥20.0 | 52 (6.6) | 5 (7.5) | 47 (6.5) |
Histology | |||
Conventional | 661 (83.5) | 53 (79.1) | 608 (83.9) |
Chondroblastic | 131 (16.5) | 14 (20.9) | 117 (16.1) |
Type of surgery | |||
Amputation | 185 (23.4) | 14 (20.9) | 171 (23.6) |
Salvage | 607 (76.6) | 53 (79.1) | 554 (76.4) |
Study time, mean (SD) | |||
Time to treatment | 1.0 (0.6) | 1.1 (0.7) | 1.0 (0.6) |
Follow-up time | 60.3 (44.0) | 64.9 (42.4) | 59.9 (44.1) |
88% of the Other race and ethnicity is classified as Asian or Pacific Islander. Results presented as % (count) for categorical variables and mean (SD) for continuous variables.
Mean follow-up time was 65 months (median = 53.5, interquartile range [IQR] = 30.5-92.5 months) for the surgery first group and 60 months (median = 47.5, IQR = 21.5-91.5 months) for the neoadjuvant first group. At 5 years, 30.5% of surgery first patients and 32.8% of neoadjuvant first patients had been administratively censored.
Primary analyses
In our primary analysis (Table 2), expected 5-year survival was 74% for surgery first patients and 67% for neoadjuvant first patients, with a survival difference of 6.9% (95% CI = -4.2% to 16.1%). Figure 1 shows the survival difference results through 120 months (10 years) of follow-up. In sensitivity analyses of 5-year survival, the results were consistent, showing a 6.8%-13.7% higher 5-year survival in surgery first patients, depending on the analyses (Table 2). Additional results for covariate balance are provided in Supplementary Figures 1 and 2 (available online). Figure 2 shows the overall survival curves for the primary adjusted analysis (solid lines) and the unadjusted analysis (dashed lines). Figure 3 shows how the risk of death (ie, the hazard) changes over time in the primary analysis.

Difference in overall survival over time (surgery first—neoadjuvant first). The difference in overall survival over time based on the adjusted model. Values higher than 0 favor surgery first. The shaded area represents the 95% confidence interval.

Adjusted and unadjusted overall survival by treatment group. Adjusted survival shown with a solid line and shaded 95% confidence intervals. Unadjusted overall survival shown with the dashed line.

Mortality rate over time. The instantaneous risk of death (hazard) over time on the basis of the adjusted model.
Analysis . | Surgery first 5-year survival . | Neoadjuvant first 5-year survival . | Survival difference . |
---|---|---|---|
G-computation, primary analysis | 0.741 (0.629-0.824) | 0.672 (0.637-0.700) | 0.069 (−0.049-0.166) |
Full match plus g-computation, sensitivity analysis | 0.738 (0.626-0.847) | 0.670 (0.561-0.754) | 0.068 (−0.050-0.205) |
Cardinality match plus g-computation, sensitivity analysis | 0.730 (0.631-0.840) | 0.594 (0.559-0.738) | 0.137 (−0.043-0.213) |
Unadjusted, referent | 0.760 (0.661-0.874) | 0.678 (0.641-0.717) | 0.083 (−0.030-0.196) |
Analysis . | Surgery first 5-year survival . | Neoadjuvant first 5-year survival . | Survival difference . |
---|---|---|---|
G-computation, primary analysis | 0.741 (0.629-0.824) | 0.672 (0.637-0.700) | 0.069 (−0.049-0.166) |
Full match plus g-computation, sensitivity analysis | 0.738 (0.626-0.847) | 0.670 (0.561-0.754) | 0.068 (−0.050-0.205) |
Cardinality match plus g-computation, sensitivity analysis | 0.730 (0.631-0.840) | 0.594 (0.559-0.738) | 0.137 (−0.043-0.213) |
Unadjusted, referent | 0.760 (0.661-0.874) | 0.678 (0.641-0.717) | 0.083 (−0.030-0.196) |
Results reflect the proportion alive at 5 years starting from the initiation of chemotherapy after surgery.
Analysis . | Surgery first 5-year survival . | Neoadjuvant first 5-year survival . | Survival difference . |
---|---|---|---|
G-computation, primary analysis | 0.741 (0.629-0.824) | 0.672 (0.637-0.700) | 0.069 (−0.049-0.166) |
Full match plus g-computation, sensitivity analysis | 0.738 (0.626-0.847) | 0.670 (0.561-0.754) | 0.068 (−0.050-0.205) |
Cardinality match plus g-computation, sensitivity analysis | 0.730 (0.631-0.840) | 0.594 (0.559-0.738) | 0.137 (−0.043-0.213) |
Unadjusted, referent | 0.760 (0.661-0.874) | 0.678 (0.641-0.717) | 0.083 (−0.030-0.196) |
Analysis . | Surgery first 5-year survival . | Neoadjuvant first 5-year survival . | Survival difference . |
---|---|---|---|
G-computation, primary analysis | 0.741 (0.629-0.824) | 0.672 (0.637-0.700) | 0.069 (−0.049-0.166) |
Full match plus g-computation, sensitivity analysis | 0.738 (0.626-0.847) | 0.670 (0.561-0.754) | 0.068 (−0.050-0.205) |
Cardinality match plus g-computation, sensitivity analysis | 0.730 (0.631-0.840) | 0.594 (0.559-0.738) | 0.137 (−0.043-0.213) |
Unadjusted, referent | 0.760 (0.661-0.874) | 0.678 (0.641-0.717) | 0.083 (−0.030-0.196) |
Results reflect the proportion alive at 5 years starting from the initiation of chemotherapy after surgery.
Exploratory analyses
The only statistically significant factor associated with the choice of treatment was age, with older patients more likely to be in the surgery first group than younger patients (Supplementary Table 2, available online).
In the Cox proportional hazards model, statistically significant hazard ratios (HRs; Table 3) for factors associated with overall survival included older age (per year; HR = 1.04, 95% CI = 1.01 to 1.07), larger tumor size (per centimeter; HR = 1.03, 95% CI = 1.00 to 1.05), the use of salvage (vs amputation); (HR = 0.69, 95% CI = 0.52 to 0.92), and stage III-IV disease (vs stage I-II disease; HR = 3.09, 95% CI = 2.36 to 4.04). Given the small sample size, we note that mortality was higher in non-Hispanic Black patients (vs non-Hispanic White; HR = 1.38, 95% CI = 0.95 to 2.00; P = .09). The point estimate for the surgery first group showed that it was associated with a 30% risk reduction compared with the neoadjuvant first group (HR = 0.70, 95% CI = 0.43 to 1.14; P = .16).We used the same Cox proportional hazards model (excluding stage) within each stage subgroup to explore whether the survival results might be stronger in one stage group. The hazard ratio for surgery first vs neoadjuvant first was 0.68 (95% CI = 0.36 to 1.28) in the stage I-II group and 0.60 (95% CI = 0.27 to 1.33) in the stage III-IV group.
Cox proportional hazards model of factors associated with overall survivala
Variable . | Hazard Ratio (95% CI) . | SE . | P . |
---|---|---|---|
Surgery first vs neoadjuvant | 0.702 (0.431 to 1.143) | 0.249 | .155 |
Age, per year | 1.041 (1.013 to 1.071) | 0.014 | .005 |
Female, vs male | 0.904 (0.695 to 1.177) | 0.135 | .454 |
Non-Hispanic Black vs non-Hispanic White | 1.381 (0.952 to 2.002) | 0.190 | .089 |
Hispanic vs non-Hispanic White | 0.961 (0.706 to 1.308) | 0.157 | .799 |
All other vs non-Hispanic White | 0.863 (0.556 to 1.340) | 0.225 | .511 |
Size, per cm | 1.027 (1.004 to 1.050) | 0.011 | .020 |
Conventional vs chondroblastic | 0.959 (0.685 to 1.344) | 0.172 | .808 |
Salvage vs amputation | 0.690 (0.518 to 0.920) | 0.146 | .011 |
Metro area >1 million vs smaller metro area or rural | 0.868 (0.664 to 1.133) | 0.136 | .297 |
Stage III-IV vs stage I-II | 3.090 (2.361 to 4.044) | 0.137 | .000 |
Variable . | Hazard Ratio (95% CI) . | SE . | P . |
---|---|---|---|
Surgery first vs neoadjuvant | 0.702 (0.431 to 1.143) | 0.249 | .155 |
Age, per year | 1.041 (1.013 to 1.071) | 0.014 | .005 |
Female, vs male | 0.904 (0.695 to 1.177) | 0.135 | .454 |
Non-Hispanic Black vs non-Hispanic White | 1.381 (0.952 to 2.002) | 0.190 | .089 |
Hispanic vs non-Hispanic White | 0.961 (0.706 to 1.308) | 0.157 | .799 |
All other vs non-Hispanic White | 0.863 (0.556 to 1.340) | 0.225 | .511 |
Size, per cm | 1.027 (1.004 to 1.050) | 0.011 | .020 |
Conventional vs chondroblastic | 0.959 (0.685 to 1.344) | 0.172 | .808 |
Salvage vs amputation | 0.690 (0.518 to 0.920) | 0.146 | .011 |
Metro area >1 million vs smaller metro area or rural | 0.868 (0.664 to 1.133) | 0.136 | .297 |
Stage III-IV vs stage I-II | 3.090 (2.361 to 4.044) | 0.137 | .000 |
Predictors of survival after initiation of postsurgical chemotherapy from a multivariable Cox proportional hazards model. Hazard ratios more than 1 are associated with a higher risk of death. CI = confidence interval; SE = standard error.
Cox proportional hazards model of factors associated with overall survivala
Variable . | Hazard Ratio (95% CI) . | SE . | P . |
---|---|---|---|
Surgery first vs neoadjuvant | 0.702 (0.431 to 1.143) | 0.249 | .155 |
Age, per year | 1.041 (1.013 to 1.071) | 0.014 | .005 |
Female, vs male | 0.904 (0.695 to 1.177) | 0.135 | .454 |
Non-Hispanic Black vs non-Hispanic White | 1.381 (0.952 to 2.002) | 0.190 | .089 |
Hispanic vs non-Hispanic White | 0.961 (0.706 to 1.308) | 0.157 | .799 |
All other vs non-Hispanic White | 0.863 (0.556 to 1.340) | 0.225 | .511 |
Size, per cm | 1.027 (1.004 to 1.050) | 0.011 | .020 |
Conventional vs chondroblastic | 0.959 (0.685 to 1.344) | 0.172 | .808 |
Salvage vs amputation | 0.690 (0.518 to 0.920) | 0.146 | .011 |
Metro area >1 million vs smaller metro area or rural | 0.868 (0.664 to 1.133) | 0.136 | .297 |
Stage III-IV vs stage I-II | 3.090 (2.361 to 4.044) | 0.137 | .000 |
Variable . | Hazard Ratio (95% CI) . | SE . | P . |
---|---|---|---|
Surgery first vs neoadjuvant | 0.702 (0.431 to 1.143) | 0.249 | .155 |
Age, per year | 1.041 (1.013 to 1.071) | 0.014 | .005 |
Female, vs male | 0.904 (0.695 to 1.177) | 0.135 | .454 |
Non-Hispanic Black vs non-Hispanic White | 1.381 (0.952 to 2.002) | 0.190 | .089 |
Hispanic vs non-Hispanic White | 0.961 (0.706 to 1.308) | 0.157 | .799 |
All other vs non-Hispanic White | 0.863 (0.556 to 1.340) | 0.225 | .511 |
Size, per cm | 1.027 (1.004 to 1.050) | 0.011 | .020 |
Conventional vs chondroblastic | 0.959 (0.685 to 1.344) | 0.172 | .808 |
Salvage vs amputation | 0.690 (0.518 to 0.920) | 0.146 | .011 |
Metro area >1 million vs smaller metro area or rural | 0.868 (0.664 to 1.133) | 0.136 | .297 |
Stage III-IV vs stage I-II | 3.090 (2.361 to 4.044) | 0.137 | .000 |
Predictors of survival after initiation of postsurgical chemotherapy from a multivariable Cox proportional hazards model. Hazard ratios more than 1 are associated with a higher risk of death. CI = confidence interval; SE = standard error.
Discussion
To our knowledge, this is the first article to directly compare adjuvant only (surgery first) vs neoadjuvant plus adjuvant chemotherapy (neoadjuvant first) of upper and lower limb osteosarcoma since the trial results by Goorin et al. (21). Our results are consistent with those of the trial and show that a course of neoadjuvant therapy is no better, and likely worse, than upfront surgery. It is interesting to note that the peak risk of death and the largest difference in the mortality rates between the 2 treatment groups are approximately 12-24 months after initiating chemotherapy (Figure 2). This suggests that if treatment can be improved in these patients, differences in mortality should be observable within the first 3 years, an important consideration for any future clinical trials.
The continued rationale for neoadjuvant first strategy is based on 3 presumed advantages: 1) the potential benefit of immediate treatment of the cancer and any micrometastatic disease with chemotherapy, without delay because of surgery; 2) improved limb-salvage surgical options secondary to tumor regression as a response to the induction course of chemotherapy; and 3) the prognostic value obtained in the identification of good vs poor responders to the chemotherapy regimen through assessment of the resected specimen at time of local control (26-28). None of these reasons, however, have been validated through investigation.
Regarding the immediate treatment advantage, this has not been demonstrated by analysis of circulating tumor cell burden nor has it has been borne out by studies demonstrating improved survival. Assessment of micrometastatic disease through laboratory testing remains in its infancy at the time of this writing, and clear data of micrometastatic disease burden between a course of neoadjuvant chemotherapy vs upfront surgery remain unknown (29-31).
Likewise, disease-free and overall survival data at any posttreatment interval have not been shown to be better for those with neoadjuvant chemotherapy vs upfront surgery, when both groups receive a complete course of chemotherapy (21).
The second presumed advantage—that a course of neoadjuvant chemotherapy allows more limb-salvage surgeries, avoiding amputation, because of regression or consolidation of the primary tumor—is unsupported by the literature (26,32,33-35). Jones et al. (27) published an investigation in which surgeons and radiologists assessed the resectability of distal femur osteosarcomas using magnetic resonance imaging before and after a course of neoadjuvant chemotherapy. When blinded to the timing of the magnetic resonance imaging relative to the chemotherapy treatment, the study team found that more tumors were deemed appropriate for limb salvage in the before chemotherapy group than the same tumors in the postchemotherapy group. This result was found despite the surgeons’ reported assessment that they thought the more resectable tumors would be in the postchemotherapy group, confirming their blinded status and their intrinsic bias toward neoadjuvant chemotherapy.
Finally, much has been written about the benefits of measuring the necrosis rates found in the resected primary tumor after neoadjuvant therapy (36-42). Patients are found to be either “good” or “bad” responders to the chemotherapy regimen, based on how much viable tumor is left in the resected specimen. Bielack et al. (37) reported that patients with more than 90% tumor necrosis noted at time of surgical resection after neoadjuvant chemotherapy had a 5-year overall survival of 73% vs 47% for those with less than 90% necrosis. However, although prognosis may change with these necrosis rates, this does not translate into any usable, actionable information. Changing chemotherapy regimens because of a proven poor response has not led to any improvements in outcomes (20,43-45). According to the NCCN guidelines, “Patients whose disease has a poor response could be considered for chemo with a different regimen (category 3 recommendation). However, attempts to improve the outcome of poor responders by modifying the adjuvant chemo remain unsuccessful” (20).
Several limitations in the current study warrant discussion. The analyses are retrospective and, in terms of measured characteristics at baseline that were associated with outcomes, age was the only characteristic that was notably different, with upfront surgery patients being slightly older. Hence, although confounding does not appear to be an issue in this study on the basis of the available information, the results may be affected by selection bias through unmeasured confounding. In particular, there is no explanation as to why patients had upfront surgery followed by chemotherapy. That only 8.5% of patients received upfront surgery reenforces the strong preference of sarcoma specialty providers to treat osteosarcoma with the sequence of neoadjuvant chemotherapy, surgical resection, and adjuvant chemotherapy. It may be reasonable to assume that deviation from the gold standard protocol would happen because of factors that would impart a poorer presentation (eg, pathologic fracture, diagnostic conundrum, patient noncompliance), but there is no information available. Also, there were no data available regarding the specifics of the treatment regimen or completion. Such information would be useful for clarifying the role of dose delays and duration of therapy.
The limited nature of SEER chemotherapy and chemotherapy sequencing data also affected the index date of patients in the study. Because we attempted to align the initiation of time at risk with the initiation of chemotherapy after surgery for both groups, neoadjuvant first patients began their observation time approximately 3 months longer after diagnosis than surgery first patients. This would tend to bias the results in favor of neoadjuvant first patients because those who never resumed chemotherapy after surgery were excluded.
The relatively small sample size in the surgery first group raises the question of whether there is a sufficient sample to estimate a meaningful 5-year survival probability and whether differential censoring might affect the estimate of survival difference. In this study, slightly more than 30% of patients in each group had been censored by 5 years. This suggests that differential censoring did not bias the results, which is consistent with an administrative censoring mechanism caused by SEER reporting procedures. Of course, the small sample size in the surgery first group still leads to imprecision, which is reflected in the confidence intervals.
Because neoadjuvant first is the current standard of care, it is instructive to consider how likely our results, and those of Goorin et al. (21), would be assuming that neoadjuvant first is superior to surgery first. If we assume the treatment benefit from neoadjuvant first is a 5% improvement in survival at 5 years (less than the 15% hypothesized at 2 years in the power calculations of Goorin et al.) and if we assume a standard error of 7% for both study estimates (based on the Goorin et al. study, which is slightly larger than seen in our study), we can calculate the following. A 3% or greater survival difference in favor of surgery first as seen in the Goorin et al. study would occur only 12.7% of the time, and a 6.9% or greater survival difference in favor of surgery first as seen in our study would occur only 4.5% of the time. The likelihood of these 2 independent studies providing such estimates in favor of surgery first despite an actual 5% improvement from neoadjuvant first is approximately 0.6%. However, we should be clear that these calculations do not mean that surgery first is superior; they simply suggest that the evidence in favor of a 5% or greater benefit from neoadjuvant first treatment is unlikely given these data.
In conclusion, the favorable outcomes in the surgery first group for extremity osteosarcoma in the SEER data suggest that improving survival may simply require an alteration of timing of treatments that are already available and accepted. An improvement of 6.9% in survival from established practice has not been realized in the past 40 years, so even a modest improvement would be welcome. This study, and its consistency with the results from the only relevant randomized trial, suggests that there is reason to revisit a prospective, randomized trial of osteosarcoma treatment with regard to timing of surgery and chemotherapy.
Data availability
The SEER Research Plus data used in this study cannot be shared publicly but are available from the National Cancer Institute. Information about requesting data access is available at https://seer.cancer.gov/data/access.html.
Author contributions
Mark Danese, MHS, PhD (Conceptualization; Data curation; Formal analysis; Methodology; Writing – original draft; Writing – review & editing) and John S. Groundland, MD, MS (Conceptualization; Supervision; Writing – original draft; Writing – review & editing).
Funding
This work was completely self-funded.
Conflicts of interest
Neither author reports any conflicts of interest.