Evaluation of Predict, a prognostic risk tool, after diagnosis of a second breast cancer

Abstract Background The UK National Health Service’s Predict is a clinical tool widely used to estimate the prognosis of early-stage breast cancer. The performance of Predict for a second primary breast cancer is unknown. Methods Women 18 years of age or older diagnosed with a first or second invasive breast cancer between 2000 and 2013 and followed for at least 5 years were identified from the US Surveillance, Epidemiology, and End Results (SEER) database. Model calibration of Predict was evaluated by comparing predicted and observed 5-year breast cancer–specific mortality separately by estrogen receptor status for first vs second breast cancer. Receiver operating characteristic curves and areas under the curve were used to assess model discrimination. Model performance was also evaluated for various races and ethnicities. Results The study population included 6729 women diagnosed with a second breast cancer and 357 204 women with a first breast cancer. Overall, Predict demonstrated good discrimination for first and second breast cancers (areas under the curve ranging from 0.73 to 0.82). Predict statistically significantly underestimated 5-year breast cancer mortality for second estrogen receptor–positive breast cancers (predicted-observed = ‒6.24%, 95% CI = ‒6.96% to ‒5.49%). Among women with a first estrogen receptor–positive cancer, model calibration was good (predicted-observed = ‒0.22%, 95% CI = ‒0.29% to ‒0.15%), except in non-Hispanic Black women (predicted-observed = ‒2.33%, 95% CI = ‒2.65% to ‒2.01%) and women 80 years of age or older (predicted-observed = ‒3.75%, 95% CI = ‒4.12% to ‒3.41%). Predict performed well for second estrogen receptor–negative cancers overall (predicted-observed = ‒1.69%, 95% CI = ‒3.99% to 0.16%) but underestimated mortality among those who had previously received chemotherapy or had a first cancer with more aggressive tumor characteristics. In contrast, Predict overestimated mortality for first estrogen receptor–negative cancers (predicted-observed = 4.54%, 95% CI = 4.27% to 4.86%). Conclusion The Predict tool underestimated 5-year mortality after a second estrogen receptor–positive breast cancer and in certain subgroups of women with a second estrogen receptor–negative breast cancer.

Effective screening and treatment strategies have led to a reduction in breast cancer mortality.As a consequence, many patients with breast cancer can survive long enough to develop a second cancer.It has been estimated that approximately 20% of breast cancer survivors will develop a second primary cancer within 25 years of diagnosis, of which 40% will be breast cancers (1,2).Prior studies by our group have shown that breast cancer survivors who developed a second breast cancer experienced statistically significantly higher mortality than women with a first breast cancer, and the mortality is highest among non-Hispanic Black survivors (3,4).
The UK National Health Service's Predict is an online prognostic tool that can generate individualized prediction of breast cancer-specific and all-cause mortality for women diagnosed with early-stage invasive breast cancer (5,6).It is used to help oncology clinicians provide survival information to patients and make treatment decisions.The Predict online tool has been widely used across the world, with 20 000 visits per month (6).It has been validated for primary breast cancers in multiple female populations from Europe, Canada, and Southeast Asia (7-11).
There are no specific prognostic tools to assist clinician-patient decision making after a second breast cancer, however, and the clinical accuracy of existing tools such as Predict is unknown.
In this study, we evaluated the performance of the Predict tool in calculating the 5-year breast cancer-specific mortality among women diagnosed with a second breast cancer using data from the Surveillance, Epidemiology, and End Results (SEER) program and compared the predictive accuracy of this model with that of women diagnosed with a first breast cancer during the same time period.

Study population
The study population was from the SEER 18 registry (2000-2018) (12).SEER is a US program that collects information about patients with cancer from cancer registries across the country.SEER 18 covers 18 cancer registries, representing 28% of the US population.No approval was needed for this study as data were publicly available from SEER program.SEER collected information on multiple primary cancers and distinguishes them from recurrences based on a series of rigorous criteria, such as the cancer site (International Classification of Diseases for Oncology, 3rd Edition topography codes), time since initial cancer diagnosis, tumor histology, tumor behavior, and laterality for paired organs (13).
From the SEER 18 database, we identified 2 cohorts of women 18 years of age or older diagnosed with invasive breast cancer.The first cohort included women diagnosed with 1 incident local or regional breast cancer treated by surgery between January 1, 2000, and December 31, 2013.The second cohort included women diagnosed with both an incident breast cancer of any stage and a subsequent local or regional breast cancer treated by surgery during the same time period.To minimize misclassification between synchronous and subsequent cancers, second breast cancer was defined as a breast cancer diagnosed at least 1 year after the first breast cancer (14).Women in both groups were followed until December 31, 2018, to ensure at least 5 years of follow-up.We excluded women who developed more than 2 cancers during the study period.

Model covariates
The Predict model incorporates information about age at diagnosis (year), tumor size (mm), tumor grade (1: well differentiated, 2: moderately differentiated, 3: poorly or undifferentiated), number of positive lymph nodes, estrogen receptor status (positive or negative), HER2 status (positive, negative, or unknown), KI-67 status (positive, negative, or unknown), mode of detection (screen or clinical), hormone therapy (yes or no), chemotherapy (no, second generation, third generation), trastuzumab use (yes or no), and bisphosphonate use (yes or no).
In SEER data, HER2 status was available only after 2010, and KI-67 status was not available.Therefore, missing values for HER2 and KI-67 status were entered as "unknown," as allowed by the Predict model.For mode of detection, we assumed that 70% of breast cancers were detected by screening, based on previous studies among US women (15,16).Of note, mode of detection influenced the mortality prediction of estrogen receptor-positive cancers only, not estrogen receptor-negative cancers.
Details of the chemotherapy drugs given to patients were not available in SEER.Therefore, we classified "second generation" as chemotherapy treatments reported before 2005 and "third generation" as chemotherapy treatments reported during 2005 or later.The SEER chemotherapy variable was categorized as "yes" (patient had chemotherapy) and "no/unknown" (no evidence of chemotherapy was found in the medical records examined) to reflect the fact that the cancer registries were unable to capture complete treatment information based on medical records.Chemotherapy is less likely to have been used on early-stage estrogen receptor-positive diseases because hormone therapy is often the primary treatment (17).Because the majority of women with estrogen receptor-negative disease are treated with chemotherapy, we restricted our analysis to the 70% who received chemotherapy.The following treatment assumptions were made in our primary analysis: 1) all women with estrogen receptor-positive disease received hormone therapy, 2) 70% of women with HER2-positive disease received trastuzumab (18), and 3) no patient received bisphosphonate.
Variables in SEER that were not inputs of the Predict model included race and ethnicity (non-Hispanic White, non-Hispanic Black, non-Hispanic American Indian or Alaska Native, non-Hispanic Asian or Pacific Islander, and Hispanic), tumor histology, and radiation therapy (yes, no/unknown).

Outcome ascertainment
Death from breast cancer was identified based on the SEER variable "COD to site recode."Vital status, survival time, and cause of death were ascertained from the National Center for Health Statistics.Women with missing survival time or cause of death (1% of the study population) were excluded.

Statistical analysis
Demographic and clinical characteristics of first and second breast cancers were summarized by reporting the proportions for categorical variables and mean (SD) for continuous variables.Predict, version 2.1 (19) was used to generate predicted 5-year breast cancer-specific mortality risk for each patient.For the first cancer cohort, mortality estimates were calculated from the first cancer diagnosis.For the second cancer cohort, mortality estimates were generated from the second cancer diagnosis.
The performance of Predict was assessed by examining both model discrimination and calibration.All analyses were conducted separately for the first and second breast cancer cohorts and stratified by estrogen receptor status, given differences in their outcome.The model discrimination was assessed by plotting the receiver operating characteristic curve and calculating the area under the curve (AUC), with further stratification by race and ethnicity.Model calibration was evaluated by comparing the predicted 5-year breast cancer mortality to the observed mortality over the same period.The number of predicted deaths from breast cancer was calculated by summing the predicted scores of each patient generated from the Predict model.Breast cancer mortality was calculated as follows:

Number of breast cancer deaths Number of patients with breast cancer
� 100: We plotted the observed mortality with 95% confidence intervals (CIs) against deciles of the predicted mortality.These plotted estimates were visually compared with the perfect agreement line (y ¼ x).We also quantified the difference between predicted and observed mortality.The relative ratio (predicted mortality divided by the observed mortality) as well as absolute difference (predicted mortality -observed mortality) were calculated.Bootstrap methods were used to generate 95% confidence intervals for relative and absolute differences.A similar analysis was conducted for multiple subgroups based on demographic and clinical characteristics.Further, among women with a second cancer, the difference between predicted and observed mortality for the second cancer was evaluated in subgroups defined by characteristics of their first breast cancer, including age at diagnosis of first cancer, tumor grade, tumor stage, lymph node status, estrogen receptor status, treatments (chemotherapy and radiation therapy), and time between first and second cancer (<5, 5-10, �10 years).For the second cancer cohort, we also generated the Predict score from their first cancer diagnosis (only for local or regional-stage cancer treated by surgery) and conducted the analysis stratified by quartiles of this score.Sensitivity analyses were conducted to check the robustness of some assumptions made by evaluating model performance under alternative situations.The type of chemotherapy was set to second or third generation for all calendar years instead of second generation before 2005 and third generation in 2005 and

Demographic and clinical characteristics
The study population included 6729 women diagnosed with a second breast cancer (5606 women with estrogen receptor-positive second cancer and 1123 women with estrogen receptor-negative second cancer) as well as 357 204 women with a first breast cancer (303 837 women with estrogen receptor-positive cancer and 53 367 women with estrogen receptor-negative cancer).Table 1 summarizes the baseline characteristics of the study population, stratified by first or second breast cancer and estrogen receptor status.The majority of breast cancers were local and node negative, with a tumor size under 20 mm.As expected, women with a second breast cancer were older than women with a first breast cancer for both estrogen receptor-positive cancers (65 vs 59.7 years of age) and estrogen receptor-negative cancers (56.8 vs 53.2 years of age).Among women with 2 breast cancers, more than 40% of the second tumors were developed within 5 years of the first cancer.Women with a second cancer were more likely to have estrogen receptorpositive than estrogen receptor-negative disease.Close to 23% of women with a second estrogen receptor-negative cancer were non-Hispanic Black compared with 9.4% among women with a second estrogen receptor-positive cancer.For women with a second estrogen receptor-positive cancer, 72% of their first cancers were also estrogen receptor positive; for women with a second estrogen receptor-negative cancer, 45.7% of their first cancers were estrogen receptor negative.

Estrogen receptor-positive diseases
In women with first estrogen receptor-positive breast cancers, the observed mortality agreed with the predicted mortality across all deciles, as shown in the calibration plot displayed in Figure 1, A1.The overall absolute mortality difference was -0.22% (95% CI ¼ -0.29% to -0.15%), and the relative ratio was 0.95 (95% CI ¼ 0.94 to 0.97) (Table 2).The model calibration remained good in most subgroup analyses, as shown in Table 2.The model did, however, underestimate breast cancer mortality in non-Hispanic Black women (absolute difference ¼ -2.33%, 95% CI ¼ -2.65% to -2.01%) and women 80 years of age or older (absolute difference ¼ -3.75%, 95% CI ¼ -4.12% to -3.41%).The overall model discrimination was excellent, with an AUC of 0.819 (Figure 1, A2).The AUC by race and ethnicity ranged from 0.806 to 0.831 (Supplementary Table 1, available online).
The model performance was not as good for women diagnosed with a second estrogen receptor-positive cancer.Predict underestimated mortality across all deciles, with the greatest difference in performance among the higher deciles (Figure 1, B1).The tool underestimated breast cancer-specific mortality by 6.24% (95% CI ¼ -6.96% to -5.49%) overall.The relative ratio between predicted and observed mortality was 0.37 (95% CI ¼ 0.34 to 0.4).Poor calibration was observed in all subgroups, with an absolute mortality difference ranging from -15.83% to −4% and a relative ratio ranging from 0.19 to 0.55 (Table 2).The underestimation was more pronounced among non-Hispanic Black (absolute difference ¼ -9.3%, 95% CI ¼ -12.25% to -6.65%) and Hispanic women (absolute difference ¼ -9.24%, 95% CI ¼ -12.36% to -6.12%).The model also underestimated breast cancer mortality among women who had a first cancer with more aggressive tumor characteristics, including tumor stage, size, grade, lymph node status, and Predict score.In addition, a shorter time interval between first and second cancers was associated with greater underestimation of mortality (Table 4).The AUC was lower at 0.745 in women with a second estrogen receptor-positive cancer overall (Figure 1, B2).Worse model discrimination was observed among non-Hispanic White women (AUC ¼ 0.727) as well as non-Hispanic Asian and Pacific Islander and non-Hispanic American Indian and Alaska Native women (AUC ¼ 0.697) (Supplementary Table 1, available online).

Estrogen receptor-negative diseases
An overestimation of cancer-specific mortality was found for women with first estrogen receptor-negative breast cancers and received chemotherapy (absolute difference ¼ 4.54%, 95% CI ¼ 4.27% to 4.86%; relative ratio ¼ 1.26, 95% CI ¼ 1.24 to 1.28) (Table 2).The Predict tool overestimated mortality across all deciles (Figure 2, A1).In subgroup analyses, there was an overestimation of No evidence of chemotherapy or radiation therapy was found in the medical records.It could be that patients did not have the treatment or patients had the treatment, but it was not recorded in medical records.c This was calculated at time of initial presentation of local or regional stage breast cancer treated by surgery.Data was also available on tumor grade, lymph node status, tumor size, and estrogen receptor status.mortality in all subgroups except for women 80 years of age or older, where an underestimation was observed (absolute difference ¼ -3.91%, 95% CI ¼ -7.05% to -0.79%) and women with a grade 1 tumor, where predicted and observed mortalities were similar (absolute difference ¼ -0.99%, 95% CI ¼ -2.83% to 0.81%) (Table 3).When stratified by race, the absolute difference among non-Hispanic Black women was much smaller than among all other races.The model discrimination was good overall (AUC ¼ 0.753) and by race and ethnicity (AUC ranged from 0.744 to 0.754) (Figure 2, A2 and Supplementary Table 1, available online).
For women diagnosed with a second estrogen receptor-negative cancer, the predicted and observed mortality were similar across deciles, as shown in Figure 2, B1.The absolute difference between the overall predicted and observed mortality was -1.69% (95% CI ¼ -3.99% to 0.16%), and the relative ratio was 0.92 (95% CI ¼ 0.83 to 1.01) (Table 3).Mortality was underestimated in several subgroups, however, including women diagnosed between 2005 and 2009, women younger than 50 years of age, women with a tumor of regional stage, women with a grade 1 tumor, women with positive lymph nodes, and women with ductal carcinoma (Table 3).Similar to estrogen receptor-positive cancer, the model underestimated breast cancer mortality among women who had a first cancer with more aggressive tumor characteristics, including tumor stage, size, grade, lymph node status, and Predict score (Table 4).A shorter time interval between first and second cancers was also associated with greater underestimation of mortality (Table 4).Overall model discrimination was comparable to that for women with first estrogen receptor-negative cancer (AUC ¼ 0.735), with similar AUCs across race and ethnicity (Figure 2, B2 and Supplementary Table 1, available online).
Sensitivity analysis by changing assumptions for the type of chemotherapy received (ie, second vs third generation) and cancer detection mode yielded similar overall results (Supplementary Tables 2 and 3, available online).

Discussion
To our knowledge, this study is the first to evaluate model calibration and discrimination of the Predict tool among a cohort of women diagnosed with a second early-stage breast cancer.Based on a diverse cohort of women with either a first or second breast cancer, Predict performed best among those with a first estrogen receptor-positive cancer.Predict underestimated 5-year mortality in women with a second breast cancer, particularly those with estrogen receptor-positive disease.Greater underestimations in mortality were observed among non-Hispanic Black and Hispanic women; women with more aggressive tumor characteristics of the first cancer; or shorter time interval between 2 cancers.In women with estrogen receptor-negative disease, Predict overestimated mortality for a first cancer but performed well for a second cancer overall.
We observed an underestimation for a second estrogen receptor-negative cancer among women who previously received chemotherapy, had a first cancer with more aggressive tumor characteristics, or had a shorter interval between first and second cancer.Predict showed good model discrimination for women with either a first or second breast cancer, irrespective of estrogen receptor status.In most clinical scenarios, the model provided a more conservative estimate by underestimating vs overestimating 5-year mortality for second cancers.
Consistent with validation studies of women diagnosed with a first breast cancer conducted mainly in White populations, Predict accurately evaluated the 5-year breast cancer mortality for a diverse population of women diagnosed with a first estrogen receptor-positive cancer but overestimated mortality for women with a first estrogen receptor-negative cancer.In a study among 45 789 cases from the Scottish Cancer Registry, Gray et al. (8) found that among women with estrogen receptor-positive diseases, the predicted 5-year mortality was only 0.5% greater than the observed one, and the AUC was 0.76.Among women diagnosed with estrogen receptor-negative breast cancer, however, a 16.6% overestimation of the 5-year mortality was observed, although the model discrimination remained excellent (AUC ¼ 0.74).Another independent validation study using data from the Netherlands Cancer Registry reported a reliable prediction of 5-year overall survival for patients estrogen receptor-positive cancer but an underestimation of overall survival (ie, overestimation of mortality) for patients with estrogen receptor-negative cancer (7).A recent study in a New Zealand cohort also showed an overestimation of mortality among patients with estrogen receptor-negative cancer (20).Interestingly, despite   possible differences in cancer screening; demographics, including race and ethnicity; and use of treatments that may affect 5-year breast cancer mortality, all these studies, including ours, reported an overestimation of mortality for estrogen receptor-negative cancers, suggesting that other unmeasured factors may also play a role.
The number of studies that have evaluated the Predict tool in a diverse population is limited.In a study among patients with early-stage breast cancer treated at the at the University of Texas MD Anderson Cancer Center, Wu et al. (21) found that the Predict tool underestimated the 5-year overall mortality for African American women but overestimated mortality for Hispanic and White women.This study, however, was based on patients from a single hospital, which limits its generalizability.Our study, which included states across the United States, showed a slight underestimation of the 5-year cancer mortality for non-Hispanic Black women with first estrogen receptor-positive cancers and greater underestimation for non-Hispanic Black and Hispanic women with second estrogen receptor-positive cancers.Predict overestimated mortality for first estrogen receptor-negative cancers across all racial and ethnic women but to a lesser degree in non-Hispanic Black women.The tool performed well for second estrogen receptor-negative cancers among non-Hispanic White women but underestimated mortality for other racial and ethnic groups.As Predict was not developed to discriminate differences in outcomes by race and ethnicity, our findings by race and ethnicity should be considered hypothesis generating.
The underestimation of breast cancer mortality we observed for both second estrogen receptor-positive and estrogen receptor-negative cancers is consistent with our prior observation that women with a second breast cancer have a higher risk of death than women with a first breast cancer, even after considering tumor characteristics of the initial and the second cancer and the time interval between 2 cancer (3, [22][23][24][25].There are multiple plausible explanations for the high mortality among women with a second breast cancer.Second cancers initiated soon after therapy may be more therapy resistant (22).Our previous study also suggested that prior chemotherapy is associated with greater mortality after a second cancer (3).Persistent risk factors and comorbidities among survivors may continue to affect disease progression (26,27).Women may receive inadequate care the second time around due to various factors, such as experiencing toxicities from the first treatment, being less compliant with treatment, or facing financial stressors that limit their access to health care (28).
In this study, Predict showed reasonable model discrimination in predicting cancer mortality among women with a second cancer.These results indicate that the Predict tool could still be useful to identify groups of patients who had a higher vs lower risk of death from a second cancer, although it may not provide an accurate estimation of the absolute risk, particularly in women diagnosed with a second estrogen receptor-positive cancer.
Although we did not have detailed information about chemotherapy regimens, it is important to note that sensitivity analyses evaluating the impact of varying numbers of women who received third-vs second-generation adjuvant chemotherapy regimens did not statistically significantly alter the results.Of note, cyclin-dependent kinase inhibitors are not part of Predict.
The strengths of this study include the large number of second breast cancers; rigorous ascertainment of second cancers in the SEER registry; high-quality data on vital status and cause of death; and a diverse population that represents 28% of the US population, which allows for analysis of those from racial and ethnic minority groups.This study has several limitations, as well.First, KI-67 status and HER2 status before 2010 were not available in the SEER database.Therefore, "unknown" was entered into the model for these missing values.Second, assumptions were made for mode of detection, generation of chemotherapy, and uptake of trastuzumab.Sensitivity analyses to test different assumptions, however, yielded results consistent with the main analyses, suggesting that the results are robust.Third, chemotherapy was underreported in SEER (29).To eliminate the potential bias this issue may cause, we limited our analyses to women who received chemotherapy for estrogen receptor-negative cancers, which account for 70% of all patients with estrogen receptor-negative cancers.Another limitation is that we could not determine whether death was due to the first or second breast cancer.Therefore, it is plausible that a small number of deaths were attributed incorrectly.
In summary, this US SEER study showed that Predict largely underestimated the 5-year cancer-specific mortality in women diagnosed with a second estrogen receptor-positive breast cancer and in some subgroups of women diagnosed with a second estrogen receptor-negative breast cancer.Our findings suggest that clinicians should be cautious when applying estimates from Predict among women with a second breast cancer.

AUCFigure 1 .
Figure 1.Calibration plot (A1 and B1) and receiver operating characteristic curve (A2 and B2) for estrogen receptor-positive breast cancer.A1 and A2 are for first breast cancers.B1 and B2 are for second breast cancers.The dotted line in A1 and B1 indicates perfect agreement between predicted and observed mortality.The black dots and bars in A1 and B1 represent the observed mortality with 95% confidence interval against deciles of the predicted mortality.AUC ¼ area under the curve.

AUCFigure 2 .
Figure2.Calibration plot (A1 and B1) and receiver operating characteristic curve (A2 and B2) for estrogen receptor-negative breast cancer.A1 and A2 are for first breast cancers.B1 and B2 are for second breast cancers.The dotted line in A1 and B1 indicates perfect agreement between predicted and observed mortality.The black dots and bars in A1 and B1 represent the observed mortality with 95% confidence interval against deciles of the predicted mortality.AUC ¼ area under the curve.

Table 1 .
Baseline characteristics of first and second breast cancers, by estrogen receptor status

Characteristics of the initial breast cancer for patients with a second breast cancer
Z.Denget al. | 3 later.In another analysis, we assumed that less breast cancers (ie, 50%) were detected by screening.All analyses were performed in R, version 4.1.2(R Foundation for Statistical Computing, Vienna, Austria).

Table 1 .
(continued) a "Other" included non-Hispanic Asian and Pacific Islander, non-Hispanic American Indian, and Alaska Native.b

Table 2 .
Difference between predicted and observed mortality among women with estrogen receptor-positive breast cancer a Absolute mortality difference (%) ¼ predicted mortality -observed mortality.b Relative mortality ratio ¼ predicted mortality/observed mortality.c "Other" included non-Hispanic Asian and Pacific Islander, non-Hispanic American Indian, and Alaska Native.

Table 3 .
Difference between predicted and observed mortality among women with estrogen receptor-negative breast cancer who received chemotherapy Other" included non-Hispanic Asian and Pacific Islander, non-Hispanic American Indian, and Alaska Native.

Table 4 .
Difference between predicted and observed mortality among women with second breast cancer, by characteristics of their first breast cancer

First breast cancer characteristics Estrogen receptor-positive second breast cancer Estrogen receptor-negative second breast cancer Predicted No. of deaths Observed No. of deaths
only for local or regional stage initial breast cancers treated by surgery and had data on tumor grade, lymph node status, and estrogen receptor status.
a Absolute mortality difference¼predicted -observed.b Relative mortality ratio¼predicted/observed.c The upper confidence interval was infinite due to the small number of deaths.d This was available Z. Deng et al. | 9