Bridging the Age Gap: a prognostic model that predicts survival and aids in primary treatment decisions for older women with oestrogen receptor-positive early breast cancer

Background: A prognostic model was developed and validated using cancer registry data. This underpins an online decision support tool, informing primary treatment choice for women aged 70years or older with hormone receptor-positive early breast cancer. Methods: Data from women diagnosed between 2002 and 2010 in the English Northern and Yorkshire and West Midlands regions were used to develop the model. Primary treatment options of surgery with adjuvant endocrine therapy or primary endocrine therapy were compared. Models predicting the hazard of breast cancer-speciic mortality and hazard of other-cause mortality were combined to derive survival probabilities. The model was validated externally using data from the Eastern Cancer Registration and Information Centre. Results: The model was developed using data from 23842 women, and validated externally on a data set from 14526 patients. The overall model calibration was good. At 2 and 5years, predicted mortality from breast cancer and other causes differed from the observed rate by less than 1 per cent. At 5years, there were slight overpredictions in breast cancer mortality (2629 predicted versus 2556 observed deaths; P = 0 ⋅ 142) and mortality from all causes (6399 versus 6320 respectively; P = 0 ⋅ 583). The discrepancy varied between subgroups. Model discrimination was 0 ⋅ 75 or above for all mortality measures. Conclusion: A prognostic model for older women with oestrogen receptor-positive early breast cancer was developed and validated in the present study. This forms a basis for an online decision support tool (https://agegap.shef.ac.uk/).


Introduction
Breast cancer is the most common cancer to affect women, with 54 896 patients diagnosed in the UK in 2016 1 . Around one-third occur in women aged over 70 years. The standard of care for early breast cancer is surgical removal of the primary cancer, axillary surgery, and adjuvant therapies that may include chemotherapy, antihuman epidermal growth factor receptor (HER2) therapy, bisphosphonates, antioestrogens and radiotherapy. However, age-related practice varies widely; older women are less likely to receive adjuvant therapy than their younger counterparts 2 . The National Institute for Health and Care Excellence 3 recommends that women with early breast cancer, irrespective of age, are treated with surgery and systemic therapy rather than endocrine therapy alone, unless signiicant co-morbidity precludes surgery. However, up to 40 per cent of older women with oestrogen receptor (ER)-positive cancer in the UK have historically received primary endocrine therapy.
Treatment with primary endocrine therapy is justiied in some instances. It was shown to be effective in several trials in the 1980s, with no survival disadvantage relative to surgery, although rates of local control were suboptimal 4 . For some older women, surgery is associated with signiicant risks 5 and others may prefer minimal treatment 6 . Tang and colleagues 5 reported unacceptable rates of morbidity and mortality in surgically treated US nursing home residents. Older and less it women are more likely to die from competing risks and may experience a signiicant decline in quality of life after surgery. Selecting the best treatment for an individual woman is complex and currently no tools exist to support the decision. Models to inform clinician and patient decisions about adjuvant therapy after surgery already exist. They are based on clinical prognosticators for recurrence risk and breast cancer death. These include the Nottingham Prognostic Index 7 ,P R E D I C T 8 , Adjuvant! Online 9 , OPTIONS and CancerMaths 10 . PREDICT is widely used in the UK, and is based on clinicopathological factors including tumour size, tumour grade, lymph node status, ER status, HER2 status and mode of detection. These models, however, consider the impact of adjuvant treatment after surgery rather than the initial decision regarding surgery itself. They might give an estimate of the expected outcome for an older woman having surgery, but not on the alternative of primary endocrine therapy. There is also evidence that PREDICT may be less accurate in 10-year outcome prediction in women aged over 75 years 11 . Existing prognostic models do not explicitly consider age-related factors such as co-morbidities and frailty. Co-morbidity is a strong predictor of competing mortality 12 and should be included when modelling an elderly population. Variations in functional and cognitive status and physiological reserves in older populations should also be considered 12 . In addition, deprivation level is usually linked to a higher burden of co-morbidity and frailty, and possibly also to undertreatment.
The aim of this study was to develop and validate a new prognostic model to inform the primary treatment choice (surgery or primary endocrine therapy) for women aged at least 70 years with ER-positive early breast cancer.

Methods
A prognostic model was developed comprising two submodels: breast cancer mortality and other-cause mortality. Model parameters (Ta b l e 1 ) were selected based on literature review, exploratory investigation and expert advice. The hazard of breast cancer mortality was modelled as a function of patient age, co-morbidity score (Charlson Co-morbidity Index (CCI) without the age component) and deprivation level; tumour detection pathway; tumour diameter, grade and nodal status; and treatment choice. Registry data do not contain any information about concomitant co-morbidities and data on these were derived from linked Hospital Episode Statistics (HES) data collected during inpatient spells for patients with cancer. The hazard of other-cause mortality was modelled as a function of patient age, co-morbidity score and frailty. The subsequent model development procedure, including technical details of how these co-variables were incorporated into the model, is detailed in Appendix S1 (supporting information).
The resulting breast cancer mortality model was a Royston-Parmar (RP) restricted cubic spline model with eight co-variables. RP models allow relaxation of the proportional hazards assumption associated with well known Cox models 13 and, being parametric, facilitate extrapolation of survival predictions as required for prognostic modelling. During the model building process, evidence was found for non-proportional hazards for some co-variables (Ta b l e 1 ). Interaction terms were also found to be statistically signiicant between treatment and three other co-variables: tumour grade, size and nodal status. The coeficients of the breast cancer submodel were estimated using the lexsurv library in the open  14 .
The three co-variables in the other-cause mortality submodel (age, co-morbidity and frailty) were modelled using the proportional hazards assumption. Frailty was approximated by a version of the activities of daily living (ADL) score 15 , represented by an integer value ranging from 0 to 5; 0 means no dificulties, and 5 means complete dificulty in the components eat, toilet, dress, transfer, bathe and walk. This variable was not recorded in the registry data, so a Markov chain Monte Carlo approach was adapted from Koissi and Högnäs 16 . This process inferred frailty weights, that is the probability of being at each ADL level for each patient, as well as model parameters that are the hazard ratios of other-cause mortality for each level (Appendix S1, supporting information). This estimation was carried out using the open source WinBUGS package 17 and the R2WinBUGS interface 18 .
Hazards predicted by the submodels were combined and transformed appropriately (Appendix S1, supporting information) to derive probabilities of death by 2 and 5 years (breast cancer-speciic, other cause and all cause).  distribution, deprivation pattern, tumour stage and biological subtype distributions (Tables 2 and 3). The data from these regions were also of the highest available quality in terms of accuracy and completeness in comparison with wider UK data. Survival data were derived from death certiicates from the Ofice for National Statistics, with a mean follow up of 5⋅2 years and a censoring date of 17 January 2017. Details of data preprocessing have been reported elsewhere 19 and are summarized in Appendix S1 and Table S1 (supporting information). For external validation, Eastern Cancer Registration and Information Centre (ECRIC) data were obtained on all irst diagnoses of invasive breast cancer in women aged 70 years or more between 2002 and 2012, with a mean follow-up of 4⋅8 years. The majority of administrative censoring for the validation data occurred in January 2016 (Appendix S1, supporting information). Cancer registry data in the UK do not record co-morbidities or frailty. Co-morbidity was derived from linked records in the HES data set. HES records were searched from 18 months before the date of diagnosis and linkage was made using National Health Service (NHS) number, date of birth, sex and postcode. Both data sets contained variables with a non-negligible number of missing values (Tables 2 and 3). Multiple imputation was used to create complete versions to avoid excluding patients with any missing data. Missing a variable is often dependent on patient characteristics, leading to potential exclusion bias. The distribution of these variables in patients with similar characteristics was used to impute values for the missing variables. To account for the uncertainty in this process, 15 imputations of both data sets were created. The analysis was done on each imputation, and results combined using Rubin's rules 20 . Details of the imputation process have been published previously 19 and are summarized in Appendix S1 and Table S2 (supporting information).
Validation comprised assessment of calibration and discrimination. Calibration tests whether the model predicts the correct number of deaths over a given interval. The time points chosen for validation were 2 and 5 years, owing to the age and typical frailty of the women who will use the tool. For 2-year calibration, for instance, all women with complete 2-year follow-up (excluding those lost to follow-up before 2 years) were selected and the sum of 2-year all-cause mortality predictions was compared with the number of observed deaths. Discrimination measures whether patients with a higher predicted probability of death are, on average, those who die more frequently. Discrimination is calculated as the area under the receiver operating characteristic (ROC) curve (AUC); an AUC of 0⋅5 represents the equivalent of randomly allocated mortality probabilities, and an area of 1 represents perfect concordance between probabilities and outcomes 8,21 .A l l validation results were produced by averaging over the individual results for the 15 imputed data sets and also by weighted averaging of mortality predictions using the ADL level frailty weights inferred for each patient. Tables 2 and 3. The cohorts were similar, except for a higher proportion of screened patients (9⋅2 versus 5⋅6 per cent) and a lower proportion of patients in the high-deprivation group (7⋅8 versus 22⋅2 per cent) in the external versus training data sets. After preprocessing, 18 727 (78⋅5 per cent) patients in the training data set were classiied as ER-positive; of these, 10 085 women (53⋅9 per cent) had surgery and 8642 (46⋅1 per cent) received primary endocrine therapy.

Hazard ratios
Hazard ratios for surgery versus primary endocrine therapy groups were patient-dependent. Values for a range of subgroups are shown in Table S3 (supporting information). Values in parentheses are percentages. *Predictedobserved; †predicted versus observed.

Internal validation
Overall calibration of the model was good (Ta b l e 4 ). At 2 and 5 years, predicted breast cancer and other-cause mortality differed from observed rates by less than 1 per cent in all instances. At 5 years, there was a slight overprediction of breast cancer mortality (2629 predicted versus 2556 observed deaths; P = 0⋅142) and all-cause mortality (6399 versus 6320; P = 0⋅583). The AUC representing discrimination was 0⋅75 or above for all mortality measures (Ta b l e 4 ). Performance was similar to that of PREDICT (AUC 0⋅76-0⋅78) 8 . Fig. S1 (supporting information) shows predicted and observed 5-year all-cause mortality by deciles of observed mortality. Calibration was good for the intermediate deciles, but Patients are divided into ten groups using deciles of mortality predictions. For each group, the average 5-year all-cause mortality prediction is plotted against the average observed mortality. A straight line of unit gradient represents perfect calibration. less good for low-risk (overprediction) and moderately high-risk (under-prediction) deciles.
The degree of underestimation or overestimation of all-cause mortality varied between subgroups (Ta b l e 5 ). The 5-year relative mortality difference exceeded +/-10 per cent for the surgery subgroup, the screened population subgroup and two of the tumour size subgroups (less than 10 mm and 10-19 mm). The 2-year relative mortality difference exceeded +/-10 per cent for the surgery, node-negative, co-morbidity score ≥ 3 and screen-detected subgroups. All-cause mortality was also overpredicted for patients with smaller tumours and underpredicted for those with the largest tumours (over 50 mm). At both 2 and 5 years, mortality was overpredicted in the surgery subgroup and underpredicted in the primary endocrine therapy subgroup. These subgroup differences were driven by the other-cause mortality estimates (Table S4, supporting information), with 13⋅4and17⋅3 per cent underprediction for patients who had primary endocrine therapy at 2 and 5 years respectively.

External validation
Overall calibration was also good in the external data set ( Table S5, supporting information). Five-year predicted breast cancer mortality exceeded the observed rate in the training data by 0⋅5 per cent, and the difference for all-cause mortality was 0⋅1 per cent. Two-year predicted all-cause mortality exceeded the observed rate by 1⋅2p e r cent, whereas at 5 years there was a small underprediction (-0⋅9 per cent). Fig. 1 shows predicted and observed 5-year all-cause mortality by deciles of observed mortality. Calibration was good for the higher-risk deciles, but there was some overprediction for the low-risk deciles. Discrimination results for 2-and 5-year breast cancer mortality and all-cause mortality in the external validation data set had AUC values in the range 0⋅75-0⋅80 (Table S5, supporting information). The results of external subgroup validation are shown in Table S6 (supporting information).

Discussion
A prognostic model for women aged 70 years or older with ER-positive early breast cancer was developed and validated. This model was targeted speciically at supporting the decision of whether to undergo surgery or opt for primary endocrine therapy. The model was developed using a large cohort of patients from the West Midlands (WMCIU) and the Northern and Yorkshire (NYCRIS) cancer registries. These together cover around 25 per cent of the population in England, and are regarded as being broadly representative of the UK demographic distribution. The model was validated using data from the Eastern Cancer Registry (ECRIC) in England. The model was shown to be well calibrated for all-cause mortality and to have good discrimination, with similar performance to other prognostic models in early breast cancer, such as PREDICT. The model additionally performed well on the external validation data set. The Age Gap prognostic model provides outputs for both breast cancer mortality and other-cause mortality. This is an important issue in older patients, who have an increased risk of dying from other causes. By age 85 years, around three-quarters of deaths are from causes other than breast cancer.
The indings of this analysis are not directly comparable to those of other predictive models in breast cancer. PREDICT and other commonly used models focus on decisions around adjuvant therapy following breast cancer surgery, rather than whether or not a patient will beneit from surgery itself. These models are typically trained on a mixed-age population, and not designed to deal with the greater other-cause mortality rates among patients aged 70 years or more. In contrast, the Age Gap model is targeted at older women for whom the beneits versus harms of surgery are more complex, owing to co-morbidities and frailty.
In this model, other-cause mortality is underpredicted for patients treated with primary endocrine therapy (absolute mortality difference -6⋅8 per cent) ( Table S4, supporting information). The other-cause mortality model has taken the irst step to incorporating the impact of co-morbidities and frailty on individual-patient predictions of mortality, but the prediction is currently subject to limitations. Cancer registry data in the UK do not record co-morbidities or frailty. Here, co-morbidity was derived from linked records in the HES data set. Data were available from HES only if a patient had a hospital inpatient or day-case admission preceding their cancer diagnosis. This methodology relies heavily on the accuracy of coding within HES. HES data are likely to under-record co-morbidities in patients who have chronic co-morbidities managed in the community or outpatient setting, such as diabetes or dementia. On average, women are more likely to receive primary endocrine therapy if they are older and have chronic co-morbidities 22,23 . Under-reporting of co-morbidities in patients undergoing primary endocrine therapy may, at least in part, contribute to the underprediction of the other-cause mortality for such patients.
Validation of the model was conducted at 2 and 5 years after diagnosis as suficient 10-year uncensored follow-up data were not available. This also relects the decision that these prognostic timescales are the most appropriate for the majority of women in this older age group, for many of whom a 10-year prediction may not give a positive message. Further validation will be carried out when full data are available, and presentation of 10-year predictions will be reviewed in the light of user and patient input.
No measure of frailty is included in cancer registry data sets. Frailty was approximated by a version of the ADL score as used by Stineman and colleagues 15 .V a l u e sf r o m this US study were used to form priors for the parameters in the developed model. The mean age of participants, 77⋅4 years, was similar to that in the UK population. However, the distributions of ADL level in these study participants may not fully relect the UK population of older women with breast cancer. Further validation of the current methodology will be possible using data collected in the prospective Bridging the Age Gap cohort study, which includes co-morbidities and frailty data, once suficiently mature survival data become available.
The present analysis beneits from the large retrospective data set of patients with breast cancer. This reduces the biases from exclusion criteria of many RCTs, which often include both older age and complex co-morbidity. Use of routinely collected data, however, resulted in a high proportion of missing values, especially for patients in the non-surgical group. For instance, tumour size and clinical node status based on imaging was not clearly recorded in these patients. No perfect method exists for obtaining these missing data. Multiple imputation, however, is less prone to bias than complete-case analysis and/or treating 'missing' as a category. It also allows the propagation of uncertainty owing to missing data into the estimates of co-variable effects.
The prognostic model is part of a decision support intervention (DESI) including an online tool (https://agegap .shef.ac.uk/) with two other patient-facing decision aids 24 , tailored to the information needs and preferences of older women 25,26 in decision-making between surgery and primary endocrine therapy. This DESI has been developed for women whose choice between surgery and primary endocrine therapy is not clear cut. The online tool predictions are generated by the model described in the present study. The co-morbidity score is determined by entering a history of patient co-morbidities via a tick-box list. The frailty score is similarly generated by entering the level of dificulty experience in six functional domains. In the current version, for reasons of patient sensitivity, the deprivation level is not entered and a medium level is assumed for all patients. The other model inputs are straightforward.
A prospective cohort study, part of the Bridging the Age Gap in Breast Cancer project, will inally offer a more detailed data set to further assess the effects of surgery and primary endocrine therapy on survival in older women.