Abstract

Background: Previous reports identifying discordance between multiparameter tests at the individual patient level have been largely attributed to methodological shortcomings of multiple in silico studies. Comparisons between tests, when performed using actual diagnostic assays, have been predicted to demonstrate high degrees of concordance. OPTIMA prelim compared predicted risk stratification and subtype classification of different multiparameter tests performed directly on the same population.

Methods: Three hundred thirteen women with early breast cancer were randomized to standard (chemotherapy and endocrine therapy) or test-directed (chemotherapy if Oncotype DX recurrence score >25) treatment. Risk stratification was also determined with Prosigna (PAM50), MammaPrint, MammaTyper, NexCourse Breast (IHC4-AQUA), and conventional IHC4 (IHC4). Subtype classification was provided by Blueprint, MammaTyper, and Prosigna.

Results: Oncotype DX predicted a higher proportion of tumors as low risk (82.1%, 95% confidence interval [CI] = 77.8% to 86.4%) than were predicted low/intermediate risk using Prosigna (65.5%, 95% CI = 60.1% to 70.9%), IHC4 (72.0%, 95% CI = 66.5% to 77.5%), MammaPrint (61.4%, 95% CI = 55.9% to 66.9%), or NexCourse Breast (61.6%, 95% CI = 55.8% to 67.4%). Strikingly, the five tests showed only modest agreement when dichotomizing results between high vs low/intermediate risk. Only 119 (39.4%) tumors were classified uniformly as either low/intermediate risk or high risk, and 183 (60.6%) were assigned to different risk categories by different tests, although 94 (31.1%) showed agreement between four of five tests. All three subtype tests assigned 59.5% to 62.4% of tumors to luminal A subtype, but only 121 (40.1%) were classified as luminal A by all three tests and only 58 (19.2%) were uniformly assigned as nonluminal A. Discordant subtyping was observed in 123 (40.7%) tumors.

Conclusions: Existing evidence on the comparative prognostic information provided by different tests suggests that current multiparameter tests provide broadly equivalent risk information for the population of women with estrogen receptor (ER)–positive breast cancers. However, for the individual patient, tests may provide differing risk categorization and subtype information.

For over 40 years ( 1–3 ) the impact of tumor molecular markers on patient outcome and treatment response has been central to breast cancer management. Gene expression profiling ( 4 , 5 ) to describe the intrinsic subtypes of breast cancer was followed by the independent development, in 2004, of the first multiparameter molecular diagnostic assay stratifying breast cancer patients with estrogen receptor (ER)–positive disease based on risk of relapse following treatment ( 6 ). The past decade saw a rapid expansion in the number of such multiparameter molecular residual risk tests for breast cancer patients (see [7]). These herald an era of more personalized medicine because of their potential to inform rational treatment decisions on a patient-by-patient basis. The initial goal was to identify patients who, despite “favorable” clinico-pathological characteristics, have a poor outcome following conventional endocrine treatment and to advise aggressive therapy, which may reduce relapse risk. Over time, interest has also grown in the potential for multiparameter assays to predict chemo-sensitivity ( 8 , 9 ). These tests may also allow an estimate of the intrinsic chemotherapy sensitivity of tumors, reducing the importance of stage information. There are women who gain little from chemotherapy and women who have clinically relevant gains. There is therefore a rationale for using stratified medicine to identify patients who may safely avoid toxicities associated with chemotherapy.

The OPTIMA trial ( 7 ) is designed as a prospective test of the effectiveness of multiparameter testing in identifying the subgroup of women with breast cancer (among those who would be routinely offered adjuvant chemotherapy based on conventional criteria) whose tumors are intrinsically insensitive to chemotherapy and for whom such treatment offers only toxicity and delay in starting more effective adjuvant endocrine therapy and radiotherapy without any clinically meaningful additional benefit. A key objective of “OPTIMA prelim,” the in-built feasibility phase of OPTIMA, was to evaluate the performance of alternative multiparameter tests, in order to aid selection of a test for the main study that would ensure the results of such a trial be robust and broadly applicable to the patient population, both now and in the future. Critical to this decision was the ability to compare test performance at both the population and individual patient level. Existing data directly comparing individual test performance is limited. A series of studies performing statistical comparisons between tests suggest that, at a population level, four tests (IHC4, PAM50, BCI, and Oncotype DX) provided broadly equivalent prognostic information on the risk of relapse up to five years post-treatment ( 10–12 ). Further studies, based largely on in silico reconstruction of existing tests from publically available gene expression datasets, suggest a statistically significant degree of discordance between signatures at the individual patient level ( 13–17 ). These observations are predominantly attributed to methodological differences due to the in silico reconstruction of signatures ( 15 , 17 ). This thesis has not, to date, been robustly tested using actual test methodologies. Limited data shows that concordance between different tests in assigning patients to similar risk groups is low ( 10 ). This is consistent with the marked differences in genes measured by different tests (see Supplementary Table 1 , available online) and with the relatively modest predictive value, in terms of recurrence, offered by these tests at the individual patient level. Here we report the direct patient-level comparison of multiple commercial residual risk profiles in the OPTIMA prelim study, performed to gather information on their performance.

Methods

Recruitment and Patient Samples

Optimal Personalised Treatment of early breast cancer using Multiparameter Analysis preliminary study (OPTIMA prelim, ISRCTN42400492) ( 18 ) is a multicenter study that randomly assigned women aged 40 years or older with ER-positive, human epidermal growth factor receptor 2 (HER2)–negative early breast cancer and either one to nine involved axillary nodes or tumor size of 30 mm or greater (if node-negative) between standard treatment (chemotherapy followed by endocrine therapy) and test-directed therapy ( 7 ). In the test-directed arm, an Oncotype DX test was performed; patients with recurrence scores (RSs) greater than 25 (“high” risk) were assigned chemotherapy followed by endocrine therapy; those with RSs of 25 or lower (“intermediate/low” risk) received endocrine therapy alone. Chemotherapy, selected from regimens commonly used in the UK NHS, was specified at patient registration. The study was partially blinded so that neither patients nor referring centers were aware of whether chemotherapy was assigned on the basis of Oncotype DX RS or by random assignment to the standard treatment arm. Central retesting of ER and HER2 status was performed on all patients. Following confirmation of eligibility, samples were sent to Genomic Health for Oncotype DX assays to be performed with funding from the OPTIMA prelim study. No patient outcome data is available for this analysis. All patients gave written informed consent to participate in the study. The study was approved by the South East Coast - Surrey Research Ethics Committee.

To facilitate the comparison of alternative tests, a number of test vendors were approached for support ( Supplementary Table 2 , available online): Ultimately five tests in addition to Oncotype DX were included in the OPTIMA prelim study: MammaPrint/BluePrint, Prosigna (PAM50), MammaTyper, NexCourse Breast by Aqua (IHC4-AQUA), and IHC4 by conventional immunohistochemistry. Multiparameter assays were performed irrespective of patient random assignment. Vendors that did not participate expressed concerns about transposing specific tests into novel applications.

Residual tumor samples from patients were collected at a central good clinical laboratory practice pathology repository (Edinburgh, UK). Tissue microarrays (TMAs) were constructed as previously described ( 19 ) using triplicate 0.6 mm cores. TMA sections, tissue sections, or extracted mRNA were provided either to the Ontario Institute for Cancer Research (Prosigna; IHC4: ER, PgR, and Ki67 by quantitative image analysis [Ariol] using standard immunohistochemistry [IHC] with HER2 testing by in situ hybridization at UCL Advanced Diagnostics) or to Genoptix (IHC4-AQUA), Agendia (MammaPrint/Blueprint) and Stratifyer (MammaTyper). Results from individual tests were collated at the Warwick Clinical Trials Unit (CTU) for analysis.

Statistical Analysis

OPTIMA prelim was designed to recruit 300 patients to enable the kappa value for agreement between tests to be estimated with good accuracy. Assuming 70% of patients would be assigned to no chemotherapy by the test and the true kappa value was 0.8 ( 14 ), this would provide a lower 95% confidence limit of 0.73. These numbers were also sufficient to allow for the assumed proportion of patients assigned to no chemotherapy to vary from 55% to 80% (lower confidence limit for kappa varied from 0.74 to 0.72, respectively).

The proportion of tumors assigned to risk groups and/or subtypes was determined. The kappa coefficient and associated 95% confidence interval (CI) was used to assess agreement between tests. The predicted benefits of endocrine therapy with or without chemotherapy individualized to patients were estimated using two nomograms, Adjuvant! (version 8, without correction for HER2 status) ( 20 ) and PREDICT ( 21–23 ). A multivariable logistic regression model using stepwise elimination was performed to determine factors predicting discordant cases. To explore the post hoc hypothesis that individual tests were more likely to agree at the extremes of their ranges, two-by-two scatterplots for the tests that provide risk scores and agreement charts for the categorization of tumors were constructed ( 24 ). Statistical analyses were performed using the SAS statistical package (version 9.3; SAS Institute Inc., Cary, NC) and R version 3.0.3 ( 25 ). All statistical tests were two-sided, and a P value of less than .05 was considered statistically significant.

Results

Patients

Between October 2012 and June 2014, 313 patients were randomly assigned from 35 UK hospitals (see the Notes), of whom 302 had samples available for multiparameter testing ( Table 1 ). Eleven patients were excluded from multiparameter testing; four withdrew consent, one was ineligible, and samples for six patients were insufficient for testing ( Supplementary Figure 1 , available online).

Table 1.

Characteristics of the 302 patients

CharacteristicTotal
Age, median (range), y58 (40–78)
Menopausal status of participant, No. (%)
 Pre/perimenopausal97 (32.1)
 Postmenopausal205 (67.9)
Number of involved nodes, No. (%)
 None57 (18.9)
 1-3192 (63.6)
 4-942 (13.9)
 Positive sentinel node biopsy without clearance surgery11 (3.6)
Histological grade, No. (%)
 119 (6.3)
 2201 (66.6)
 382 (27.1)
Largest tumor size, median (range), mm28 (2–170)
 ≤30 No. (%)172 (57.0)
 >30 No. (%)130 (43.0)
Lymphovascular invasion reported, No. (%)
 No169 (56.0)
 Yes122 (40.4)
 Not known11 (3.6)
Tumor type, No. (%)
 Ductal214 (70.9)
 Lobular65 (21.5)
 Tubular/cribriform2 (0.7)
 Mucinous4 (1.3)
 Micropapillary1 (0.3)
 Mixed16 (5.3)
CharacteristicTotal
Age, median (range), y58 (40–78)
Menopausal status of participant, No. (%)
 Pre/perimenopausal97 (32.1)
 Postmenopausal205 (67.9)
Number of involved nodes, No. (%)
 None57 (18.9)
 1-3192 (63.6)
 4-942 (13.9)
 Positive sentinel node biopsy without clearance surgery11 (3.6)
Histological grade, No. (%)
 119 (6.3)
 2201 (66.6)
 382 (27.1)
Largest tumor size, median (range), mm28 (2–170)
 ≤30 No. (%)172 (57.0)
 >30 No. (%)130 (43.0)
Lymphovascular invasion reported, No. (%)
 No169 (56.0)
 Yes122 (40.4)
 Not known11 (3.6)
Tumor type, No. (%)
 Ductal214 (70.9)
 Lobular65 (21.5)
 Tubular/cribriform2 (0.7)
 Mucinous4 (1.3)
 Micropapillary1 (0.3)
 Mixed16 (5.3)

y=year

Table 1.

Characteristics of the 302 patients

CharacteristicTotal
Age, median (range), y58 (40–78)
Menopausal status of participant, No. (%)
 Pre/perimenopausal97 (32.1)
 Postmenopausal205 (67.9)
Number of involved nodes, No. (%)
 None57 (18.9)
 1-3192 (63.6)
 4-942 (13.9)
 Positive sentinel node biopsy without clearance surgery11 (3.6)
Histological grade, No. (%)
 119 (6.3)
 2201 (66.6)
 382 (27.1)
Largest tumor size, median (range), mm28 (2–170)
 ≤30 No. (%)172 (57.0)
 >30 No. (%)130 (43.0)
Lymphovascular invasion reported, No. (%)
 No169 (56.0)
 Yes122 (40.4)
 Not known11 (3.6)
Tumor type, No. (%)
 Ductal214 (70.9)
 Lobular65 (21.5)
 Tubular/cribriform2 (0.7)
 Mucinous4 (1.3)
 Micropapillary1 (0.3)
 Mixed16 (5.3)
CharacteristicTotal
Age, median (range), y58 (40–78)
Menopausal status of participant, No. (%)
 Pre/perimenopausal97 (32.1)
 Postmenopausal205 (67.9)
Number of involved nodes, No. (%)
 None57 (18.9)
 1-3192 (63.6)
 4-942 (13.9)
 Positive sentinel node biopsy without clearance surgery11 (3.6)
Histological grade, No. (%)
 119 (6.3)
 2201 (66.6)
 382 (27.1)
Largest tumor size, median (range), mm28 (2–170)
 ≤30 No. (%)172 (57.0)
 >30 No. (%)130 (43.0)
Lymphovascular invasion reported, No. (%)
 No169 (56.0)
 Yes122 (40.4)
 Not known11 (3.6)
Tumor type, No. (%)
 Ductal214 (70.9)
 Lobular65 (21.5)
 Tubular/cribriform2 (0.7)
 Mucinous4 (1.3)
 Micropapillary1 (0.3)
 Mixed16 (5.3)

y=year

Results From Predictive Nomograms

The majority of patients recruited were either at intermediate (74.8%) or high (21.2%) risk using the Nottingham Prognostic Index (NPI) ( 26 ). All 12 patients with lower-risk NPI scores (≤3.4) had tumors 3.0 cm or larger in size. The median 10-year overall survival estimated by PREDICT ( 21–23 ) or Adjuvant! ( 20 ) differed by 6.2% to 8.4%, reflecting expected differences between the risk estimate provided by these tools ( Table 2 ).

Table 2.

Clinical risk of patients (n = 302)

Risk toolTotal
Nottingham Prognostic Index, median (range)4.6 (2.8–8.2)
 ≤3.4, No. (%)12 (4.0)
 >3.4–≤5.4, No. (%)226 (74.8)
 >5.4, No. (%)64 (21.2)
PREDICT 10-year overall survival, median (range), %
 Endocrine therapy only77.0 (25.1–94.6)
 Chemotherapy and endocrine therapy82.6 (39.8–95.9)
 Additional benefit of chemotherapy5.5 (1.2–25.8)
Adjuvant! 10-year risk overall survival, median (range), %
 Endocrine therapy only68.6 (25.4–90.4)
 Chemotherapy and endocrine therapy76.4 (31.0–93.6)
 Additional benefit of chemotherapy6.8 (1.2–25.8)
Adjuvant! 10-year relapse-free survival, median (range), %
 Endocrine therapy only60.5 (22.0–82.1)
 Chemotherapy and endocrine therapy72.9 (29.1–89.4)
 Additional benefit of chemotherapy10.5 (2.7–33.3)
Risk toolTotal
Nottingham Prognostic Index, median (range)4.6 (2.8–8.2)
 ≤3.4, No. (%)12 (4.0)
 >3.4–≤5.4, No. (%)226 (74.8)
 >5.4, No. (%)64 (21.2)
PREDICT 10-year overall survival, median (range), %
 Endocrine therapy only77.0 (25.1–94.6)
 Chemotherapy and endocrine therapy82.6 (39.8–95.9)
 Additional benefit of chemotherapy5.5 (1.2–25.8)
Adjuvant! 10-year risk overall survival, median (range), %
 Endocrine therapy only68.6 (25.4–90.4)
 Chemotherapy and endocrine therapy76.4 (31.0–93.6)
 Additional benefit of chemotherapy6.8 (1.2–25.8)
Adjuvant! 10-year relapse-free survival, median (range), %
 Endocrine therapy only60.5 (22.0–82.1)
 Chemotherapy and endocrine therapy72.9 (29.1–89.4)
 Additional benefit of chemotherapy10.5 (2.7–33.3)
Table 2.

Clinical risk of patients (n = 302)

Risk toolTotal
Nottingham Prognostic Index, median (range)4.6 (2.8–8.2)
 ≤3.4, No. (%)12 (4.0)
 >3.4–≤5.4, No. (%)226 (74.8)
 >5.4, No. (%)64 (21.2)
PREDICT 10-year overall survival, median (range), %
 Endocrine therapy only77.0 (25.1–94.6)
 Chemotherapy and endocrine therapy82.6 (39.8–95.9)
 Additional benefit of chemotherapy5.5 (1.2–25.8)
Adjuvant! 10-year risk overall survival, median (range), %
 Endocrine therapy only68.6 (25.4–90.4)
 Chemotherapy and endocrine therapy76.4 (31.0–93.6)
 Additional benefit of chemotherapy6.8 (1.2–25.8)
Adjuvant! 10-year relapse-free survival, median (range), %
 Endocrine therapy only60.5 (22.0–82.1)
 Chemotherapy and endocrine therapy72.9 (29.1–89.4)
 Additional benefit of chemotherapy10.5 (2.7–33.3)
Risk toolTotal
Nottingham Prognostic Index, median (range)4.6 (2.8–8.2)
 ≤3.4, No. (%)12 (4.0)
 >3.4–≤5.4, No. (%)226 (74.8)
 >5.4, No. (%)64 (21.2)
PREDICT 10-year overall survival, median (range), %
 Endocrine therapy only77.0 (25.1–94.6)
 Chemotherapy and endocrine therapy82.6 (39.8–95.9)
 Additional benefit of chemotherapy5.5 (1.2–25.8)
Adjuvant! 10-year risk overall survival, median (range), %
 Endocrine therapy only68.6 (25.4–90.4)
 Chemotherapy and endocrine therapy76.4 (31.0–93.6)
 Additional benefit of chemotherapy6.8 (1.2–25.8)
Adjuvant! 10-year relapse-free survival, median (range), %
 Endocrine therapy only60.5 (22.0–82.1)
 Chemotherapy and endocrine therapy72.9 (29.1–89.4)
 Additional benefit of chemotherapy10.5 (2.7–33.3)

Multiparameter Tests

Results from all tests were available for 236 (78.1%) patients. One patient on the standard arm had insufficient invasive tumor for Oncotype DX testing but sufficient for alternative testing. Test results were unobtainable from Prosigna for three patients, from MammaTyper for four patients, from MammaPrint for four patients, and from BluePrint for seven patients. IHC4 and IHC4-AQUA could not be determined for 45 (14.9%) and 31 (10.3%) patients, respectively, reflecting use of TMAs for this assessment.

Risk Scores

Five tests provided quantitative or semi-quantitative risk scores and a predefined categorized risk assessment (low, intermediate, high). For OPTIMA prelim, Oncotype DX RS was dichotomized around 25, separating “low/intermediate” from “high” risk cases as only patients with a high risk of recurrence were allocated chemotherapy ( Table 3 ). Using this approach for all tests ( Supplementary Methods , available online), the proportion of tumors classified as low/intermediate risk was 82.1% (95% CI = 77.8% to 86.4%) for Oncotype DX, 72.0% (95% CI = 66.5% to 77.5%) for IHC4, 65.6% (95% CI = 60.1% to 70.9%) using Prosigna risk of recurrence score including proliferation and tumor size, 61.6% (95% CI = 55.8% to 67.4%) for IHC4-AQUA, and 61.4% (95% CI = 55.9% to 66.9%) for MammaPrint ( Table 3 ).

Table 3.

Risk categorization by each test

Risk group Oncotype DX * No. (%) MammaPrint No. (%) Prosigna No. (%)IHC4 No. (%) IHC4-AQUA No. (%)
No. of patients (%)301 (99.7)298 (98.9)299 (99.0)257 (85.1)271 (89.7)
Low risk163 (54.2)183 (61.4)108 (36.1)62 (24.1)87 (32.1)
Intermediate risk84 (27.9)88 (29.4)123 (47.9)80 (29.5)
Mid risk55 (20.3)
High risk54 (17.9)115 (38.6)103 (34.5)72 (28.0)49 (18.1)
Risk group Oncotype DX * No. (%) MammaPrint No. (%) Prosigna No. (%)IHC4 No. (%) IHC4-AQUA No. (%)
No. of patients (%)301 (99.7)298 (98.9)299 (99.0)257 (85.1)271 (89.7)
Low risk163 (54.2)183 (61.4)108 (36.1)62 (24.1)87 (32.1)
Intermediate risk84 (27.9)88 (29.4)123 (47.9)80 (29.5)
Mid risk55 (20.3)
High risk54 (17.9)115 (38.6)103 (34.5)72 (28.0)49 (18.1)

*Oncotype DX is divided into three risk groups, with intermediate defined as recurrence score 18-25 for the current analysis.

†MammaPrint divides tumors into two risk groups only.

‡IHC4-AQUA divides tumors into four risk groups: low, low-mid (here called intermediate), mid and high (combined as high risk).

Table 3.

Risk categorization by each test

Risk group Oncotype DX * No. (%) MammaPrint No. (%) Prosigna No. (%)IHC4 No. (%) IHC4-AQUA No. (%)
No. of patients (%)301 (99.7)298 (98.9)299 (99.0)257 (85.1)271 (89.7)
Low risk163 (54.2)183 (61.4)108 (36.1)62 (24.1)87 (32.1)
Intermediate risk84 (27.9)88 (29.4)123 (47.9)80 (29.5)
Mid risk55 (20.3)
High risk54 (17.9)115 (38.6)103 (34.5)72 (28.0)49 (18.1)
Risk group Oncotype DX * No. (%) MammaPrint No. (%) Prosigna No. (%)IHC4 No. (%) IHC4-AQUA No. (%)
No. of patients (%)301 (99.7)298 (98.9)299 (99.0)257 (85.1)271 (89.7)
Low risk163 (54.2)183 (61.4)108 (36.1)62 (24.1)87 (32.1)
Intermediate risk84 (27.9)88 (29.4)123 (47.9)80 (29.5)
Mid risk55 (20.3)
High risk54 (17.9)115 (38.6)103 (34.5)72 (28.0)49 (18.1)

*Oncotype DX is divided into three risk groups, with intermediate defined as recurrence score 18-25 for the current analysis.

†MammaPrint divides tumors into two risk groups only.

‡IHC4-AQUA divides tumors into four risk groups: low, low-mid (here called intermediate), mid and high (combined as high risk).

Agreement between tests when patients were subdivided into combined low/intermediate vs high-risk groups using predefined cutpoints was modest; Kappas ranged from 0.33 (95% CI = 0.21 to 0.44) between MammaPrint and IHC4 to 0.60 (95% CI = 0.50 to 0.70) between IHC4 and IHC4-AQUA ( Table 4 ; Supplementary Table 3 , available online). Only 119 (39.4%) tumors were uniformly classified as either low/intermediate or high by all five tests; 30.8% (n = 93) tumors were classified as low/intermediate risk by all tests, a further 8.6% (n = 26) classified as high risk by all tests. The majority (60.6%, n = 183) of tumors gave no consensus result across all five tests. However, for 31.1% of tumors (n = 94), agreement was observed in four of the five tests. There were also no clear differences between tests in terms of the agreement with other tests ( Table 5 ). No statistically significant differences in clinico-pathological features between tumors that were concordant or discordant were observed ( Supplementary Table 4 , available online). There is no evidence from the scatterplots of risk scores that individual tests were more likely to agree at the extremes of their ranges ( Supplementary Figure 2 , available online). Disagreement spanning one of three risk categories was common, eg, low risk to intermediate risk, and disagreement spanning two categories was not infrequent, ie, low risk to high risk ( Figure 1 ; Supplementary Table 5 , available online). An exploratory analysis using a categorization of low vs intermediate/high risk to more closely reflect current test usage was performed ( Supplementary Tables 6-7 , available online); again, modest agreement between tests was observed.

 Agreement charts for two-by-two comparison of tests according to risk groups. A) Prosigna against Oncotype DX. B) IHC4 against Oncotype DX. C) IHC4-AQUA against Oncotype DX. D) IHC4 against Prosigna. E) IHC4-AQUA against Prosigna. F) IHC4 against IHC4-AQUA. Only tests that provide three risk categories are included in this analysis. The Oncotype DX intermediate-risk group is defined as RS 18-25. The IHC4-AQUA mid-risk group was combined with the high-risk group. Rectangles are drawn for each level of the test outcomes, ie, low, intermediate, and high risk, based on the row and column cumulative totals. Thus, for the low-risk rectangle of the test 1 vs test 2 comparison, all tumors categorized as low risk by either test are included. The boundaries of the rectangles along both axes represent the number of tumors that were categorized as that outcome for each test. Black squares within the rectangles represent exact agreement between the levels of the two tests, eg, both low scores, and are of size based on the cell frequencies and located according to the cumulative totals of the previous levels. Gray rectangles represent partial agreement, where the scores from one test are within one level of those from the other test, ie, a low score on one test but intermediate on the other test. White areas within the rectangle reflect disagreement by more than level, ie, low scores on one test and high scores on the other test.
Figure 1.

Agreement charts for two-by-two comparison of tests according to risk groups. A) Prosigna against Oncotype DX. B) IHC4 against Oncotype DX. C) IHC4-AQUA against Oncotype DX. D) IHC4 against Prosigna. E) IHC4-AQUA against Prosigna. F) IHC4 against IHC4-AQUA. Only tests that provide three risk categories are included in this analysis. The Oncotype DX intermediate-risk group is defined as RS 18-25. The IHC4-AQUA mid-risk group was combined with the high-risk group. Rectangles are drawn for each level of the test outcomes, ie, low, intermediate, and high risk, based on the row and column cumulative totals. Thus, for the low-risk rectangle of the test 1 vs test 2 comparison, all tumors categorized as low risk by either test are included. The boundaries of the rectangles along both axes represent the number of tumors that were categorized as that outcome for each test. Black squares within the rectangles represent exact agreement between the levels of the two tests, eg, both low scores, and are of size based on the cell frequencies and located according to the cumulative totals of the previous levels. Gray rectangles represent partial agreement, where the scores from one test are within one level of those from the other test, ie, a low score on one test but intermediate on the other test. White areas within the rectangle reflect disagreement by more than level, ie, low scores on one test and high scores on the other test.

Table 4.

Kappa statistics for tests providing risk predictions *

TestMammaPrint (low), Kappa statistic (95% CI)Prosigna (low/intermediate), Kappa statistic (95% CI)IHC4 (low/intermediate), Kappa statistic (95% CI) IHC4-AQUA (low/low-mid), Kappa statistic (95% CI)
Oncotype DX (recurrence score ≤25)0.40 (0.30 to 0.49)0.44 (0.33 to 0.54)0.53 (0.41 to 0.65)0.40 (0.30 to 0.51)
MammaPrint0.53 (0.43 to 0.63)0.33 (0.21 to 0.44)0.42 (0.30 to 0.53)
Prosigna (low/intermediate)0.39 (0.27 to 0.50)0.43 (0.31 to 0.54)
IHC4 (low/intermediate)0.60 (0.50 to 0.70)
TestMammaPrint (low), Kappa statistic (95% CI)Prosigna (low/intermediate), Kappa statistic (95% CI)IHC4 (low/intermediate), Kappa statistic (95% CI) IHC4-AQUA (low/low-mid), Kappa statistic (95% CI)
Oncotype DX (recurrence score ≤25)0.40 (0.30 to 0.49)0.44 (0.33 to 0.54)0.53 (0.41 to 0.65)0.40 (0.30 to 0.51)
MammaPrint0.53 (0.43 to 0.63)0.33 (0.21 to 0.44)0.42 (0.30 to 0.53)
Prosigna (low/intermediate)0.39 (0.27 to 0.50)0.43 (0.31 to 0.54)
IHC4 (low/intermediate)0.60 (0.50 to 0.70)

*Kappa statistics are for agreement between categorization into combined low and intermediate risk vs high risk. CI = confidence interval.

†IHC4-AQUA mid risk and high risk are combined for this analysis.

Table 4.

Kappa statistics for tests providing risk predictions *

TestMammaPrint (low), Kappa statistic (95% CI)Prosigna (low/intermediate), Kappa statistic (95% CI)IHC4 (low/intermediate), Kappa statistic (95% CI) IHC4-AQUA (low/low-mid), Kappa statistic (95% CI)
Oncotype DX (recurrence score ≤25)0.40 (0.30 to 0.49)0.44 (0.33 to 0.54)0.53 (0.41 to 0.65)0.40 (0.30 to 0.51)
MammaPrint0.53 (0.43 to 0.63)0.33 (0.21 to 0.44)0.42 (0.30 to 0.53)
Prosigna (low/intermediate)0.39 (0.27 to 0.50)0.43 (0.31 to 0.54)
IHC4 (low/intermediate)0.60 (0.50 to 0.70)
TestMammaPrint (low), Kappa statistic (95% CI)Prosigna (low/intermediate), Kappa statistic (95% CI)IHC4 (low/intermediate), Kappa statistic (95% CI) IHC4-AQUA (low/low-mid), Kappa statistic (95% CI)
Oncotype DX (recurrence score ≤25)0.40 (0.30 to 0.49)0.44 (0.33 to 0.54)0.53 (0.41 to 0.65)0.40 (0.30 to 0.51)
MammaPrint0.53 (0.43 to 0.63)0.33 (0.21 to 0.44)0.42 (0.30 to 0.53)
Prosigna (low/intermediate)0.39 (0.27 to 0.50)0.43 (0.31 to 0.54)
IHC4 (low/intermediate)0.60 (0.50 to 0.70)

*Kappa statistics are for agreement between categorization into combined low and intermediate risk vs high risk. CI = confidence interval.

†IHC4-AQUA mid risk and high risk are combined for this analysis.

Table 5.

Number of tests agreeing with each test

No. of other tests agreed with testOncotype DX No. (%)Prosigna No. (%)MammaPrint No. (%)IHC4 No. (%)IHC4-AQUA No. (%)
4119 (39.4)119 (39.4)119 (39.4)119 (39.4)119 (39.4)
384 (27.8)77 (25.5)73 (24.2)67 (22.2)75 (24.8)
254 (17.9)52 (17.2)47 (15.6)36 (11.9)33 (10.9)
131 (10.3)33 (10.9)34 (11.2)25 (8.3)27 (9.0)
013 (4.3)18 (6.0)25 (8.3)10 (3.3)17 (5.6)
Missing1 (0.3)3 (1.0)4 (1.3)45 (14.9)31 (10.3)
No. of other tests agreed with testOncotype DX No. (%)Prosigna No. (%)MammaPrint No. (%)IHC4 No. (%)IHC4-AQUA No. (%)
4119 (39.4)119 (39.4)119 (39.4)119 (39.4)119 (39.4)
384 (27.8)77 (25.5)73 (24.2)67 (22.2)75 (24.8)
254 (17.9)52 (17.2)47 (15.6)36 (11.9)33 (10.9)
131 (10.3)33 (10.9)34 (11.2)25 (8.3)27 (9.0)
013 (4.3)18 (6.0)25 (8.3)10 (3.3)17 (5.6)
Missing1 (0.3)3 (1.0)4 (1.3)45 (14.9)31 (10.3)
Table 5.

Number of tests agreeing with each test

No. of other tests agreed with testOncotype DX No. (%)Prosigna No. (%)MammaPrint No. (%)IHC4 No. (%)IHC4-AQUA No. (%)
4119 (39.4)119 (39.4)119 (39.4)119 (39.4)119 (39.4)
384 (27.8)77 (25.5)73 (24.2)67 (22.2)75 (24.8)
254 (17.9)52 (17.2)47 (15.6)36 (11.9)33 (10.9)
131 (10.3)33 (10.9)34 (11.2)25 (8.3)27 (9.0)
013 (4.3)18 (6.0)25 (8.3)10 (3.3)17 (5.6)
Missing1 (0.3)3 (1.0)4 (1.3)45 (14.9)31 (10.3)
No. of other tests agreed with testOncotype DX No. (%)Prosigna No. (%)MammaPrint No. (%)IHC4 No. (%)IHC4-AQUA No. (%)
4119 (39.4)119 (39.4)119 (39.4)119 (39.4)119 (39.4)
384 (27.8)77 (25.5)73 (24.2)67 (22.2)75 (24.8)
254 (17.9)52 (17.2)47 (15.6)36 (11.9)33 (10.9)
131 (10.3)33 (10.9)34 (11.2)25 (8.3)27 (9.0)
013 (4.3)18 (6.0)25 (8.3)10 (3.3)17 (5.6)
Missing1 (0.3)3 (1.0)4 (1.3)45 (14.9)31 (10.3)

Intrinsic Subtypes

The three tests that provide subtype information categorized similar proportions of patients as having “luminal A” tumors (BluePrint: 60.7%, 95% CI = 55.2% to 66.3%; Prosigna: 59.5%, 95% CI = 53.9% to 65.1%; and MammaTyper [combined luminal A and low-risk luminal B]: 62.4%, 95% CI = 56.9% to 67.9%). Thirteen (4.3%) patients were classified as having HER2-enriched/positive tumors by at least one test. Two (0.7%) patients had basal-like tumors using Prosigna subtyping, one of whom also had a basal-like tumor using BluePrint but triple-negative breast cancer using MammaTyper. All these patients were classified as ER-positive and HER2-negative on central review. Agreement between all three tests providing subtype assignment was obtained for 179 (59.3%) patients; 121 (40.1%) tumors were classified as luminal A, 58 (19.2%) as all other subtypes. Discordant results across these tests were seen in 123 (40.7%) patients. Moderate agreement between tests was confirmed by Kappa statistics of 0.39 (95% CI = 0.29 to 0.50) between BluePrint and MammaTyper, 0.44 (95% CI = 0.34 to 0.54) between Prosigna and MammaTyper, and 0.55 (95% CI = 0.45 to 0.64) between BluePrint and Prosigna subtype.

Assessing Relationship Between the Prosigna Subtyping and Risk of Recurrence Score

Prosigna is unique among the multiparameter assays evaluated in providing both a subtype and a continuous risk of recurrence score (ROR), with predefined risk categories derived from an identical set of genes. All 178 tumors classified as luminal A had an ROR score below the predefined high-risk cutpoint, and none of the 113 luminal B tumors was classified as low risk ( Table 6 ). Eight tumors, all of which were centrally confirmed as ER-positive/HER2-negative, were categorized into either the basal-like (n = 2) or HER2-like (n = 6) subtypes, and these were either intermediate or high risk by ROR score, respectively.

Table 6.

Relationship between Prosigna subtyping and the continuous risk of recurrence score *

Prosigna test result Subtype
Luminal A No. (%)Luminal B No. (%)Basal like No. (%)HER2 enriched No. (%)
No. of patients (%)178 (59.5)113 (37.8)2 (0.7)6 (2.0)
ROR, Median (IQR)37 (28–44)70 (63–78)53 (47–58)76 (72–78)
 Range5–5943–9647–5864–84
Risk groups, No. (%)
 Low risk108 (60.7)000
 Intermediate risk70 (39.3)16 (14.2)2 (100)0
 High risk097 (85.8)06 (100)
Prosigna test result Subtype
Luminal A No. (%)Luminal B No. (%)Basal like No. (%)HER2 enriched No. (%)
No. of patients (%)178 (59.5)113 (37.8)2 (0.7)6 (2.0)
ROR, Median (IQR)37 (28–44)70 (63–78)53 (47–58)76 (72–78)
 Range5–5943–9647–5864–84
Risk groups, No. (%)
 Low risk108 (60.7)000
 Intermediate risk70 (39.3)16 (14.2)2 (100)0
 High risk097 (85.8)06 (100)

*HER2 = human epidermal growth factor receptor 2; IQR = interquartile range; ROR = risk of recurrence.

Table 6.

Relationship between Prosigna subtyping and the continuous risk of recurrence score *

Prosigna test result Subtype
Luminal A No. (%)Luminal B No. (%)Basal like No. (%)HER2 enriched No. (%)
No. of patients (%)178 (59.5)113 (37.8)2 (0.7)6 (2.0)
ROR, Median (IQR)37 (28–44)70 (63–78)53 (47–58)76 (72–78)
 Range5–5943–9647–5864–84
Risk groups, No. (%)
 Low risk108 (60.7)000
 Intermediate risk70 (39.3)16 (14.2)2 (100)0
 High risk097 (85.8)06 (100)
Prosigna test result Subtype
Luminal A No. (%)Luminal B No. (%)Basal like No. (%)HER2 enriched No. (%)
No. of patients (%)178 (59.5)113 (37.8)2 (0.7)6 (2.0)
ROR, Median (IQR)37 (28–44)70 (63–78)53 (47–58)76 (72–78)
 Range5–5943–9647–5864–84
Risk groups, No. (%)
 Low risk108 (60.7)000
 Intermediate risk70 (39.3)16 (14.2)2 (100)0
 High risk097 (85.8)06 (100)

*HER2 = human epidermal growth factor receptor 2; IQR = interquartile range; ROR = risk of recurrence.

Discussion

The evaluation of candidate multiparameter tests within OPTIMA prelim to determine the best assessment of risk stratification for the main OPTIMA study presented an interesting challenge given: 1) evidence that these tests provide broadly similar prognostic information at the population level ( 26 ), 2) the use of markedly different gene panels to estimate the same endpoint, 3) the use of different technologies including immunohistochemistry, polymerase chain reaction (PCR), and quantitative and semi-quantitative array-based technologies.

Previous in silico comparisons of multiple gene signatures have identified statistically significant discordance between different “diagnostic tests” ( 13 , 15–17 ). However, to date, this has been attributed to suboptimal comparisons because in the majority of studies genomic prediction scores have been estimated from published expression profiles. It has been argued that, in any direct comparison of validated diagnostic genomic assays, a high level of concordance could and should be obtained ( 14 ). In the current study, we performed such a direct comparison, each commercial assay was performed as prescribed by the relevant manufacturer (although the AQUA-IHC4 assay used TMAs for convenience). What is striking is that, among five tests with robust independent technical and clinical validation as predictors of residual risk (MammaPrint, Oncotype DX, Prosigna, IHC4, and IHC4-AQUA) and three that measure a recognized risk factor (molecular subtype), there is marked disagreement across all tests. Indeed, for all tests the level of agreement was “moderate” as defined by Prat et al., reaching only level 3 reproducibility (κ0.40-0.59) ( 14 ). This suggests that agreement for risk classification between different molecular tests applied to the same patient sample is on the level of agreement for pathological assessment of tumor grade.

The observed disagreement in risk categorization for 60.6% of tumors raises questions as to how patient management may be impacted by the choice of test used for risk stratification. Interestingly, there does not seem to be better correlation between tests at the extremes of their ranges (the very low- and high-risk tumors in our cohort) than in the mid-range. It was less common, although not infrequent, for tumors placed into the lowest risk group by one test to be assigned into the highest risk group by another.

Each test is independently validated and adopted for prediction of risk of recurrence, so what should we do when they disagree? Paradoxically, the result of this study can be viewed as either predictable or unexpected, depending on perspective. From a purely biological and technical perspective, it is entirely predictable that tests that measure fundamentally different genes using different technologies give dissimilar results even when each individual assay remains technically valid. For example, MammaPrint and Prosigna, despite measuring the broadest range of genes (70 and 50, respectively), have only three genes in common and use different technical approaches ( 27 , 28 ). Even those tests measuring the same genes (IHC4, IHC4-AQUA, and MammaTyper) use different technologies (PCR versus IHC) or different antibodies, detection, and quantification methods.

From a clinical perspective, the disagreement between multiple tests, each assessing residual risk, is highly perplexing. The disagreement extends to an inability to demonstrate strong agreement on molecular subtyping between tests, which again seems counterintuitive. However, it is less surprising that disagreement between molecular subtyping, in this context predominantly between luminal A and luminal B, should exist in the absence of any clinical or molecular agreement as to the true boundary between a “luminal A” and “luminal B” cancer ( 16 ). Again, the Prosigna and BluePrint tests for subtyping have minimal gene overlap, with only seven genes in common.

What about risk prediction? The prediction of disease recurrence based on clinico-pathological and molecular features of a cancer is notoriously challenging within populations and even more so at the individual patient level. Biologically and clinically aggressive cancers, which, if left untreated, are destined to progress, may be “cured” by surgery, radiotherapy, chemotherapy, or endocrine therapy. Tests predicting risk therefore face an important challenge in that they seek to measure the risk of recurrence based on the biology of tumors and must function within a clinical setting where biology may reflect risk that is not realized because of medical intervention. What, then, can we learn from comparisons between validated assays that seek to stratify patients by risk of recurrence, if indeed we can learn anything? We argue that there is value in such comparisons, even in the absence of outcome data. Each test applied in this study is externally validated and adopted or available for adoption in multiple clinical jurisdictions ( 6 , 27–32 ). However, none is, or claims to be, the ultimate discriminator of risk for patients. This study suggests there is more than one way of predicting residual risk.

All studies have limitations. While unable to determine subtle nuances in the performance of different tests within this population, we also recognize that existing data, both from the original studies validating individual tests and from comparisons, at a population level, of test performance in a single population ( 10–12 ), cannot provide a clear discrimination between them. No outcome data from OPTIMA prelim were available at the time of analysis. As the sample size is comparatively small, it is highly unlikely that it will prove possible to compare the ability of the tests studied here to predict patient outcome.

In conclusion, in the widest and most comprehensive patient-level direct diagnostic comparisons to date between multiparametric tests of “residual risk” (after local treatment and endocrine therapy), we present further data that the proportions of patients identified as low, intermediate, or high risk are broadly similar irrespective of which test is employed. However, both with respect to risk stratification and molecular subtyping, marked differences were observed when categorization of individual patients was considered. Such data, when considered with existing data on efficacy comparisons between different tests, support the conclusion that many current risk stratification tools are broadly equivalent and that further improvements in both prediction of relapse risk and therapeutic targeting would be of clinically significant value for patients at high risk of disease relapse ( 14 ).

Funding

This work was supported by the National Institute for Health Research Health Technology Assessment programme (grant number 10/34/01) and will be published in full in the Health Technology Assessment Journal Volume 20, Issue 10. Further information available at: http://www.nets.nihr.ac.uk/projects/hta/103401 . This publication presents independent research commissioned by the National Institute for Health Research. The views and opinions expressed by authors in this publication are those of the authors and do not necessarily reflect those of the National Health Service; National Institute for Health Research; Medical Research Council; Central Commissioning Facility; NIHR Evaluation, Trials and Studies Coordinating Centre; Health Technology Assessment programme; or Department of Health. Research at the Ontario Institute for Cancer Research is funded by the Government of Ontario. Agendia Inc., NanoString Technologies, Stratifyer/BioNTech Diagnostics, and Genoptix Medical Laboratories supported testing by provision of reagents and test results (as appropriate) at no financial cost to the current study. RCS was supported by the National Institute for Health Research University College London Hospitals Biomedical Research Centre.

Notes

Acknowledgments: Trial Management Group: John M. S. Bartlett (Program Director & Hon. Professor, Ontario Institute of Cancer Research, Canada); David A. Cameron (Professor of Oncology & Head of Cancer Services, University of Edinburgh, UK); Amy Campbell (Clinical Trial Manager, Warwick Clinical Trials Unit, University of Warwick, UK); Peter Canney (Consultant Oncologist, retired); Jenny Donovan (Professor of Social Medicine, University of Bristol, UK); Janet A. Dunn (Professor of Clinical Trials, University of Warwick, UK); Helena M. Earl (Reader in Clinical Cancer Medicine, University of Cambridge Department of Oncology and NIHR Cambridge Biomedical Research Centre, UK); Mary Falzon (Consultant Histopathologist, UCL Hospitals, London, UK); Adele Francis (Consultant Breast Surgeon, University Hospital Birmingham, UK); Peter S. Hall (Senior Lecturer and Consultant Medical Oncologist, University of Edinburgh & Visiting Health Economist, AUHE, University of Leeds, UK); Victoria Harmer (Breast Care Nurse, Imperial College NHS Healthcare Trust, London, UK); Helen Higgins (Senior Project Manager, Warwick Clinical Trials Unit, University of Warwick, UK); Luke Hughes-Davies (Consultant Oncologist, Addenbrookes Hospital, Cambridge, UK); Claire Hulme (Director Academic Unit of Health Economics, Leeds Institute of Health Sciences, Leeds, UK); Iain R. Macpherson (Clinical Senior Lecturer in Medical Oncology, University of Glasgow, Beatson West of Scotland Cancer Centre, UK); Andrea Marshall (Principal Research Fellow in Medical Statistics, University of Warwick, UK); Andreas Makris (Consultant Clinical Oncologist, Mount Vernon Cancer Centre, Northwood, UK); Christopher McCabe (Professor of Health Economics, University of Alberta, Canada); Adrienne Morgan (Patient Advocate & Chair of Independent Cancer Patients’ Voice Trustees); Sarah E. Pinder (Professor of Breast Pathology, Kings College London, Guy’s Hospital, UK); Christopher J. Poole (Professor of Medical Oncology, University Hospitals Coventry & Warwickshire NHS Trust, UK); Daniel W. Rea (Senior Lecturer in Medical Oncology, University of Birmingham, UK); Leila Rooshenas (Research Associate, University of Bristol, UK); Nigel Stallard (Professor of Medical Statistics, University of Warwick, UK); Robert C. Stein (Consultant Medial Oncologist & Hon. Senior Lecturer, UCL Hospitals, London, UK).

Participating centers: The following centers and Principal Investigators contributed patients to the trial: Addenbrooke’s Hospital, Cambridge, Dr. Luke Hughes-Davies; Alexandra Hospital, Redditch, Dr. Denise Hrouda; University Hospital Ayr, Ayr, Dr. Graeme Lumsden; Barnet Hospital, London, Dr. Rob Stein; Beatson West of Scotland Cancer Centre, Glasgow, Dr. Iain Macpherson; Bedford Hospital (Primrose Oncology Unit), Bedford, Dr. Sarah Smith; Bristol Haematology and Oncology Centre, Bristol, Dr. Jeremey Braybrooke; City Hospital, Birmingham, Dr. Daniel Rea; Dumfries & Galloway Royal Infirmary, Dumfries, Dr. Tamsin Evans; Forth Valley Royal Hospital, Larbet, Dr. Judith Fraser; Hairmyres Hospital, Lanarkshire, Dr. Grainne Dunn; Inverclyde Royal Hospital, Greenock, Dr. Abdulla Alhasso; Luton & Dunstable University Hospital, Luton, Dr. Mei-Lin Ah-See; Mount Vernon Hospital, Northwood, Dr. Andreas Makris; Musgrove Park Hospital, Taunton, Dr. John Graham; Norfolk and Norwich University Hospital, Norwich, Dr. Adrian Hartnett; Northwick Park Hospital, Harrow, Dr. Andreas Makris; Peterborough City Hospital, Peterborough, Dr. Karen McAdam; Queen Elizabeth Hospital, Birmingham, Dr. Daniel Rea; Queen Elizabeth Hospital, King’s Lynn, Dr. Margaret Daly; Royal Alexandra Hospital, Paisley, Dr. Abdulla Alhasso; Royal Devon & Exeter Hospital, Exeter, Dr. David Hwang; Royal Glamorgan Hospital, Llantrisant, Dr. Jacinta Abraham; Royal United Hospital Bath, Bath, Dr. Mark Beresford; St Bartholomew’s Hospital, London, Dr. Rebecca Roylance; The Christie, Manchester, Dr. Anne Armstrong; The Woodlands Centre, Hinchingbrooke, Dr. Cheryl Palmer; Torbay Hospital, Torbay, Dr. Andrew Goodman; University Hospital Coventry, Coventry, Professor Christopher Poole; University Hospital Crosshouse, Kilmarnock, Dr. Graeme Lumsden; Velindre Cancer Centre, Cardiff, Dr. Annabel Borley; Western General Hospital, Edinburgh, Dr. Angela Bowman; Wishaw General Hospital, Lanarkshire, Dr. Jonathan Hicks; Yeovil District Hospital, Yeovil, Dr. Urmila Barthakur; York District Hospital, York, Dr. Andrew Proctor.

Author contributions: John M. S. Bartlett* (Program Director & Hon. Professor, Ontario Institute of Cancer Research, Canada) was the translational research lead for the trial. He contributed to study design and managed tissue banking, the establishment of commercial relationships for undertaking multiparameter assays, the performance of laboratory assays, and data analysis. He was responsible for drafting all sections of the paper and had final editorial responsibility. Jane Bayani* (Research Scientist Ontario Institute of Cancer Research, Canada) was responsible for RNA extraction, Prosigna, and IHC4 analysis and contributed to manuscript writing. Andrea Marshall (Principal Research Fellow in Medical Statistics, University of Warwick, UK) is the trial statistician. She contributed to the statistical analysis plan, conducted the statistical analysis of the data, and contributed to manuscript writing. Janet A. Dunn (Professor of Clinical Trials, University of Warwick, UK) was the CTU lead and senior statistician for the study. She substantially contributed to the trial design, conduct including day-to-day management and monitoring, and the statistical analysis plan. Amy Campbell (Trial Manager, Warwick Clinical Trials Unit, University of Warwick, UK) was responsible for the day-to-day management of the trial and monitored data collection, sample collection, and analysis. Carrie Cunningham (Edinburgh Cancer Research Centre, University of Edinburgh, UK) was responsible for all aspects of sample collection, management checking pathology quality, TMA construction, and sample shipping to various laboratories. Monika S. Sobol (Edinburgh Cancer Research Centre, University of Edinburgh, UK) was responsible for all aspects of sample collection, management checking pathology quality, TMA construction, and sample shipping to various laboratories. Peter S. Hall (Senior Lecturer and Consultant Medical Oncologist, University of Edinburgh & Visiting Health Economist, AUHE, University of Leeds, UK) contributed to the health economics aspects of the study design and its overall conduct. Christopher J. Poole (Professor of Medical Oncology, University Hospitals Coventry and Warwickshire NHS Trust, UK) contributed to the study design and its overall conduct and advised on the clinical aspects of the trial. David A. Cameron (Professor of Oncology & Head of Cancer Services, University of Edinburgh, UK) contributed to the study design and its overall conduct and advised on the clinical aspects of the trial. Helena M. Earl (Reader in Clinical Cancer Medicine, University of Cambridge Department of Oncology and NIHR Cambridge Biomedical Research Centre, UK) contributed to the study design and its overall conduct and advised on the clinical aspects of the trial. Daniel W. Rea (Senior Lecturer in Medical Oncology, University of Birmingham, UK) contributed to the study design and its overall conduct and advised on the clinical aspects of the trial. Iain R. Macpherson (Clinical Senior Lecturer in Medical Oncology, Beatson West of Scotland Cancer Centre, University of Glasgow, UK) contributed to the overall conduct of the study, advised on the clinical aspects of the trial, and contributed to manuscript writing. Peter Canney (Consultant Oncologist, Beatson West of Scotland Cancer Centre, Glasgow, UK, retired) contributed to the study concept and design and advised on the clinical aspects of the trial. Adele Francis (Consultant Breast Surgeon, University Hospital Birmingham, UK) contributed to the study design and its overall conduct and advised on the surgical aspects of the trial. Christopher McCabe (Professor of Health Economics, University of Alberta, Canada) contributed to the health economics aspects of study design. Sarah E. Pinder (Professor of Breast Pathology, Kings College London, UK) contributed to the trial design and advised on pathology aspects of trial conduct. Luke Hughes-Davies (Consultant Oncologist, Addenbrookes Hospital, Cambridge, UK) is co-chief investigator. He contributed to the concept and design of the study and its day-to-day management and overall conduct. Andreas Makris (Consultant Clinical Oncologist, Mount Vernon Hospital, Northwood, UK) is co-chief investigator. He contributed to the concept and design of the study and its day-to-day management and overall conduct. Robert C. Stein (Consultant Medial Oncologist & Hon. Senior Lecturer, UCL Hospitals, London, UK) is chief investigator and lead of clinical aspects of the trial. He made substantial contributions to the concept and design of the study, its day-to-day management and overall conduct, and data analysis and contributed to manuscript writing.

On behalf of the OPTIMA TMG.

*Authors contributed equally to this work.

Role of study sponsor: The sponsors of this study had no role in the study design; the data collection, analysis, or interpretation; the writing of the report; or the decision to publish. The authors had full access to the data and are responsible for the content of this manuscript.

References

1

McGuire
WL.
Estrogen receptors in human breast cancer
.
J Clin Invest.
1973
;
52
(
1
):
73
77
.

2

McGuire
WL
Chamness
GC
Costlow
ME
Shepherd
RE.
Hormone dependence in breast cancer
.
Metabolism.
1974
;
23
(
1
):
75
100
.

3

Slamon
DJ
Clark
GM
Wong
SG.
Human breast cancer: Correlation of relapse and survival with amplification of the HER-2/neu oncogene
.
Science.
1987
;
235
(
4785
).

4

Perou
CM
Sorlie
T
Eisen
MB
, et al. .
Molecular portraits of human breast tumours
.
Nature
.
2000
;
406
(
6797
):
747
752
.

5

Perou
CM
Jeffrey
SS
Van de Rijn
M
, et al. .
Distinctive gene expression patterns in human mammary epithelial cells and breast cancers
.
Proc Natl Acad Sci U S A.
1999
;
96
(
16
):
9212
9217
.

6

Paik
S
Shak
S
Tang
G
, et al. .
A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer
.
N Engl J Med.
2004
;
351
(
27
):
2817
2826
.

7

Bartlett
J
Canney
P
Campbell
A
, et al. .
Selecting breast cancer patients for chemotherapy: the opening of the UK OPTIMA trial
.
Clin Oncol (R Coll Radiol).
2013
;
25
(
2
):
109
116
.

8

Paik
S
Tang
G
Shak
S
, et al. .
Gene Expression and Benefit of Chemotherapy in Women With Node-Negative, Estrogen Receptor-Positive Breast Cancer
.
J Clin Oncol.
2006
;
24
(
23
):
3726
3734
.

9

Albain
KS
Barlow
WE
Shak
S
, et al. .
Prognostic and predictive value of the 21-gene recurrence score assay in postmenopausal women with node-positive, oestrogen-receptor-positive breast cancer on chemotherapy: a retrospective analysis of a randomised trial
.
Lancet Oncol.
2010
;
11
(
1
):
55
65
.

10

Dowsett
M
Sestak
I
Lopez-Knowles
E
, et al. .
Comparison of PAM50 risk of recurrence score with oncotype DX and IHC4 for predicting risk of distant recurrence after endocrine therapy
.
J Clin Oncol.
2013
;
31
(
22
):
2783
2790
.

11

Sgroi
DC
Sestak
I
Cuzick
J
, et al. .
Prediction of late distant recurrence in patients with oestrogen-receptor-positive breast cancer: a prospective comparison of the breast-cancer index (BCI) assay, 21-gene recurrence score, and IHC4 in the TransATAC study population
.
Lancet Oncol.
2013
;
14
(
11
):
1067
1076
.

12

Cuzick
J
Dowsett
M
Pineda
S
, et al. .
Prognostic Value of a Combined Estrogen Receptor, Progesterone Receptor, Ki-67, and Human Epidermal Growth Factor Receptor 2 Immunohistochemical Score and Comparison With the Genomic Health Recurrence Score in Early Breast Cancer
.
J Clin Oncol.
2011
;
29
(
32
):
4273
4278
.

13

Fan
C
Oh
DS
Wessels
L
, et al. .
Concordance among gene-expression-based predictors for breast cancer
.
N Engl J Med.
2006
;
355
(
6
):
560
569
.

14

Prat
A
Ellis
MJ
Perou
CM.
Practical implications of gene-expression-based assays for breast oncologists
.
Nat Rev Clin Oncol.
2012
;
9
(
1
):
48
57
.

15

Kelly
CM
Bernard
PS
Krishnamurthy
S
, et al. .
Agreement in risk prediction between the 21-gene recurrence score assay (Oncotype DX(R)) and the PAM50 breast cancer intrinsic Classifier in early-stage estrogen receptor-positive breast cancer
.
Oncologist.
2012
;
17
(
4
):
492
498
.

16

Mackay
A
Weigelt
B
Grigoriadis
A
, et al. .
Microarray-based class discovery for molecular classification of breast cancer: analysis of interobserver agreement
.
J Natl Cancer Inst.
2011
;
103
(
8
):
662
673
.

17

Weigelt
B
Mackay
A
A'Hern
R
, et al. .
Breast cancer molecular profiling with single sample predictors: a retrospective analysis
.
Lancet Oncol.
2010
;
11
(
4
):
339
349
.

18

Stein
RC
Dunn
JA
Bartlett
JMS
, et al. .
OPTIMA: a randomised feasibility study of personalised care in the treatment of women with early breast cancer
.
Health Technol Assess
.
2016
;
20
(
10
):
1
202
.

19

Bartlett
JMS
Brookes
CL
Robson
T
, et al. .
Estrogen Receptor and Progesterone Receptor As Predictive Biomarkers of Response to Endocrine Therapy: A Prospectively Powered Pathology Study in the Tamoxifen and Exemestane Adjuvant Multinational Trial
.
J Clin Oncol.
2011
;
29
(
12
):
1531
1538
.

20

Ravdin
PM
Siminoff
LA
Davis
GJ
, et al. .
Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer
.
J Clin Oncol.
2001
;
19
(
4
):
980
991
.

21

Wishart
GC
Bajdik
CD
Azzato
EM
, et al. .
A population-based validation of the prognostic model PREDICT for early breast cancer
.
Eur J Surg Oncol.
2011
;
37
(
5
):
411
417
.

22

Wishart
GC
Bajdik
CD
Dicks
E
, et al. .
PREDICT Plus: development and validation of a prognostic model for early breast cancer that includes HER2
.
Br J Cancer.
2012
;
107
(
5
):
800
807
.

23

Wishart
GC
Azzato
EM
Greenberg
DC
, et al. .
PREDICT: a new UK prognostic model that predicts survival following surgery for invasive breast cancer
.
Breast Cancer Res.
2010
;
12
(
1
):
R1
.

24

Bangdiwala
SJ
Shankar
V.
The Agreement Chart
.
BMC Med Res Methodol.
2013
;
13
(
97
).

25

R Core Team
. A language and environment for statistical computing.
2014
.

26

Galea
M
Blamey
R
Elston
C
Ellis
I.
The Nottingham prognostic index in primary breast cancer
.
Breast Cancer Res Treat.
1992
;
22
(
3
):
207
219
.

27

Parker
JS
Mullins
M
Cheang
MCU
, et al. .
Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes
.
J Clin Oncol.
2009
;
27
(
8
):
1160
1167
.

28

Chang
JC
Makris
A
Gutierrez
MC
, et al. .
Gene expression patterns in formalin-fixed, paraffin-embedded core biopsies predict docetaxel chemosensitivity in breast cancer patients
.
Breast Cancer Res Treat.
2008
;
108
(
2
):
233
240
.

29

Cuzick
J
Dowsett
M
Wale
C
, et al. .
Prognostic Value of a Combined ER, PgR, Ki67, HER2 Immunohistochemical (IHC4) Score and Comparison with the GHI Recurrence Score - Results from TransATAC
.
Cancer Res.
2009
;
69
(
24
):
503S
.

30

Dowsett
M
Cuzick
J
Wale
C
, et al. .
Prediction of risk of distant recurrence using the 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: a TransATAC study
.
J Clin Oncol.
2010
;
28
(
11
):
1829
1834
.

31

Nielsen
TO
Parker
JS
Leung
S
, et al. .
A Comparison of PAM50 Intrinsic Subtyping with Immunohistochemistry and Clinical Prognostic Factors in Tamoxifen-Treated Estrogen Receptor-Positive Breast Cancer
.
Clin Cancer Res.
2010
;
16
(
21
):
5222
5232
.

32

Chia
SK
Bramwell
VH
Tu
D
, et al. .
A 50-Gene Intrinsic Subtype Classifier for Prognosis and Prediction of Benefit from Adjuvant Tamoxifen
.
Clin Cancer Res.
2012
;
18
(
16
):
4465
4472
.

Author notes

*Authors contributed equally to this work.

Supplementary data