Trade-Offs Between Harms and Benefits of Different Breast Cancer Screening Intervals Among Low-Risk Women

Abstract Background A paucity of research addresses breast cancer screening strategies for women at lower-than-average breast cancer risk. The aim of this study was to examine screening harms and benefits among women aged 50-74 years at lower-than-average breast cancer risk by breast density. Methods Three well-established, validated Cancer Intervention and Surveillance Network models were used to estimate the lifetime benefits and harms of different screening scenarios, varying by screening interval (biennial, triennial). Breast cancer deaths averted, life-years and quality-adjusted life-years gained, false-positives, benign biopsies, and overdiagnosis were assessed by relative risk (RR) level (0.6, 0.7, 0.85, 1 [average risk]) and breast density category, for US women born in 1970. Results Screening benefits decreased proportionally with decreasing risk and with lower breast density. False-positives, unnecessary biopsies, and the percentage overdiagnosis also varied substantially by breast density category; false-positives and unnecessary biopsies were highest in the heterogeneously dense category. For women with fatty or scattered fibroglandular breast density and a relative risk of no more than 0.85, the additional deaths averted and life-years gained were small with biennial vs triennial screening. For these groups, undergoing 4 additional screens (screening biennially [13 screens] vs triennially [9 screens]) averted no more than 1 additional breast cancer death and gained no more than 16 life-years and no more than 10 quality-adjusted life-years per 1000 women but resulted in up to 232 more false-positives per 1000 women. Conclusion Triennial screening from age 50 to 74 years may be a reasonable screening strategy for women with lower-than-average breast cancer risk and fatty or scattered fibroglandular breast density.

be considered. Few studies have assessed the harms and benefits for women at decreased risk (lower than average, ie, a relative risk [RR] < 1). Women with lower-than-average risk of breast cancer are expected to have a less favorable harm to benefit ratio from untargeted screening, suggesting that less intense screening strategies than biennial screening might be appropriate for this group.
The proportion of women at low risk in the population is substantial; for example, 34% of US women aged 40-74 years have a 5-year risk of developing breast cancer below 1.00% based on the Breast Cancer Surveillance Consortium (BCSC) risk model (8,9). Established factors that are associated with substantially decreased risk for breast cancer include fatty breasts, young age at first birth (younger than 20 years), and young age at menopause (younger than 40 years) with relative risks of 0.6-0.7 (10)(11)(12); these factors apply to 8%, 12%, and 13% of US women, respectively (10,11). Factors associated with a more modest decrease in risk, such as 3 or 4 full pregnancies (RR ¼ 0.84) and age at menopause between 45 and 49 years (RR ¼ 0.86) (12,13), are even more common, with 39% and 24% of US women aged 50-79 years reporting those factors, respectively (11).
Breast density has also received attention as an important factor that influences risk of developing breast cancer, as well as affecting the balance between benefits and harms of screening, because low breast density not only leads to a reduced risk for developing disease but also increases the sensitivity of mammography (9,10,14). The aim of this study was to assess the benefits and harms of screening by breast cancer risk, breast density, and screening interval among women aged 50-74 years with lower-than-average risk levels using collaborative modeling. Study results are intended to inform discussions about riskbased screening guidelines and practice.

Model Overview
We used 3 well-established microsimulation models developed independently as part of the National Cancer Institute-funded Cancer Intervention and Surveillance Modeling Network consortium: model E (15), model GE (16), and model W (see Table 1) (17). These models have been validated previously (19), have been shown to replicate US population trends in breast cancer incidence and mortality (20,21), and have been used extensively to estimate the impact of different screening scenarios (22)(23)(24)(25). The models and common inputs have been described in detail previously (15)(16)(17)(18)26) (Supplementary Methods, available online).

Model Inputs
A cohort of US women born in 1970 was simulated using previously described inputs (6,18), such as breast cancer incidence (27), adjuvant therapy (28), and data from the BCSC (http:// www.bcsc-research.org) for sensitivity, specificity, and benign biopsy rate of digital mammography by age, breast density, and screening round (first vs subsequent). We modeled 16 subgroups of women, defined on the basis of combinations of risk levels (RR ¼ 0.6, 0.7, 0.85, and 1; see Table 2 for examples of risk factors associated with decreased risk) and 4 breast density categories (Breast Imaging Reporting and Data System categories almost entirely fatty [a], scattered fibroglandular densities [b], heterogeneously dense [c], or extremely dense [d]). The risk level (relative risk) influenced the onset of breast cancer and was assumed to be constant over age. Breast density category was assigned at age 50 years and could decrease by 1 level or remain the same at age 65 years, based on the observed age- Overdiagnosis was defined as screen-detected cancer that would not have been diagnosed in a woman's lifetime in the absence of screening. c Combined output from all 3 models was analyzed using SAS (Cary, NC) version 9.4. specific prevalence in the BCSC. Density affected mammography performance (32), whereas mammography performance was assumed to be unaffected by risk. Risk associated with density (Table 3) was combined multiplicatively with the risk of the different risk levels (relative risks). In this way, density and other risk factors were assumed to be independent determinants of breast cancer risk, consistent with observed data. Thus, a 50-year-old woman with heterogeneously dense breasts and a relative risk of 0.7 had a relative risk of 0.875 (0.7*1.25). In each simulation, women were followed until death or a modelspecific upper age of 100 or 120 years. To evaluate the efficacy of different screening scenarios, we assumed 100% uptake of screening and treatment. We modeled biennial screening between ages 50 and 74 years (13 screens) and triennial screening between ages 50 and 74 years (9 screens).

Screening Outcomes
For all screening scenarios, we estimated outcomes per 1000 women alive at age 50 years, including the number of ductal carcinoma in situ (DCIS) and invasive breast cancers detected. Benefits included breast cancer deaths averted, life-years gained, and quality-adjusted life-years (QALYs) gained. To calculate QALYs, we applied health-related quality-of-life utilities by age (33), and we applied quality-of-life decrements by attaching weights to specific health states for women undergoing a mammogram and diagnostics (34) and life-years with breast cancer by stage of disease at diagnosis (35). Harms included overdiagnosis, false-positives, and benign biopsies. Overdiagnosis was defined as screen-detected cancer that would not have been diagnosed in a woman's lifetime in the absence of screening. In addition, harm to benefit ratios (falsepositives per life-year gained and overdiagnosis per breast cancer death averted) were calculated.

Analysis
We presented all outcomes by subgroups of risk and density for each strategy using the median (minimum, maximum) of the 3 models. Each outcome was compared with a reference value, defined as the model-specific results for biennial screening from age 50 to 74 years, all densities combined (thus, with representative population frequencies of breast density categories), and average risk (RR ¼ 1). We evaluated the differences between screening scenarios by assessing the incremental benefits and incremental harms by dividing the incremental harm by the incremental benefits.
We performed sensitivity analyses on varying utility values for undergoing screening and additional workup and on varying specificity by risk (9) (Supplementary Methods, available online).

Screening Outcomes
Among 1000 women aged 50 years followed over their lifetimes, the number of invasive breast cancers detected when screening biennially between ages 50 and 74 years varied substantially by subgroup; the highest number of invasive breast cancers was a median of 150 (range across models ¼ 150-177) detected in the average-risk (RR ¼ 1) extremely dense group and decreased with decreasing risk and density in all 3 models to 39 (range ¼ 33-52) in the lowest risk-density category (ie, RR ¼ 0.6 and almost  entirely fatty breasts) ( Table 4). The trends in lifetime benefits and harms are shown for 1 exemplar model ( Figure 1).

Benefits
The absolute numbers of lifetime benefits decreased with decreasing risk and with decreasing density in all 3 models. For women with lower-than-average risk and fatty breasts, screening led to fewer benefits (breast cancer deaths averted and lifeyears gained) than for women at average risk and/or with denser breasts (  Table 4). The finding that benefits decreased with decreasing risk (approximately linearly) was consistent across models, screening scenarios, density categories, and outcomes (breast cancer deaths averted, life-years gained, QALYs gained). Absolute benefits also increased with increasing density consistently across models, screening scenarios, and risk groups, although the increase was not linear and showed a leveling off for the highest density category (Table 4;  Supplementary Table 1, available online). Biennial screening scenarios resulted in more benefits and triennial screening scenarios in all models and for all risk and density subgroups ( Figure 1).

Harms
The number of false-positives were relatively stable over risk given our model assumptions (Table 4; Supplementary Table 2, available online), whereas the number of overdiagnoses decreased with decreasing risk (Figure 2). The number of false-positives was highest in breast density category C (heterogeneously dense) (Figure 1). The same trend was found for the number of benign biopsies (Figure 1). The relationship between overdiagnosis and density varied across models: in model E, overdiagnosis increased with increasing density; in model W, overdiagnosis was highest in the 2 middle categories; and in model GE, overdiagnosis slightly decreased with increasing density (Figure 2). When overdiagnosis was expressed as a percentage of all breast cancers detected, the percentage decreased consistently in all models with increasing density from 22.7% (range ¼ 12.1%-31.9%) to 11.6% (range ¼10.6%-12.5%) for a relative risk of 1 and did not vary by risk.

Harm to Benefit Ratios
The ratio between harms and benefits showed diversity across models and measures (Supplementary Table 3

Screening Scenarios (Biennial vs Triennial)
Biennial vs triennial screening has fewer benefits for the lowrisk and low-density subgroups than for average-risk women ( Table 5). The additional number of breast cancer deaths averted per 1000 women is 0.4 (range ¼ 0.3-0.6) in women at lowest risk (RR ¼ 0.6) with fatty breasts and 0.6 (range ¼ 0.5-0.7) in women at lowest risk (RR ¼ 0.6) with scattered fibroglandular densities with biennial vs triennial screening. For women with fatty or scattered fibroglandular breast density and a relative risk of 0.6, 0.7, or 0.85, screening biennially (13 screens) vs screening triennially (9 screens) averted less than 1 additional breast cancer death and gained at most 16 life-years and 10 QALYs. For average-risk women with extremely dense breasts, there were 1.5 (range ¼ 1.2-1.5) additional deaths averted, 28 life-years gained, and 19 QALYs gained with biennial vs triennial screening ( Table 5).
The number of additional false-positives was highest for the heterogeneously dense category, lowest for the almost entirely fatty category, and did not vary much by risk. For women with fatty or scattered fibroglandular breast density and a relative risk of no more than 0.85, there were up to 232 additional false-positives per 1000 women (Table 5). There were more additional false-positives per additional life-year gained among the lowrisk groups, and this ratio decreased with increasing risk in all models ( Table 5). The number of additional overdiagnoses per breast cancer death averted decreased in 2 of the 3 models by risk and density ( Table 5). The number of additional screens per additional life-year gained when going from triennial to biennial screening increased with decreasing risk and density consistently across models. In average risk women (RR ¼ 1) with extremely dense breasts, models predicted that 120 (range ¼ 120-145) additional screens were needed to gain 1 life-year when going from triennial to biennial screening, whereas in women at lowest risk (RR ¼ 0.6) with fatty breasts, models predicted a substantially higher number of additional screens needed to gain 1 life-year: 409 (range ¼ 373-644) ( Table 5).

Sensitivity Analysis
Varying utility values for undergoing screening and additional workup or varying specificity by risk did not majorly change the ranking and differences between subgroups (Supplementary  Tables 4 and 5, available online).

Discussion
This is the first collaborative modeling study of breast cancer screening strategies for women at lower-than-average risk, while considering breast density in this assessment. The results indicate that triennial screening from age 50 to 74 years should be considered for women at lower-than-average risk with low density, because this strategy reduces harms while maintaining a large part of the benefits. This conclusion was robust across models and assumptions about disutility associated with screening and variations in specificity by risk. Our findings are largely in line with previous studies. A previous modeling study, including the same 3 models, focusing on women at increased risk, found that average-risk women with low breast density undergoing triennial screening will maintain a similar or better balance of benefits and harms than averagerisk women receiving biennial screening (6). Another modeling study using combined risk-based strategies also found that triennial screening from age 50-74 years was optimal for low-risk and medium-low-risk Spanish women (7) and even investigated less intense strategies (quinquennial screening). Moreover, triennial screening is the currently employed screening frequency in the United Kingdom and has been predicted to lead to a substantial mortality reduction (36). Also, the Canadian Task Force recommends screening with mammography every 2-3 years for women aged 50-69 years (37).
Our results show that for a subgroup of women with a combination of fatty or scattered fibroglandular breast density and low-risk (RR ¼ 0.6, 0.7, 0.85) incremental benefits (deaths averted, life-years gained, and QALYs gained) are small for biennial screening from age 50 to 74 years compared with triennial screening. This is reflected in the higher ratio between additional false-positives and additional life-years gained in the low-risk and low-density subgroups when going from triennial to biennial screening than in the average-risk population, indicating that there are (relatively) more harms relative to benefits in these subgroups than in the average-risk population.
The models consistently found that the benefits of screening decrease with decreasing risk, whereas the number of falsepositives and unnecessary biopsies are mostly stable over categories of low risk. The latter was due to our assumption that mammography performance was unaffected by risk. The benefits also decreased with decreasing density, although the decrease in benefits was not so steep when comparing the highest density category to the next category, indicating that elevated risk among women with high density is a more important determinant of absolute screening benefits than high breast density. With regard to harms, false-positives and unnecessary biopsies were highest in the heterogeneously dense category, whereas the trends in overdiagnosis across density categories varied across models.
These results are useful for informing guidelines and for clinical practice. Because the conditions that result in lowerthan-average risk are common, primary care providers could use these results in shared decision-making discussions with women. Most risk factors that lead to a decreased risk are not easily modifiable, but they are relatively straightforward to ascertain. If a subgroup of women can be identified to be at low risk, these women can relatively safely decrease their screening intensity from biennial to triennial.
We acknowledge that breast density is not known in women who have never been screened and is therefore difficult to use to tailor the interval of screening among low-risk women. However, it is possible to tailor the screening interval after a first mammogram based on density, especially because mandated standard reporting of breast density to women after a mammogram has become increasingly more common in the United States. Importantly, the measurement of breast density has become more reliable with automated density measures and has similar accuracy in predicting breast cancers (38)(39)(40).
Strengths of this study include consideration of breast density; evaluation of a comprehensive set of outcomes for benefits and harms; and the use of 3 well-established, validated models (19). One of the strengths of collaborative modeling is that the combined results from the different independent modeling groups constitute a sensitivity analysis on model structure. Each model was developed using common data from multiple sources and an elaborate calibration process varying multiple parameters to match population-level breast cancer incidence and mortality data (from Surveillance, Epidemiology, and End Results [SEER]). If models were to include alternative values for standard parameters, they would no longer be calibrated to SEER data, and the resulting predictions could not be viewed as reliable. A strength of our analysis is that each model incorporates different structural assumptions about unobservable natural breast cancer history, including varying assumptions regarding the percent of cancers (invasive and/or DCIS) that do not progress, and sojourn times, which inherently provide a sensitivity analysis on screening benefit. Taken collectively, the cross-model results provide stronger evidence than would any single model varying each parameter individually. In addition, most trends and the ranking of scenarios were very similar across models, except for the overdiagnosis results. We found especially that the trends in overdiagnosis across density categories varied across the models; in model E, the number of overdiagnosed women increases with increasing density, reflecting the higher risk associated with density, whereas in model GE, the number of overdiagnosed women decreased, reflecting the lower sensitivity associated with density, and in model W, overdiagnosis was highest in the 2 middle categories as a result of the 2 opposing causes of higher risk and lower sensitivity. The variation across models reflects uncertainty around overdiagnosis in general and uncertainty around overdiagnosis by density in particular.
Our study also had some limitations. Most importantly, we assumed that the relative risk only influenced the onset of breast cancer and was constant over age. Thus, our models assumed that the age distribution of cancers was similar to the average population reported in SEER and was just proportionately lower. We also assumed that the screening performance and the distribution of tumors in terms of estrogen receptors and HER2 are the same for lower-than-average risk women as that for average-risk women. It would be useful to reassess our results when there are additional data on disease biology and screening performance by risk level. Second, we modeled digital mammography screening. Several studies have suggested that the introduction of tomosynthesis in the United States has led to a reduction in recall rates (41,42), so that the number of falsepositives might be reduced if tomosynthesis is widely used. However, the reductions in recall rate are relatively small in the United States (approximately 1%), and the effect of tomosynthesis on other harms, such as overdiagnosis, is still uncertain. In addition, our sensitivity analysis showed that even when quality-of-life effects due to false-positives are not taken into account, the ranking and differences between subgroups were largely unchanged. In addition, our analysis focuses on screening scenarios starting at age 50 years, and results will be different for older starting ages (eg, age 60 years). The absolute risk (for a woman with relative risk of 0.6) is higher at age 60 years than at age 50 years, and therefore more benefits (breast cancer deaths averted) are expected. However, for 60-year-old women, there are fewer life-years to be gained, and overdiagnosis increases by age. Future work might focus on the balance of benefits and harms for starting screening in (low-risk) older women. Finally, the models incorporate different structural assumptions about unobservable natural history, including the following 4 factors. First, the percent of invasive breast cancers that do not progress: model W includes a fraction of tumors with limited malignant potential, whereas models E and GE do not include a subset of invasive cancers that do not regress. Second, the models include a range of nonprogressive DCIS, resulting in a wide range of predicted overdiagnosis of DCIS from 34% to 62% (43). Third, the models assume that the benefit of screening arises from either detection at a smaller tumor size or at an earlier stage, and at a younger age. There is a range between these 3 models in predicted mortality reductions of 25%-32% for biennial screening in ages 50-74 years (22). Finally, for sojourn times, model GE includes an age-dependent sojourn time ranging from 2 to 4 years, whereas models E and W simulate continuous tumor growth with certain distributions, resulting in a wide range of distribution of sojourn times, including a subset of tumors with very short sojourn times as well as very long sojourn times. Estimates of mean sojourn times may be biased if they are based on a model that does not allow for nonprogressive (overdiagnosed) cancer (44).
Despite the substantial differences between models on these key assumptions, models come to the same conclusion regarding the incremental benefits and harms of biennial vs triennial screening in low-risk women.
Overall, our collaborative modeling study showed that triennial screening from ages 50 to 74 years can be considered for women who have fatty or scattered fibroglandular breast density and average or low risk of developing breast cancer and for women with very low risk at any density level. By undergoing more intense screening, these women are subjected to more harms, with only small added benefits. The results contribute to the growing body of evidence that tailored screening has many advantages over age-based guidelines for average populations (7,45). It will be important to translate our findings, and other results, into clinical practice and test the most effective methods for communication of breast cancer risk and breast density to enhance shared decision making about breast cancer screening.

Funding
This work was supported by the National Cancer Institute at the National Institutes of Health (P01 CA154292, U01 CA199218, U01 CA152958, P30 CA014520, and P30 CA023108). Data collection for model inputs from the Breast Cancer Surveillance Consortium (BCSC) was supported by the National Cancer Institute grant P01 CA154292 and grant U54 CA163303. The collection of BCSC cancer and vital status data used in this study was supported in part by several state public health departments and cancer registries throughout the United States. For a full description of these sources, please see https://www.bcsc-research.org/about/ work-acknowledgement.

Notes
Role of the funders: The funders had no role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; or the decision to submit the manuscript for publication.
Disclosures: Dr Kerlikowske reports unpaid consulting with Grail on the STRIVE study. The other authors have no disclosures.