-
PDF
- Split View
-
Views
-
Cite
Cite
Jean M Seely, Peter R Eby, Martin J Yaffe, The Fundamental Flaws of the CNBSS Trials: A Scientific Review, Journal of Breast Imaging, Volume 4, Issue 2, March/April 2022, Pages 108–119, https://doi.org/10.1093/jbi/wbab099
- Share Icon Share
Abstract
Although the two Canadian National Breast Screening Study (CNBSS) trials were performed 40 years ago, their negative findings continue to heavily influence screening policies around the world. These policies, based on underestimates of the mortality reduction attributable to mammography particularly for women in the 40–49-year age range, contribute to increased mortality and morbidity from breast cancer. This review summarizes principles of a randomized controlled trial (RCT) and evaluates the compliance of the CNBSS1 and CNBSS2 RCTs in the context of these principles. We describe the fundamental flaws of the CNBSS trials, which failed to demonstrate mortality benefit of screening mammography and contribute to their being the only two outlier studies of eight screening mammography RCTs. The most significant flaws of the trials are (1) inadequate power to detect significant differences in breast cancer mortality; (2) very poor quality mammography with low sensitivity and cancer detection rates; (3) inclusion of women with symptoms of breast cancer; and (4) study design that allowed for violation of the randomization of the allocation process. Finally, we demonstrate that the conditions of the screening intervention in the CNBSS do not reflect the environment of modern population-based screening mammography programs.
Among eight international randomized controlled trials of screening mammography, the fundamental flaws in the Canadian National Breast Screening Study (CNBSS) trials are responsible for them being the only ones not to find a reduction in breast cancer mortality with screening mammography.
The conditions of the screening intervention in the CNBSS do not reflect the environment of current population-based screening mammography programs.
Given the numerous and pervasive problems with the CNBSS, they are not reliable trials of the efficacy of screening mammography and should not be used to make decisions regarding access, onset, and frequency of breast cancer screening programs.
Introduction
Randomized controlled trials (RCTs) remain the gold standard for assessing the efficacy of medical interventions. Most recently, numerous RCTs confirmed the efficacy of vaccines to reduce the morbidity and mortality of COVID-19 (1–3). In the 1960s and 1980s, RCTs around the globe tested the efficacy of screening mammography (4–9). Nearly all the trials confirmed significant disease-specific reduction of mortality from breast cancer. The two Canadian National Breast Screening Studies (CNBSS), however, did not (8,9). Some champion the CNBSS as the only properly designed and executed RCTs and have cited their results when advocating against the use of screening and for the discontinuation of organized programs (10,11). Others consider the CNBSS as outliers with major flaws (12–16). This review article will summarize multiple weaknesses in the execution of the CNBSS trials, which are likely to have impacted their results.
Historical Context
In 1963, the Health Insurance Plan (HIP) of Greater New York initiated the Breast Screening Trial, an RCT of 62 000 women aged 40–64 (17). The HIP Trial was motivated by what its lead investigator described as a rising incidence of breast cancer, a stable breast cancer mortality rate for women 25 years and older of 40/100 000 in the face of declines of other-cause mortality (18,19), and the observation that mammography could detect some asymptomatic cancers. Women in the intervention group received four annual examinations consisting of two-view film (craniocaudal [CC] and lateral) mammography and clinical breast examination (CBE). In 1971, the HIP Trial reported an overall reduction in breast cancer deaths of 40% for those women randomized to receive the screening intervention. Of the screen-detected cancers, 33% were found by mammography alone and 70% were axillary node negative compared to 45% in the control group. It is notable, however, that there was a great deal of resistance to mammography, with many fears about the risks of radiation, and one-third of women invited in the HIP study declined to be screened.
Although the HIP trial demonstrated an impressive mortality benefit, the findings left several questions unanswered. The HIP study was not designed to provide information on age subsets of the cohort, and there were suggestions that the main benefit was for women over age 50. At the time, many referring surgeons and physicians did not believe that screening mammography was beneficial (20). Furthermore, women in the screened group received both mammography and clinical examination so that the specific contribution of each could not be differentiated. Based on the positive results of the HIP study, the Breast Cancer Detection Demonstration Project (BCDDP) was conducted by the American Cancer Society in conjunction with the National Cancer Institute (21). It was a large observational trial called the “Laboratory of Life.” By 1975, it had enrolled 280 000 women, and for the 4240 women with invasive cancers at 11-year follow-up, 5- and 8-year survival of 87% and 81% in the screened women was much higher than the 74% and 65%, respectively, demonstrated in the United States Surveillance, Epidemiology, and End Results (SEER) Program database (21). These survival benefits were shown to be the same in women of all ages, including those under 50. However, as it was not an RCT, the BCDDP was unable to provide definitive answers regarding the benefits of screening.
Although the HIP trial included women aged 40–64, it did not have adequate power for subgroup analysis of the efficacy of screening women in their 40s, and no other RCT at that time had addressed this question. The working group to review the National Cancer Institute–American Cancer Society BCDDP recommended that such a trial be conducted (22). Two trials, later collectively known as the Canadian National Breast Screening Study (CNBSS), were designed in Canada in 1979 with the intention of resolving questions remaining after the HIP study (8).
Design of CNBSS 1 and 2
In CNBSS1 (8), the research question was, “Does screening in the 40–49 age group contribute to reduced mortality from breast cancer?” Fifty thousand women 40–49 years of age were to be randomized to receive either five annual rounds of mammography plus CBE by a trained nurse (the MP trial arm) or an initial CBE followed by “usual care” (the UC arm) over the subsequent 4 years. In CNBSS2 (9), the research question was “Does the addition of mammography screening increase the mortality reduction beyond that provided by CBE alone?” Forty thousand women aged 50–59 years were to be randomized to five annual examinations, which were either CBE alone (PO arm) or CBE plus mammography (MP arm). Death due or probably due to breast cancer was the main end point for analyses.
The two trials were reported separately in 1992 and 1993 (8,9) and then again in 2000 and 2002 (23,24). In spite of their different research questions, different methodologies, and different age groups, the populations and results of both CNBSS were, inappropriately, combined for the 25-year follow-up publication in 2014 (25). This decision was just one of many that deviated from the standard elements and criteria of high-quality RCTs.
Fundamental Features of High-quality RCTs
In an extension of the outline of fundamental features of high-quality RCTs proposed by Houle et al (26), eight questions to gauge the quality of RCTs will be posed and addressed:
Were the definitions and mechanisms for inviting and recruiting the study population: ages, number needed for statistical power properly and clearly developed?
Were the inclusion and exclusion criteria appropriately defined? Were the staff informed properly?
Did staff properly apply the inclusion and exclusion study criteria to select the study sample? Did patients have symptoms or exam findings that would disqualify them from screening as we know it?
Were the staff blinded to randomization? Were the randomized groups equal at the outset?
What was the control plan of care? Was it followed and consistent?
Did the intervention meet the standard of care and state-of-the-art? Was it consistently applied?
To these we add: Did the trial appropriately address the research question(s): does screening for asymptomatic women 40–49 years old (CNBSS1) and the addition of mammography to CBE for asymptomatic women in their 50s (CNBSS2) contribute to reduced mortality from breast cancer?
Although it will have no bearing on the evaluation of a historical study, if the results of that study are to be applied to a current task—for example, the development of health care policy—it is reasonable to ask an additional question:
Do the intervention and control conditions of that study reasonably reflect the environment under which the new health policy is being considered? Conversely, have enough factors changed over time that the relevance of the historical study has become diminished?
1. Were the Definitions and Mechanisms for Inviting and Recruiting the Study Population: Ages, Number Needed for Statistical Power Properly and Clearly Developed?
The CNBSS studies had low power, high contamination rates, and high noncompliance rates. The trials were designed to recruit 50 000 women into CNBSS1 and 40 000 women in CNBSS2. Under ideal circumstances this would provide, for each trial, 80% power to detect a 40% reduction in breast cancer mortality if it occurred. For several reasons, both the expected effect size and the power were overly optimistic (25,26). For example, although the HIP trial found a 40% reduction in breast cancer deaths, that trial compared the combination of mammography and CBE versus no screening for women 40–64 years of age. The incidence of breast cancer in women from HIP would have been overestimated for the CNBSS1, which was designed for women ages 40–49 years. It is also reasonable to expect the effect size to be lower in CNBSS2, where the control group was not unscreened but received CBE. There were few deaths from breast cancer reported in the first 7 years of follow-up: only 66 in CNBSS1, 38 in the screening arm, and 28 in the UC arm (8). In CNBSS2, there were 77 deaths: 38 in the screening arm and 39 in the PO arm (9). Any effect or lack of effect would, as Freedman et al noted, “only be demonstrated with poor precision” (27).
Like most RCTs, CNBSS was analyzed on an “intent to treat” basis (ie, outcomes were ascribed to the arm to which a woman was allocated whether they had received the screening intervention or not). Lack of compliance (women assigned to screening not actually having the examination) or contamination (women assigned to the control group being exposed to screening outside the trial) would reduce the size of any measured effect of the intervention. Anticipating such effects, it is customary in designing RCTs to compensate by increasing the required number of participants to be recruited to maintain statistical power. In establishing sample sizes for the two trials, the investigators did not allow for these effects (28,29).
In the combined CNBSS1 and 2, lack of compliance among 23 261 women randomly allocated to undergo mammography and physical examination annually occurred in 120 (0.5%) of these women who refused to undergo mammography, and another 40 did not undergo the first mammographic screening due to procedural errors (30). In CNBSS1, overall noncompliance rates for screening increased from 10.6% to 14.4% from screens 2–5. In addition, it was seen that 1.7% to 2.9% of the women in the screened group accepted the CBE exam but refused mammography (8). Noncompliance rates were similar in CNBSS2 (9).
The contamination rate was documented as 26% in CNBSS1 and 17% in CNBSS2 (31). An independent survey at one of the sites (Winnipeg) based on billing records estimated contamination rates of 17%–20% (32). The relatively high contamination rates further diminish the power of the studies. Given other problems, discussed later, such as poor image quality, which are likely to have reduced the sensitivity of the mammography, the size of the effect in CNBSS and consequently the power for observing a mortality reduction would be further reduced. To preserve statistical power in light of high contamination, poor image quality, and less than full compliance, a greater number of study participants was required.
2. Were the Inclusion and Exclusion Criteria Appropriately Defined? Were the Staff Informed Properly?
The inclusion criteria were not appropriately defined because they allowed women with symptoms of breast cancer to be included. The inclusion criteria listed for the CNBSS trials were women 40–59 years of age who did not have a previous history of breast cancer, had no mammograms in the past 12 months, and were not pregnant (33). The protocol did not explicitly exclude women with breast cancer symptoms (ie, changes observed by the woman herself or her primary health care provider that were suggestive of breast cancer). As a study of breast cancer screening, which implies the preclinical detection of cancer, women with symptoms of breast cancer should have been excluded. Staff should have been trained to apply these criteria rigorously. Data obtained from the CNBSS staff confirmed that many symptomatic patients were included in the screening arms of the study (34,35).
3. Did Staff Properly Apply the Inclusion and Exclusion Study Criteria to Select the Study Sample? Did Patients Have Symptoms or Exam Findings That Would Disqualify Them From Screening as We Know It?
Inclusion and exclusion criteria were not uniformly applied. The CNBSS trials differed from the other trials of screening mammography by recruiting volunteers instead of inviting age-eligible women to participate based on population lists. Many women who volunteered for CNBSS were already symptomatic with palpable lumps (ie, they were candidates for diagnostic mammography, not screening). Some of these women would have certainly had breast cancer, detectible either at entry or in subsequent years, and subsequent deaths in both arms of the study would dilute the ability to detect any beneficial effect of mammography screening. The high fraction of clinically palpable cancers found at the initial screening of both studies—58.1% (68/117) in CNBSS1 and 49.0% (77/157) in CNBSS2—might be partially explained by the fact that this was a group of women who had never previously been screened; however, it is also consistent with women with self-identified symptoms not being excluded from the studies. High numbers of 67%–74% clinically palpable cancers were similarly found in the same period with the two other RCTs that included CBE (5,36).
In addition, in at least one of the screening sites, participants were recruited from a breast surgical clinic in that hospital. Presumably, these individuals already had well-established symptoms. It has been documented that at other sites, staff observed the recruitment of symptomatic women to the CNBSS (34,35). The inclusion of many women with breast cancer symptoms converts the CNBSS from a trial of pure screening to a trial of screening and diagnostic mammography.
The authors noted that there was no attempt to exclude women with clinical symptoms of breast cancer (37). Part of this was due to a challenge in recruiting an adequate number of participants. The addition of more women who had palpable lumps was felt to increase the power of the study (38). It has been argued by the authors of the CNBSS1 that if a bias due to adding women with palpable lumps to the screening arm would have occurred, it would have been reflected in a different distribution of risk factors in the two arms, such as a positive family history. However, this is unlikely given that most (80%) of breast cancers occur in women without family history.
The CNBSS protocol excluded women with breast cancer from enrolling. However, an analysis correlating independent previous public health insurance claim data with assignment to trial arms at one of the screening sites (Winnipeg) (39) found 9 participants with prior health claims for breast cancer who should have been excluded. Eight of those 9 were registered in the mammography arms; 4 of 4 were assigned to the mammography arm in CNBSS1.
4. Were the Staff Blinded to Randomization? Were the Randomized Groups Equal at the Outset?
There is direct and indirect evidence that nonrandom assignment occurred in the CNBSS studies. The CNBSS were not designed as blinded trials. The allocation to trial arms was pre-established for participants. The randomization was stratified by 5-year age groups and by center (33). Lists were provided to the centers by the central epidemiology unit and were kept by the study coordinators. However, in practice, randomization did not adhere to RCT standards. Group assignments were made by local center coordinators using lists with preprinted identification numbers and group designations. In all but one of the 15 screening sites, women were registered into their allocation after having received a CBE by a trained nurse examiner (or by physicians in Quebec). Unlike modern practice where registration takes place at arm’s length, in CNBSS registration occurred individually at each center using an “open book” method with trial arm identification given for each entry row. After each subject had been examined, but before randomization allocation took place, the coordinator knew the group to which the next subject would be allocated (28). As Tarone wrote in 1995, such nonblinded randomization could have allowed some women to be assigned preferentially to the mammography group based on adverse signs discovered during the physical examination (40). Boyd and colleagues in 1993 noted that the randomization procedure used in the CNBSS was not standard for multicenter trials. The local randomization allocation in CNBSS did not rely on a central source of allocation by telephone, a better-established practice (41).
Evidence that randomization protocol violation occurred in the CNBSS1 is seen indirectly by the excess of advanced cancers (defined as four or more positive lymph nodes) in the screening arm, with 17 women with physical signs of advanced breast compared with five in the UC arm. Such an imbalance has a probability of occurring due to chance of 0.0033 (28). This would likely explain the excess of deaths in the CNBSS1 screening arm (40). Other indirect evidence of randomization protocol violation is evident from the findings by Cohen and associates (39) described above. There is now direct evidence that this occurred at one trial site with eyewitness statement of randomization protocol violation of at least 20 women with physical signs of breast cancer who were allocated to the screening arm in CNBSS (34,35). In summary, some degree of corruption of randomization occurred in the CNBSS studies. Because it appears that this took place for women who had poor-prognosis cancers, only a small number of such women shifted into the mammography screening group could markedly move the trial result toward showing no mortality benefit (34). As Boyd indicated in 1997, lessons about management of randomization should be learned from CNBSS (28) to prevent tampering.
5. What Was the Control Plan of Care? Was It Followed and Consistent?
The control plan of care was not always followed and was not consistent. The CNBSS control arm varied according to local clinical practices of each center. The control arm of the CNBSS studies required that the clinical findings identified by the nurse examiners had to be reviewed by the study surgeons before further diagnostic investigation was performed. This was done by a weekly review of the patients by the study surgeons at a review clinic in the study center. If the surgeon agreed with the clinical findings, subsequent procedures in the control arm varied according to each review clinic. In some centers, patients were referred for diagnostic mammography, others referred for surgical biopsy and excision by community surgeons, and other findings could be dismissed as false positives. The surgeons generally referred women for diagnostic mammography if the findings were felt to be suspicious. Baines published the results of the role of the physical examiner in the CNBSS in 1989 (42). Of the 19 965 women in CNBSS2 who were eligible to receive CBE in the control arm, 2358 (12%) were referred to the review center after CBE findings, and of these, 2289 (97%) were not referred for mammography or biopsy because deemed to be false positives at CBE (Note that the number of women in this report differs from that given in ref. 4) (42).
The structure of the CNBSS in the context of the health care system was that all work-up procedures were performed in the community and were not part of the trial. Therefore, it was not possible, especially across six provinces with different health systems. Confirming the lack of a standardized care plan, in CNBSS1, tissue aspiration or biopsy was recommended in 16.6/1000 women in the MP arm and was performed in only 13.4/1000. Needle localization biopsy was recommended in 16.2/1000 women in the MP arm but performed in only 12.4/1000 (8). In CNBSS2, the rates were similar; in the MP arm, 11.6/1000 needle biopsies and aspirations were recommended and 7.9/1000 were performed, and 23.8/1000 needle localization biopsies were recommended and yet only 17.7/1000 were performed (9). Some review clinics would not refer the mammographically-only detected lesions for a surgical diagnosis because of the inability to localize the lesions, the technique of needle localization being brand new. Therefore, the presence of a mammographically detected lesion could be ignored or not treated, leaving the plan of care uncontrolled and some women untreated until the abnormality became palpable. The study authors acknowledged that although the CNBSS was able to provide standard diagnostic recommendations, this was not uniformly adopted by the study surgeons or patients themselves (42).
6. Did the Intervention Meet the Standard of Care and State-of-the-art? Was It Consistently Applied?
The quality of the mammography intervention was very poor. In a controlled trial of an intervention, it is imperative that the quality of both the intervention and the control treatment are reflective of their actual capability. Otherwise, the true difference of their efficacies will be either underestimated or overestimated. In the case of the CNBSS, those performing the CBEs were highly trained, whereas mammography was treated as a generic commodity. In fact, several respected external reviewers judged the quality of the mammography to be extremely poor. The first two radiologist reviewers for the study, Dr Wende Logan and Dr Stephen Feig, resigned due to their unheeded recommendations to improve the quality (29). Their specific concerns were of sharpness, contrast, and overall quality of the mammograms. The radiation dose was kept so low that there was inadequate contrast of the mammograms. Further, grids were used only in the latter part of the study (43). They also noted that lack of training of the technologists resulted in inadequate positioning. One center in Vancouver used an 11-year-old mammography unit from the start of the trial in 1983 until 1987 (13). Another unit in London, Ontario, was retrieved from a storage garage and had not been used for 2 years, already out-of-date when installed at their center for the studies. A participating radiologist from Ottawa documented the poor quality in a letter to the editor (44). Five of 15 centers did not use phototimers (automatic exposure control) (43). The study protocol stipulated the 90° mediolateral view for the first 5 years and switched to the mediolateral oblique (MLO) view only in the last 3 years, thereby missing tissue in the upper outer quadrant, where more breast cancers occur, for the majority of the trial.
External Review of the CNBSS
The quality of the mammography in the CNBSS was audited by an external review committee. The results were published in 1990 by Baines and Miller (43), the principal investigators of the CNBSS, along with experts in mammography physics and quality measurement. Three independent expert radiologists rated the quality of the mammography for 10 randomly selected mammograms from each of the 15 centers for each year of the study. Each mammogram was rated on a 4-point scale for each view: 0 (poor), 1 (fair), 2 (satisfactory), 3 (good). The best possible score for a four-view mammogram was 12 (4 × 3). Participants were initially recruited into the CNBSS from 1980 to 1985. Original performance results are summarized in Table 1 (43). In the first 5 years of the CNBSS (1980–1984), less than 40% of the mediolateral or MLO views were satisfactory, as judged by the 1988 standard for MLO view. Alternatively stated, more than 60% of the mediolateral mammograms received scores of only 0 (poor) or 1 (fair) for the first 5 years. For the first 6 years, coinciding with the entire initial recruitment period, image quality was satisfactory only 49%–68% of the time (43). It was only toward the end of the study that the quality improved to a sufficient level.
Percentage of Mammograms With Satisfactory Scores (≥2) for Each Criterion by Calendar Year
Criteria . | 1980 . | 1981 . | 1982 . | 1983 . | 1984 . | 1985 . | 1986 . | 1987 . |
---|---|---|---|---|---|---|---|---|
Craniocaudal view | 98 | 92 | 93 | 93 | 96 | 95 | 98 | 98 |
Mediolateral oblique viewa | 35 | 28 | 28 | 37 | 39 | 61 | 85 | 90 |
Contrast and density | 63 | 56 | 66 | 66 | 72 | 75 | 89 | 89 |
Image quality | 50 | 49 | 55 | 60 | 64 | 68 | 85 | 85 |
Criteria . | 1980 . | 1981 . | 1982 . | 1983 . | 1984 . | 1985 . | 1986 . | 1987 . |
---|---|---|---|---|---|---|---|---|
Craniocaudal view | 98 | 92 | 93 | 93 | 96 | 95 | 98 | 98 |
Mediolateral oblique viewa | 35 | 28 | 28 | 37 | 39 | 61 | 85 | 90 |
Contrast and density | 63 | 56 | 66 | 66 | 72 | 75 | 89 | 89 |
Image quality | 50 | 49 | 55 | 60 | 64 | 68 | 85 | 85 |
Adapted from (43).
aIncluded mediolateral or mediolateral oblique positioning and evaluated according to the 1988 standard for a mediolateral oblique view.
Percentage of Mammograms With Satisfactory Scores (≥2) for Each Criterion by Calendar Year
Criteria . | 1980 . | 1981 . | 1982 . | 1983 . | 1984 . | 1985 . | 1986 . | 1987 . |
---|---|---|---|---|---|---|---|---|
Craniocaudal view | 98 | 92 | 93 | 93 | 96 | 95 | 98 | 98 |
Mediolateral oblique viewa | 35 | 28 | 28 | 37 | 39 | 61 | 85 | 90 |
Contrast and density | 63 | 56 | 66 | 66 | 72 | 75 | 89 | 89 |
Image quality | 50 | 49 | 55 | 60 | 64 | 68 | 85 | 85 |
Criteria . | 1980 . | 1981 . | 1982 . | 1983 . | 1984 . | 1985 . | 1986 . | 1987 . |
---|---|---|---|---|---|---|---|---|
Craniocaudal view | 98 | 92 | 93 | 93 | 96 | 95 | 98 | 98 |
Mediolateral oblique viewa | 35 | 28 | 28 | 37 | 39 | 61 | 85 | 90 |
Contrast and density | 63 | 56 | 66 | 66 | 72 | 75 | 89 | 89 |
Image quality | 50 | 49 | 55 | 60 | 64 | 68 | 85 | 85 |
Adapted from (43).
aIncluded mediolateral or mediolateral oblique positioning and evaluated according to the 1988 standard for a mediolateral oblique view.
Comparison of Mammography Sensitivity With Other RCTs
In 2002, it was noted in the World Health Organization Handbook on Breast Cancer Screening that the sensitivity of mammography in CNBSS was only 69% (178/259) with CBE sensitivity of 59.5% (154/259). This was much lower than the UK Edinburgh trial sensitivity of 94.5% (171/181) for mammography and 71.5% for CBE (130/181) (8,45,46). In the Swedish Two-County trial, mammography sensitivity was 95% (47). In the first year (prevalence) of the UK trial, the sensitivity was 98% compared with CBE 72%, and in the following (incident) years the relative sensitivity advantage increased (ie, 90% vs 45% for CBE) (45). By contrast, the sensitivity of mammography in CNBSS1 in the prevalence year was only 54.7% (64/117), whereas CBE had a sensitivity of 58.1% (68/117), and in the subsequent incidence years of CNBSS1, the sensitivity of mammography decreased to 114/233 (48.9%), whereas CBE sensitivity was 86/233 (36.9%) (Table 2) (8). Similarly, in CNBSS2, in year 1, the sensitivity of mammography was 68.8% (108/157), whereas CBE sensitivity was 49.1% (77/157), and in the incidence years, the sensitivity of mammography decreased to 161/251 (64.1%), whereas CBE was 76/251 (30.3%) (Table 3) (9).The low sensitivity of the mammography in both of the CNBSS compared to these other trials confirms the poor-quality mammography, even for that period.
Cancer Detection Rates (CDRs) (Includes In Situ Cancers) for Mammography and Clinical Breast Examination in Screening Arm (MP) vs Usual Care (UC) in CNBSS1 (8) Compared With Population-based Screening Mammography Programs
. | MP Arm . | UC Arm . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDR in Screening Programsa . | Screen Year . | Exams/Patients (N) . | CDR (/1000) . | CDR for MG Alone (/1000) . | CDR (MG Alone or Both MG and CBE) (/1000) . | Cancers Detected by MG Alone (N) . | Cancers Detected by CBE Alone (N) . | Cancers Detected by Both MG and CBE (N) . | Interval + Incident Cancersb (N) . | ICR Including Interval and Incident Cancers (/1000) . | Round of Study . | Patients (N) . | Cancers Detected With CBE (N) . | CDR (/1000) . | Interval or Incident Cancers (N) . |
2.8 | 1 | 25 214 | 3.89 | 1.19 | 2.54 | 30 | 34 | 34 | 19 | 0.75 | 1 | 25 216 | 62 | 2.46 | 28 |
1.6 | 2 | 22 424 | 1.74 | 0.76 | 1.11 | 17 | 14 | 8 | 16 + 2 = 18 | 0.80 | 2 | 25 092 | 38 | ||
1.6 | 3 | 22 066 | 1.99 | 0.77 | 1.45 | 17 | 12 | 15 | 8 + 3 = 11 | 0.50 | 3 | 25 033 | 42 | ||
1.6 | 4 | 21 839 | 2.38 | 1.37 | 1.88 | 30 | 11 | 11 | 10 + 9 = 19 | 0.87 | 4 | 24 954 | 46 | ||
1.6 | 5 | 14 146 | 1.84 | 0.78 | 1.13 | 11 | 10 | 5 | 9 + 15 = 24 | 5 | 24 883 | 39 | |||
Total | 105 | 81 | 73 | 91 | 62 | 193 | |||||||||
Percentage of total cancers | 30% (105/350) | 23.1% (81/350) | 20.9% (73/350) | 26% (91/350) | 24.3% (62/255) | 75.7% | |||||||||
ICR of screening programsa | 0.65 |
. | MP Arm . | UC Arm . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDR in Screening Programsa . | Screen Year . | Exams/Patients (N) . | CDR (/1000) . | CDR for MG Alone (/1000) . | CDR (MG Alone or Both MG and CBE) (/1000) . | Cancers Detected by MG Alone (N) . | Cancers Detected by CBE Alone (N) . | Cancers Detected by Both MG and CBE (N) . | Interval + Incident Cancersb (N) . | ICR Including Interval and Incident Cancers (/1000) . | Round of Study . | Patients (N) . | Cancers Detected With CBE (N) . | CDR (/1000) . | Interval or Incident Cancers (N) . |
2.8 | 1 | 25 214 | 3.89 | 1.19 | 2.54 | 30 | 34 | 34 | 19 | 0.75 | 1 | 25 216 | 62 | 2.46 | 28 |
1.6 | 2 | 22 424 | 1.74 | 0.76 | 1.11 | 17 | 14 | 8 | 16 + 2 = 18 | 0.80 | 2 | 25 092 | 38 | ||
1.6 | 3 | 22 066 | 1.99 | 0.77 | 1.45 | 17 | 12 | 15 | 8 + 3 = 11 | 0.50 | 3 | 25 033 | 42 | ||
1.6 | 4 | 21 839 | 2.38 | 1.37 | 1.88 | 30 | 11 | 11 | 10 + 9 = 19 | 0.87 | 4 | 24 954 | 46 | ||
1.6 | 5 | 14 146 | 1.84 | 0.78 | 1.13 | 11 | 10 | 5 | 9 + 15 = 24 | 5 | 24 883 | 39 | |||
Total | 105 | 81 | 73 | 91 | 62 | 193 | |||||||||
Percentage of total cancers | 30% (105/350) | 23.1% (81/350) | 20.9% (73/350) | 26% (91/350) | 24.3% (62/255) | 75.7% | |||||||||
ICR of screening programsa | 0.65 |
Adapted from Table 5 in (8).
Abbreviations: CBE, clinical breast examination; CNBSS, Canadian National Breast Screening Study; ICR, interval cancer rate; MG, mammography.
aData from service screening programs (48,49). Average reported rates for CDR in Canada 2009–2010 for women 40–49 years for invasive cancers (2/1000 and 1.1/1000) and in situ cancer detection rates (0.8/1000 and 0.5/1000) in prevalent and incident years, combined.
bIn this column, the number of interval cancers and incident cancers are combined. In the study, interval cancer was defined as a cancer diagnosed less than 12 months after a normal screen, and incident cancer was defined as a cancer diagnosed after 12 months of a normal screen (48). Interval cancers were reported for rounds 1–5 for the screening arm and round 1 for the UC arm, and incident cancers were reported for rounds 2–5 for both screening and UC arms. The study interval and incident cancer rates include in situ cancers and might be slightly higher than service screening programs, which are based on only invasive cancer rates.
Cancer Detection Rates (CDRs) (Includes In Situ Cancers) for Mammography and Clinical Breast Examination in Screening Arm (MP) vs Usual Care (UC) in CNBSS1 (8) Compared With Population-based Screening Mammography Programs
. | MP Arm . | UC Arm . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDR in Screening Programsa . | Screen Year . | Exams/Patients (N) . | CDR (/1000) . | CDR for MG Alone (/1000) . | CDR (MG Alone or Both MG and CBE) (/1000) . | Cancers Detected by MG Alone (N) . | Cancers Detected by CBE Alone (N) . | Cancers Detected by Both MG and CBE (N) . | Interval + Incident Cancersb (N) . | ICR Including Interval and Incident Cancers (/1000) . | Round of Study . | Patients (N) . | Cancers Detected With CBE (N) . | CDR (/1000) . | Interval or Incident Cancers (N) . |
2.8 | 1 | 25 214 | 3.89 | 1.19 | 2.54 | 30 | 34 | 34 | 19 | 0.75 | 1 | 25 216 | 62 | 2.46 | 28 |
1.6 | 2 | 22 424 | 1.74 | 0.76 | 1.11 | 17 | 14 | 8 | 16 + 2 = 18 | 0.80 | 2 | 25 092 | 38 | ||
1.6 | 3 | 22 066 | 1.99 | 0.77 | 1.45 | 17 | 12 | 15 | 8 + 3 = 11 | 0.50 | 3 | 25 033 | 42 | ||
1.6 | 4 | 21 839 | 2.38 | 1.37 | 1.88 | 30 | 11 | 11 | 10 + 9 = 19 | 0.87 | 4 | 24 954 | 46 | ||
1.6 | 5 | 14 146 | 1.84 | 0.78 | 1.13 | 11 | 10 | 5 | 9 + 15 = 24 | 5 | 24 883 | 39 | |||
Total | 105 | 81 | 73 | 91 | 62 | 193 | |||||||||
Percentage of total cancers | 30% (105/350) | 23.1% (81/350) | 20.9% (73/350) | 26% (91/350) | 24.3% (62/255) | 75.7% | |||||||||
ICR of screening programsa | 0.65 |
. | MP Arm . | UC Arm . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDR in Screening Programsa . | Screen Year . | Exams/Patients (N) . | CDR (/1000) . | CDR for MG Alone (/1000) . | CDR (MG Alone or Both MG and CBE) (/1000) . | Cancers Detected by MG Alone (N) . | Cancers Detected by CBE Alone (N) . | Cancers Detected by Both MG and CBE (N) . | Interval + Incident Cancersb (N) . | ICR Including Interval and Incident Cancers (/1000) . | Round of Study . | Patients (N) . | Cancers Detected With CBE (N) . | CDR (/1000) . | Interval or Incident Cancers (N) . |
2.8 | 1 | 25 214 | 3.89 | 1.19 | 2.54 | 30 | 34 | 34 | 19 | 0.75 | 1 | 25 216 | 62 | 2.46 | 28 |
1.6 | 2 | 22 424 | 1.74 | 0.76 | 1.11 | 17 | 14 | 8 | 16 + 2 = 18 | 0.80 | 2 | 25 092 | 38 | ||
1.6 | 3 | 22 066 | 1.99 | 0.77 | 1.45 | 17 | 12 | 15 | 8 + 3 = 11 | 0.50 | 3 | 25 033 | 42 | ||
1.6 | 4 | 21 839 | 2.38 | 1.37 | 1.88 | 30 | 11 | 11 | 10 + 9 = 19 | 0.87 | 4 | 24 954 | 46 | ||
1.6 | 5 | 14 146 | 1.84 | 0.78 | 1.13 | 11 | 10 | 5 | 9 + 15 = 24 | 5 | 24 883 | 39 | |||
Total | 105 | 81 | 73 | 91 | 62 | 193 | |||||||||
Percentage of total cancers | 30% (105/350) | 23.1% (81/350) | 20.9% (73/350) | 26% (91/350) | 24.3% (62/255) | 75.7% | |||||||||
ICR of screening programsa | 0.65 |
Adapted from Table 5 in (8).
Abbreviations: CBE, clinical breast examination; CNBSS, Canadian National Breast Screening Study; ICR, interval cancer rate; MG, mammography.
aData from service screening programs (48,49). Average reported rates for CDR in Canada 2009–2010 for women 40–49 years for invasive cancers (2/1000 and 1.1/1000) and in situ cancer detection rates (0.8/1000 and 0.5/1000) in prevalent and incident years, combined.
bIn this column, the number of interval cancers and incident cancers are combined. In the study, interval cancer was defined as a cancer diagnosed less than 12 months after a normal screen, and incident cancer was defined as a cancer diagnosed after 12 months of a normal screen (48). Interval cancers were reported for rounds 1–5 for the screening arm and round 1 for the UC arm, and incident cancers were reported for rounds 2–5 for both screening and UC arms. The study interval and incident cancer rates include in situ cancers and might be slightly higher than service screening programs, which are based on only invasive cancer rates.
Cancer Detection Rates (CDRs) (Includes In Situ Cancers) for Mammography and Clinical Breast Examination in Screening Arm (MP) vs Physical Examination Only (PO) in CNBSS2 (9) Compared With Population-based Screening Mammography Programs
. | MP Arm . | PO Arm . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDR of MG Screening Programs/ 1000a . | Screen Year . | Exams/Patients (N) . | CDR (/1000) . | CDR for MG Alone (/1000) . | CDR for MG Alone or both MG and CBE(/1000) . | Cancers by MG Alone (N) . | Cancers by MG and CBE (N) . | Cancers by CBE Alone (N) . | Interval Cancers + Incident Cancersb (N) . | ICR (/1000) . | Round of Study . | Exams/Patients (N) . | Cancers (N) . | CDR (/1000) . | Interval + Incident Cancers (N) . |
4.9 | 1 | 19 711 | 7.2 | 3.3 | 5.48 | 65 | 43 | 34 | 15 | 0.76 | 1 | 19 694 | 69 | 3.50 | 16 |
3.4 | 2 | 17 699 | 3.73 | 1.81 | 2.88 | 32 | 19 | 15 | 10 + 6 = 16 | 0.90 | 2 | 17 453 | 34 | 1.95 | 16 + 4 = 20 |
3.4 | 3 | 17 347 | 2.48 | 1.38 | 1.96 | 24 | 10 | 9 | 8 + 4 = 12 | 0.69 | 3 | 17 143 | 22 | 1.28 | 26 + 9 = 35 |
3.4 | 4 | 17 193 | 3.14 | 2.27 | 2.91 | 39 | 11 | 4 | 9 + 3 = 12 | 0.70 | 4 | 16 918 | 15 | 0.89 | 16 + 7 = 23 |
3.4 | 5 | 9876 | 2.84 | 2.03 | 2.64 | 20 | 6 | 2 | 5 + 15 = 20 | 2.03 | 5 | 9755 | 16 | 1.64 | 16 + 24 = 40 |
Total cancers | 180 | 89 | 64 | 75 | 156 | 134 | |||||||||
Percentage of total cancers | 44.1% (180/408) | 21.8% (89/408) | 15.7% (64/408) | 18.4% (75/408) | 53.8% (156/290) | 46.2% (134/290) | |||||||||
ICR of screening programsc | 0.74 |
. | MP Arm . | PO Arm . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDR of MG Screening Programs/ 1000a . | Screen Year . | Exams/Patients (N) . | CDR (/1000) . | CDR for MG Alone (/1000) . | CDR for MG Alone or both MG and CBE(/1000) . | Cancers by MG Alone (N) . | Cancers by MG and CBE (N) . | Cancers by CBE Alone (N) . | Interval Cancers + Incident Cancersb (N) . | ICR (/1000) . | Round of Study . | Exams/Patients (N) . | Cancers (N) . | CDR (/1000) . | Interval + Incident Cancers (N) . |
4.9 | 1 | 19 711 | 7.2 | 3.3 | 5.48 | 65 | 43 | 34 | 15 | 0.76 | 1 | 19 694 | 69 | 3.50 | 16 |
3.4 | 2 | 17 699 | 3.73 | 1.81 | 2.88 | 32 | 19 | 15 | 10 + 6 = 16 | 0.90 | 2 | 17 453 | 34 | 1.95 | 16 + 4 = 20 |
3.4 | 3 | 17 347 | 2.48 | 1.38 | 1.96 | 24 | 10 | 9 | 8 + 4 = 12 | 0.69 | 3 | 17 143 | 22 | 1.28 | 26 + 9 = 35 |
3.4 | 4 | 17 193 | 3.14 | 2.27 | 2.91 | 39 | 11 | 4 | 9 + 3 = 12 | 0.70 | 4 | 16 918 | 15 | 0.89 | 16 + 7 = 23 |
3.4 | 5 | 9876 | 2.84 | 2.03 | 2.64 | 20 | 6 | 2 | 5 + 15 = 20 | 2.03 | 5 | 9755 | 16 | 1.64 | 16 + 24 = 40 |
Total cancers | 180 | 89 | 64 | 75 | 156 | 134 | |||||||||
Percentage of total cancers | 44.1% (180/408) | 21.8% (89/408) | 15.7% (64/408) | 18.4% (75/408) | 53.8% (156/290) | 46.2% (134/290) | |||||||||
ICR of screening programsc | 0.74 |
Adapted from Table 5 in (9).
Abbreviations: CBE, physical examination; CNBSS, Canadian National Breast Screening Study; ICR, interval cancer rate; MG, mammography.
aData from service screening programs (48,49). Average reported rates for CDR in Canada 2009–2010 for women 50–59 years for invasive cancers (3.8/1000 and 2.7/1000) and in situ cancer detection rates (1.1/1000 and 0.7/1000) in prevalent and incident years combined (blue).
bIn this column, the number of interval cancers and incident cancers are combined. In the study, interval cancer was defined as a cancer (in situ and invasive) diagnosed less than 12 months after a normal screen, and incident cancer was defined as a cancer diagnosed after 12 months of a normal screen. Interval cancers were reported for rounds 1–5 for the MP arm and PO arms, and incident cancers were reported for rounds 2–5 for both MP and PO arms. For incident year after the 5th screening round, there were 19 159 women in the MP arm and 19 273 women in the PO arm. The study interval and incident cancer rates include in situ cancers and might be slightly higher than service screening programs, which are based on only invasive cancer rates.
cDefined as post-screen invasive carcinoma within 12 months of a normal screening mammogram from modern screening programs (48).
Cancer Detection Rates (CDRs) (Includes In Situ Cancers) for Mammography and Clinical Breast Examination in Screening Arm (MP) vs Physical Examination Only (PO) in CNBSS2 (9) Compared With Population-based Screening Mammography Programs
. | MP Arm . | PO Arm . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDR of MG Screening Programs/ 1000a . | Screen Year . | Exams/Patients (N) . | CDR (/1000) . | CDR for MG Alone (/1000) . | CDR for MG Alone or both MG and CBE(/1000) . | Cancers by MG Alone (N) . | Cancers by MG and CBE (N) . | Cancers by CBE Alone (N) . | Interval Cancers + Incident Cancersb (N) . | ICR (/1000) . | Round of Study . | Exams/Patients (N) . | Cancers (N) . | CDR (/1000) . | Interval + Incident Cancers (N) . |
4.9 | 1 | 19 711 | 7.2 | 3.3 | 5.48 | 65 | 43 | 34 | 15 | 0.76 | 1 | 19 694 | 69 | 3.50 | 16 |
3.4 | 2 | 17 699 | 3.73 | 1.81 | 2.88 | 32 | 19 | 15 | 10 + 6 = 16 | 0.90 | 2 | 17 453 | 34 | 1.95 | 16 + 4 = 20 |
3.4 | 3 | 17 347 | 2.48 | 1.38 | 1.96 | 24 | 10 | 9 | 8 + 4 = 12 | 0.69 | 3 | 17 143 | 22 | 1.28 | 26 + 9 = 35 |
3.4 | 4 | 17 193 | 3.14 | 2.27 | 2.91 | 39 | 11 | 4 | 9 + 3 = 12 | 0.70 | 4 | 16 918 | 15 | 0.89 | 16 + 7 = 23 |
3.4 | 5 | 9876 | 2.84 | 2.03 | 2.64 | 20 | 6 | 2 | 5 + 15 = 20 | 2.03 | 5 | 9755 | 16 | 1.64 | 16 + 24 = 40 |
Total cancers | 180 | 89 | 64 | 75 | 156 | 134 | |||||||||
Percentage of total cancers | 44.1% (180/408) | 21.8% (89/408) | 15.7% (64/408) | 18.4% (75/408) | 53.8% (156/290) | 46.2% (134/290) | |||||||||
ICR of screening programsc | 0.74 |
. | MP Arm . | PO Arm . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDR of MG Screening Programs/ 1000a . | Screen Year . | Exams/Patients (N) . | CDR (/1000) . | CDR for MG Alone (/1000) . | CDR for MG Alone or both MG and CBE(/1000) . | Cancers by MG Alone (N) . | Cancers by MG and CBE (N) . | Cancers by CBE Alone (N) . | Interval Cancers + Incident Cancersb (N) . | ICR (/1000) . | Round of Study . | Exams/Patients (N) . | Cancers (N) . | CDR (/1000) . | Interval + Incident Cancers (N) . |
4.9 | 1 | 19 711 | 7.2 | 3.3 | 5.48 | 65 | 43 | 34 | 15 | 0.76 | 1 | 19 694 | 69 | 3.50 | 16 |
3.4 | 2 | 17 699 | 3.73 | 1.81 | 2.88 | 32 | 19 | 15 | 10 + 6 = 16 | 0.90 | 2 | 17 453 | 34 | 1.95 | 16 + 4 = 20 |
3.4 | 3 | 17 347 | 2.48 | 1.38 | 1.96 | 24 | 10 | 9 | 8 + 4 = 12 | 0.69 | 3 | 17 143 | 22 | 1.28 | 26 + 9 = 35 |
3.4 | 4 | 17 193 | 3.14 | 2.27 | 2.91 | 39 | 11 | 4 | 9 + 3 = 12 | 0.70 | 4 | 16 918 | 15 | 0.89 | 16 + 7 = 23 |
3.4 | 5 | 9876 | 2.84 | 2.03 | 2.64 | 20 | 6 | 2 | 5 + 15 = 20 | 2.03 | 5 | 9755 | 16 | 1.64 | 16 + 24 = 40 |
Total cancers | 180 | 89 | 64 | 75 | 156 | 134 | |||||||||
Percentage of total cancers | 44.1% (180/408) | 21.8% (89/408) | 15.7% (64/408) | 18.4% (75/408) | 53.8% (156/290) | 46.2% (134/290) | |||||||||
ICR of screening programsc | 0.74 |
Adapted from Table 5 in (9).
Abbreviations: CBE, physical examination; CNBSS, Canadian National Breast Screening Study; ICR, interval cancer rate; MG, mammography.
aData from service screening programs (48,49). Average reported rates for CDR in Canada 2009–2010 for women 50–59 years for invasive cancers (3.8/1000 and 2.7/1000) and in situ cancer detection rates (1.1/1000 and 0.7/1000) in prevalent and incident years combined (blue).
bIn this column, the number of interval cancers and incident cancers are combined. In the study, interval cancer was defined as a cancer (in situ and invasive) diagnosed less than 12 months after a normal screen, and incident cancer was defined as a cancer diagnosed after 12 months of a normal screen. Interval cancers were reported for rounds 1–5 for the MP arm and PO arms, and incident cancers were reported for rounds 2–5 for both MP and PO arms. For incident year after the 5th screening round, there were 19 159 women in the MP arm and 19 273 women in the PO arm. The study interval and incident cancer rates include in situ cancers and might be slightly higher than service screening programs, which are based on only invasive cancer rates.
cDefined as post-screen invasive carcinoma within 12 months of a normal screening mammogram from modern screening programs (48).
Training and Experience of Radiologists in CNBSS
The goal of cancer screening is to achieve a high sensitivity at detecting cancers with the highest specificity compatible with that sensitivity. As mentioned, many of the radiologists in CNBSS had very limited training and experience in mammography. Several radiologists noted that they learned mammography interpretation only during the study, noting that they had less than 1–2 hours of training and, in some cases, none before starting work in the study (13,35,44). This deficiency in training was highlighted by Warren Burhenne in 1993, who noted that the mammographic false negative rate of 2.5 per 1000 for CNBSS was far higher than the rate of 0.92 per 1000 for the established British Columbia screening mammography program or the BCCDP study false negative rate of 1.93/1000 (13,21), both of which employed trained radiologists. At certain centers a local policy was established that any positive mammography finding would be reviewed by a second (senior) radiologist (33). In normal practice this would occur before recommendation of a biopsy. In the centers where this second opinion was required, the second radiologist would often override the first radiologist’s opinion and negate the abnormal recall, thus preventing cases from being referred to the review clinic. Second reading is widely used for mammography screening, particularly in Europe. It is essential, however, that both readers are highly skilled to avoid the erroneous rejection by one radiologist of the precocious finding of the signs of cancer by the other. Due to the poor training of many of the CNBSS radiologists at the time, many abnormal mammograms were not acted on (35).
7. Did the Trial Appropriately Address the Research Question(s): Does Screening for Asymptomatic Women 40–49 Years Old (CNBSS1) and the Addition of Mammography to CBE for Asymptomatic Women in Their 50s (CNBSS2) Contribute to Reduced Mortality From Breast Cancer?
Symptomatic women were included in the study. The CNBSS trials were intended to evaluate the efficacy of screening women for breast cancer. However, women were not required to be asymptomatic, and there is now evidence that at multiple centers women with symptoms were actively recruited into the study (35). The CNBSS were, therefore, not trials of screening, but of some hybrid of screening and diagnostic mammography where the proportions of the two types of breast imaging are not clearly defined. Any deaths in these women in either arm of the trial would be less informative in measuring the benefit of a screening intervention for preclinical cancer.
8. Do the Conditions Under Which the CNBSS Trials Were Conducted and the Performance of the Tests Used in the Intervention and the Control Reasonably Reflect Modern Screening Practice, Work-up, and Therapy?
The CNBSS trials were conducted such that their performance does not reflect modern screening practice and work-up, as shown by the low mammography cancer detection rates (CDR) and high interval cancer rates (ICR). Although most modern screening programs no longer include CBE in screening practices, in Canada in 2009, one of the three provinces that included women 40–49 years of age also performed CBE as part of the screening programs (50). For women 50–59 years, 3 of 12 Canadian jurisdictions included CBE in their screening programs, one of which (Ontario) performed 35% of all screening mammograms (50). The CDR from CNBSS 1 and 2 by method of detection (mammography, CBE, or both), as well as ICR are compared with those rates in service screening programs mainly using film-screen mammography in Table 2 for CNBSS1 and Table 3 for CNBSS2. For CNBSS1, where imaging took place in the early 1980s, the overall CDR for the screening arm (MP) of 3.89/1000 in the prevalence year, and 1.84–2.38/1000 in the subsequent years are all higher than expected rates for established screening programs in Canada in 2009–2010 (this was the last year for which data for screening women in their 40s were reported) (48,50). For women ages 40–49, CDR for Canadian mammography screening programs was 2.8/1000 in the first year, and 1.6 for subsequent screening years in 2009–2010 (48). The high CDR in the MP arm of CNBSS1 may reflect inclusion of symptomatic women, leading to higher rates of cancer.
By comparison, the CDR of the mammogram (MG) in the screening arm in CNBSS1, including both MG-only and cancers detected on both MG and CBE, was 2.54/1000 in the initial year, and 1.12, 1.45, 1.88, and 1.13/1000 for subsequent years, which are much lower (except for year 4) than the corresponding CDRs for established screening programs for the same age group (given above) as illustrated in Figure 1. This suggests that the sensitivity of the mammography in CNBSS1 was considerably lower than in modern screening.
Performance in the CNBSS1 screening arm versus service screening programs (48) for cancer detection rate (CDR) and interval cancer rate (ICR) with mammography (MG) alone or with clinical breast examination (CBE). Abbreviation: CNBSS, Canadian National Breast Screening Study.
Similar results were seen in CNBSS2, where for women aged 50–59, the total CDR for the MP arm was 7.2/1000 in the first year of screening and the second year CDR was 3.74/1000, both of which were higher than reported CDRs for more modern screening programs in 2009–2010, which were 4.9/1000 in the first year and 3.4/1000 for subsequent years for women aged 50–59 (48). The CDR of mammography alone or in combination with CBE in CNBSS2 in the incidence years was 2.88, 1.96, 2.91, and 2.63/1000, all of which are lower than the Canadian screening programs for the same age group of 3.4/1000 for incidence screening as shown in Figure 2 (42). The CDR of MG in both studies is lower than the performance of the established screening programs even when including CBE and indicates the poor quality of the mammography and subsequent diagnosis.
Performance in the CNBSS2 screening arm versus service screening programs (48) for cancer detection rate (CDR) and interval cancer rate (ICR) with mammography (MG) alone or with clinical breast examination (CBE). Abbreviation: CNBSS, Canadian National Breast Screening Study.
The above is consistent with the observation that there was lower sensitivity of mammography in all 5 years (50.9% in CNBSS1 and 65.9% in CNBSS2) compared with established mammography screening programs sensitivity of 84% (48). By contrast, CBE sensitivity was high: 44% in CNBSS 1 and 37.5% in CNBSS 2. More recent advances in mammography technology such as improved x-ray tubes, automatic exposure control, digital mammography, and digital breast tomosynthesis have occurred since these trials were performed. The disparity between MG performance observed in the CNBSS and that of current screening programs is, therefore, likely to be even larger.
Another parameter to evaluate screening mammography performance is the ICR. Interval cancers are diagnosed in between screening periods. In Canada, ICRs of <0.6/1000 within 1 year of a normal screen or 1.2/1000 within 2 years are generally achieved in provincial and territorial screening programs consistent with internationally accepted targets of <1/1000 per year (49). In Canadian screening programs (2009–2010), the one-year ICRs were 0.65/1000 for 40–49 years and 0.68/1000 for 50–69 years (48). In CNBSS1, the 91 interval cancers occurred in the MP arm women with an average one-year ICR of 0.89/1000 for incidence screens (Table 2). In CNBSS2, 75 interval cancers occurred in the MP arm in 19 711 women, with an average one-year ICR of 0.97/1000 for incidence screens (Table 3). The high ICRs in both CNBSS studies indicate much lower performance of screening mammography in the CNBSS than in current programs.
Inappropriate Pooling
The CNBSS studies were designed differently but were inappropriately pooled. Although not part of the conduct of the CNBSS itself, there were also anomalies in reporting the studies. For example, in 2014, the CNBSS investigators published on 25-year follow-up results, but inexplicably pooled data from the two studies (23). Although this gave the impression of a much larger study, in fact it was inappropriate to combine data because they applied to different populations, had different hypotheses, and the two studies had compared different intervention and control conditions.
Discussion
The CNBSS trials have fueled decades of debate and influenced national policies in Canada, the United States, and beyond that continue to lead to preventable deaths of women from breast cancer. Of the eight RCTs that tested the efficacy of screening in reducing mortality, six found that breast cancer mortality was reduced. The relative risk reduction for women invited to screening mammography ranged from 0.68 (Swedish Two-County trial) to 1.02 (CNBSS2) for an overall relative risk reduction of 20% (51,52). The eight RCTs that included women 40–49 years after 10.5–18 years follow-up showed a relative risk reduction of breast cancer mortality ranging from 0.56 (Gothenburg study) to 0.97 (CNBSS1) among women invited to screening, and a meta-analysis showed a combined statistically significant 29% reduction in breast cancer mortality after 12.7 years of follow-up (53). The CNBSS studies were the only two trials to demonstrate no benefit from screening mammography for women 40–59 years of age. The negative results of CNBSS were concluded by the investigators to be “an example of the effect of good therapy, the 13-year survival for the breast cancers diagnosed in the physical examination alone screening arm in CNBSS2 was 83%, identical to those in the CNBSS2 mammography arm and for comparably aged women in the ASP (screened) arm in the Swedish Two-County trial—all superior to the 75% survival in the women with breast cancer in the PSP (control) arm in the Two-County Trial” (54). In CNBSS2, the 7-year survival rates were high and similar in the mammography and non-mammography arms, 90.2% and 89.9%, respectively (9). At the time, however, for the general public in Canada and the United States, 5-year survival for all breast cancers was only 75%–80% (38). The CNBSS patients had a much higher survival from breast cancer than most women at the time, equivalent to current 5-year survival rates in Canada of 90% (55). It is unlikely that the CNBSS patients were representative of the general population in 1989 because of their unusually high breast cancer survival rates.
For CNBSS1, the number of deaths resulting from cancers detected at the first screen was markedly higher in the screening arm (11/98, 11%) than the usual care arm (5/62, 8.1%) (8). However, in the screening arm, the number of deaths from cancers that were found by mammography alone (5/105, 4.8%) was markedly lower than those found with physical examination, with or without mammography (17/154, 11%) (8).
The CNBSS fell severely short in meeting most of the major criteria for design and conduct of an RCT. This almost certainly explains the failure of the CNBSS to find mortality reductions from screening women in their 40s with mammography and from adding mammography to CBE for screening women in their 50s. The good that came from CNBSS and from the criticism that it received was the marked improvement in the quality of mammography in Canada. In the 1990s, fewer than 50% of Canadian mammography units met the Canadian Association of Radiologists Mammography Accreditation Program (CARMAP) standards for mammography; in 2021, the current rate is above 90% (56). It has been noted that the establishment of CARMAP was a direct result of the CNBSS trials.
Conclusion
Although the intent of the CNBSS was laudable and testing the efficacy of screening mammography in the 40–49-year-old age group was desperately needed, these RCTs did not achieve their intended goals. The CNBSS failed to meet almost every fundamental requirement of high-quality RCTs either in the design or day-to-day operations. The most significant errors were that (1) the trials were inadequately powered to achieve significance in detecting differences in breast cancer mortality; (2) had very poor quality mammography with low sensitivity, low CDRs, and high ICRs; (3) included many women with symptoms of breast cancer; and (4) were designed such that women were allocated to either arm of the trials only after CBE was performed, allowing violation of the random allocation process, which is evident by a significantly higher number of advanced breast cancers in the screening arm of CNBSS1. Given the numerous and pervasive problems with the CNBSS, they are not scientifically valid trials of the efficacy of screening mammography. Health policy makers should understand the shortcomings and dismiss the results of the CNBSS when making decisions regarding access, onset, and frequency of public screening programs. This will reduce the risk of preventable deaths of women from breast cancer as well as morbidity associated with delayed detection.
Funding
None declared.
Conflict of Interest Statement
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Research collaboration between Dr Yaffe’s institution, Sunnybrook Research Institute, and GE Healthcare on breast tomosynthesis and contrast-enhanced mammography. Dr Yaffe has received consulting fees from BHR (pharma) and IGAN for unrelated activities. Dr Yaffe holds shares in Volpara Health Technologies and is the co-principal of Mammographic Physics, Inc., a company that provides consulting on image quality and radiation safety issues in breast cancer imaging. Drs Seely, Eby, and Yaffe serve on the Editorial Board of the Journal of Breast Imaging and as such were not involved in the review and decision process of this article.
References
International Agency for Research on
Canadian Partnership Against Cancer. Breast Cancer Screening in Canada: Monitoring and Evaluation of Quality Indicators—Results Report, January 2009–December 2010. Toronto, Canada: Canadian Partnership Against Cancer; 2015. Available at: https://s22457.pcdn.co/wp-content/uploads/2019/01/Breast-Cancer-Screen-Monitor-Perform-2010-EN.pdf. Accessed January 5, 2022.
Canadian Partnership Against Cancer. Breast Cancer Screening in Canada: Monitoring and Evaluation of Quality Indicators - Results Report, January 2011 to December 2012. Toronto, Canada: Canadian Partnership Against Cancer; 2017. Available at: https://s22457.pcdn.co/wp-content/uploads/2019/01/Breast-Cancer-Screen-Quality-Indicators-Report-2012-EN.pdf. Accessed January 25, 2022.