-
PDF
- Split View
-
Views
-
Cite
Cite
Talía Malagón, Time to change the paradigm for primary endpoints in cancer screening trials?, JNCI: Journal of the National Cancer Institute, Volume 116, Issue 8, August 2024, Pages 1187–1189, https://doi.org/10.1093/jnci/djae088
- Share Icon Share
The article by Katki et al. (1) is an important contribution to a fierce debate on how to measure the benefits of screening with multicancer detection (MCD) tests (2,3). Current standard of care cancer screening tests only detect cancer or its precursors at a single anatomic site. Substantial research efforts are aiming to assess whether we could leverage several biomarkers such as circulating tumor cells and proteins in the blood to create tests that could be used for screening and early detection of cancer across multiple anatomic sites, including many cancers for which no current screening test exits (4-7). Although these tests have the potential to be game-changing technologies, the question of whether they are effective as screening tests remains to be answered. While many tests are currently under development or being improved, the evidence has reached the point where a large MCD screening randomized controlled trial is underway in the UK in partnership with the National Health Service (NHS), the NHS-Galleri trial (8), and the National Cancer Institute has launched a Cancer Screening Research Network to also start conducting rigorous, multicenter cancer screening trials in the United States (9). The time for randomized controlled trials has arrived, and with it the thorny methodological questions that must be answered in order to ensure their success and prevent an expensive flop (2). Issues of power and sample size, and of primary outcome to use to measure efficacy, are far from academic in the context of cancer screening studies, where typically hundreds of thousands of participants need to be followed for over a decade to show screening efficacy (10,11).
Much ink has been spilled over the primary outcome that should be used in screening randomized controlled trials of multicancer early detection tests (2,12-15). Typically, the primary endpoint should be the one that demonstrates the clinical benefit of an intervention. I share the opinion of many others that cancer mortality should be the primary endpoint used in cancer screening trials, rather than a stage shift or other intermediate endpoint (2,13). Earlier detection of cancer is not a benefit per se; the benefit of screening comes from the potentially averted morbidity and mortality through earlier treatment. As many have pointed out, using an endpoint related to early detection may lead to erroneous conclusions regarding the benefit of screening with MCD tests, as stage-specific cancer incidence may not reliably predict mortality reductions (12), could be affected by overdiagnosis (16), or may simply not translate into a clinical benefit if screen-detected early-stage cancers are not those that respond to treatment (17). For these reasons, mortality is generally considered to be the gold standard primary outcome in cancer screening trials. Despite this, the NHS-Galleri trial has opted to use incidence of late-stage cancers as their primary outcome, relegating cancer mortality to a secondary outcome (8). This decision has been critiqued (12,13,15), but it stems from a desire to reach conclusions on the efficacy of these tests sooner using an intermediate endpoint requiring a lower sample size than mortality.
Previously, trials have been designed to shown benefits for screening tests with cancer mortality as the primary endpoint, including colorectal and lung cancer screening (10,11). Clearly, it is not impossible to design a successful screening trial with cancer mortality as the primary endpoint for screening with MCD tests. Why, then, are some balking at the use of mortality as the primary endpoint in MCD trials? There is perhaps the sense that the accelerating progress in health technologies is starting to outstrip our ability to assess their clinical value on human timescales, and increased appetite for generating evidence quickly (3,18). A machine learning algorithm can be trained on biomarker data to improve its predictions much faster than we can assess the clinical benefit of that prediction in a randomized controlled trial, especially when it could take many years for that prediction to have a health impact. There is a not unreasonable possibility that a multiyear trial would risk becoming outdated by the time its results became available, with the version of the MCD test under trial becoming obsolete. There is also the fear that the errors of the past will be repeated. When prostate-specific antigen (PSA) tests were first introduced in the late 1980s, they had not been approved for cancer screening and had not undergone randomized controlled trials to test their screening efficacy. However, this did not prevent many from using them as screening tests, leading to substantial overdiagnosis of prostate cancer (19). Although randomized controlled trials were eventually set up to measure the efficacy of PSA tests for screening, the metaphorical horse had already bolted from the barn, with trial validity being threatened due to high control group contamination (20). Retrospective analyses concluded that the harms from overdiagnosis and overtreatment through screening with PSA tests significantly outweighed any potential benefits of earlier detection of prostate cancer (21). As two MCD tests are already commercially available in the United States despite not having regulatory approval as screening tests (22), there is a very real possibility that history could repeat itself. How can we reconcile the issue of speed of progress with the undisputed necessity to perform randomized controlled trials to evaluate the benefits and harms of screening with MCD tests?
This is why there is value in finding more efficient ways to design MCD trials in particular, and the motivating factor for the study by Katki et al. (1) In their paper, they outline the power and sample size requirements for a nested analysis looking at the effect size in screen-test positives, and they show that this analysis (which they call the intended effect analysis) can lead to important reductions in required sample size compared with the more standard intention-to-treat analysis. Although the idea of measuring the benefits of MCD screening in only test-positives had already been proposed (14), it is necessary to have studies like this to formalize the power and sample size calculations for this analysis to be used in randomized controlled trials. The authors are understandably cautious in their conclusions, indicating that they believe it may yet be premature to commit to using an intended effect analysis as the analysis for the primary endpoint in MCD trials given the lack of experience with this design in cancer screening trials. However, it certainly raises the question as to what could be the role of the intended effect analysis in these trials?
While not mentioned by Katki et al., the NHS-Galleri trial includes in its analysis plan the intention to use a retrospective nested analysis of mortality outcomes (8), which is to say they plan to do an intended effect analysis for their secondary endpoint. Although the trial protocol did not include power calculations for this secondary analysis, it notes the increased statistical power of using this nested analysis for mortality when compared to the standard analysis. Consequently, although the NHS-Galleri trial is underpowered to do a direct comparison of cancer-specific mortality rates between arms, it will possibly still have sufficient statistical power to detect an effect on mortality in test-positives. Even if cancer mortality is not the primary endpoint of the trial, those of us who strongly believe in the value of cancer mortality will be looking at this secondary analysis closely to see if any benefits in prevention of late-stage cancer incidence are supported by prevention of cancer mortality in screen-positives.
Looking toward the design of future MCD test trials, one potential application of the intended effect analysis might be for cancer mortality as a co-primary endpoint. Multiple co-primary endpoints are used in trials when it is necessary to demonstrate intervention effects on all of the endpoints to establish clinical benefit (23). The case can be made that MCD test screening trials are good candidates for requiring co-primary efficacy endpoints, with one primary endpoint being an intermediate endpoint such as stage-specific cancer incidence and the second primary endpoint being cancer mortality. The advantage of this approach would be to potentially combine the benefits of reaching earlier conclusions based on the intermediate endpoint, while still requiring the study to be sufficiently powered to see a benefit against cancer mortality. Although the use of co-primary endpoints in trials increases the required sample size to maintain the same type II error rate (24), it is possible that the efficiency gains from an intended effect analysis for mortality as co-primary endpoint would still be sufficient to lead to a lower sample size than a standard intention-to-treat analysis of mortality as sole primary endpoint. This would also help to mitigate the concerns the authors raise regarding potential threats to the validity of the intended effect analysis; the analysis plan could prespecify that the co-primary endpoint of mortality will be discarded in the worst-case scenario if the conditions for validity do not hold, such as differential dropout rates by arm, evidence of effect in test-negatives, and loss-of-signal in stored specimens used for the analysis (1).
Although these are exciting times in cancer screening research, it is important to emphasize the need for randomized controlled trials of screening with MCD tests, and the need for research into methods such as these to make these trials as efficient and rigorous as possible to enable evidence-based decision-making.
Data availability
No new data were generated or analyzed for this editorial.
Author contributions
Talía Malagón, PhD (Conceptualization; Writing—original draft; Writing—review & editing).
Funding
No funding was used for this editorial.
Conflicts of interest
Talía Malagón has no disclosures.