-
PDF
- Split View
-
Views
-
Cite
Cite
Ping Hu, Jon A Steingrimsson, Elodia Cole, Jean Cormack, Barbara K Dunn, Constantine Gatsonis, Cecilia Lee, Ni Li, Etta D Pisano, Jie He, Barnett S Kramer, Design considerations and challenges in the CHinA National CancEr Screening (CHANCES) trial and Tomosynthesis Mammographic Imaging Screening Trial (TMIST), JNCI Monographs, Volume 2025, Issue 68, March 2025, Pages 42–48, https://doi.org/10.1093/jncimonographs/lgae049
- Share Icon Share
Abstract
This paper explores the design considerations and hurdles encountered by the CHinA National CancEr Screening (CHANCES) Trial and the Tomosynthesis Mammographic Imaging Screening Trial (TMIST), both aimed at advancing cancer screening research. Before population-based cancer screening programs are launched, it is important to have confidence that the potential benefits of the screening process and resulting interventions outweigh harms, an ethical imperative because the people actively invited into the programs are relatively healthy. Large randomized screening trials provide the strongest, direct evidence regarding the balance of benefits and harms. The implementation of cancer screening programs involves a series of steps, with outcomes influenced by factors such as the prevalence of the disease, availability of effective treatment within the health-care system, and acceptance by the target population—all of which may vary considerably from country to country. This paper examines how these factors shaped the design and statistical approach of the CHANCES Trial for lung and colorectal cancers and the TMIST trial for breast cancer. We discuss the rationale, objectives, endpoint definitions, trial designs, and sample size considerations, highlighting both the challenges and opportunities presented in different settings. Ultimately, the goal is to foster collaboration and develop screening strategies that are scientifically robust and practically effective for diverse populations worldwide.
Introduction
Cancer remains one of the leading causes of death worldwide.1 Effective early detection tests coupled with prompt intervention have been shown to reduce death rates from some cancers.2 Cancer screening aims to identify cancers in asymptomatic individuals, allowing for treatment initiation before the disease progresses to late stages. The effectiveness of these programs is influenced by factors such as the biology of the specific cancer, the accuracy of the screening test, the prevalence of the disease in the screened population, successful program implementation and subsequent treatment, and acceptance by the target population.3 Given the considerable differences in these factors across nations and regions, combined with disparities in population genetics, health-care infrastructure, cultural norms, and economic resources, the complexities of cancer screening are evident. Therefore, it is important to approach cancer screening strategies with a global view, ensuring they are effective, flexible, and applicable to the specific setting.
The modern approach to cancer screening is grounded in findings from pivotal trials that have shaped practices worldwide. Taking lung cancer as an example, the landmark National Lung Screening Trial (NLST) was launched in 2002.2,4 This randomized multicenter study assessed the efficacy of low-dose helical computed tomography (LDCT) versus chest radiography for reduction in lung cancer mortality in current and former heavy smokers aged 55 to 74. It concluded that LDCT led to a 15%-20% relative reduction in lung cancer mortality. The Prostate, Lung, Colorectal, and Ovarian Cancer (PLCO) Trial, initiated in 1992, which investigated potential mortality reduction from screening for these 4 cancers irrespective of smoking history, expanded our understanding of regular screenings for several common cancers, including assessment of lung cancer screening in both men and women with annual posterior-anterior chest radiographs. Finally, the Dutch-Belgian Randomized Lung Cancer Screening Trial, known as the NELSON trial, confirmed a reduction in lung cancer-specific mortality using LDCT in a European, predominantly male population.5
Despite notable advancements, cancer screening is replete with challenges, limitations, and controversy. The risks include false positives and negatives as well as overdiagnosis. Overdiagnosis presents a significant challenge and can result in invasive procedures, side effects, and psychological distress for conditions that are not life-threatening. Additionally, the intuitive appeal and widespread enthusiasm for screening has often driven adoption of screening without solid, conclusive evidence from rigorously conducted randomized trials.2 This deviation from evidence-based practice can bring unintended consequences to optimal patient care and has ethical implications because the targeted individuals are often healthy and asymptomatic. Moreover, the medical community often has divergent opinions on screening approaches, highlighting variances in data interpretation and emphasizing the urgent need for evidence-driven guidelines suited to the specific setting. It is also worth noting that not all screening tests are beneficial, but all have associated harms and costs. They may also divert resources from more effective public health and medical interventions, indirectly harming the health of the community. The benefits, harms, and costs vary across countries.
In light of these complexities, this paper describes international perspectives in the design of cancer screening trials and discusses some of these complexities in the context of two large ongoing cancer screening trials. We describe the challenges and opportunities inherent in these trials across varied settings. The overarching goal is to foster collaboration, ensuring that strategies developed are both theoretically sound and practically beneficial in diverse populations worldwide.
The CHinA National CancEr Screening (CHANCES) trial
Launched in 2019 by the China National Cancer Center, the CHANCES trial is a 3-arm community-based multicenter randomized controlled trial (RCT) focused on lung and colorectal cancers, designed to address key questions in China that are also applicable internationally. It is the first statistically powered study in China to assess the effectiveness of different lung cancer screening intervals using low-dose computed tomography (LDCT), while also evaluating the cost-effectiveness and efficiency of various screening strategies for both lung and colorectal cancers.
Study design and unique research questions
The CHANCES trial has 3 arms:
Control arm: Participants choose between 1 colonoscopy or 5 annual fecal immunochemical test (FIT) screenings.
Annual LDCT arm: Participants receive 5 annual LDCT screenings and a one-time colonoscopy.
Biennial LDCT arm: Participants receive LDCT screening every other year (3 rounds total) with 5 annual FIT screenings.
The primary aims of the CHANCES Trial are to determine whether LDCT screening reduces lung cancer-specific mortality in high-risk Chinese urban residents and, if so, to establish whether annual or biennial screening is the optimal time interval for reducing lung cancer-specific mortality in LDCT screening arms. The control arm serves as a comparison group with no LDCT screening. CRC screening tests differ across the 3 arms, with varying combinations of colonoscopy and FIT. Secondary aims are outlined in the Supplementary Contexts of the protocol paper (to be published in BMC Cancer).
Rationale and background
Several population-based RCTs have demonstrated that repeated screening with LDCT in high-risk populations can reduce lung cancer mortality, including the US-based NLST2 and the Dutch–Belgian NELSON trial.5 Yet, nearly all data regarding the efficacy of lung cancer screening are from Western health-care systems, and estimating the efficacy in an Asian population is an important motivation for the CHANCES trial. Additionally, there is a lack of compelling evidence to determine the optimal frequency of LDCT screenings among high-risk groups. In particular, the relative benefits and harms of annual versus biennial screening remain unknown, a critical consideration for individual health-care decisions, as well as resource allocations. This is of particular importance in regions where health-care and economic resources, disease incidence rate, and access to medical services differ from Western countries.
Control arm: choice of colorectal screening
In concert with input from the Institutional Review Board (IRB) and the Institutional Review Committee (IRC) in China, participants in all 3 study arms are offered a screening test. This involved choosing a suitable screening procedure for the control arm. In making this selection, several factors were considered:
The study endpoint should not conflict with LDCT screening.
It should demonstrate effective screening performance and potentially benefit the participants.
The disease risk of control arm participants should match the broader lung screening population to ensure relevance and comparability based on lung cancer screening criteria.
The selected screening should be both readily available and acceptable to Chinese participants.
It should offer cost-effective screening.
Taking all these criteria into account, screening exams for colorectal cancer were chosen for the control arm.
Colonoscopy presents its set of challenges within China
Colonoscopy screening coverage is not yet comprehensive across many communities in China, necessitating intensified efforts by public health professionals to promote its broader implementation.
It requires experienced gastroenterologists, who are in limited supply relative to screening needs.
It is more expensive than FIT.
Participant adherence to colonoscopy screenings in China has not yet met expectations. The main reasons may include its frequent performance without anesthesia or sedation, lack of medical insurance coverage, and significant travel distances to screening centers.
Given these considerations, the trial investigators chose a one-time colonoscopy in Arm 1 to be compared with 5 annual FITs in Arm 2 for in-depth assessment and determination of participant acceptance. Participants in Arm 3 are given the choice of annual FIT or one-time colonoscopy.
Sample size considerations
The CHANCES trial uses the Hu and Zelen model6 to estimate the sample size sufficiently powered to detect the targeted reduction in lung cancer mortality. Due to the limited population and the inability to achieve 80% power when directly comparing Arm 1 to Arm 2, power and sample size were estimated separately for Arm 1 vs Arm 3 and Arm 2 vs Arm 3 using a 1-sided level-α test. This resulted in an unbalanced 2:3:3 randomization to achieve 90% power. The input parameters for the Hu-Zelen model are either sourced from China cancer registry,7 obtained from existing literature, or based on expert discussions. Notably, the estimated sample size for the primary endpoint in the CHANCES trial is considerably larger than that in the NLST because of the lower incidence rate observed among Chinese urban residents and the comparison of biannual versus no LDCT screenings within the 3-arm design of the CHANCES trial, compared with the NLST’s 2-arm design of annual LDCT versus chest x-ray, a screening test that has been shown to have little or no effect on lung cancer mortality. Table 1 provides a comparison between the key design parameters of CHANCES and NLST.
Summary of design of the China National Cancer Screening (CHANCES) Trial and the National Lung Screening Trial (NLST).
Trial Sample size Randomization ratio . | Age . | Accrual . | Sex . | Smoking eligibility . | Incidence rate . | Arm 1 . | Arm 2 . | Arm 3 . |
---|---|---|---|---|---|---|---|---|
| 50-74 | 3-year |
|
| 2/1000 |
|
|
|
26 400 | 39 600 | 39 600 | ||||||
| 55-74 | 2-year | No requirement | Current and former smokers with 30+ pack-years | 6.7/1000 |
|
| |
25 000 | 25 000 |
Trial Sample size Randomization ratio . | Age . | Accrual . | Sex . | Smoking eligibility . | Incidence rate . | Arm 1 . | Arm 2 . | Arm 3 . |
---|---|---|---|---|---|---|---|---|
| 50-74 | 3-year |
|
| 2/1000 |
|
|
|
26 400 | 39 600 | 39 600 | ||||||
| 55-74 | 2-year | No requirement | Current and former smokers with 30+ pack-years | 6.7/1000 |
|
| |
25 000 | 25 000 |
Abbreviations: LDCT = low-dose helical computed tomography; FIT = fecal immunochemical test.
Summary of design of the China National Cancer Screening (CHANCES) Trial and the National Lung Screening Trial (NLST).
Trial Sample size Randomization ratio . | Age . | Accrual . | Sex . | Smoking eligibility . | Incidence rate . | Arm 1 . | Arm 2 . | Arm 3 . |
---|---|---|---|---|---|---|---|---|
| 50-74 | 3-year |
|
| 2/1000 |
|
|
|
26 400 | 39 600 | 39 600 | ||||||
| 55-74 | 2-year | No requirement | Current and former smokers with 30+ pack-years | 6.7/1000 |
|
| |
25 000 | 25 000 |
Trial Sample size Randomization ratio . | Age . | Accrual . | Sex . | Smoking eligibility . | Incidence rate . | Arm 1 . | Arm 2 . | Arm 3 . |
---|---|---|---|---|---|---|---|---|
| 50-74 | 3-year |
|
| 2/1000 |
|
|
|
26 400 | 39 600 | 39 600 | ||||||
| 55-74 | 2-year | No requirement | Current and former smokers with 30+ pack-years | 6.7/1000 |
|
| |
25 000 | 25 000 |
Abbreviations: LDCT = low-dose helical computed tomography; FIT = fecal immunochemical test.
Compliance and contamination
Compliance: One rationale for offering screening exams in the control arm is to achieve high participation and retention rates in the trial, given that each arm provides a form of screening.
Contamination: The contamination rate is presumed to be low, because most participants do not have insurance that covers LDCT screening expenses. Participants in the CHANCES Trial receive screening exams free of charge.
The Tomosynthesis Mammographic Imaging Screening Trial (TMIST)
The Tomosynthesis Mammographic Imaging Screening Trial (TMIST) is a large ongoing randomized controlled trial (RCT) with the primary objective of comparing the rate of “advanced cancer” between women screened for breast cancer with tomosynthesis mammography (TM) versus digital mammography (DM).8 In 2011, the Food and Drug Administration approved the first TM machine for breast cancer screening. To date, evidence from RCTs evaluating the effectiveness of TM has come largely from trials conducted in Europe,9-13 mostly focusing on short-term outcomes (eg, recall rate or rate of interval cancers). However, screening practices, population characteristics, and breast cancer rates in Europe are not representative of the United States or the rest of the world,14-16 raising concerns about interpreting European trial results globally.
In TMIST, each woman will undergo either annual or biennial screening for 5 years (where screening frequency depends on risk factors), and each woman will be followed up for between 3 and 8 years depending on time of study entry. The trial activated in 2017 and is projected to complete enrollment and follow-up at the end of the year 2027. As of September 16, 2024, a total of 104 854 women (out of a target accrual of 108 508) had been enrolled in the United States, Argentina, Thailand, Peru, Italy, Canada, Chile, and South Korea, with more than 60% of accrual being international, mostly from Argentina. Notably, the trial has successfully recruited members of groups who are historically underrepresented in clinical research; 20.9% of the US participants are Black or African American, and 48.7% of all TMIST participants are Hispanic/Latina. International sites qualified for participation in TMIST by submitting data on their usual breast cancer screening practices and results to assure their screening systems were similar to those of US practitioners. Specifically, the TMIST study team checked that international sites recall rates and cancer detection rates per 1000 were similar to US practice.
TMIST has multiple secondary and exploratory objectives including subgroup comparisons, comparisons of biopsy and recall rates, comparisons of health-care utilization, and assessing the effects of social determinants of health. The trial is also creating a large biorepository that, combined with data from TMIST, has the potential to address many of the research gaps listed in the latest US Preventive Services Task Force (UPSTF) recommendations,17 including questions surrounding breast density, individualized screening, estimation of overdiagnosis, and exploring the reasons for higher breast cancer mortality among Black women.
Design considerations
The ultimate goal of cancer screening is to reduce cancer-specific mortality, making cancer-specific mortality the ideal endpoint in cancer screening trials (although other considerations such as intensity of treatment or number and severity of adverse events are also important). But for less aggressive cancers, using cancer-specific mortality may be infeasible because it would require extremely large sample sizes or very long follow-up, necessitating the use of surrogates for cancer-specific mortality. In order for the surrogate to be a “valid surrogate” for cancer-specific mortality, the effect of screening on the surrogate must accurately predict the effect of the screening on cancer-specific mortality.18,19 Having the surrogate outcome simply correlate with cancer-specific mortality is not sufficient for the surrogate to be “valid.”20 A recent systematic review found at least a moderate correlation between reduction in later-stage cancers and reduction in breast-cancer specific mortality.21
Breast cancer screening is an example in which conducting a screening trial comparing two screening modalities or methods in an asymptomatic population using breast cancer-specific mortality as the primary endpoint is not feasible. After much deliberation involving interdisciplinary stakeholders including patient advocates, the endpoint in TMIST was selected as the novel endpoint of advanced cancer, defined as cancers meeting any of the following criteria:
There are distant metastases,
There is at least one lymph node macrometastasis (i.e., greater than 2 mm in size and excluding micrometastases and isolated tumor cells),
The cancer is invasive and is greater than 10 mm in size and is either: i. ER negative or low positive (≤10%), and PR negative or low positive (≤10%), and HER2 negative, or isii. HER2+,or
The cancer is invasive and is greater than or equal to 20 mm in size, unless of pure mucinous, papillary (does not include micropapillary), tubular, adenoid cystic, or invasive cribriform histology; these histologic types are excluded.
Cancers meeting the definition of advanced cancers roughly correspond to cancers requiring chemotherapy, and reduction in the rate of advanced cancers is believed to correlate with reduction in breast cancer-specific mortality.22
An alternative, but weaker, objective for cancer screening studies focuses on estimating diagnostic accuracy—that is, the ability of a screening test to detect cancer when it is present and identify absence of cancer for cancer-free participants (this was the primary objective of DMIST,23 the precursor to TMIST, which compared DM and film mammography for breast cancer screening). The diagnostic accuracy for cancer screening is often reported using sensitivity, specificity, receiver operation characteristic curve (ROC) that plots sensitivity vs specificity for varying thresholds, the area under the ROC curve, and negative and positive predictive value, all of which compare the imaging result to a reference status.24 The advantage of diagnostic accuracy studies, compared with long-term surrogates of cancer-specific mortality, is that they require shorter follow-up and smaller sample sizes. But diagnostic accuracy is not a validated true surrogate endpoint, because finding more cancers is not guaranteed to improve cancer-specific mortality and it does not provide information on what proportion of the additionally diagnosed cancers are aggressive disease vs cancers that are overdiagnosis.
Sample size considerations
In screening trials, it is common to follow participants for some time after the last protocol-mandated screen for outcomes, and a dilution of the effect of screening after screening is no longer mandated is expected. There is also often a time lag between the time of first screen and when the effect of screening on patient-centered outcomes shows up. As a result, the proportional hazards assumption is likely violated, and sample size calculations that rely on such assumptions can be faulty (although log-rank tests are still valid in the absence of proportional hazards). In the TMIST design, the sample size calculations incorporated a varying hazard rate with smaller differences in efficacy in the beginning and diminishing effects after protocol-mandated screening stopped. Expert panel discussion concluded that a 20% relative reduction in the rate of advanced cancers at 4.5 years from enrollment would be considered clinically important. Using data from the Breast Cancer Screening Consortium25 on the rate of advanced cancers and incorporating dropout and crossover between arms, a sample size of 108 508 resulted in 80% power of a 2-sided log-rank test with a type 1 error rate of 0.05 using 1:1 randomization.
Discussion
It is essential to have robust high-level evidence drive health-care policymaking, an especially relevant issue when the target population is healthy. The public also has a right to expect that the evidence of net benefit is strong when a medical intervention is promoted as a matter of policy. When evaluating the benefits and risks of cancer screening, RCTs are the gold standard. Although often informative, analysis of observational data on cancer screening is prone to various strong biases such as lead time bias, length time bias, and confounding26,27 that impede determination or estimates of benefits. Several statistical methods have been developed to adjust for such biases, but all rely on untestable assumptions.28,29 Consequently, results from observational analyses are inherently more speculative and the conclusions less definitive compared with results from RCTs.
Nonetheless, few randomized trials for cancer screening are conducted compared with therapeutic trials, largely because cancer screening trials are large and expensive. RCTs that focus on answering more nuanced questions such as comparing screening intervals or screening refinements are even more rare. Furthermore, as systemic therapy improves, the efficacy of screening may diminish, but RCTs to reassess screening in the setting of better therapy are almost nonexistent. As a result, many countries make screening recommendations based on results of RCTs conducted in very different populations and health-care settings. But various factors can lead to study results that are poorly generalizable to a new population due to different population characteristics, clinical practices, and treatment options.30 Recently developed statistical methods can combine trial data with covariate data from different target populations to estimate the effect of interventions in a new target population with different covariate distribution.31-33 For example, these methods were recently used to evaluate the effect of LDCT screening on lung cancer-specific mortality in the target population of all individuals eligible for lung cancer screening in the United States by combining trial data from NLST with nationally representative survey data of noninstitutionalized people in the United States eligible for lung cancer screening.34 However, like observational data analysis, such statistical adjustments depend on untestable assumptions, and they only account for differences in observed population characteristics. Pragmatic trials, designed to show real-world effectiveness of interventions, try to recruit participants who have similar characteristics to the users of the intervention if it became standard of care, and they deliver the intervention and conduct follow-up similarly to how it would be done in practice.35 They have been used in cancer screening36 and have the potential to provide more generalizable results (eg, reducing healthy volunteer bias). However, they are most useful to improve estimates of real-world effectiveness of interventions in the population where the trial is conducted and are less useful to adjust for between-population differences in factors such as disease prevalence, acceptance of screening in the target population, and availability of effective therapy options.
Reducing cancer-specific mortality through screening relies not only on the effectiveness of the screening modality, but also on uptake among the eligible population and quality of care. All present varying degrees of challenge internationally.37,38 For example, in the cross-sectional 2019 US Behavioral Risk Factor Surveillance System survey, self-reported uptake rate of screening LDCT among participants eligible for screening according to the 2013 USPSTF recommendations was only 12.8%, a major impediment to effectiveness at the population level.39 What affects uptake of cancer screening differs between communities,37,40,41 highlighting the need for culturally appropriate measures for increasing screening uptake and optimizing therapy after diagnosis of cancers.
In this article, we provide details on two large cancer screening trials, CHANCES and TMIST, that fill in critical gaps in information about the performance of new screening techniques and will provide important data on crucial questions such as optimal screening intervals and screening effectiveness among racial and ethnic groups. Both trials show the importance of international collaboration and partnerships to address questions of relevance across international boundaries.
Acknowledgments
The NIH had no role in the design and conduct of the study; in the collection, analysis, and interpretation of the data; or in the preparation, review, or approval of the manuscript. The authors alone are responsible for the views expressed in this article, and they do not necessarily represent the views, decisions, or policies of the institutions with which they are affiliated. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Funding
This article was supported by the National Cancer Institute of the National Institutes of Health under award numbers UG1CA189828, U10CA180820, and U10CA180794.
Monograph sponsorship
This article appears as part of the monograph “Statistical and Practical Considerations in Design and Analysis of Clinical Trials with Patient-Centered Outcomes,” sponsored by the National Cancer Institute (NCI) and the following NCI Community Oncology Research Program (NCORP) Research bases: NRG Oncology (UG1CA189867), Alliance (UG1CA189823), SWOG (UG1CA189974), ECOG-ACRIN (UG1CA189828), Wake Forest University (UG1CA189824), University of Rochester (UG1CA189961), and Children’s Oncology Group (5UG1CA189955-11).
Conflicts of interest
None declared.
Data availability
As TMIST and CHANCES are still ongoing, data from TMIST and CHANCES are not publicly available.
References
Author notes
P. Hu and J.A. Steingrimsson contributed equally to this work.