Intraclass Correlation Estimates for Cancer Screening Outcomes: Estimates and Applications in the Design of Group-Randomized Cancer Screening Studies

Background Screening has become one of our best tools for early detection and prevention of cancer. The group-randomized trial is the most rigorous experimental design for evaluating multilevel interventions. However, identifying the proper sample size for a group-randomized trial requires reliable estimates of intraclass correlation (ICC) for screening outcomes, which are not available to researchers. We present crude and adjusted ICC estimates for cancer screening outcomes for various levels of aggregation (physician, clinic, and county) and provide an example of how these ICC estimates may be used in the design of a future trial. Investigators working in the area of cancer screening were contacted and asked to provide crude and adjusted ICC estimates using the analysis of variance method estimator. Of the 29 investigators identified, estimates were obtained from 10 investigators who had relevant data. ICC estimates were calculated from 13 different studies, with more than half of the studies collecting information on colorectal screening. In the majority of cases, ICC estimates could be adjusted for age, education, and other demographic characteristics, leading to a reduction in the ICC. ICC estimates varied considerably by cancer site and level of aggregation of the groups. 130 crude and adjusted ICC estimates covering breast, cervical, colon, and prostate screening and have detailed them by level of aggregation, screening measure, and study characteristics. We have also demonstrated their use in planning a future trial and the need for the evaluation of the proposed interval estimator for binary outcomes under conditions typically seen in GRTs.

Screening has become one of our best tools for early detection and prevention of breast, cervical, and colorectal cancers. Despite periodic modifications of specific recommendations, these screening tests continue to include the following: mammography for breast cancer; prostate-specific antigen for prostate cancer; the Papanicolaou test and testing for high-risk types of human papillomavirus for cervical cancer, and fecal occult blood test, flexible sigmoidoscopy and colonoscopy for colorectal cancer screening (1,2). In spite of their efficacy, uptake of these screening tests is not optimal, and further outreach and dissemination efforts are needed to inform the community about screening test availability and recommended intervals, to reduce health service access barriers to obtaining screening, and to encourage positive decisions to seek screening (2). Specifically, these issues are particularly apparent in rural communities, such as Appalachia (3)(4)(5)(6).
Public health interventions to increase screening include efforts focusing on individuals, the health-care providers, the health-care delivery systems, other organizational groups in the community (churches and work sites), or an entire community (2,4,6,7). When an intervention operates at a group level, when it cannot be delivered to individuals, or when it manipulates the social or physical environment, a cluster or group-randomized trial (GRT) may be employed to evaluate the intervention effects. GRTs are a natural extension of the usual randomized clinical trial; in GRTs, distinct groups rather than individuals are randomly assigned to the intervention or control condition (8,9).
Because the primary goal of a GRT is to compare the treatment conditions which are assigned to groups, not to individuals, the design and analysis of the trial must account for individuals being a member of a group. Group membership is expressed as the correlation among individuals in the same group. Individuals who see the same physician, who go to the same clinic, who work in the same place, or who live in the same community are expected to share some common characteristics creating a positive intraclass correlation Intraclass Correlation Estimates for Cancer Screening Outcomes: Estimates and Applications in the Design of Group-Randomized Cancer Screening Studies Background Screening has become one of our best tools for early detection and prevention of cancer. The group-randomized trial is the most rigorous experimental design for evaluating multilevel interventions. However, identifying the proper sample size for a group-randomized trial requires reliable estimates of intraclass correlation (ICC) for screening outcomes, which are not available to researchers. We present crude and adjusted ICC estimates for cancer screening outcomes for various levels of aggregation (physician, clinic, and county) and provide an example of how these ICC estimates may be used in the design of a future trial.

Methods
Investigators working in the area of cancer screening were contacted and asked to provide crude and adjusted ICC estimates using the analysis of variance method estimator.

Results
Of the 29 investigators identified, estimates were obtained from 10 investigators who had relevant data. ICC estimates were calculated from 13 different studies, with more than half of the studies collecting information on colorectal screening. In the majority of cases, ICC estimates could be adjusted for age, education, and other demographic characteristics, leading to a reduction in the ICC. ICC estimates varied considerably by cancer site and level of aggregation of the groups.
(ICC). A positive ICC affects the estimated variance of the intervention effect by a factor of (1 ( 1) ) m , where m is the average number of individuals per group and r is the ICC between members of the group (10). For large m, the inflation factor may substantially increase the variance, even when r is small, as it often is in GRTs.
Identifying the proper sample size for a GRT requires reliable estimates of ICC, which are often not published or easily available to researchers. An underestimated ICC will result in an underpowered study, whereas an inflated ICC will require too many groups to be randomized. Accurate sample size estimates are needed for the efficient and timely use of scarce research funding.
Gathering estimates of relevant ICCs is an important step in planning a GRT. We are aware of only two articles that have published ICCs for cancer screening outcomes (11,12). In this article, we present the results of a study to gather both crude and adjusted ICC estimates for different cancer screening outcomes for various levels of aggregation (physician, clinic, county, and region). Furthermore, we provide an example of how these ICC estimates may be used in the design of a future trial.

Data Sources
Twenty-nine investigators working in the area of cancer screening were identified based on our experience in cancer screening research and through discussions with officials at the National Cancer Institute; all were contacted via e-mail in February 2009. Each was asked if he or she had access to data on cancer screening outcomes (ever screened, yes/no; screened within guidelines, yes/ no) and would be willing to work together to calculate crude and adjusted ICC estimates. Approximately, 2 weeks after the initial e-mail, regular follow-up phone calls began to address investigators' concerns and to answer questions they had in calculating ICCs. Regular contact continued with each investigator to compile results and to ensure that all calculations were performed in a consistent fashion. All data were approved by the investigators' local institutional review board.

Cancer Screening Outcomes
For each estimated ICC, collaborating investigators provided details on the study's design, including the target cancer under study, the percentage of individuals ever screened or screened within guidelines, the type and number of groups, and the number of individuals for each group.

Analysis Methods
To calculate ICC estimates consistently, investigators were asked to estimate r via the analysis of variance or analysis of covariance method, which has been shown to perform well for continuous and binary outcomes (13). ICCs were calculated as follows: which is a weighted mean group size. The total number of subjects is given by where m i is the number of subjects in the ith group and g is the number of groups. When possible, unadjusted/ crude estimates of ICC, ICC adjusted for age and education, and ICC adjusted for other covariates were provided for each outcome and level of aggregation.

Results
Of the 29 investigators initially contacted, two referred us to their collaborators who were principal investigators of pertinent cancer screening studies; one investigator initially contacted was involved in a research project of a principal investigator already contacted by us. Of the 28 investigators of unique research projects, 10 agreed to collaborate, 11 indicated that they did not have any relevant data to share, three declined to participate because of time constraints, and four did not respond.
From the 10 participating investigators, we received 138 ICC estimates from 12 different studies. Characteristics of each data source are presented in Table 1. More than half of the studies collected information on colorectal and mammography screening, five of the 12 studies collected data on Papanicolaou test screening, whereas only two studies could provide information on prostate cancer screening (prostate-specific antigen). Outcomes were assessed via medical record abstraction/chart review for more than half of the studies, and the majority of studies enrolled participants more than 40 years old. Table 2 presents crude and adjusted ICCs and further study characteristics. We note that all ICC estimates are from baseline data, except when noted as coming from follow-up. Adjustment for basic demographics (age and education) as well as adjustment for other factors reduced the estimated ICCs in most cases. Adjustment of ICCs (models 2 and 3) most often occurred as a continuous covariate for age and a categorical covariate for education. Exceptions are noted in Table 2. Estimates of ICCs varied considerably by cancer site and by the size of the aggregated group, with larger sized groups tending to have smaller ICCs (24). Adjustment factors considered by investigators, other than age and education, included income, marital status, race, ethnicity, city, insurance status, smoking status, comorbidities, and the number of primary care visits recorded.

Application of Findings for Trial Design
Details of how to use ICC estimates in sample size calculations for GRTs have been described elsewhere (8,9). Here, we provide a relevant example for potential GRTs in cancer screening. We consider a nested cohort design to examine the effect of a new intervention program to increase colon cancer screening in a diverse urban population of men and women. We plan to implement our intervention in community health clinics and will verify up-to-date colorectal cancer screening via chart review. We expect that approximately 40% of adults in our population are already appropriately screened, and we believe that an increase in this rate by 30%, to 52% screened, would be a reasonable and scientifically meaningful increase. Moreover, we believe that we can recruit at least 25 patients on average from each clinic. The planned analysis of this trial will be via a mixed model analysis of covariance, adjusting   for baseline covariates. The sample size formula for this type of trial can be written as follows (8) Here, g c is the number of clinics per condition, m is the average number of individuals per group (clinic), and 2 y is the variance of the primary endpoint. The critical values, , / 2 df t and , df t will reflect the acceptable Type I and II error rates for this trial, and m and ˆg reflects one minus the percent of variance reduction expected through regression adjustment for member-level and group-level covariates, respectively.
For example, we may expect that regression adjustment for member-level covariates to reduce variance in our outcome by 10%, and therefore, m would be set to .9. Note that conservative estimates of m and ˆg would be 1. Sample size calculation begins with critical values , / 2 df t and , df t set for infinite df. Next, we use the calculated sample size to determine an updated estimate of df and iterate through the calculation updating the critical values appropriately. Given the proposed study's target population and outcome, we will use the ICC estimates from Ferrante et al. (15) for this example (cf Table 2). We calculate sample size as follows allowing the estimated ICC to be .05, ˆ. θ m = 90, ˆ. θ g = 80, and a Type I error of 5% and 80% power: In the above calculation, we began by using the critical values with infinite df. We can recalculate sample size, assuming df equal to 2(21 2 1) = 40. . .
Therefore, we can suggest that with 22 clinics per condition and 25 patients per clinic, we will have 80% power to detect a 12% absolute increase in screening from a baseline of 40% given the above assumptions. To gauge the sensitivity of the calculated sample size to the study assumptions, we vary both the number of patients to be recruited per clinic and the estimated ICC. Because we expect to be able to recruit at least 25 patients per clinic, a reasonable upper value may be 75 patients per clinic. To obtain a range of ICC values, we calculate the one-sided upper 80% confidence interval for the ICC based on the method described by Searle (25) and by Snedecor and Cochran (26). This method was developed for continuous outcomes, and it is unknown if the nominal coverage level is maintained for binary outcomes (27,28). Even so, we use this method here only to provide an approximate range of values for sample size calculation. Further investigation of the properties of this confidence interval method for binary outcomes is needed under conditions typically seen in GRTs. Kieser and Wassmer (29) discuss the use of confidence limits for estimates used in sample size calculation to take into account uncertainty of sample estimates.
They confirm that using the upper one-sided 80% confidence limit should guarantee that the planned power be 1 2 b, with probability of at least 1 2 a. Table 2 provides ICC estimates and their associated number of groups and average number of members per group needed to calculate confidence limits as outlined above. Using the ICC estimate, the associated number of groups, and members from Ferrante et al. (15), we calculate the upper 80% one-sided confidence limit for the ICC to be approximately .08.
Varying these values, Table 3 outlines the required study sample size per condition. In the range specified, increasing the number of individuals enrolled per clinic reduces the number of groups required, although the decrease appears to be less after increasing to 50 patients per clinic (8). In contrast, any increase in ICC contributes substantially to the number of groups per condition needed to detect our hypothesized treatment effect, with 80% power and 5% two-sided probability of Type I error.
We note that others have suggested varying approaches to account for uncertainty in estimation of the ICC (30-32). Turner et al. use a Bayesian approach that can be extended to combining multiple prior estimates of the ICC. Blitstein et al. (32) developed a method to combine ICC estimates based on techniques common in meta-analysis. Both methods attempt to provide a means to incorporate interstudy heterogeneity and provide investigators the ability to use all data available. Moreover, both authors provide guidance that we find useful for the selection of external ICC estimates (30,32). Available ICCs should be collected from studies that are as similar as possible to the study to be designed. Specifically, it is preferred that the ICC estimates come from studies with a similar endpoint, which use a comparable method of measurement, and are calculated from measurements taken on the same general target population. Furthermore, it is preferable if the design and analysis of the trial from which the ICCs are derived are similar to those of the study being planned (32). Turner et al. (30) relax some of these criteria to incorporate other relevant data sources but allow these to have less influence when combining ICC estimates.

Conclusions
Previously, we had found only two articles with published ICCs for cancer screening outcomes; one of which discussed cervical screening, whereas the other investigated breast cancer screening (11,12). Their reported ICCs fall in line with those presented here (.02-.07) for breast and cervical cancer screening.
Our work makes at least three relevant contributions to the literature. First, we have compiled and described crude and adjusted ICC estimates from 13 studies covering breast, cervical, colon, and prostate screening estimates. Estimates are detailed by level of aggregation, screening measure, and study characteristics. Second, all ICC estimates in Table 2 were calculated in the same manner for consistency. Finally, we have provided an illustration of how these estimates can be used to plan future trials.
There is considerable variation in the ICC estimates both between and within screening types. This is a function of the screening outcome measure, level of aggregation, and overall study design. We note that adjustment for basic demographic characteristics beyond age and education, which are likely available in almost any study, generally aids in reducing the ICC estimate. In fact, in several instances, the point estimate for an ICC fell below zero. In practice, we would recommend using a small positive value for sample size calculation instead of a negative value or zero. As we have done in the above example, investigators can consider calculating the one-sided upper 80% confidence interval for the ICC estimate, which would likely correspond to a small positive number.
We also note that adjustment for covariates can increase the ICC estimate, as it did in a few cases in Table 1. Group-level ICCs can increase as a result of covariate adjustment (8). This can occur when the uneven distribution of a covariate across groups masks what is otherwise a higher level of within-group correlation. When we adjust for the covariate, we remove the mask and the ICC estimate increases.
Although the studies presented should provide a starting point for investigators in planning future studies, it is likely that they will have to do some of their own pilot work to determine the most accurate ICCs for their studies. These pilot ICCs can be combined with published estimates using either of the methods mentioned above to determine a more robust estimate of ICC for sample size determination.