Abstract

Although randomized controlled trials are regarded as the gold standard for comparison of treatments, evidence from observational studies is still relevant. To cope with the problem of possible confounding in these studies, investigators need methods for analyzing their results which adjust for confounders and lead to unbiased estimation of the treatment effect. In this paper, the authors describe the main principles of three statistical methods for doing this. The first method is the classical approach of a multiple regression model including the effects of treatment and covariates. This considers the relation between prognostic factors and the outcome variable as a relevant criterion for adjustment. The second method is based on the propensity score, focusing on the relation between prognostic factors and treatment assignment. The third method is an ecologic approach using a grouped treatment variable, which may aid in avoiding confounding by indication. These approaches are applied to a partially randomized trial conducted in 720 German breast cancer patients between 1984 and 1997. The study had a comprehensive cohort study design that included recruitment of patients who had consented to participation but not to randomization because of a preference for one of the treatments. This design offers a unique opportunity to contrast results from the nonrandomized portion of a study with those for a randomized subcohort as a reference.

Randomized controlled trials are considered the gold standard for comparison of clinical treatments. Consequently, treatment recommendations depend primarily on the results of randomized controlled trials. Nevertheless, evidence from nonrandomized observational studies is relevant. Randomization is sometimes not acceptable to patients (e.g., when treatments differ qualitatively, as in surgical therapy vs. medical therapy). For some questions, nonrandomized observational studies may be the sole source of available evidence.

In the analysis of results from nonrandomized observational studies, a simple overall comparison of the treatment arms may lead to a biased estimate of the treatment effect due to confounding factors (covariates). Various statistical methods have been proposed for analyzing data from nonrandomized observational studies such that estimated treatment effects may be interpreted as causal effects. In this paper, we consider three different approaches. The classical approach of fitting a multiple regression model including the effects of treatment and of covariates considers the relation between prognostic factors and outcome as a relevant criterion for adjustment. The second approach is based on the propensity score, focusing on the relation between prognostic factors and treatment assignment (1). The third approach estimates the effect of a grouped-treatment (GT) variable related to the assigned treatment, for which it can be assumed that no confounding by indication exists (2–4).

We illustrate these approaches using an example of a study with a so-called comprehensive cohort study design (5, 6), where all patients fulfilling the clinical eligibility criteria and giving consent to participation are recruited. Patients are randomized between study treatments, if they consent to randomization. If not, they receive their preferred study treatment according to the protocol. This results in a prospective cohort study that includes as a subcohort the participants in the classical randomized clinical trial. This design offers an ideal situation for investigating the properties of the different approaches for analysis of nonrandomized observational studies (7).

MATERIALS AND METHODS

Breast cancer study with a comprehensive cohort study design

In 1984, the German Breast Cancer Study Group started a comprehensive cohort study of breast cancer at 44 hospitals in Germany. The study was initiated to compare, using a 2 × 2 design, three cycles of chemotherapy with six cycles of chemotherapy and to investigate the additional effect of tamoxifen as an adjuvant treatment among patients with primary histologically proven nonmetastatic node-positive breast cancer who had been treated with mastectomy. Chemotherapy was administered according to the so-called CMF (cyclophosphamide-methotrexate-flourouracil) regimen, consisting of 500 mg/m2 cyclophosphamide, 40 mg/m2 methotrexate, and 600 mg/m2 flourouracil given intravenously on days 1 and 8 of a 4-week treatment period. Endocrine therapy consisted of a daily dose of 30 mg of tamoxifen over 2 years. The study was performed after approval by an ethical committee. Informed consent was obtained from each patient. Further details on the study design have been provided elsewhere (8, 9). In 1986, the protocol was changed; after that time, premenopausal patients were not allowed to receive tamoxifen and were randomized only with regard to three cycles of CMF (3×CMF) versus six cycles of CMF (6×CMF). Therefore, the patients in the randomized portion of the study were all randomized with regard to 3×CMF versus 6×CMF, but only a part of them were randomized with regard to tamoxifen. Here, we consider only the effect of 3×CMF versus 6×CMF. The tamoxifen treatment is considered as a covariate and is dealt with in the same way as the prognostic factors.

The covariates given in table 1 were considered as possible confounders using the listed categories. All analyses were restricted to the 450 of 473 randomized patients and 238 of 247 nonrandomized patients with complete information on the covariates. Table 2 shows the randomization rates at the 44 clinical centers.

TABLE 1.

Distribution of randomized and nonrandomized patients according to prognostic factors and treatment with tamoxifen in a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997

Factor Proportion 
Randomized patients (n = 450) Nonrandomized patients (n = 238) 
Menopausal status   
    Premenopausal 0.42 0.43 
    Postmenopausal 0.58 0.57 
No. of positive lymph nodes   
    1–3 0.56 0.53 
    4–9 0.30 0.29 
    >9 0.14 0.18 
Tumor size (mm)   
    ≤20 0.27 0.24 
    21–30 0.42 0.42 
    >30 0.31 0.34 
Tumor grade   
    I 0.12 0.12 
    II 0.66 0.62 
    III 0.22 0.25 
Estrogen receptor status   
    Positive 0.60 0.65 
    Negative 0.40 0.35 
Progesterone receptor status   
    Positive 0.59 0.64 
    Negative 0.41 0.36 
Treatment with tamoxifen   
    No 0.60 0.71 
    Yes 0.40 0.29 
Factor Proportion 
Randomized patients (n = 450) Nonrandomized patients (n = 238) 
Menopausal status   
    Premenopausal 0.42 0.43 
    Postmenopausal 0.58 0.57 
No. of positive lymph nodes   
    1–3 0.56 0.53 
    4–9 0.30 0.29 
    >9 0.14 0.18 
Tumor size (mm)   
    ≤20 0.27 0.24 
    21–30 0.42 0.42 
    >30 0.31 0.34 
Tumor grade   
    I 0.12 0.12 
    II 0.66 0.62 
    III 0.22 0.25 
Estrogen receptor status   
    Positive 0.60 0.65 
    Negative 0.40 0.35 
Progesterone receptor status   
    Positive 0.59 0.64 
    Negative 0.41 0.36 
Treatment with tamoxifen   
    No 0.60 0.71 
    Yes 0.40 0.29 
TABLE 1.

Distribution of randomized and nonrandomized patients according to prognostic factors and treatment with tamoxifen in a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997

Factor Proportion 
Randomized patients (n = 450) Nonrandomized patients (n = 238) 
Menopausal status   
    Premenopausal 0.42 0.43 
    Postmenopausal 0.58 0.57 
No. of positive lymph nodes   
    1–3 0.56 0.53 
    4–9 0.30 0.29 
    >9 0.14 0.18 
Tumor size (mm)   
    ≤20 0.27 0.24 
    21–30 0.42 0.42 
    >30 0.31 0.34 
Tumor grade   
    I 0.12 0.12 
    II 0.66 0.62 
    III 0.22 0.25 
Estrogen receptor status   
    Positive 0.60 0.65 
    Negative 0.40 0.35 
Progesterone receptor status   
    Positive 0.59 0.64 
    Negative 0.41 0.36 
Treatment with tamoxifen   
    No 0.60 0.71 
    Yes 0.40 0.29 
Factor Proportion 
Randomized patients (n = 450) Nonrandomized patients (n = 238) 
Menopausal status   
    Premenopausal 0.42 0.43 
    Postmenopausal 0.58 0.57 
No. of positive lymph nodes   
    1–3 0.56 0.53 
    4–9 0.30 0.29 
    >9 0.14 0.18 
Tumor size (mm)   
    ≤20 0.27 0.24 
    21–30 0.42 0.42 
    >30 0.31 0.34 
Tumor grade   
    I 0.12 0.12 
    II 0.66 0.62 
    III 0.22 0.25 
Estrogen receptor status   
    Positive 0.60 0.65 
    Negative 0.40 0.35 
Progesterone receptor status   
    Positive 0.59 0.64 
    Negative 0.41 0.36 
Treatment with tamoxifen   
    No 0.60 0.71 
    Yes 0.40 0.29 
TABLE 2.

Rates of randomization versus rates of treatment with three cycles of CMF,* by clinical center, among nonrandomized patients in a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997

Clinical center No. of patients Proportion of randomized patients No. of nonrandomized patients Proportion of nonrandomized patients treated with three cycles of CMF (grouped-treatment variable) 
12 0.50 1.00 
0.33 1.00 
0.80 1.00 
30 0.97 1.00 
34 0.38 21 0.90 
34 0.65 12 0.83 
22 0.82 0.75 
27 0.33 18 0.72 
20 0.50 10 0.70 
10 0.57 0.67 
11 11 0.55 0.60 
12 18 0.17 15 0.53 
13 26 0.77 0.50 
14 12 0.83 0.50 
15 0.67 0.50 
16 46 0.89 0.40 
17 22 0.27 16 0.38 
18 24 0.42 14 0.36 
19 0.50 0.33 
20 28 0.89 0.33 
21 0.57 0.33 
22 20 0.85 0.33 
23 10 0.70 0.33 
24 11 0.00 11 0.27 
25 46 0.83 0.25 
26 0.00 0.25 
27 19 0.37 12 0.17 
28 45 0.69 14 0.14 
29 14 0.07 13 0.00 
30 12 0.25 0.00 
31 0.00 0.00 
32 0.25 0.00 
33 19 0.95 0.00 
34 45 1.00  
35 1.00 — 
36 1.00 — 
37 1.00 — 
38 1.00 — 
39 1.00 — 
40 1.00 — 
41 1.00 — 
42 1.00 — 
43 1.00 — 
44 1.00 — 
Total 688 0.65 238 0.46 
Clinical center No. of patients Proportion of randomized patients No. of nonrandomized patients Proportion of nonrandomized patients treated with three cycles of CMF (grouped-treatment variable) 
12 0.50 1.00 
0.33 1.00 
0.80 1.00 
30 0.97 1.00 
34 0.38 21 0.90 
34 0.65 12 0.83 
22 0.82 0.75 
27 0.33 18 0.72 
20 0.50 10 0.70 
10 0.57 0.67 
11 11 0.55 0.60 
12 18 0.17 15 0.53 
13 26 0.77 0.50 
14 12 0.83 0.50 
15 0.67 0.50 
16 46 0.89 0.40 
17 22 0.27 16 0.38 
18 24 0.42 14 0.36 
19 0.50 0.33 
20 28 0.89 0.33 
21 0.57 0.33 
22 20 0.85 0.33 
23 10 0.70 0.33 
24 11 0.00 11 0.27 
25 46 0.83 0.25 
26 0.00 0.25 
27 19 0.37 12 0.17 
28 45 0.69 14 0.14 
29 14 0.07 13 0.00 
30 12 0.25 0.00 
31 0.00 0.00 
32 0.25 0.00 
33 19 0.95 0.00 
34 45 1.00  
35 1.00 — 
36 1.00 — 
37 1.00 — 
38 1.00 — 
39 1.00 — 
40 1.00 — 
41 1.00 — 
42 1.00 — 
43 1.00 — 
44 1.00 — 
Total 688 0.65 238 0.46 
*

CMF, cyclophosphamide-methotrexate-flourouracil.

Not applicable.

TABLE 2.

Rates of randomization versus rates of treatment with three cycles of CMF,* by clinical center, among nonrandomized patients in a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997

Clinical center No. of patients Proportion of randomized patients No. of nonrandomized patients Proportion of nonrandomized patients treated with three cycles of CMF (grouped-treatment variable) 
12 0.50 1.00 
0.33 1.00 
0.80 1.00 
30 0.97 1.00 
34 0.38 21 0.90 
34 0.65 12 0.83 
22 0.82 0.75 
27 0.33 18 0.72 
20 0.50 10 0.70 
10 0.57 0.67 
11 11 0.55 0.60 
12 18 0.17 15 0.53 
13 26 0.77 0.50 
14 12 0.83 0.50 
15 0.67 0.50 
16 46 0.89 0.40 
17 22 0.27 16 0.38 
18 24 0.42 14 0.36 
19 0.50 0.33 
20 28 0.89 0.33 
21 0.57 0.33 
22 20 0.85 0.33 
23 10 0.70 0.33 
24 11 0.00 11 0.27 
25 46 0.83 0.25 
26 0.00 0.25 
27 19 0.37 12 0.17 
28 45 0.69 14 0.14 
29 14 0.07 13 0.00 
30 12 0.25 0.00 
31 0.00 0.00 
32 0.25 0.00 
33 19 0.95 0.00 
34 45 1.00  
35 1.00 — 
36 1.00 — 
37 1.00 — 
38 1.00 — 
39 1.00 — 
40 1.00 — 
41 1.00 — 
42 1.00 — 
43 1.00 — 
44 1.00 — 
Total 688 0.65 238 0.46 
Clinical center No. of patients Proportion of randomized patients No. of nonrandomized patients Proportion of nonrandomized patients treated with three cycles of CMF (grouped-treatment variable) 
12 0.50 1.00 
0.33 1.00 
0.80 1.00 
30 0.97 1.00 
34 0.38 21 0.90 
34 0.65 12 0.83 
22 0.82 0.75 
27 0.33 18 0.72 
20 0.50 10 0.70 
10 0.57 0.67 
11 11 0.55 0.60 
12 18 0.17 15 0.53 
13 26 0.77 0.50 
14 12 0.83 0.50 
15 0.67 0.50 
16 46 0.89 0.40 
17 22 0.27 16 0.38 
18 24 0.42 14 0.36 
19 0.50 0.33 
20 28 0.89 0.33 
21 0.57 0.33 
22 20 0.85 0.33 
23 10 0.70 0.33 
24 11 0.00 11 0.27 
25 46 0.83 0.25 
26 0.00 0.25 
27 19 0.37 12 0.17 
28 45 0.69 14 0.14 
29 14 0.07 13 0.00 
30 12 0.25 0.00 
31 0.00 0.00 
32 0.25 0.00 
33 19 0.95 0.00 
34 45 1.00  
35 1.00 — 
36 1.00 — 
37 1.00 — 
38 1.00 — 
39 1.00 — 
40 1.00 — 
41 1.00 — 
42 1.00 — 
43 1.00 — 
44 1.00 — 
Total 688 0.65 238 0.46 
*

CMF, cyclophosphamide-methotrexate-flourouracil.

Not applicable.

The primary endpoint for analysis was event-free survival time. Patients were followed at regular intervals until the middle of 1997, leading to a median follow-up time of approximately 8.5 years (10). Event-free survival time was defined from mastectomy to the first event of failure (locoregional recurrence, distant metastasis, a second cancer contralateral or at a distant site, or death without previous failure); 400 events were observed.

Unadjusted analysis of treatment effect

For unadjusted analysis, the hazard ratio for the comparison between treatment groups and its 95 percent confidence interval were calculated, and a two-sided Wald test of the hypothesis of no treatment effect was performed in a Cox regression model (11) including only treatment as a covariate. Cumulative event-free survival rates for the treatment arms were estimated by means of the Kaplan-Meier method (12). This analysis was performed separately for randomized and nonrandomized patients. For the latter, this analysis is obviously inadequate, but it is included for illustration.

Analysis strategies adjusting for confounders

Multiple Cox regression analysis adjusting for covariates.

An adjusted analysis was performed within a Cox regression model including the respective covariates additional to treatment. The hazard ratio for the comparison between treatment groups and its 95 percent confidence interval were calculated, and a two-sided Wald test of the hypothesis of no treatment effect was performed. Cumulative event-free survival rates for the treatment arms adjusted for covariates were estimated through an adjusted Cox model (13). This analysis was performed in both nonrandomized and randomized patients.

Propensity-score-based analysis.

In the nonrandomized portion of the study, a propensity-score-based analysis was performed. The propensity score is defined as the conditional probability of receiving one of the treatments under comparison, here 3×CMF, given the observed covariates (1). Then a stratified analysis of the treatment effect is performed using strata that are homogenous with respect to the propensity score. Since the true propensity score has the property that treatment assignment and covariates are conditionally independent given the propensity score (1), within homogenous strata, covariates should be balanced between treatment groups. Thus, a stratified analysis theoretically leads to unbiased estimation of the treatment effect, assuming that all confounders are accounted for.

First, the propensity score is estimated by means of a logistic regression model for treatment assignment, 3×CMF versus 6×CMF, dependent on the covariates. We included for estimation of the propensity score only those covariates that showed an effect on treatment assignment with a p value smaller than 0.157, corresponding to the Akaike criterion (14). Second, patients are divided into strata based on the estimated propensity score, and we ascertain whether covariates are balanced between the treatment groups. Then the treatment effect is estimated using a stratified Cox regression model.

The GT approach.

In the nonrandomized portion of the study, an analysis using a GT approach was performed. In the GT approach, the treatment individually assigned is considered to be confounded by indication, which means that patients may be selected to receive one of the treatments because of known or unknown prognostic factors (4). Whereas the first two approaches try to adjust only for known confounders, the GT approach also tries to eliminate bias arising from unknown confounders. This approach requires several assumptions (2) which are sometimes called instrumental-variable assumptions (3):The observation that hospital practice of treating patients with 3×CMF or 6×CMF varies considerably among the 33 clinical centers entering nonrandomized patients into the study (see table 2) leads us to use as a GT variable (3) the proportion of patients treated with 3×CMF at the respective center. This variable was estimated from the data on the nonrandomized patients. A causal diagram satisfying assumptions 1−3 with regard to the GT variable is given in figure 1. Assumptions 1–3 imply that the hospital where a patient is treated and thus the probability of receiving 3×CMF (i.e., the GT variable) can be considered a pseudorandomized treatment choice. The key assumptions for valid inferences using the GT approach are similar to those in randomized controlled trials (15). In randomized controlled trials, the random assignment has no direct effect on the outcome (i.e., assumption 3), except via the close relation between the randomized treatment assignment and the actual treatment received (i.e., assumption 1). As the randomized treatment assignment, the GT variable must be unrelated to observed and unobserved prognostic factors (i.e., assumption 2), which means that prognostic factors do not influence the patient's choice of a certain hospital.

  1. The GT variable must be related to the treatment individually assigned, in order to have reasonable strength.

  2. The GT variable must be unrelated to observed and unobserved prognostic factors.

  3. The GT variable must be unrelated to outcome, except through pathways that operate via the treatment individually assigned.

FIGURE 1.

Causal diagram satisfying the assumptions regarding the grouped treatment (GT) variable used in the nonrandomized portion (n = 238) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997. 3×CMF, three cycles of cyclophosphamide-methotrexate-flourouracil.

FIGURE 1.

Causal diagram satisfying the assumptions regarding the grouped treatment (GT) variable used in the nonrandomized portion (n = 238) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997. 3×CMF, three cycles of cyclophosphamide-methotrexate-flourouracil.

The effect of treatment on event-free survival will be estimated by the regression coefficient of the GT variable in a Cox regression model for event-free survival including only the GT variable as a covariate.

Data storage and analysis was performed using the Statistical Analysis System (16).

RESULTS

Analysis of the randomized portion of the study

The results of the comparison between 3×CMF and 6×CMF in the 450 randomized patients are shown in figure 2 and table 3. The unadjusted comparison showed no difference between 3×CMF and 6×CMF. Since the major prognostic factors were well balanced between the randomized treatment arms (8, 9), the result remained unchanged when prognostic factors were adjusted for in the analysis.

TABLE 3.

Effect of treatment with three cycles of CMF* versus treatment with six cycles of CMF among randomized and nonrandomized breast cancer patients in unadjusted and adjusted analyses, using different methods of adjustment, German Breast Cancer Study Group, Germany, 1984–1997

Method of analysis Hazard ratio Standard error 95% confidence interval p value 
Randomized patients (n = 450; 262 events)     
    Unadjusted 1.077 0.124 0.845, 1.372 0.55 
    Conventional adjustment for covariates§ 1.054 0.125 0.825, 1.345 0.67 
Nonrandomized patients (n = 238; 138 events)     
    Unadjusted 0.693 0.173 0.494, 0.973 0.034 
    Conventional adjustment for covariates§ 1.002 0.195 0.683, 1.470 0.99 
    Stratified for propensity score 0.987 0.192 0.677, 1.438 0.95 
    Grouped-treatment variable# 0.758 0.280 0.438, 1.311 0.32 
Method of analysis Hazard ratio Standard error 95% confidence interval p value 
Randomized patients (n = 450; 262 events)     
    Unadjusted 1.077 0.124 0.845, 1.372 0.55 
    Conventional adjustment for covariates§ 1.054 0.125 0.825, 1.345 0.67 
Nonrandomized patients (n = 238; 138 events)     
    Unadjusted 0.693 0.173 0.494, 0.973 0.034 
    Conventional adjustment for covariates§ 1.002 0.195 0.683, 1.470 0.99 
    Stratified for propensity score 0.987 0.192 0.677, 1.438 0.95 
    Grouped-treatment variable# 0.758 0.280 0.438, 1.311 0.32 
*

CMF, cyclophosphamide-methotrexate-flourouracil.

p value from a two-sided Wald test in a Cox regression model.

Cox regression model including a treatment indicator.

§

Cox regression model including a treatment indicator and the covariates listed in table 1.

Cox regression model including a treatment indicator, stratified for six values of the estimated propensity score.

#

Cox regression model including the grouped-treatment variable listed in the last column of table 2.

TABLE 3.

Effect of treatment with three cycles of CMF* versus treatment with six cycles of CMF among randomized and nonrandomized breast cancer patients in unadjusted and adjusted analyses, using different methods of adjustment, German Breast Cancer Study Group, Germany, 1984–1997

Method of analysis Hazard ratio Standard error 95% confidence interval p value 
Randomized patients (n = 450; 262 events)     
    Unadjusted 1.077 0.124 0.845, 1.372 0.55 
    Conventional adjustment for covariates§ 1.054 0.125 0.825, 1.345 0.67 
Nonrandomized patients (n = 238; 138 events)     
    Unadjusted 0.693 0.173 0.494, 0.973 0.034 
    Conventional adjustment for covariates§ 1.002 0.195 0.683, 1.470 0.99 
    Stratified for propensity score 0.987 0.192 0.677, 1.438 0.95 
    Grouped-treatment variable# 0.758 0.280 0.438, 1.311 0.32 
Method of analysis Hazard ratio Standard error 95% confidence interval p value 
Randomized patients (n = 450; 262 events)     
    Unadjusted 1.077 0.124 0.845, 1.372 0.55 
    Conventional adjustment for covariates§ 1.054 0.125 0.825, 1.345 0.67 
Nonrandomized patients (n = 238; 138 events)     
    Unadjusted 0.693 0.173 0.494, 0.973 0.034 
    Conventional adjustment for covariates§ 1.002 0.195 0.683, 1.470 0.99 
    Stratified for propensity score 0.987 0.192 0.677, 1.438 0.95 
    Grouped-treatment variable# 0.758 0.280 0.438, 1.311 0.32 
*

CMF, cyclophosphamide-methotrexate-flourouracil.

p value from a two-sided Wald test in a Cox regression model.

Cox regression model including a treatment indicator.

§

Cox regression model including a treatment indicator and the covariates listed in table 1.

Cox regression model including a treatment indicator, stratified for six values of the estimated propensity score.

#

Cox regression model including the grouped-treatment variable listed in the last column of table 2.

FIGURE 2.

Event-free survival rates by duration of chemotherapy in the randomized portion (n = 450) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997. a) Unadjusted Kaplan-Meier estimates; b) adjusted estimates from a Cox model, adjusted for the covariates listed in table 1. 3×CMF, three cycles of cyclophosphamide-methotrexate-flourouracil; 6×CMF, six cycles of cyclophosphamide-methotrexate-flourouracil.

FIGURE 2.

Event-free survival rates by duration of chemotherapy in the randomized portion (n = 450) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997. a) Unadjusted Kaplan-Meier estimates; b) adjusted estimates from a Cox model, adjusted for the covariates listed in table 1. 3×CMF, three cycles of cyclophosphamide-methotrexate-flourouracil; 6×CMF, six cycles of cyclophosphamide-methotrexate-flourouracil.

Comparison of randomized and nonrandomized patients

Analysis of factors that influenced patients' consent to randomization revealed that the major factor was the hospital where the patient was informed about the study. Consent to randomization seemed mainly to depend not on the patient but on the doctor's effort to seek consent (table 2). Prognostic factors were rather balanced when randomized and nonrandomized patients were compared (table 1), and event-free survival rates of randomized and nonrandomized patients were nearly identical (figure 3).

FIGURE 3.

Event-free survival rates by randomization status in a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997.

FIGURE 3.

Event-free survival rates by randomization status in a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997.

Analysis of the nonrandomized portion of the study

Unadjusted analysis.

In an unadjusted comparison of the treatments, 3×CMF showed a higher event-free survival rate than did 6×CMF (figure 4, part a). The estimated hazard ratio was 0.693 (95 percent confidence interval (CI): 0.494, 0.973; p = 0.034) (table 3).

FIGURE 4.

Event-free survival rates by duration of chemotherapy in the nonrandomized portion (n = 238) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997. a) Unadjusted Kaplan-Meier estimates; b) adjusted estimates from a Cox model, adjusted for the covariates listed in table 1. In part b, the dotted line is not visible because the dotted line and the solid line are superimposed upon each other. 3×CMF, three cycles of cyclophosphamide-methotrexate-flourouracil; 6×CMF, six cycles of cyclophosphamide-methotrexate-flourouracil.

FIGURE 4.

Event-free survival rates by duration of chemotherapy in the nonrandomized portion (n = 238) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997. a) Unadjusted Kaplan-Meier estimates; b) adjusted estimates from a Cox model, adjusted for the covariates listed in table 1. In part b, the dotted line is not visible because the dotted line and the solid line are superimposed upon each other. 3×CMF, three cycles of cyclophosphamide-methotrexate-flourouracil; 6×CMF, six cycles of cyclophosphamide-methotrexate-flourouracil.

In patients with good prognostic factors such as fewer than four positive axillary lymph nodes or a tumor smaller than 3 cm, the rate of treatment with 3×CMF was higher than that in patients with a poor prognosis (table 4). Additionally, the choice of treatment with tamoxifen was related to length of chemotherapy, with a higher 3×CMF treatment rate in patients receiving tamoxifen. This emphasizes the necessity of adjusting the comparison of the treatment arms for covariates.

TABLE 4.

Relation of covariates to individual treatment assignment and to proportion of patients treated with 3×CMF* at the respective clinical center (grouped-treatment variable) in the nonrandomized portion (n = 238) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997

Factor Proportion of patients treated with 3×CMF Mean proportion of patients treated with 3×CMF at the respective clinical center (mean grouped-treatment variable) 
Menopausal status   
    Premenopausal 0.44 0.43 
    Postmenopausal 0.48 0.49 
No. of positive lymph nodes   
    1–3 0.60 0.53 
    4–9 0.33 0.38 
    >9 0.26 0.39 
Tumor size (mm)   
    ≤20 0.53 0.41 
    21–30 0.50 0.49 
    >30 0.36 0.46 
Tumor grade   
    I 0.46 0.45 
    II 0.50 0.49 
    III 0.38 0.39 
Estrogen receptor status   
    Positive 0.50 0.49 
    Negative 0.39 0.41 
Progesterone receptor status   
    Positive 0.47 0.46 
    Negative 0.45 0.46 
Treatment with tamoxifen   
    No 0.40 0.45 
    Yes 0.61 0.49 
Factor Proportion of patients treated with 3×CMF Mean proportion of patients treated with 3×CMF at the respective clinical center (mean grouped-treatment variable) 
Menopausal status   
    Premenopausal 0.44 0.43 
    Postmenopausal 0.48 0.49 
No. of positive lymph nodes   
    1–3 0.60 0.53 
    4–9 0.33 0.38 
    >9 0.26 0.39 
Tumor size (mm)   
    ≤20 0.53 0.41 
    21–30 0.50 0.49 
    >30 0.36 0.46 
Tumor grade   
    I 0.46 0.45 
    II 0.50 0.49 
    III 0.38 0.39 
Estrogen receptor status   
    Positive 0.50 0.49 
    Negative 0.39 0.41 
Progesterone receptor status   
    Positive 0.47 0.46 
    Negative 0.45 0.46 
Treatment with tamoxifen   
    No 0.40 0.45 
    Yes 0.61 0.49 
*

3×CMF, three cycles of cyclophosphamide-methotrexate-flourouracil.

TABLE 4.

Relation of covariates to individual treatment assignment and to proportion of patients treated with 3×CMF* at the respective clinical center (grouped-treatment variable) in the nonrandomized portion (n = 238) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997

Factor Proportion of patients treated with 3×CMF Mean proportion of patients treated with 3×CMF at the respective clinical center (mean grouped-treatment variable) 
Menopausal status   
    Premenopausal 0.44 0.43 
    Postmenopausal 0.48 0.49 
No. of positive lymph nodes   
    1–3 0.60 0.53 
    4–9 0.33 0.38 
    >9 0.26 0.39 
Tumor size (mm)   
    ≤20 0.53 0.41 
    21–30 0.50 0.49 
    >30 0.36 0.46 
Tumor grade   
    I 0.46 0.45 
    II 0.50 0.49 
    III 0.38 0.39 
Estrogen receptor status   
    Positive 0.50 0.49 
    Negative 0.39 0.41 
Progesterone receptor status   
    Positive 0.47 0.46 
    Negative 0.45 0.46 
Treatment with tamoxifen   
    No 0.40 0.45 
    Yes 0.61 0.49 
Factor Proportion of patients treated with 3×CMF Mean proportion of patients treated with 3×CMF at the respective clinical center (mean grouped-treatment variable) 
Menopausal status   
    Premenopausal 0.44 0.43 
    Postmenopausal 0.48 0.49 
No. of positive lymph nodes   
    1–3 0.60 0.53 
    4–9 0.33 0.38 
    >9 0.26 0.39 
Tumor size (mm)   
    ≤20 0.53 0.41 
    21–30 0.50 0.49 
    >30 0.36 0.46 
Tumor grade   
    I 0.46 0.45 
    II 0.50 0.49 
    III 0.38 0.39 
Estrogen receptor status   
    Positive 0.50 0.49 
    Negative 0.39 0.41 
Progesterone receptor status   
    Positive 0.47 0.46 
    Negative 0.45 0.46 
Treatment with tamoxifen   
    No 0.40 0.45 
    Yes 0.61 0.49 
*

3×CMF, three cycles of cyclophosphamide-methotrexate-flourouracil.

Multiple Cox regression analysis with adjustment for covariates.

From a Cox regression analysis including the prognostic factors for adjustment, the hazard ratio for 3×CMF versus 6×CMF was estimated as 1.002 (95 percent CI: 0.683, 1.470; p = 0.99) (table 3). Figure 4, part b, shows the corresponding adjusted event-free survival rates. After adjustment for prognostic factors, no difference was observed between treatment arms in nonrandomized patients.

Propensity-score-based analysis.

Logistic regression analysis revealed that the number of positive axillary lymph nodes and the decision to undergo tamoxifen treatment were both related to the choice of 3×CMF versus 6×CMF (p < 0.157; results not shown in detail). Therefore, the propensity score was estimated with a logistic regression model including these two covariates (table 5). Thus, the estimated propensity score had only six different values defined by the categories of the number of positive axillary lymph nodes and the decision to undergo tamoxifen treatment. Its median value was 0.54 in 3×CMF (range, 0.22–0.75) and 0.41 in 6×CMF (range, 0.22–0.75), showing considerable overlap.

TABLE 5.

Effects of prognostic factors on treatment assignment (three cycles of CMF* vs. six cycles of CMF) in the nonrandomized portion (n = 238) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997

Factor Odds ratio 95% confidence interval p value 
No. of positive lymph nodes    
    1–3  <0.0001 
    4–9 0.29 0.16, 0.55  
    >9 0.23 0.10, 0.51  
Treatment with tamoxifen    
    No  0.003 
    Yes 2.54 1.38, 4.68  
Factor Odds ratio 95% confidence interval p value 
No. of positive lymph nodes    
    1–3  <0.0001 
    4–9 0.29 0.16, 0.55  
    >9 0.23 0.10, 0.51  
Treatment with tamoxifen    
    No  0.003 
    Yes 2.54 1.38, 4.68  
*

CMF, cyclophosphamide-methotrexate-flourouracil.

p value from a two-sided Wald test in a logistic regression model.

TABLE 5.

Effects of prognostic factors on treatment assignment (three cycles of CMF* vs. six cycles of CMF) in the nonrandomized portion (n = 238) of a comprehensive cohort study of breast cancer, German Breast Cancer Study Group, Germany, 1984–1997

Factor Odds ratio 95% confidence interval p value 
No. of positive lymph nodes    
    1–3  <0.0001 
    4–9 0.29 0.16, 0.55  
    >9 0.23 0.10, 0.51  
Treatment with tamoxifen    
    No  0.003 
    Yes 2.54 1.38, 4.68  
Factor Odds ratio 95% confidence interval p value 
No. of positive lymph nodes    
    1–3  <0.0001 
    4–9 0.29 0.16, 0.55  
    >9 0.23 0.10, 0.51  
Treatment with tamoxifen    
    No  0.003 
    Yes 2.54 1.38, 4.68  
*

CMF, cyclophosphamide-methotrexate-flourouracil.

p value from a two-sided Wald test in a logistic regression model.

Patients were divided into the six strata defined by the values of the included covariates, leading to perfect balance of these covariates. From a Cox regression analysis stratified for the six propensity score strata, the hazard ratio for 3×CMF versus 6×CMF was estimated as 0.987 (95 percent CI: 0.677, 1.438; p = 0.95) (table 3).

The GT approach.

Hospital practices varied considerably in the choice of 3×CMF versus 6×CMF in nonrandomized patients (table 2). Thus, an important prerequisite for the use of the GT approach (4) is given in our study. The mean of the GT variable was equal to 0.66 in patients who received 3×CMF and equal to 0.29 in patients who received 6×CMF. The rank correlation between the individually assigned treatment variable and the GT variable equaled 0.61, showing that assumption 1 was fulfilled for the GT variable.

Table 4 shows the relations of the covariates to individual treatment assignment and the GT variable. Imbalances observed between the individual treatment variable and the covariates were substantially reduced when the GT variable was examined in relation to the covariates. An imbalance with a difference in the mean GT variable between the covariate categories of more than 0.1 only remains for the number of positive axillary lymph nodes. This indicates that assumption 2 for the GT variable seems to be reasonably fulfilled for most of the observed prognostic factors, although some differences between clinical centers may still exist regarding the disease status of patients. The large variation in the proportion of 3×CMF also supports the assumption that the heterogeneous treatment assignment at the different centers is more influenced by hospital practice than by individual covariates.

Using the GT variable, the hazard ratio for 3×CMF versus 6×CMF was estimated as 0.758 (95 percent CI: 0.438, 1.311; p = 0.32) (table 3). This seems to have eliminated some of the bias, since the effect of 3×CMF versus 6×CMF was less extreme than in the unadjusted analysis but was still much larger than in the other adjusted analyses and in the randomized portion of the study.

DISCUSSION

In this paper, we conducted separate analyses of the randomized and nonrandomized portions of a comprehensive cohort study in breast cancer patients. Our intention was to present and compare different methods proposed for estimation of causal effects in nonrandomized studies and to consider their ability to reproduce the correct unbiased result (it being known from the randomized portion of the study). The underlying assumption that the randomized and nonrandomized patients were comparable seems to have been fulfilled in our study.

The analysis of the randomized portion of the study has been published previously (9). An unadjusted analysis of the treatment effect and an analysis adjusted for the known covariates showed almost identical results—namely, no difference between 3×CMF and 6×CMF with respect to event-free survival, with an estimated hazard ratio close to 1.

An unadjusted analysis of the nonrandomized portion of the study that did not take covariates into account showed an apparent superiority of treatment arm 3×CMF as compared with treatment arm 6×CMF, with an estimated hazard ratio of 0.693 (95 percent CI: 0.494, 0.973; p = 0.034); this contrasted with the results of the randomized portion of the study. However, obvious imbalances of the covariates between treatment arms indicated confounding. With a conventional Cox regression analysis adjusting for known prognostic factors and treatment with tamoxifen, we could correct for this bias. The resulting estimate of the treatment effect now agreed with that obtained in the randomized portion of the study.

An analysis based on the propensity score was also able to correct for the bias, with a resulting treatment effect estimate that was also close to 1. Recently, investigations have been performed on which variables should be included for optimal estimation of the propensity score (17, 18). This has not been considered in detail here. Those factors showing an influence on treatment assignment were selected—namely, the number of positive axillary lymph nodes and the decision to undergo tamoxifen treatment. We stratified the analysis for the resulting six propensity score strata. Other methods of adjusting for the propensity score have been proposed in the literature (e.g., matching for the propensity score or propensity-score-based weighting), which may lead to quite different estimated treatment effects in the case of a nonuniform treatment effect across different values of the propensity score (19). Our procedure is close to an approach that matches for the propensity score, because only patients with an identical propensity score are summarized in a stratum. In a sensitivity analysis, we additionally estimated separate treatment effects in the six strata, showing that no nonuniform effect was present in our study. We thus may conclude that our result is not much influenced by the choice of method of adjusting for the propensity score. We performed an additional sensitivity analysis in which an indicator of treatment center was included for estimation of the propensity score. The analysis of the treatment effect was then stratified using quintiles of the estimated propensity score, producing a result similar to the one presented in table 3 (data not shown).

The GT variable used in our study, the proportion of patients treated with 3×CMF at the respective clinical center, was not able to remove the bias due to confounding. The estimated effect of 3×CMF versus 6×CMF was less extreme but still close to that obtained in the unadjusted analysis. This may be due to the residual imbalance with respect to the number of positive axillary lymph nodes. This is especially problematic because the number of positive axillary lymph nodes is the strongest prognostic factor in this patient population (9). For situations where the GT variable is related to known prognostic factors, it has been proposed that these factors be included as individual covariates for adjustment in the model relating the GT variable to the outcome (3). However, this must also be regarded as problematic, because this may introduce confounding with other factors in an uncontrolled way. In our study, including the factor number of positive axillary lymph nodes as an individual covariate in the regression model leads to an estimated effect for the GT variable close to 1, thus being in good agreement with the unbiased effect from the randomized portion of the study. Another consideration concerns the effect of clustering of the GT variable across clinical centers. Table 2 shows that no important clustering is present, and an additional sensitivity analysis adjusting the GT variable for the number of patients per center showed a result similar to the one presented in table 3.

The tamoxifen treatment was dealt with in the same way as the prognostic factors, although, in a strict sense, it is not a variable causing confounding by indication but rather must be considered a part of the hospital practice. This was not considered problematic, especially because its imbalance in relation to the individual treatment with 3×CMF or 6×CMF could be substantially reduced by considering the GT variable.

Note that the estimated standard error of the effect of the GT variable is larger than the estimated standard error of the estimated treatment effect when analyzing the individually assigned treatment. This was also observed in other, similar applications (15, 20) and in a simulation study on the properties of the GT approach (3). This is due to the reduced variability of the GT variable as compared with the individual treatment variable. Some authors refer to this as a loss in precision due to the use of an imperfect surrogate (3, 15). Another point sometimes mentioned is that a considerable correlation between patients at a given center can lead to larger confidence intervals, and it has been proposed that statistical methods accounting for this correlation be used (21). However, this correlation would be relevant not only for the analysis using the GT variable but also for the analysis using the individual treatment variable. Nevertheless, if this were considered necessary, it could be accomplished in survival analysis by including an appropriate frailty term in the model (22).

A further point of interest is interpretation of the effect estimates resulting from the different methods. Adjustment of the analyses for covariates results in estimates of the treatment effect that are conditional on the observed covariates and marginal with respect to unobserved covariates, whereas the analyses unadjusted for covariates, the analysis based on the propensity score (23), and the GT approach (3, 20) all result in estimates of the treatment effect that are marginal with respect to observed and unobserved covariates. The marginal effect is interpreted on the population level; it is the change in the hazard of the population if all patients were to receive one treatment in comparison with all patients' receiving the other treatment. The conditional effect is interpreted more on the individual level; it is the change in hazard for a patient if she receives one treatment in comparison with whether she receives the other treatment, conditional on the measured covariates. In the proportional hazards model in general, these parameters do not coincide (3, 24–26). In expectation, the marginal effect is more conservative (i.e., closer to unity) than the conditional effect. In many applied settings where both conventional regression models and propensity-score-based methods are compared, this difference in parameters is often not considered.

We conclude that in our study, the propensity-score-based approach does not yield any obvious advantages over the conventional regression approach, as has also been stressed in other recent applications and reviews (20, 27–29). The conventional regression analysis and the propensity-score-based approach can adjust only for measured confounders. The GT approach additionally tries to reduce bias due to unmeasured variables' being confounders by indication on the individual level. This requires the assumption that these variables are not related to the GT variable—that is, are not confounders by indication on the ecologic level. In our study, this means that prognostic factors did not influence the patient's choice of a certain hospital because of its practice concerning the frequency of giving the treatments under comparison. We conjecture that this may be fulfilled more often than “no confounding” on the individual level. Nevertheless, with the GT approach in our study, we did not succeed in removing the confounding completely.

The comprehensive cohort study design offers an ideal situation for investigating the properties of the different approaches for analyzing nonrandomized observational studies mentioned above, since, being part of the same study, it allows a comparison of the results obtained in the nonrandomized patients with the unbiased results obtained in the randomized patients.

Abbreviations

    Abbreviations
     
  • CI

    confidence interval

  •  
  • CMF

    cyclophosphamide-methotrexate-flourouracil

  •  
  • GT

    grouped treatment

This work was supported by the Deutsche Forschungsgemeinschaft (German Research Foundation), Research Unit FOR 534.

Conflict of interest: none declared.

References

1.
Rosenbaum
PR
Rubin
DB
The central role of the propensity score in observational studies for causal effects
Biometrika
1983
, vol. 
70
 (pg. 
41
-
55
)
2.
Greenland
S
An introduction to instrumental variables for epidemiologists
Int J Epidemiol
2000
, vol. 
29
 (pg. 
722
-
9
)
3.
Johnston
SC
Henneman
T
McCulloch
CE
, et al. 
Modelling treatment effects on binary outcomes with grouped treatment variables and individual covariates
Am J Epidemiol
2002
, vol. 
156
 (pg. 
753
-
60
)
4.
Wen
SW
Kramer
MS
Use of ecological studies in the assessment of intended treatment effects
J Clin Epidemiol
1999
, vol. 
52
 (pg. 
7
-
12
)
5.
Olschewski
M
Scheurlen
H
Comprehensive cohort study: an alternative to randomized consent design in a breast preservation trial
Methods Inf Med
1985
, vol. 
24
 (pg. 
131
-
4
)
6.
Schmoor
C
Olschewski
M
Schumacher
M
Randomized and nonrandomized patients in clinical trials: experiences with comprehensive cohort studies
Stat Med
1996
, vol. 
15
 (pg. 
263
-
71
)
7.
D'Agostino
RB
D'Agostino
RB
Estimating treatment effects using observational data
JAMA
2007
, vol. 
297
 (pg. 
314
-
6
)
8.
Schumacher
M
Bastert
G
Bojar
H
, et al. 
Randomized 2 × 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients
German Breast Cancer Study Group (GBSG) J Clin Oncol
1994
, vol. 
12
 (pg. 
2086
-
93
)
9.
Sauerbrei
W
Bastert
G
Bojar
H
, et al. 
Randomized 2 × 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients: an update based on 10 years follow up. German Breast Cancer Study Group (GBSG)
J Clin Oncol
2000
, vol. 
18
 (pg. 
94
-
101
)
10.
Schemper
M
Smith
TL
A note on quantifying follow-up in studies of failure time
Control Clin Trials
1996
, vol. 
17
 (pg. 
343
-
6
)
11.
Cox
DR
Regression models and life tables (with discussion)
J R Stat Soc Ser B
1972
, vol. 
34
 (pg. 
187
-
220
)
12.
Kaplan
EL
Meier
P
Nonparametric estimation from incomplete observation
J Am Stat Assoc
1958
, vol. 
53
 (pg. 
457
-
81
)
13.
Makuch
RW
Adjusted survival curve estimation using covariates
J Chronic Dis
1982
, vol. 
35
 (pg. 
437
-
43
)
14.
Teräsvirta
T
Mellin
I
Model selection criteria and model selection tests in regression models
Scand J Stat
1986
, vol. 
13
 (pg. 
159
-
71
)
15.
Schneeweis
S
Solomon
DH
Wang
PS
, et al. 
Simultaneous assessment of short-term gastrointestinal benefits and cardiovascular risks of selective cyclooxygenase 2 inhibitors and nonselective nonsteroidal antiinflammatory drugs: an instrumental variable analysis
Arthritis Rheum
2006
, vol. 
54
 (pg. 
3390
-
8
)
16.
SAS Institute
Inc. SAS language, version 8
1999
Cary, NC
SAS Institute, Inc
17.
Brookhart
MA
Schneeweiss
S
Rothman
KJ
, et al. 
Variable selection for propensity score models
Am J Epidemiol
2006
, vol. 
163
 (pg. 
1149
-
56
)
18.
Austin
PC
Grootendorst
P
Anderson
GM
A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study
Stat Med
2007
, vol. 
26
 (pg. 
734
-
53
)
19.
Kurth
T
Walker
AM
Glynn
RJ
, et al. 
Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect
Am J Epidemiol
2006
, vol. 
163
 (pg. 
262
-
70
)
20.
Stukel
TA
Fisher
ES
Wennberg
DE
, et al. 
Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods
JAMA
2007
, vol. 
297
 (pg. 
278
-
85
)
21.
Johnston
SC
Combining ecological and individual variables to reduce confounding by indication: case study—subarachnoid hemorrhage treatment
J Clin Epidemiol
2000
, vol. 
53
 (pg. 
1236
-
41
)
22.
Clayton
DG
Cuzick
J
Multivariate generalizations of the proportional hazards model (with discussion)
J R Stat Soc Ser A
1985
, vol. 
148
 (pg. 
82
-
117
)
23.
Senn
S
Graf
E
Caputo
A
Stratification for the propensity score compared with linear regression techniques to assess the effect of treatment or exposure
Stat Med
2007
, vol. 
26
 (pg. 
5529
-
44
)
24.
Schumacher
M
Olschewski
M
Schmoor
C
The impact of heterogeneity on the comparison of survival times
Stat Med
1987
, vol. 
6
 (pg. 
773
-
84
)
25.
Austin
PC
Grootendorst
P
Normand
SLT
, et al. 
Conditioning on the propensity score can result in biased estimation of common measures of treatment effects: a Monte Carlo study
Stat Med
2007
, vol. 
26
 (pg. 
754
-
68
)
26.
Johnston
KM
Gustafson
P
Levy
AR
, et al. 
Use of instrumental variables in the analysis of generalized linear models in the presence of unmeasured confounding with applications to epidemiological research
Stat Med
2007
 
Sep 10 [Epub ahead of print]
27.
Shah
B
Laupacis
A
Hux
J
, et al. 
Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review
J Clin Epidemiol
2005
, vol. 
58
 (pg. 
550
-
9
)
28.
Stürmer
T
Joshi
M
Glynn
RJ
, et al. 
A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods
J Clin Epidemiol
2006
, vol. 
59
 (pg. 
437
-
47
)
29.
Stürmer
T
Schneeweiss
S
Brookhart
MA
, et al. 
Analytic strategies to adjust confounding using exposure propensity scores and disease risk scores: nonsteroidal antiinflammatory drugs and short-term mortality in the elderly
Am J Epidemiol
2005
, vol. 
161
 (pg. 
891
-
8
)