## Abstract

Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.

Binary outcomes are routinely encountered in epidemiologic research. Traditionally, the effect of an exposure or intervention on such outcomes has been expressed in the form of an odds ratio. Although the odds ratio has some advantages, such as being directly estimable in case-control studies, it may be difficult for nonepidemiologists to interpret and is often misinterpreted as a relative risk (1). As a result, arguments have been made in favor of estimating relative risks for prospective studies (1–3), rather than odds ratios.

Relative risks can be estimated by log binomial regression (4), a generalized linear model that combines a log link with a binomial distribution. The model may be written as

_{i}is the probability of experiencing the outcome of interest for subject

*i*, and

*X*

_{1i},

*X*

_{2i}, …

*X*are predictor variables. This model has the flexibility to accommodate both categorical and continuous predictors but may fail to converge (5). Convergence problems occur during the iterative estimation procedure when the right-hand side of equation 1 exceeds 0 for some subject(s), based on the current values of the parameter estimates, since the left-hand side of equation 1 must be less than or equal to zero for π

_{ki}_{i}to be a valid probability.

To overcome convergence problems, researchers have suggested a number of alternative methods for estimating relative risks (5–8), including the popular modified Poisson regression approach (5). This method also uses a log link and, hence, has the same form as equation 1 but applies a Poisson distribution to the data, rather than a binomial distribution. It produces consistent estimates of the parameters in equation 1 but inconsistent variances, since the variance under a Poisson model is larger than the variance under a binomial model unless the outcome is rare (9, 10). Robust variance estimation is therefore used to avoid overestimating standard errors of parameter estimates (5). The modified Poisson regression approach is now highly cited (466 citations in the *Thomson Reuters Web of Knowledge* (formerly *ISI Web of Knowledge*) as of January 10, 2011) and has been applied across a broad range of observational (11, 12) and intervention (13, 14) studies.

Modified Poisson regression was proposed in the context of independent data and has been shown both analytically and by simulation to be appropriate in this setting (5, 9, 10, 15–18). Clustering is present in many prospective studies and may result from repeated measurements taken on the same subject over time (e.g., presence or absence of depressive symptoms at multiple psychological assessments) or measurements taken on multiple subjects within a group (e.g., presence or absence of postoperative infection among patients within hospitals). The need to investigate the performance of modified Poisson regression in the context of clustered data was identified in 2004 when the method was first suggested (5), but this still remains to be done.

Despite a lack of evidence supporting the use of modified Poisson regression for clustered prospective data, application of this method to such data is not uncommon (19, 20). Standard errors are often corrected by using generalized estimating equations (GEEs), which use a form of robust variance estimation and take the clustering into account (21), rather than the usual robust variance estimation approach applied to independent data. Investigation of the performance of modified Poisson regression combined with GEEs is essential to determine whether the results of previous clustered studies using this method are valid, and whether this method should continue to be applied to clustered data in the future.

The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using GEEs to account for clustering. We consider both intervention and observational studies where clustering is present due to observations made on multiple subjects within a group.

## MATERIALS AND METHODS

To evaluate the performance of modified Poisson regression for clustered data, we conducted a simulation study. For each simulation scenario, 1,000 data sets were generated with 20 or 50 clusters of various sizes according to a Poisson distribution with a mean of 25 or 10, respectively, to give an average total sample size of 500. Half of the subjects were assigned to the treatment/exposed group, while the other half were assigned to the control/unexposed group. Treatment/exposure group assignment occurred at the individual level, independent of the cluster. Binary outcomes were generated with probability for subject *j* in cluster *i*, where *X*_{1ij} is a binary indicator variable for treatment/exposure, *X*_{2ij} is a binary or continuous covariate, and *u _{i}* is a normally distributed random cluster effect with mean 0 and variance 0.1 or 0.2, used to induce clustering. This corresponds to an intracluster correlation coefficient (ICC) of between 0.01 and 0.15, depending on the treatment and covariate effects (22). ICCs of this magnitude are typical for clustered studies in practice (23). For each simulation scenario, the expected baseline risk was 0.1 or 0.2, while the treatment/exposure and covariate relative risks were 1, 1.25, or 2. If π

_{ij}exceeded 1 for a given combination of values for the treatment/exposure group, covariate, and random cluster effect, new values of the covariate and/or random cluster effect were generated until π

_{ij}< 1.

Data were simulated under both intervention and observational study designs. For the intervention study scenarios, the covariate was generated independently of the treatment group assignment. Binary covariates were generated by using a beta-binomial model with an average cluster-specific prevalence of 0.5 and an ICC of 0.05 (24). Continuous covariates were generated by using the model *X*_{2ij} = μ + *a _{i}* +

*e*, with average cluster-specific mean μ = 0.5 and normally distributed random cluster effects (

_{ij}*a*) and error terms (

_{i}*e*), each with mean 0. Variances were chosen such that the total variance was 0.25, and the ICC was 0.05. For the observational study scenarios, the covariate depended on the exposure status to induce confounding. Binary covariates were generated with prevalence 0.4 for nonexposed subjects and 0.6 for exposed subjects, as well as an ICC of 0.05 by using the method of Qaqish (25). Continuous covariates were generated as for the intervention study but with an average cluster-specific mean of μ = 0.4 for nonexposed subjects and μ = 0.6 for exposed subjects.

_{ij}Each simulated data set was analyzed by using modified Poisson regression, as well as log binomial regression for comparison. GEEs with an exchangeable working correlation structure were used to account for clustering. An exchangeable structure was chosen because equal correlation is a reasonable assumption for the type of clustered data considered. This choice of structure is not a requirement for the extension of the modified Poisson regression approach to accommodate clustered data, however. Both treatment/exposure and the covariate were included in the analysis model. Because GEEs are known to underestimate the variance of the parameter estimates when the number of clusters is small (e.g., less than 40) (26), bias corrections were applied in simulation scenarios involving 20 clusters by using the method of Mancl and DeRouen (27), as implemented in the SAS macro, diag103.sas (28). All analyses were performed by using SAS, version 9.2, software (SAS Institute, Inc., Cary, North Carolina).

For each simulation scenario, the following properties were determined for both log binomial regression and modified Poisson regression: the convergence rate, calculated as the percentage of simulated data sets where the fitting algorithm converged; the type I error rate, calculated as the percentage of Wald tests that resulted in a rejection of the null hypothesis of no treatment/exposure effect at the 2-sided 5% level when the null hypothesis was true; the coverage rate, calculated as the percentage of standard error-based 95% confidence intervals for the treatment/exposure relative risk that contained the true value; and the mean percent relative bias, where relative bias was calculated as the estimated relative risk for treatment/exposure minus the true relative risk, divided by the true relative risk. Type I error and coverage rates that differed significantly from the nominal level were identified. For 1,000 simulated data sets, type I error rates of less than 3.6% or greater than 6.4% differ significantly (*P* < 0.05) from the nominal level of 5% based on a 2-sided normal approximation test for a proportion. Similarly, coverage rates of less than 93.6% or greater than 96.4% differ significantly from the nominal level of 95%. Where log binomial regression failed to converge for a particular data set, results from that data set were necessarily excluded for the log binomial method from both the numerator and denominator when determining the type I error rate, coverage rate, and mean percent relative bias. Actual cutoff values used to identify significant differences from the nominal level for the type I error and coverage rates therefore differed, depending on the number of simulated data sets that converged. Relative risks estimated by using modified Poisson regression were plotted against estimates obtained by using log binomial regression, and summary statistics were calculated for the difference in estimates between methods (modified Poisson minus log binomial) across all simulations. Results are presented for an expected baseline risk of 0.2 and a variance of 0.2 for the random cluster effects only; full results are available from the corresponding author on request.

## RESULTS

Modified Poisson regression converged for all simulated data sets in the vast majority of scenarios, and convergence rates did not fall below 99.9% for any scenario (data not shown). Log binomial regression also converged for all simulated data sets in most scenarios when adjustment was made for a binary covariate but regularly failed to converge for some simulated data sets when adjustment was made for a continuous covariate. Convergence rates fell as low as 34.7% for scenarios with 50 clusters and 0.3% for scenarios with 20 clusters. The convergence rate decreased as the treatment or covariate relative risk increased and as the number of clusters decreased, but was largely unaffected by the study design (Table 1). Decreasing the baseline risk resulted in more convergence problems in scenarios where the covariate had no effect on the outcome but fewer problems otherwise, while decreasing the variance of the random cluster effects had little influence on the results (data not shown).

Treatment/Exposure Relative Risk | Covariate Relative Risk | Intervention Study | Observational Study | ||

20 Clusters | 50 Clusters | 20 Clusters | 50 Clusters | ||

1.00 | 1.00 | 98.5 | 100.0 | 98.5 | 100.0 |

1.25 | 93.9 | 100.0 | 95.1 | 100.0 | |

2.00 | 19.4 | 91.8 | 15.4 | 91.1 | |

1.25 | 1.00 | 98.5 | 100.0 | 98.5 | 100.0 |

1.25 | 89.9 | 100.0 | 91.4 | 100.0 | |

2.00 | 13.9 | 84.6 | 8.1 | 80.9 | |

2.00 | 1.00 | 93.8 | 100.0 | 93.8 | 100.0 |

1.25 | 75.5 | 99.9 | 73.2 | 99.9 | |

2.00 | 3.3 | 51.5 | 4.0 | 46.4 |

Treatment/Exposure Relative Risk | Covariate Relative Risk | Intervention Study | Observational Study | ||

20 Clusters | 50 Clusters | 20 Clusters | 50 Clusters | ||

1.00 | 1.00 | 98.5 | 100.0 | 98.5 | 100.0 |

1.25 | 93.9 | 100.0 | 95.1 | 100.0 | |

2.00 | 19.4 | 91.8 | 15.4 | 91.1 | |

1.25 | 1.00 | 98.5 | 100.0 | 98.5 | 100.0 |

1.25 | 89.9 | 100.0 | 91.4 | 100.0 | |

2.00 | 13.9 | 84.6 | 8.1 | 80.9 | |

2.00 | 1.00 | 93.8 | 100.0 | 93.8 | 100.0 |

1.25 | 75.5 | 99.9 | 73.2 | 99.9 | |

2.00 | 3.3 | 51.5 | 4.0 | 46.4 |

Abbreviation: GEE, generalized estimating equation.

Both log binomial regression and modified Poisson regression sometimes produced type I error rates that were too high (Table 2) and coverage rates that were too low (Table 3) compared with the nominal level. Any type I error problems were minimal however, as the highest type I error rates across all scenarios were only 7.1% and 7.0% for log binomial regression and modified Poisson regression, respectively. Coverage problems were also minimal when adjustment was made for a binary covariate but could be substantial when controlling for a continuous covariate, particularly for log binomial regression. The minimum coverage rate across all scenarios was 72.7% for this method, compared with 91.0% for modified Poisson regression. The poorest coverage rates tended to occur when the treatment and covariate effects were strongest. Type I error and coverage rates were largely unaffected by the study design. Decreasing the baseline risk and random effects variance resulted in fewer type I error and coverage problems (data not shown).

No. of Clusters | Covariate Relative Risk | Intervention Study | Observational Study | ||||||

Binary | Continuous | Binary | Continuous | ||||||

Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | ||

20 | 1.00 | 5.9 | 6.0 | 5.1 | 5.1 | 6.0 | 5.9 | 5.4 | 5.8 |

1.25 | 6.7a | 6.9a | 6.2 | 6.5a | 6.2 | 6.0 | 4.6 | 5.1 | |

2.00 | 6.5a | 6.6a | 2.6 | 5.3 | 6.0 | 5.7 | 3.2 | 4.7 | |

50 | 1.00 | 5.2 | 5.1 | 5.0 | 5.0 | 6.2 | 6.2 | 5.1 | 5.1 |

1.25 | 6.4a | 6.3 | 4.8 | 4.8 | 5.0 | 5.1 | 6.0 | 6.2 | |

2.00 | 4.4 | 4.7 | 7.1a | 6.7a | 6.3 | 6.3 | 4.8 | 4.1 |

No. of Clusters | Covariate Relative Risk | Intervention Study | Observational Study | ||||||

Binary | Continuous | Binary | Continuous | ||||||

Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | ||

20 | 1.00 | 5.9 | 6.0 | 5.1 | 5.1 | 6.0 | 5.9 | 5.4 | 5.8 |

1.25 | 6.7a | 6.9a | 6.2 | 6.5a | 6.2 | 6.0 | 4.6 | 5.1 | |

2.00 | 6.5a | 6.6a | 2.6 | 5.3 | 6.0 | 5.7 | 3.2 | 4.7 | |

50 | 1.00 | 5.2 | 5.1 | 5.0 | 5.0 | 6.2 | 6.2 | 5.1 | 5.1 |

1.25 | 6.4a | 6.3 | 4.8 | 4.8 | 5.0 | 5.1 | 6.0 | 6.2 | |

2.00 | 4.4 | 4.7 | 7.1a | 6.7a | 6.3 | 6.3 | 4.8 | 4.1 |

Abbreviation: GEE, generalized estimating equation.

Significantly different (*P* < 0.05) from the nominal level of 5% based on a 2-sided normal approximation test for a proportion.

No. of Clusters | Treatment/Exposure Relative Risk | Covariate Relative Risk | Intervention Study | Observational | ||||||

Binary | Continuous | Binary | Continuous | |||||||

Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | |||

20 | 1.00 | 1.00 | 94.1 | 94.0 | 94.9 | 94.9 | 94.0 | 94.1 | 94.6 | 94.2 |

1.25 | 93.3a | 93.1a | 93.8 | 93.5a | 93.8 | 94.0 | 95.4 | 94.9 | ||

2.00 | 93.5a | 93.4a | 97.4 | 94.7 | 94.0 | 94.3 | 96.8 | 95.3 | ||

1.25 | 1.00 | 93.7 | 93.7 | 94.4 | 94.4 | 94.4 | 94.5 | 94.3 | 94.4 | |

1.25 | 93.6a | 93.5a | 93.8 | 93.6a | 94.1 | 94.0 | 93.8 | 94.3 | ||

2.00 | 94.3 | 94.2 | 95.7 | 95.0 | 94.6 | 94.4 | 95.1 | 94.4 | ||

2.00 | 1.00 | 93.9 | 94.1 | 94.0 | 94.2 | 94.3 | 94.4 | 94.0 | 93.7 | |

1.25 | 93.4a | 93.3a | 94.3 | 94.3 | 94.2 | 94.4 | 93.6 | 93.5a | ||

2.00 | 95.1 | 94.5 | 72.7a | 91.0a | 94.3 | 93.8 | 85.0a | 93.4a | ||

50 | 1.00 | 1.00 | 94.8 | 94.9 | 95.0 | 95.0 | 93.8 | 93.8 | 94.9 | 94.9 |

1.25 | 93.6a | 93.7 | 95.2 | 95.2 | 95.0 | 94.9 | 94.0 | 93.8 | ||

2.00 | 95.6 | 95.3 | 92.9a | 93.3a | 93.7 | 93.7 | 95.2 | 95.9 | ||

1.25 | 1.00 | 94.2 | 94.3 | 95.1 | 94.9 | 94.4 | 94.6 | 94.4 | 94.7 | |

1.25 | 93.7 | 93.7 | 94.7 | 94.8 | 94.4 | 94.4 | 95.0 | 95.1 | ||

2.00 | 94.5 | 93.8 | 93.9 | 94.4 | 94.2 | 94.5 | 92.8a | 93.1a | ||

2.00 | 1.00 | 93.8 | 93.7 | 94.5 | 94.4 | 93.9 | 93.9 | 94.2 | 94.2 | |

1.25 | 94.4 | 94.6 | 94.3 | 94.4 | 94.8 | 94.6 | 94.1 | 94.3 | ||

2.00 | 93.7 | 92.8a | 84.1a | 91.7a | 92.5a | 93.1a | 90.1a | 94.2 |

No. of Clusters | Treatment/Exposure Relative Risk | Covariate Relative Risk | Intervention Study | Observational | ||||||

Binary | Continuous | Binary | Continuous | |||||||

Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | |||

20 | 1.00 | 1.00 | 94.1 | 94.0 | 94.9 | 94.9 | 94.0 | 94.1 | 94.6 | 94.2 |

1.25 | 93.3a | 93.1a | 93.8 | 93.5a | 93.8 | 94.0 | 95.4 | 94.9 | ||

2.00 | 93.5a | 93.4a | 97.4 | 94.7 | 94.0 | 94.3 | 96.8 | 95.3 | ||

1.25 | 1.00 | 93.7 | 93.7 | 94.4 | 94.4 | 94.4 | 94.5 | 94.3 | 94.4 | |

1.25 | 93.6a | 93.5a | 93.8 | 93.6a | 94.1 | 94.0 | 93.8 | 94.3 | ||

2.00 | 94.3 | 94.2 | 95.7 | 95.0 | 94.6 | 94.4 | 95.1 | 94.4 | ||

2.00 | 1.00 | 93.9 | 94.1 | 94.0 | 94.2 | 94.3 | 94.4 | 94.0 | 93.7 | |

1.25 | 93.4a | 93.3a | 94.3 | 94.3 | 94.2 | 94.4 | 93.6 | 93.5a | ||

2.00 | 95.1 | 94.5 | 72.7a | 91.0a | 94.3 | 93.8 | 85.0a | 93.4a | ||

50 | 1.00 | 1.00 | 94.8 | 94.9 | 95.0 | 95.0 | 93.8 | 93.8 | 94.9 | 94.9 |

1.25 | 93.6a | 93.7 | 95.2 | 95.2 | 95.0 | 94.9 | 94.0 | 93.8 | ||

2.00 | 95.6 | 95.3 | 92.9a | 93.3a | 93.7 | 93.7 | 95.2 | 95.9 | ||

1.25 | 1.00 | 94.2 | 94.3 | 95.1 | 94.9 | 94.4 | 94.6 | 94.4 | 94.7 | |

1.25 | 93.7 | 93.7 | 94.7 | 94.8 | 94.4 | 94.4 | 95.0 | 95.1 | ||

2.00 | 94.5 | 93.8 | 93.9 | 94.4 | 94.2 | 94.5 | 92.8a | 93.1a | ||

2.00 | 1.00 | 93.8 | 93.7 | 94.5 | 94.4 | 93.9 | 93.9 | 94.2 | 94.2 | |

1.25 | 94.4 | 94.6 | 94.3 | 94.4 | 94.8 | 94.6 | 94.1 | 94.3 | ||

2.00 | 93.7 | 92.8a | 84.1a | 91.7a | 92.5a | 93.1a | 90.1a | 94.2 |

Abbreviation: GEE, generalized estimating equation.

Significantly different (*P* < 0.05) from the nominal level of 95% based on a 2-sided normal approximation test for a proportion.

The mean percent relative bias in the estimated relative risk was generally small for both log binomial regression and modified Poisson regression (Table 4). Values ranged from −10.4% to 6.4% for log binomial regression and from −4.1% to 6.4% for modified Poisson regression across all scenarios. Large differences in the mean percent relative bias occurred between methods in scenarios where convergence rates were poor for log binomial regression. In these scenarios, the magnitude of the bias was typically larger for log binomial regression compared with modified Poisson regression. Bias was otherwise very similar between methods. Relative risk estimates for individual simulated data sets were also very similar between methods. Plots of estimates obtained using the 2 approaches for several simulation scenarios are shown in Figure 1; plots for other scenarios showed a similar pattern. The median difference in estimates across all scenarios was 0.00 (interquartile range = 0.00–0.01).

No. of Clusters | Treatment/Exposure Relative Risk | Covariate Relative Risk | Intervention | Observational | ||||||

Binary | Continuous | Binary | Continuous | |||||||

Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | |||

20 | 1.00 | 1.00 | 1.85 | 1.86 | 1.90 | 2.01 | 2.44 | 2.43 | 2.07 | 2.26 |

1.25 | 1.58 | 1.55 | 2.46 | 2.18 | 0.98 | 0.95 | 2.40 | 2.23 | ||

2.00 | 0.33 | 0.37 | −0.16 | 0.68 | 1.31 | 1.31 | 3.68 | 1.03 | ||

1.25 | 1.00 | 2.21 | 2.22 | 1.97 | 2.13 | 1.20 | 1.22 | 1.96 | 2.09 | |

1.25 | 1.16 | 1.21 | 0.72 | 1.31 | 1.16 | 1.13 | 2.09 | 2.17 | ||

2.00 | 0.57 | 0.64 | −4.68 | −0.03 | 1.29 | 1.28 | −1.11 | 1.22 | ||

2.00 | 1.00 | 0.86 | 0.86 | 1.13 | 1.43 | 2.82 | 2.81 | 1.06 | 1.38 | |

1.25 | 1.38 | 1.37 | −0.53 | 0.66 | 1.64 | 1.61 | 1.53 | 1.95 | ||

2.00 | 1.09 | 1.57 | −10.39 | −2.93 | 1.70 | 1.92 | −5.18 | −1.82 | ||

50 | 1.00 | 1.00 | 2.11 | 2.13 | 1.60 | 1.58 | 2.41 | 2.40 | 1.88 | 1.86 |

1.25 | 1.63 | 1.65 | 1.88 | 1.89 | 2.29 | 2.28 | 1.84 | 1.81 | ||

2.00 | 1.01 | 1.09 | 1.29 | 1.07 | 0.63 | 0.59 | 1.89 | 1.99 | ||

1.25 | 1.00 | 1.56 | 1.59 | 1.54 | 1.53 | 2.17 | 2.19 | 1.54 | 1.53 | |

1.25 | 1.68 | 1.72 | 1.82 | 1.83 | 1.84 | 1.82 | 1.93 | 1.96 | ||

2.00 | 1.11 | 1.06 | −1.03 | −0.19 | 1.29 | 1.36 | 1.01 | 1.75 | ||

2.00 | 1.00 | 1.55 | 1.57 | 1.55 | 1.54 | 1.39 | 1.40 | 1.60 | 1.59 | |

1.25 | 1.34 | 1.32 | −0.33 | −0.19 | 2.08 | 2.04 | 1.83 | 1.91 | ||

2.00 | 1.33 | 1.33 | −8.22 | −4.09 | 1.82 | 1.80 | −5.42 | −2.63 |

No. of Clusters | Treatment/Exposure Relative Risk | Covariate Relative Risk | Intervention | Observational | ||||||

Binary | Continuous | Binary | Continuous | |||||||

Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | Log Binomial | Modified Poisson | |||

20 | 1.00 | 1.00 | 1.85 | 1.86 | 1.90 | 2.01 | 2.44 | 2.43 | 2.07 | 2.26 |

1.25 | 1.58 | 1.55 | 2.46 | 2.18 | 0.98 | 0.95 | 2.40 | 2.23 | ||

2.00 | 0.33 | 0.37 | −0.16 | 0.68 | 1.31 | 1.31 | 3.68 | 1.03 | ||

1.25 | 1.00 | 2.21 | 2.22 | 1.97 | 2.13 | 1.20 | 1.22 | 1.96 | 2.09 | |

1.25 | 1.16 | 1.21 | 0.72 | 1.31 | 1.16 | 1.13 | 2.09 | 2.17 | ||

2.00 | 0.57 | 0.64 | −4.68 | −0.03 | 1.29 | 1.28 | −1.11 | 1.22 | ||

2.00 | 1.00 | 0.86 | 0.86 | 1.13 | 1.43 | 2.82 | 2.81 | 1.06 | 1.38 | |

1.25 | 1.38 | 1.37 | −0.53 | 0.66 | 1.64 | 1.61 | 1.53 | 1.95 | ||

2.00 | 1.09 | 1.57 | −10.39 | −2.93 | 1.70 | 1.92 | −5.18 | −1.82 | ||

50 | 1.00 | 1.00 | 2.11 | 2.13 | 1.60 | 1.58 | 2.41 | 2.40 | 1.88 | 1.86 |

1.25 | 1.63 | 1.65 | 1.88 | 1.89 | 2.29 | 2.28 | 1.84 | 1.81 | ||

2.00 | 1.01 | 1.09 | 1.29 | 1.07 | 0.63 | 0.59 | 1.89 | 1.99 | ||

1.25 | 1.00 | 1.56 | 1.59 | 1.54 | 1.53 | 2.17 | 2.19 | 1.54 | 1.53 | |

1.25 | 1.68 | 1.72 | 1.82 | 1.83 | 1.84 | 1.82 | 1.93 | 1.96 | ||

2.00 | 1.11 | 1.06 | −1.03 | −0.19 | 1.29 | 1.36 | 1.01 | 1.75 | ||

2.00 | 1.00 | 1.55 | 1.57 | 1.55 | 1.54 | 1.39 | 1.40 | 1.60 | 1.59 | |

1.25 | 1.34 | 1.32 | −0.33 | −0.19 | 2.08 | 2.04 | 1.83 | 1.91 | ||

2.00 | 1.33 | 1.33 | −8.22 | −4.09 | 1.82 | 1.80 | −5.42 | −2.63 |

Abbreviation: GEE, generalized estimating equation.

Two illustrative examples follow below—an intervention study and an observational study.

For an example intervention study, we considered data from the Second Australian National Blood Pressure Study (29) that was conducted in general practices across Australia. Patients attending these practices were eligible to participate if they were aged 65–84 years and had hypertension, defined by either an average systolic blood pressure of at least 160 mm Hg or an average diastolic blood pressure of at least 90 mm Hg combined with an average systolic blood pressure of at least 140 mm Hg. Averages were based on 2 measurements taken at least 1 week apart. Patients were recruited and randomized between 1995 and 1998 to receive 1 of 2 antihypertensive drugs: angiotensin-converting-enzyme (ACE) inhibitor or diuretic. A postal survey of medication adherence was conducted in 2000 by using the instrument of Morisky et al. (30). We analyzed responses to the question (“Sometimes, if you felt worse when you took your medicine, did you stop taking it?”) using log binomial regression and modified Poisson regression. This question was answered by 3,664 patients from 657 practices. GEEs with exchangeable correlation were used to account for clustering due to the grouping of patients within general practices. The ICC for this outcome was 0.03, indicating a weak positive dependence between responses of patients from the same practice. This dependence may be due in part to the use of a common treatment approach for patients attending the same practice. Results were adjusted separately for gender, age, marital status, or all 3 covariates, each of which was related to the outcome. The mean age of responders was 72 years, 50% were male, 66% were married, 23% were widowed, and 11% were single.

Of the 1,858 responders assigned to the ACE-inhibitor group, 196 (11%) indicated that they sometimes stopped taking their medication when they felt worse, compared with 222 (12%) of the 1,806 responders assigned to the diuretic group. The relative risk of answering “yes” to the question (“Sometimes, if you felt worse when you took your medicine, did you stop taking it?”) comparing the ACE-inhibitor group with the diuretic group is given in Table 5. Modified Poisson regression gave results similar to those from log binomial regression, independent of the covariate(s) controlled for in the analysis. All analyses showed that the ACE-inhibitor group had a lower risk of answering “yes” compared with the diuretic group, with relative risks of around 0.86, although this did not reach statistical significance.

Covariate Adjustment | Log Binomial | Modified Poisson | ||

Relative Risk | 95% Confidence Interval | Relative Risk | 95% Confidence Interval | |

Gender | 0.87 | 0.72, 1.04 | 0.87 | 0.72, 1.04 |

Age | 0.86 | 0.71, 1.03 | 0.86 | 0.71, 1.03 |

Marital status | 0.85 | 0.71, 1.02 | 0.85 | 0.71, 1.02 |

All of the above | 0.86 | 0.72, 1.03 | 0.86 | 0.72, 1.03 |

Covariate Adjustment | Log Binomial | Modified Poisson | ||

Relative Risk | 95% Confidence Interval | Relative Risk | 95% Confidence Interval | |

Gender | 0.87 | 0.72, 1.04 | 0.87 | 0.72, 1.04 |

Age | 0.86 | 0.71, 1.03 | 0.86 | 0.71, 1.03 |

Marital status | 0.85 | 0.71, 1.02 | 0.85 | 0.71, 1.02 |

All of the above | 0.86 | 0.72, 1.03 | 0.86 | 0.72, 1.03 |

Abbreviations: ACE, angiotensin-converting enzyme; GEE, generalized estimating equation.

Question: “Sometimes, if you felt worse when you took your medicine, did you stop taking it?”

For an example observational study, we considered data from patients with hyperlipidemia who participated in the control arm of the Point of Care Testing Trial (31). These patients were recruited from general practices in Australia in 2005, and baseline information was collected from patients’ medical records. Patients were asked to complete a questionnaire to collect baseline demographic information and to determine whether they knew they had hyperlipidemia. Over an 18-month follow-up period, the results of any cholesterol tests performed as part of their usual patient management were collected. The primary outcome for the trial was whether the final cholesterol result was within a prespecified target range (32), based on clinical guidelines. We performed an analysis to determine whether patient awareness of their medical condition influenced this outcome for high density lipoprotein cholesterol. Analysis was based on 1,048 patients from 23 practices who completed the baseline questionnaire and who had at least 1 cholesterol test performed during the follow-up period. Small sample bias corrections were made by using the method of Mancl and DeRouen (27). Clustering due to the grouping of patients within general practices was taken into account by using GEEs with exchangeable correlation. The dependence between outcomes of patients from the same practice was weak, with an ICC of only 0.02. Results were adjusted separately for gender, age, diabetes, or all 3 covariates, each of which was related to both the outcome and the exposure of interest. No adjustment was made for frequency of testing during the follow-up period, because there was no evidence to suggest that this was related to either the outcome or the exposure. The mean age of patients was 66 years, 50% were male, and 35% had diabetes.

Of the 817 patients who knew they had hyperlipidemia, 644 (79%) had their final high density lipoprotein cholesterol result within the target range, compared with 166 (72%) of the 231 patients who were unaware they had hyperlipidemia. The relative risk of having a final high density lipoprotein cholesterol result within target range, comparing those who knew they had hyperlipidemia with those who did not, is given in Table 6. As for the intervention study example, log binomial regression and modified Poisson regression produced similar results when the former method converged. In contrast to the previous example, relative risks varied depending on the covariate(s) included in the model, ranging from 1.04 to 1.11. Statistical significance also varied. The results of the model adjusting for all 3 covariates are of most interest because of greater control for confounding compared with models adjusting for only a single covariate. Log binomial regression failed to converge in this case, while modified Poisson regression produced a relative risk of 1.04 (95% confidence interval: 0.94, 1.14), indicating a nonsignificant increase in the probability of a positive health outcome for patients who were aware of their condition.

Covariate Adjustment | Log Binomial | Modified Poisson | ||

Relative Risk | 95% Confidence Interval | Relative Risk | 95% Confidence Interval | |

Gender | Did not converge | 1.06 | 0.96, 1.17 | |

Age | 1.11 | 1.00, 1.22 | 1.10 | 1.00, 1.22 |

Diabetes | 1.06 | 0.97, 1.16 | 1.06 | 0.97, 1.17 |

All of the above | Did not converge | 1.04 | 0.94, 1.14 |

Covariate Adjustment | Log Binomial | Modified Poisson | ||

Relative Risk | 95% Confidence Interval | Relative Risk | 95% Confidence Interval | |

Gender | Did not converge | 1.06 | 0.96, 1.17 | |

Age | 1.11 | 1.00, 1.22 | 1.10 | 1.00, 1.22 |

Diabetes | 1.06 | 0.97, 1.16 | 1.06 | 0.97, 1.17 |

All of the above | Did not converge | 1.04 | 0.94, 1.14 |

Abbreviations: GEE, generalized estimating equation; HDL, high density lipoprotein.

## DISCUSSION

We studied the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data via simulation, using GEEs with an exchangeable correlation structure to account for clustering. This method performed well across a range of settings, including intervention and observational study designs, as well as small or large numbers of clusters. It produced type I error and coverage rates that were close to the nominal levels and mean percent relative biases that remained small across the range of scenarios considered.

Modified Poisson regression gave results similar to those from log binomial regression when the latter model converged, as seen in the example data sets and in the simulation study when adjustment was made for a binary covariate. These results are consistent with previous findings in the context of independent data (5, 9, 15, 16, 18). Modified Poisson regression generally outperformed log binomial regression in terms of bias and coverage in scenarios where log binomial regression suffered most from convergence problems. Log binomial regression failed to converge for up to 99.7% of simulated data sets for a given scenario, highlighting the need to consider alternative methods for estimating relative risks. Surprisingly, modified Poisson regression also failed to converge on rare occasions. Convergence problems have not been observed for this method in the independent data setting (9, 18). We investigated scenarios where modified Poisson regression failed to converge in our simulation study and found that convergence could be achieved if the exchangeable working correlation structure was replaced by an independence structure. In contrast, convergence often remained a problem for log binomial regression when an independence working correlation structure was specified. These findings suggest that modified Poisson regression can overcome the convergence problems of the log binomial model in the clustered data setting. However, different working correlation structures may need to be considered in order to achieve convergence in practice.

Modified Poisson regression was proposed as an alternative to log binomial regression for estimating relative risks in the context of independent data, and its performance in the context of clustered data is only now being investigated. Despite this, the modified Poisson regression approach is already in use for analyzing clustered data, with GEEs often used to account for clustering (19, 20). Our results suggest that application of modified Poisson regression combined with GEEs is appropriate in this setting.

An alternative to using GEEs to account for clustering is to fit a mixed-effects model with a random cluster effect. A distribution must be assumed for the random effects that may be difficult to verify, and misspecification can have a substantial impact on the results (33). An advantage of using GEEs is that the working correlation structure used to account for clustering does not need to be correctly specified in order to produce consistent parameter estimates (21).

Type I error rates were calculated for the simulation study based on a Wald test of the null hypothesis of no treatment/exposure effect. The Wald test was chosen as this could be performed when small sample bias corrections were applied and led to conclusions that were consistent with the 95% confidence intervals for the treatment/exposure effect. However, the score test may be preferable to the Wald test in practice. We investigated type I error rates based on the score test for simulation scenarios, where the number of clusters was 50 and, hence, small sample bias corrections were not applied. The score test produced fewer type I error rates that differed significantly from the nominal level compared with the Wald test but did not alter our conclusions.

Our simulation study had several limitations. First, we did not consider situations where cluster size is informative and, hence, standard GEEs may not be valid. Cluster-weighted GEEs may be more appropriate in this case (34−36) and could be applied to both the log binomial and modified Poisson approach when cluster size is related to the outcome. Second, we did not consider situations where entire clusters are randomized to the same treatment group or have the same exposure status. This work is currently in progress. Finally, we did not consider clustered data arising from repeated measurements taken on the same individuals over time. Modified Poisson regression has been applied to clustered data of this type (37, 38), and its performance in this context is an area worthy of investigation.

In conclusion, log binomial regression can be a useful tool for providing an estimate of the effect of treatment on a binary outcome that is easy to interpret. If log binomial regression fails to converge, relative risks can be estimated by using the modified Poisson regression approach. This approach has previously been shown to work well for analyzing independent data (5, 9, 10, 15–18). Our results support the use of modified Poisson regression for analyzing clustered prospective data when clustering is taken into account by using GEEs.

### Abbreviations

- ACE
angiotensin-converting enzyme

- GEE
generalized estimating equation

- ICC
intracluster correlation coefficient

Author affiliations: Discipline of Public Health, The University of Adelaide, Adelaide, South Australia, Australia (Lisa N. Yelland, Amy B. Salter, Philip Ryan).

This work was supported by a Commonwealth-funded Australian Postgraduate Award (L. N. Y.).

The authors thank the following: members of the Second Australian National Blood Pressure Study Management Committee, which included Lindon Wing, Christopher Reid, Lawrence Beilin, Mark Brown, Garry Jennings, Colin Johnston, Graham MacDonald, John McNeil, John Marley, Trefor Morgan, Philip Ryan, and Malcolm West; members of the Point of Care Testing Trial Management Committee, which included Justin Beilby, Janice Gill, Briony Glastonbury, Roger Killeen, Pamela McKittrick, Caroline Laurence, Mark Shephard, Andrew St John, David Thomas, Phil Tideman, Rosy Tirimacco, and Paul Worley; and Thomas Sullivan for reviewing drafts of the manuscript.

Conflict of interest: none declared.