Institutional Knowledge at Singapore Management University Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method

The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspeciﬁed (Liang and Zeger, 1986). However, the efﬁciency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (Qin and Lawless, 1994). Analyses show that this hybrid method is more efﬁcient than a GEE using a misspeciﬁed working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efﬁcient parameter estimates. In simulations, the hybrid method’s ﬁnite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efﬁcient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.


INTRODUCTION
Generalized estimating equations (GEEs) have been found to be very useful in analysis of correlated and longitudinal outcomes using marginal regression models. Following Liang and Zeger (1986), many aspects of GEE have been explored. Reviews of GEE include Pendergast and others (1996) and Desmond (1997).
In a marginal regression model, the primary interest is in the regression parameters, which characterize the expectations of the subject's response over time. However, in order to make proper inference about the regression parameters, the within-subject covariance (correlation) structures must be taken into consideration. The GEE approach has been popular because estimates of mean parameters remain consistent even if the correlation or the covariance structure is misspecified. On the other hand, accurate modeling of the correlation structure generally improves statistical inference on means (Albert and McShane, 1995;Fitzmaurice, 1995;Hall and Severini, 1998). Wang and Carey (2003) analyzed how efficiency can be affected by (i) the choice of the working correlation structure, (ii) the method by which the working correlation parameters are estimated, and (iii) the layout of the design matrix. Higher moments can be incorporated into estimation using a generalized version of GEE called GEE2 (Liang and others, 1992). However, bias or efficiency losses may be introduced if higher moment assumptions of GEE2 are incorrectly specified. For this reason, GEE2 has yet to receive wide application.
In GEE modeling, the most commonly used working correlation models are the exchangeable, AR(1) and MA(1). Wang and Carey (2003) found that among the 3, AR(1) is the most robust. However, they also demonstrated scenarios where the exchangeable and MA(1) models give better results than the AR(1) model. Therefore, it remains an issue of how to choose a working correlation model in a particular GEE analysis. The AR(1) and MA(1) working correlation models appear to be favored by users of GEE because (i) in most situations, they are sufficient as an approximation to the true correlation structure and (ii) they represent sensible compromises between the independence model (which ignores within-subject correlations) and the completely unstructured model (which requires the estimation of large number of nuisance parameters). These considerations lead us to propose a method that incorporates all 3 working correlation models in a single framework, yielding a method that is efficient if one of these 3 working correlation models correctly captures the true correlation structure, and robust even if none of the working correlation models is correct. The proposed method can be generalized to combining any number of GEEs with working correlation models other than the exchangeable, AR(1) and MA(1) models.
Each GEE with a particular working correlation model is a mean-zero estimating equation under the true parameters. When we are interested in combining multiple GEEs, then there are more estimating equations than the number of parameters. In situations involving independent data where the number of estimating equations may be larger than the number of parameters, Qin and Lawless (1994) showed how to combine efficiently the estimation equations using an empirical likelihood (EL) (Owen, 1988). We exploit this attribute of the EL technique to combine GEEs. The individual GEEs using different working correlations are used as constraints in an EL for the parameters of interest. The parameter estimates are then obtained by maximizing the EL. Other than its role as a tool for combining estimating equations, EL also inherits a number of desirable properties from its parametric counterparts, as described in Owen (2001).
The rest of this paper is organized as follows. Section 2 presents the basic problem, the modeling framework of the proposed method, and its large-sample properties. The results of a simulation study are summarized in Section 3. In Section 4, the method is illustrated using a real data set. Section 5 concludes the paper with a discussion. Detailed simulation results and proofs are given as supplementary material available at Biostatistics online, http://biostatistics.oxfordjournals.org.

COMBINING GEES
Consider a longitudinal study in which there are n subjects, each of whom is measured at K time points. Let y i = (y i1 , . . . , y i K ) T denote the underlying outcome for the ith subject, x i an associated vector of r × 1 covariates, and x ik the value of the covariate at time k. Denote the marginal mean outcome at the 438 D. H. Y. LEUNG AND OTHERS kth measurement for the ith subject by µ ik (β) = g(x T ik β), for a vector of unknown parameters, β. For conciseness, we suppress the explicit association of µ ik with β if there is no confusion.
Following Liang and Zeger (1986), a GEE can be used to estimate the regression parameters, β, where A i is a diagonal matrix representing the variances of y ik , R(α) is a "working correlation" depending on a set of unknown parameters α, and φ is a scale parameter used to model over-dispersion or under-dispersion. Liang and Zeger (1986) showed that, whether or not V i is correctly specified, the estimators of β obtained from (2.1) remain consistent. In addition, if V i = Cov(y i ) can be consistently estimated up to n −1/2 , then the estimator of β is fully efficient. On the other hand, an incorrectly specified V i will lead to a loss of efficiency (Wang and Carey, 2003).
is a function of β only. In general, the dimension of h i is higher than the dimension of β. Our propose is to use EL to combine the estimating equations i , then the EL estimate will be optimal. If none of them is optimal, then the EL estimate is still consistent and combines optimally the information in S 1 i ,. . . ,S J i . In practice, a few popular choices of R(α) may be used; for example, exchangeable, AR(1) and MA(1).
We now describe how to use an EL framework to combine the GEEs in h i . Let F be the distribution function associated with the observations Then, the nonparametric likelihood of the data can be written as n i=1 dF(y i |x i ) ≡ n i=1 p i , subject to the constraints 0 p i 1, i = 1, . . . , n, and n i=1 p i = 1. Without any other information, the maximum nonparametric likelihood estimate of F is the empirical distribution function F n (y i |x i ) = n i=1 I (y i y|x i ), which corresponds to p i = 1/n. However, suppose we know that E(h i (β)) = 0 under F. Then, the empirical distribution function is no longer desirable because E(h i (β)) = 0 under dF n ≡ p i = 1/n. Instead, we can use the (empirical) likelihood Efficient parameter estimation in longitudinal data analysis 439 subject to the constraints In this formulation, maximizing the EL gives a set of depend on the extra conditions E(h i (β)) = 0, which in turn depend on the value of β, the EL estimate of F is sensitive to the value of β.
The EL (2.4) can be maximized as a constrained maximization problem. By introducing Lagrange The values of {p i } n i=1 can be profiled out by differentiating (2.5) with respect to p i to give Using (2.7) and (2.8), η and {p i } n i=1 can be profiled out in the negative log-EL to give (2.10) The maximum EL estimates (β,λ) are the solutions to (2.8) and (2.10). Note that (2.8) consists of J equations for each parameter and (2.10) consists of r equations, so in total there are (J +1)r simultaneous equations to solve. We now give the results for the large-sample behavior of the parameter estimates using the proposed method. THEOREM 2.2 If one of the S 1 i ,. . . ,S J i is the optimal estimating equation, then the EL estimate will be optimal in the sense that it will be equivalent to the GEE estimate with the correct specification of R(α). In that case, as n → ∞, (2.12) In Theorem 2.2, S 1 i ,. . . ,S J i refer to the researcher's "guesses" of the optimal estimating equation. In practice, it is possible that none of the guesses correspond to the optimal estimating equation. We demonstrate that even in that case, the EL method is still optimal in the sense that it optimally combines the guesses S 1 i , . . . , S J i . This fact can be established by considering the following. Expand the left-hand side of (2.8) in a Taylor expansion around λ = 0 to give (2.13) Substitute (2.13) back into the left-hand side of (2.10) to give asymptotically. Expression (2.14) is in the form of the optimal combination of S 1 i ,. . . ,S J i (Small and McLeish, 1994, p 94).
In practice, finding the solution to the maximum EL via (2.8) and (2.10) may encounter numerical problems. Furthermore, solving (2.10) requires finding h β i (β), which is not straightforward analytically. Therefore, we follow the method of Mittelhammer and others (2003) by profiling out the Lagrange multipliers as well, so that for fixed β, the Lagrange multipliers are λ(β) = (λ T 1 (β), . . . , λ T J (β)) T . Given β, the first and second derivatives of (2.9) with respect to λ are (2.15) Therefore, with some abuse of notation, for given β and a starting value λ 0 , the following Newton-Raphson procedure can be used: (2.16) and the solution used as λ(β). Substituting λ(β) back into (2.9) then gives which can be maximized with respect to β. Hence, the algorithm can be seen as a nested algorithm with an outside loop that involves maximizing (2.17) with respect to β, while for each β, the inside loop evaluates λ(β) using (2.16). The overall maximum gives the maximum EL estimateβ. Therefore, instead of solving (J + 1)r simultaneous equations, only a function of r parameters needs to be maximized. This method becomes especially useful when the number of estimating equations, J , is large. In our simulations, we used a simple modification of Owen's S program for the inside loop (http://wwwstat.stanford.edu/∼owen/empirical/). The outside loop was performed using the optim function in R. Details of the program can be found in the supplementary material available at Biostatistics online.

SIMULATIONS
We carried out a simulation study to evaluate the moderate sample properties of the proposed method. Two sets of simulations were used: Set A: x ik , i = 1, . . . , n, k = 1, . . . , 10, are independent and identically distributed as N (1, σ 2 x ) with σ x = 1. In this setup, x ik s are subject-specific covariates that may change over time and are different between subjects but there is no time trend. x R) with R a 10 × 10 matrix with unit diagonal and off-diagonal elements equal to 0.2 and σ x = 0.5. In this setup, the intrasubject covariates are correlated over time.
Each set of simulations was based on 1000 runs. Samples sizes were n = 100 and 200. The following model was used for the mean response at time k for the ith subject, E(y ik ) ≡ µ ik = β 0 − β 1 x ik , k = 1, . . . , 10, i = 1, . . . , n. The true values of (β 0 , β 1 ) were (1, −1). The simulation study shows that the proposed method is nearly as efficient as the standard GEE using the correct working correlation model and is superior to the standard GEE using an incorrect working correlation model. We also evaluated the empirical coverage probability of 95% confidence intervals of (β 0 , β 1 ) using Theorem 2 and found that they are close to the nominal level. Details of the results are given as supplementary material available at Biostatistics online.

APPLICATION TO INDONESIAN CHILDREN'S INFECTION DATA
In this section, we apply the proposed method to data from a longitudinal study of the respiratory infection rate in a group of Indonesian children (Diggle and others, 2002). The sample consists of 275 preschool children examined at 3-month intervals for 18 months. The maximum number of visits is therefore K = 6. In total, the 275 children generated 1200 repeated measures of the response (infection versus no infection). The primary interest in this study is to determine the relationship between respiratory infection and Vitamin A deficiency while adjusting for a number of confounders, as listed in Table 1.
We fitted the data by a GEE using an exchangeable (CS), AR(1), and MA(1) working correlation. We then used the proposed method using R 1 (α) = CS, R 2 (α) = AR(1), and R 3 (α) = MA(1). The results are given in Table 1. Standard errors of the estimates for GEE MA(1) , GEE CS , and GEE AR(1) were obtained from the R routine geese in the geepack package. Those for the proposed method were estimated using Theorem 1. The results using the 4 methods are quite similar. The conclusions from all methods are the same, that there is no evidence of increased risk for infection due to xerophthalmia. These conclusions are similar to those in earlier studies (e.g. Zeger and Karim, 1991;Lin and Carroll, 2001).
As a means to compare the merits of the different models, we used Akaike's information criterion (AIC) for GEE as developed by Pan (2001) ). Then, to assess the merits of a model with parameter estimatesβ obtained using a working correlation matrix R, the AIC is defined as −2Q(β) + 2 trace(ˆ −1 Iˆ R ), whereˆ −1 I is the inverse of the variance of the model coefficients under an independence working correlation andˆ R is that under working correlation R. The AIC values for the 4 methods are given in Table 2. In Table 2, we also give the second term of the AIC, that is, 2 trace(ˆ −1 Iˆ R ), which has been shown in Hin and Wang (2009) to be more accurate in capturing the true correlation structure. The method proposed in this paper has the lowest AIC value and  the lowest value in 2 trace(ˆ −1 Iˆ R ) and therefore, by these measures, is the most preferred method for this data set.

CONCLUSION AND FUTURE RESEARCH
We have introduced a method for combining GEEs in analyzing longitudinal data so as to improve the efficiency of the GEE method when, as is typically the case in practice, correct specification of the correlation structure is problematic.
Validity of the GEE approach requires correct specification of the mean function. If some observations are missing completely at random (Little and Rubin, 1987), the mean function is not affected, and in those cases, the method proposed here remains valid. However, if the missingness probability depends on the observed responses (missing at random) or on the missing responses conditional on the observed responses (nonignorable missingness), then correct modeling of the missingness probability is required for the GEE approach, and therefore our method, to be valid. However, correct specification of the missingness probability is a nontestable condition (Gill and others, 1997;Manski, 2003), therefore, if missing at random or nonignorable missingness are suspected, some sort of sensitivity analysis is necessary.
A related method to the one proposed in this paper is the quadratic inference function (QIF) by Qu et al. (2000). In their work, the inverse of the working correlation matrix is approximated by a linear combination of basis matrices, M i , i = 1, . . . , m, such as R(α) −1 ≈ a 0 M 0 + a 1 M 1 + · · · + a m M m , (5.1) where M 1 = I K ×K is an identity matrix and M i , i = 2, . . . , m, are known symmetric matrices. Instead of estimating a 0 , . . . , a m directly, they recognized that a GEE based on (5.1) is equivalent to solving the linear combination of a vector of estimating equations: which can be performed using the generalized method of moments (Hansen, 1982). Their method giveŝ β QIF = arg min β Q n (β) ≡ g T n (β)C −1 n (β)g n (β), where C n (β) = 1/n 2 n i=1 g i (β)g T i (β) is an estimate of the variance of g n (β). We used a modest simulation study to compare our method to QIF and found that the 2 methods give very similar results throughout when the true correlation structure is AR(1), whereas for the CS structure there does seem to be a substantial difference in favor of EL when α = 0.7. Detailed results of the study are given as supplementary material available at Biostatistics online. However, we view these results as preliminary. More work needs to be done to compare these 2 related methods. Finally, our proposed method is motivated on combining GEEs to find an optimal combination of working correlations for a single data set. There are also situations where multiple longitudinal studies are to be combined in a single analysis, for example, in a meta-analysis or multicenter study (e.g. Inoue and others, 2004). In that case, S 1 (β), . . . , S J (β) may be viewed as GEEs from the different studies that share a common parameter β of interest. The difference between that situation and the one considered here is the multiple samples in the former. The method proposed here can be modified using a multiple sample EL.

ACKNOWLEDGMENTS
We thank Dr Annie Qu and Ms Guei-feng (Cindy) Tsai for their valuable comments. We also thank the referees and the coeditor for their valuable comments.