Abstract

Longitudinal data often contain missing observations and error-prone covariates. Extensive attention has been directed to analysis methods to adjust for the bias induced by missing observations. There is relatively little work on investigating the effects of covariate measurement error on estimation of the response parameters, especially on simultaneously accounting for the biases induced by both missing values and mismeasured covariates. It is not clear what the impact of ignoring measurement error is when analyzing longitudinal data with both missing observations and error-prone covariates. In this article, we study the effects of covariate measurement error on estimation of the response parameters for longitudinal studies. We develop an inference method that adjusts for the biases induced by measurement error as well as by missingness. The proposed method does not require the full specification of the distribution of the response vector but only requires modeling its mean and variance structures. Furthermore, the proposed method employs the so-called functional modeling strategy to handle the covariate process, with the distribution of covariates left unspecified. These features, plus the simplicity of implementation, make the proposed method very attractive. In this paper, we establish the asymptotic properties for the resulting estimators. With the proposed method, we conduct sensitivity analyses on a cohort data set arising from the Framingham Heart Study. Simulation studies are carried out to evaluate the impact of ignoring covariate measurement error and to assess the performance of the proposed method.

1. INTRODUCTION

Longitudinal studies are commonly conducted in the health sciences, biochemical, and epidemiology fields. Although longitudinal studies are designed to collect data on every individual in the studies at each assessment, missing observations often arise due to various reasons. There has been increasing interest in discussing valid inference methods for longitudinal data with missing values. Yet, there is relatively little work on investigating the effects of covariate measurement error on estimation of the response parameters, especially on simultaneously accounting for the biases induced by both missing values and mismeasured covariates. Measurement error in covariates is, however, a typical feature of longitudinal data. Sometimes, covariates of interest may be difficult to observe precisely due to physical location or cost. Sometimes, it is impossible to measure covariates accurately due to the nature of the covariates. In other situations, a covariate may represent an average of a certain quantity over time (e.g. cholesterol level (CHOL)) and any practical way of measuring such a quantity necessarily features measurement error.

It has been recognized and well documented that, in other contexts, ignoring covariate measurement error may lead to severe biased results. For example, Fuller (1987) pointed out that the slope in a simple linear regression model may be attenuated if covariate measurement error is ignored. For survival data analysis, Prentice (1982), Li and Lin (2003), Yi and He (2006), and Yi and Lawless (2007), among others, investigated measurement error effects and developed inference methods to correct for the bias resulted from measurement error in covariates. For an overview of measurement error problems, see Carroll and others (2006).

In this paper, we investigate the impact of covariate measurement error on longitudinal data analysis. This work is motivated by the need of methods to simultaneously address both missingness and measurement error that are often possessed by longitudinal data. For example, a data set arising from the Framingham Heart Study contains error-prone covariates and a portion of subjects who drop out of the study during the follow-up period. An objective of this study is to understand how obesity is associated with covariates such as age, blood pressure, and CHOL. It is well known that individual measurement for blood pressure and CHOL involves substantial measurement error. Here, the true measurements of these covariates are defined as their long-term average values. The measurement at a specific time point would fluctuate with time, seasonal variation, and other confounding factors. The features of measurement error in covariates and dropout present a challenge to the existing inference methods. In this paper, we develop an inference method for analyzing longitudinal data that have both dropout and error-contaminated covariates. We utilize marginal methods to modulate the response process. A functional method for the measurement error process is employed. Such a method is appealing because it does not require the specification of the covariate distribution.

The remainder is organized as follows. Notation and model setup are introduced in Section 2. In Section 3, we discuss a simulation–extrapolation (SIMEX) method to account for both dropout and covariate measurement error. A data set arising from the Framingham Heart Study is analyzed with the proposed method and the results are reported in Section 4. In Section 5, we conduct simulation studies to assess the performance of the proposed method as well as the impact of ignoring measurement error in covariates. General discussion is included in Section 6.

2. NOTATION AND MODEL SETUP

2.1 Response process

Longitudinal data analysis may typically be conducted based on marginal, random-effects, and transitional models (Diggle and others, 2002). In this paper, we focus on marginal analysis with the primary interest centered on the marginal mean parameters. Let Yij be the response variable for subject i at time point j, xij be the covariate vector subject to error, and zij be the vector of error-free covariates, i=1,2,…,n and j=1,2,…,m. Denote Yi=(Yi1,Yi2,…,Yim)′, xi=(xi1′,xi2′,…,xim′)′, and zi=(zi1′,zi2′,…,zim′)′. Let μij=E(Yij|xi,zi) and vij=var(Yij|xi,zi) be the conditional expectation and variance of Yij, respectively, given the covariates xi and zi.

We model the influence of the covariates on the marginal response mean by means of a regression model 
graphic
(2.1)
where β=(βx′,βz′)′ is the vector of regression parameters of dimension p, say, and g(·) is a known monotone function. If necessary, the intercept may be included in βz by adding a unit vector to covariates zi. Furthermore, assume vij=h(μij;φ), where h(·;·) is a known function and φ is the dispersion parameter that is known or may be estimated. We treat φ as known here with emphasis on estimation of β.

Here, we assume that the dependence of mean μij on the subject-level covariates xi and zi is completely reflected by the time-specific covariates xij and zij, that is, E(Yij|xi,zi)=E(Yij|xij,zij). This assumption has been widely adopted in modeling longitudinal data, see Diggle and Kenward (1994), Robins and others (1995), Cook and others (2004), and Yi and Thompson (2005), for example. This assumption was noted in Pepe and Anderson (1994) and was justified from the viewpoint of formulating unbiased estimating functions. Model (2.1) may consist of baseline covariates such as gender, age, and treatment status or time-varying covariates. With an exogenous covariate process (i.e. a time-varying covariate that is not predicted by past outcomes), properly including current or lagged values of the covariates may meet this assumption (e.g. Miglioretti and Heagerty, 2004). Both cross-sectional and longitudinal effects of time-varying covariates may be featured in model (2.1). See Diggle and others (2002, Chapter 12) for more detailed discussion.

2.2 Missing data process

Let Rij be 1 if Yij is observed and 0 otherwise. Let Ri=(Ri1,Ri2,…,Rim)′ be the vector of (non)missing data indicators, i=1,2,…,n. Dropouts or monotone missing data patterns are considered here. That is, Rij=0 implies Rij=0 for all j′>j. Without loss of generality, assume that Ri1=1 for every subject i. According to the dependence of the missing data process on the response process, missing data mechanisms may be classified as missing completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR) (e.g. Kenward, 1998).

In this paper, we assume an MAR mechanism for the dropout process. That is, given the covariates, the conditional distribution f(ri|xi,zi,yi) depends on the observed response components yiobs and the covariates only. Let λij=P(Rij=1|Ri,j−1=1,xi,zi,yi) and πij=P(Rij=1|xi,zi,yi). Note that πij=∏t=2jλit. Let Hijy={yi1,…,yi,j−1} denote the response history up to (but not including) time point j.

Logistic regression models are commonly used to model the dropout process (e.g. Diggle and Kenward, 1994; Robins and others, 1995), namely, 
graphic
(2.2)
where uij is the vector consisting of the information of the covariates xi and zi and the observed responses Hijy, and α is the vector of regression parameters.

Let Mi be the random dropout time for subject i and mi be a realization, i=1,2,…,n. Define Li(α)=(1−λimi)∏t=2mi−1λit, where λit is determined by model (2.2). Let Si(α)=logLi(α)/α be the vector of score functions contributed from subject i. Denote θ=(α′,β′)′ and q=dim(θ).

2.3 Measurement error process

Let Wij be an observed measurement of the covariate xij, i=1,2,…,n,j=1,2,…,m. xij and Wij are assumed to follow a classical additive measurement error model. That is, conditional on xi and zi, 
graphic
(2.3)
where the error terms eij′ are assumed to follow N(0,Σe) with Σe being the covariance matrix (e.g. Wang and others, 1998).

It is known that nonidentifiability is often a problem if model (2.3) is employed. For identifiability of model parameters, one needs a validation data set consisting of {Yij,xij,Wij,zij} or repeated measurements Wij to estimate the parameters associated with Σe. If neither validation data nor repeated measurements Wij are available, then one may conduct sensitivity analyses based on background information about the measurement process to assess the impact of different degrees of measurement error on estimation of β (Yi and Lawless, 2007). In this paper, error distribution parameters are assumed known.

3. INFERENCE PROCEDURES

3.1 Weighted estimation functions

The inverse probability weighted generalized estimating equation (IPWGEE) method is often employed to account for the bias induced by the incompleteness of data (Robins and others, 1995) when primary interest lies in the marginal mean parameters β in model (2.1). For i=1,2,…,n, let Di=μi′/β be the matrix of the derivatives of the mean vector μi with respect to β and Δi=diag(I(Rij=1)/πij,j=1,2,…,m) be the weight matrix accommodating missingness, where I(·) is the indicator function. Let Vi=Ai1/2CiAi1/2 be the covariance matrix of Yi, where Ai=diag(vij,j=1,2,…,m) and Ci=[ρi;jk] is the correlation matrix with ρi;jk being the correlation coefficient of response components Yij and Yik, for jk, and ρi;jj=1. For i=1,2,…,n, define Ui(θ) = DiVi−1Δi(Yiμi) and Hi(θ)=(Ui(θ),Si(α))′.

In the absence of measurement error, that is, covariates xij are precisely observed, E[Hi(θ)]=0; hence, H(θ)=∑i=1nHi(θ) are unbiased estimation functions for θ. Consistent estimator forumla of θ can be obtained by solving 
graphic
(3.1)
where moment estimates may be used for the correlation matrix Ci or, alternatively, a working independence matrix Ai may be used to replace Vi.

3.1 SIMEX approach

When measurement error is present in covariates xij, H(θ) is no longer unbiased if replacing xij with its observed measurement Wij. A proper adjustment is needed to account for the bias induced by using Wij. In the sequel, we describe the SIMEX method for the adjustment. Let B be a given positive integer and Λ={λ1,λ2,…,λM} be a sequence of nonnegative numbers taken from [0,λM] with λ1=0.

1. Simulation step

For i=1,2,…,n and j=1,2,…,m, generate eijbN(0,Σe)for b=1,2,…,B. Given λΛ, set Wij(b,λ)=Wij+forumla.

2. Estimation step

For given λ and b, we obtain an estimate forumla(b,λ) by solving (3.1) with xij replaced by Wij(b,λ). This step can be quickly implemented using SAS GENMOD procedure to the data set {Yi,Wi(b,λ),zi:i=1,2,…,n}. The model-based covariance matrix for forumla(b,λ) is given by 
graphic
where forumla and forumla.

Denote by forumla(b,λ) the rth diagonal element of forumla(b,λ) and forumla(b,λ) the rth component of forumla(b,λ), r=1,2,…,q. Define forumla(λ)=B−1b=1Bforumla(b,λ), forumla(λ)=B−1b=1Bforumla(b,λ), forumla(λ)=(B −1)−1b=1B(forumla(b,λ)−forumla(λ))2, and forumla(λ)=forumla(λ)−forumla(λ).

3. Extrapolation step

For r=1,2,…,q, fit a regression model to each of the sequences {(λ,forumla(λ)): λΛ} and {(λ,forumla(λ)):λΛ}, respectively, and extrapolate it to λ=−1 with forumla and forumla denoting the corresponding predicted values. Then, forumla=(forumla1,forumla2,…,forumlaq)′ is the SIMEX estimator of θ and forumla is the associated standard error for the estimator forumla(r=1,2,…,q).

The SIMEX approach is a simulation-based method that was proposed by Cook and Stefanski (1994) for parametric measurement error models. Its idea can be intuitively illustrated with simple linear regression. Suppose that the regression model is given by Y=β0+βxx+ϵ, where ϵ has mean 0. If replacing x with its observed measurement W, modeled by W=x+e with e having mean 0 and variance σ2, then the resulting least squares estimator forumla for βx converges in probability to βx*=(σx2/(σx2+σ2))βx (Fuller, 1987). Here σx2 is the variance of x. Intuitively, if replacing x with W+forumlaσeb, where eb is generated from N(0,1), then the resulting estimator forumla(b,λ) converges in probability to βx*(b,λ)=(σx2/(σx2+(1+λ)σ2))βx. If λ=0, forumla(b,0) is just the naive estimator forumla. However, if λ=−1 then the limit βx*(b,−1) is identical to the true parameter βx.

For univariate parametric models, Carroll and others (1996) established the asymptotic normality for the SIMEX estimator. However, their results cannot directly apply here because the current development involves multiple response outcomes along with an additional process concerning the missing data indicators. If the exact extrapolation function is used in Step 3 above, we may establish the following asymptotic distribution for the SIMEX estimator forumla. The proof is outlined in the Appendix.

 

THEOREM:
Under regularity conditions, 
graphic
where Gγ(γ; −1) and Q(γ) are defined in the Appendix. Hence, forumla has an asymptotic normal distribution with mean 0 and covariance matrix being the upper p × p matrix of forumla.

4. AN EXAMPLE

As an illustration, we apply the proposed method to analyze cohort 2 subset of GAW13 (Genetic Analysis Workshops) data arising from the Framingham Heart Study. The data set consists of the measurements for 1672 patients from a series of exams with 5 assessments designed for each individual. Measurements such as height, weight, age, systolic blood pressure (SBP), and CHOL are collected at each assessment. About 24% patients dropped out of the study.

It is of interest to study how an individual's obesity changes with age and how it is associated with SBP and CHOL. Practically, it is convenient and cost effect to use body mass index (BMI), which is defined as weight (kg)/height2 (m2), to estimate adiposity that correlates well with more direct and invasive measures of percentage body fat (Strug and others, 2003). Here, following Yoo and others (2003), we let Y be the binary response variable indicating obesity status of a subject, which takes value 1 if his/her maximum BMI (Max BMI) over all ages is no less than the 90th percentile of the Max BMI values observed in each replicate being analyzed and 0 otherwise. The responses and the covariates are postulated by the logistic regression model 
graphic
where xij1 represents SBP, rescaled as log(SBP − 50) as in Carroll and others (2006), xij2 is the standardized CHOL, and zij is AGE for subject i at time point j, respectively.
It is well known that both SBP and CHOL are subject to substantial measurement error. We are concerned how measurement error in SBP and CHOL impacts estimation of parameter β = (β0, βx1, βx2, βz)′, and hence, we conduct sensitivity analyses here. Let Wij = (Wij1, Wij2)′ and xij = (xij1, xij2)′. Assume that the error model is given by (2.3) with forumla. σ1 and σ2 are specified as 0, 0.5, and 1.0 to feature scenarios with different degrees of measurement error in SBP and CHOL. Distinct values for ρ are considered to facilitate different strengths in correlation. The missing data process is characterized by the logistic regression model 
graphic
(4.1)

Three analyses are conducted here. Analysis 1 ignores measurement error in SBP and CHOL with Xi naively replaced by Wi when using (3.1), Analysis 2 accounts for measurement error in the response model but not in the missing data model, while Analysis 3 addresses measurement error in both the response and the missing data models. In implementing the SIMEX method, we choose B = 200, M = 9, and a quadratic regression for each extrapolation step.

The analyses show that only α4 in model (4.1) is statistically significant under various situations considered for error model (2.3). Other coefficients such as α1, α2, and α3 are all not statistically significant. The results suggest that the dropout rate increases as the subjects become older. Dropout probability does not depend on the previous obesity status, SBP, or CHOL.

We conduct the analyses for ρ = 0 and ρ = 0.5. Table 1 reports the results for the case with ρ = 0. It is not surprising that the 3 analyses give rise to very similar results when there is no measurement error present in SBP and CHOL. When measurement error does exits, it can be seen that the estimates and associated standard errors may be considerably impacted by different degrees of measurement error in SBP or CHOL. If there is no error in SBP (i.e. σ1 = 0), both CHOL and AGE are not statistically significant, whereas SBP has a significant positive effect no matter what degree of measurement error is involved in CHOL.

Table 1.

Sensitivity analyses of the data from the Framingham Heart Study

σ1 σ2 Analysis βx1 βx2 βz 
   Bias SE p-value Bias SE p-value Bias SE p-value 
0.00 0.00 2.9465 0.3103 < 0.0001 0.0904 0.0852 0.2886 − 0.0067 0.0057 0.2427 
  2.9465 0.3119 < 0.0001 0.0904 0.0854 0.2897 − 0.0067 0.0057 0.2450 
  2.9465 0.3103 < 0.0001 0.0904 0.0852 0.2886 − 0.0067 0.0057 0.2427 
0.00 0.50 2.9827 0.3085 < 0.0001 0.0419 0.0721 0.5614 − 0.0060 0.0057 0.2937 
  2.9736 0.3119 < 0.0001 0.0541 0.0871 0.5341 − 0.0061 0.0057 0.2820 
  2.9737 0.3102 < 0.0001 0.0541 0.0868 0.5334 − 0.0061 0.0057 0.2792 
0.00 1.00 3.0069 0.3068 < 0.0001 0.0072 0.0503 0.8859 − 0.0055 0.0057 0.3372 
  3.0016 0.3100 < 0.0001 0.0140 0.0706 0.8434 − 0.0056 0.0057 0.3285 

 

 
3
 
3.0017
 
0.3083
 
< 0.0001
 
0.0140
 
0.0704
 
0.8426
 
− 0.0056
 
0.0057
 
0.3262
 
0.50 0.00 0.2828 0.0897 0.0016 0.1751 0.0797 0.0280 0.0121 0.0053 0.0232 
  0.5050 0.1346 0.0002 0.1654 0.0802 0.0391 0.0106 0.0053 0.0455 
  0.5051 0.1343 0.0002 0.1654 0.0799 0.0385 0.0106 0.0052 0.0441 
0.50 0.50 0.2316 0.0968 0.0167 0.0797 0.0728 0.2737 0.0144 0.0053 0.0063 
  0.5182 0.1335 0.0001 0.1277 0.0820 0.1194 0.0112 0.0053 0.0337 
  0.5183 0.1332 < 0.0001 0.1276 0.0817 0.1181 0.0112 0.0052 0.0324 
0.50 1.00 0.2599 0.1018 0.0107 0.0088 0.0538 0.8701 0.0157 0.0053 0.0030 
  0.5331 0.1333 < 0.0001 0.0703 0.0676 0.2988 0.0123 0.0052 0.0188 

 

 
3
 
0.5331
 
0.1330
 
< 0.0001
 
0.0703
 
0.0674
 
0.2966
 
0.0123
 
0.0052
 
0.0180
 
1.00 0.00 0.0412 0.0454 0.3648 0.1852 0.0794 0.0196 0.0137 0.0054 0.0107 
  0.0801 0.0693 0.2477 0.1830 0.0797 0.0216 0.0135 0.0054 0.0121 
  0.0802 0.0692 0.2464 0.1831 0.0793 0.0210 0.0135 0.0053 0.0116 
1.00 0.50 0.0073 0.0488 0.8809 0.1074 0.0719 0.1351 0.0156 0.0053 0.0035 
  0.0858 0.0688 0.2128 0.1433 0.0817 0.0794 0.0142 0.0054 0.0080 
  0.0858 0.0686 0.2114 0.1432 0.0813 0.0780 0.0142 0.0053 0.0076 
1.00 1.00 0.0112 0.0517 0.8285 0.0414 0.0535 0.4388 0.0169 0.0053 0.0015 
  0.0917 0.0688 0.1828 0.0811 0.0675 0.2296 0.0154 0.0053 0.0036 
  0.0917 0.0686 0.1814 0.0811 0.0671 0.2271 0.0154 0.0053 0.0034 
σ1 σ2 Analysis βx1 βx2 βz 
   Bias SE p-value Bias SE p-value Bias SE p-value 
0.00 0.00 2.9465 0.3103 < 0.0001 0.0904 0.0852 0.2886 − 0.0067 0.0057 0.2427 
  2.9465 0.3119 < 0.0001 0.0904 0.0854 0.2897 − 0.0067 0.0057 0.2450 
  2.9465 0.3103 < 0.0001 0.0904 0.0852 0.2886 − 0.0067 0.0057 0.2427 
0.00 0.50 2.9827 0.3085 < 0.0001 0.0419 0.0721 0.5614 − 0.0060 0.0057 0.2937 
  2.9736 0.3119 < 0.0001 0.0541 0.0871 0.5341 − 0.0061 0.0057 0.2820 
  2.9737 0.3102 < 0.0001 0.0541 0.0868 0.5334 − 0.0061 0.0057 0.2792 
0.00 1.00 3.0069 0.3068 < 0.0001 0.0072 0.0503 0.8859 − 0.0055 0.0057 0.3372 
  3.0016 0.3100 < 0.0001 0.0140 0.0706 0.8434 − 0.0056 0.0057 0.3285 

 

 
3
 
3.0017
 
0.3083
 
< 0.0001
 
0.0140
 
0.0704
 
0.8426
 
− 0.0056
 
0.0057
 
0.3262
 
0.50 0.00 0.2828 0.0897 0.0016 0.1751 0.0797 0.0280 0.0121 0.0053 0.0232 
  0.5050 0.1346 0.0002 0.1654 0.0802 0.0391 0.0106 0.0053 0.0455 
  0.5051 0.1343 0.0002 0.1654 0.0799 0.0385 0.0106 0.0052 0.0441 
0.50 0.50 0.2316 0.0968 0.0167 0.0797 0.0728 0.2737 0.0144 0.0053 0.0063 
  0.5182 0.1335 0.0001 0.1277 0.0820 0.1194 0.0112 0.0053 0.0337 
  0.5183 0.1332 < 0.0001 0.1276 0.0817 0.1181 0.0112 0.0052 0.0324 
0.50 1.00 0.2599 0.1018 0.0107 0.0088 0.0538 0.8701 0.0157 0.0053 0.0030 
  0.5331 0.1333 < 0.0001 0.0703 0.0676 0.2988 0.0123 0.0052 0.0188 

 

 
3
 
0.5331
 
0.1330
 
< 0.0001
 
0.0703
 
0.0674
 
0.2966
 
0.0123
 
0.0052
 
0.0180
 
1.00 0.00 0.0412 0.0454 0.3648 0.1852 0.0794 0.0196 0.0137 0.0054 0.0107 
  0.0801 0.0693 0.2477 0.1830 0.0797 0.0216 0.0135 0.0054 0.0121 
  0.0802 0.0692 0.2464 0.1831 0.0793 0.0210 0.0135 0.0053 0.0116 
1.00 0.50 0.0073 0.0488 0.8809 0.1074 0.0719 0.1351 0.0156 0.0053 0.0035 
  0.0858 0.0688 0.2128 0.1433 0.0817 0.0794 0.0142 0.0054 0.0080 
  0.0858 0.0686 0.2114 0.1432 0.0813 0.0780 0.0142 0.0053 0.0076 
1.00 1.00 0.0112 0.0517 0.8285 0.0414 0.0535 0.4388 0.0169 0.0053 0.0015 
  0.0917 0.0688 0.1828 0.0811 0.0675 0.2296 0.0154 0.0053 0.0036 
  0.0917 0.0686 0.1814 0.0811 0.0671 0.2271 0.0154 0.0053 0.0034 
Table 1.

Sensitivity analyses of the data from the Framingham Heart Study

σ1 σ2 Analysis βx1 βx2 βz 
   Bias SE p-value Bias SE p-value Bias SE p-value 
0.00 0.00 2.9465 0.3103 < 0.0001 0.0904 0.0852 0.2886 − 0.0067 0.0057 0.2427 
  2.9465 0.3119 < 0.0001 0.0904 0.0854 0.2897 − 0.0067 0.0057 0.2450 
  2.9465 0.3103 < 0.0001 0.0904 0.0852 0.2886 − 0.0067 0.0057 0.2427 
0.00 0.50 2.9827 0.3085 < 0.0001 0.0419 0.0721 0.5614 − 0.0060 0.0057 0.2937 
  2.9736 0.3119 < 0.0001 0.0541 0.0871 0.5341 − 0.0061 0.0057 0.2820 
  2.9737 0.3102 < 0.0001 0.0541 0.0868 0.5334 − 0.0061 0.0057 0.2792 
0.00 1.00 3.0069 0.3068 < 0.0001 0.0072 0.0503 0.8859 − 0.0055 0.0057 0.3372 
  3.0016 0.3100 < 0.0001 0.0140 0.0706 0.8434 − 0.0056 0.0057 0.3285 

 

 
3
 
3.0017
 
0.3083
 
< 0.0001
 
0.0140
 
0.0704
 
0.8426
 
− 0.0056
 
0.0057
 
0.3262
 
0.50 0.00 0.2828 0.0897 0.0016 0.1751 0.0797 0.0280 0.0121 0.0053 0.0232 
  0.5050 0.1346 0.0002 0.1654 0.0802 0.0391 0.0106 0.0053 0.0455 
  0.5051 0.1343 0.0002 0.1654 0.0799 0.0385 0.0106 0.0052 0.0441 
0.50 0.50 0.2316 0.0968 0.0167 0.0797 0.0728 0.2737 0.0144 0.0053 0.0063 
  0.5182 0.1335 0.0001 0.1277 0.0820 0.1194 0.0112 0.0053 0.0337 
  0.5183 0.1332 < 0.0001 0.1276 0.0817 0.1181 0.0112 0.0052 0.0324 
0.50 1.00 0.2599 0.1018 0.0107 0.0088 0.0538 0.8701 0.0157 0.0053 0.0030 
  0.5331 0.1333 < 0.0001 0.0703 0.0676 0.2988 0.0123 0.0052 0.0188 

 

 
3
 
0.5331
 
0.1330
 
< 0.0001
 
0.0703
 
0.0674
 
0.2966
 
0.0123
 
0.0052
 
0.0180
 
1.00 0.00 0.0412 0.0454 0.3648 0.1852 0.0794 0.0196 0.0137 0.0054 0.0107 
  0.0801 0.0693 0.2477 0.1830 0.0797 0.0216 0.0135 0.0054 0.0121 
  0.0802 0.0692 0.2464 0.1831 0.0793 0.0210 0.0135 0.0053 0.0116 
1.00 0.50 0.0073 0.0488 0.8809 0.1074 0.0719 0.1351 0.0156 0.0053 0.0035 
  0.0858 0.0688 0.2128 0.1433 0.0817 0.0794 0.0142 0.0054 0.0080 
  0.0858 0.0686 0.2114 0.1432 0.0813 0.0780 0.0142 0.0053 0.0076 
1.00 1.00 0.0112 0.0517 0.8285 0.0414 0.0535 0.4388 0.0169 0.0053 0.0015 
  0.0917 0.0688 0.1828 0.0811 0.0675 0.2296 0.0154 0.0053 0.0036 
  0.0917 0.0686 0.1814 0.0811 0.0671 0.2271 0.0154 0.0053 0.0034 
σ1 σ2 Analysis βx1 βx2 βz 
   Bias SE p-value Bias SE p-value Bias SE p-value 
0.00 0.00 2.9465 0.3103 < 0.0001 0.0904 0.0852 0.2886 − 0.0067 0.0057 0.2427 
  2.9465 0.3119 < 0.0001 0.0904 0.0854 0.2897 − 0.0067 0.0057 0.2450 
  2.9465 0.3103 < 0.0001 0.0904 0.0852 0.2886 − 0.0067 0.0057 0.2427 
0.00 0.50 2.9827 0.3085 < 0.0001 0.0419 0.0721 0.5614 − 0.0060 0.0057 0.2937 
  2.9736 0.3119 < 0.0001 0.0541 0.0871 0.5341 − 0.0061 0.0057 0.2820 
  2.9737 0.3102 < 0.0001 0.0541 0.0868 0.5334 − 0.0061 0.0057 0.2792 
0.00 1.00 3.0069 0.3068 < 0.0001 0.0072 0.0503 0.8859 − 0.0055 0.0057 0.3372 
  3.0016 0.3100 < 0.0001 0.0140 0.0706 0.8434 − 0.0056 0.0057 0.3285 

 

 
3
 
3.0017
 
0.3083
 
< 0.0001
 
0.0140
 
0.0704
 
0.8426
 
− 0.0056
 
0.0057
 
0.3262
 
0.50 0.00 0.2828 0.0897 0.0016 0.1751 0.0797 0.0280 0.0121 0.0053 0.0232 
  0.5050 0.1346 0.0002 0.1654 0.0802 0.0391 0.0106 0.0053 0.0455 
  0.5051 0.1343 0.0002 0.1654 0.0799 0.0385 0.0106 0.0052 0.0441 
0.50 0.50 0.2316 0.0968 0.0167 0.0797 0.0728 0.2737 0.0144 0.0053 0.0063 
  0.5182 0.1335 0.0001 0.1277 0.0820 0.1194 0.0112 0.0053 0.0337 
  0.5183 0.1332 < 0.0001 0.1276 0.0817 0.1181 0.0112 0.0052 0.0324 
0.50 1.00 0.2599 0.1018 0.0107 0.0088 0.0538 0.8701 0.0157 0.0053 0.0030 
  0.5331 0.1333 < 0.0001 0.0703 0.0676 0.2988 0.0123 0.0052 0.0188 

 

 
3
 
0.5331
 
0.1330
 
< 0.0001
 
0.0703
 
0.0674
 
0.2966
 
0.0123
 
0.0052
 
0.0180
 
1.00 0.00 0.0412 0.0454 0.3648 0.1852 0.0794 0.0196 0.0137 0.0054 0.0107 
  0.0801 0.0693 0.2477 0.1830 0.0797 0.0216 0.0135 0.0054 0.0121 
  0.0802 0.0692 0.2464 0.1831 0.0793 0.0210 0.0135 0.0053 0.0116 
1.00 0.50 0.0073 0.0488 0.8809 0.1074 0.0719 0.1351 0.0156 0.0053 0.0035 
  0.0858 0.0688 0.2128 0.1433 0.0817 0.0794 0.0142 0.0054 0.0080 
  0.0858 0.0686 0.2114 0.1432 0.0813 0.0780 0.0142 0.0053 0.0076 
1.00 1.00 0.0112 0.0517 0.8285 0.0414 0.0535 0.4388 0.0169 0.0053 0.0015 
  0.0917 0.0688 0.1828 0.0811 0.0675 0.2296 0.0154 0.0053 0.0036 
  0.0917 0.0686 0.1814 0.0811 0.0671 0.2271 0.0154 0.0053 0.0034 

If there is moderate error in SBP (i.e. σ1 = 0.5), the 3 analyses still suggest that SBP has significant positive effect on obesity. In contrast to the case with no error in SBP, AGE is found to be statistically significant by the 3 analyses and evidence tends to become stronger as error in CHOL is more substantial. However, the nature of CHOL depends on whether or not there is error in CHOL. If there is no error in CHOL, there is moderate evidence to support that CHOL has a positive effect on obesity; otherwise, CHOL is not statistically significant.

When measurement error in SBP becomes more severe (i.e. σ1 = 1.0), the effect of SBP is no longer significant indicated by the 3 analyses. Again, AGE would have a positive effect and evidence tends to become stronger as error in CHOL increases. CHOL tends to be statistically significant if error in CHOL is none or moderate; if the error in CHOL becomes larger, there is no evidence to support the effect of CHOL.

To save space, we do not display the results for ρ = 0.5 but just comment on the findings here. It seems that moderate correlation ρ tends to decrease the estimates for the effects of both SBP and CHOL but to increase associated standard errors, hence leading to increasing p-values. However, the impact of correlation ρ on AGE effect is different. Moderate correlation ρ tends to increase the estimates of AGE effect while maintaining very stable standard errors, thus the resulting p-values become smaller.

5. SIMULATION STUDIES

In this section, we conduct simulation studies to investigate the impact of ignoring measurement error on estimation and to compare the performance of the 3 analyses discussed in Section 4. The same configurations as those in Section 4 are used when implementing the SIMEX method.

In the following simulation study, we set n = 200 and m = 3 and generate 200 simulations for each parameter configuration. Consider the logistic regression 
graphic
where zij takes values 0 or 1 with probability 1/2 representing that each subject is randomized to a control or treatment group. Independent of zij, xij = (xij1, xij2)′ is generated from N(μx, Σx), where μx = (μx1, μx2)′ and forumla with μxr = 0.5 and σxr = 1.0 (r = 1, 2). Set βx1 = log(1.5), βx2 = log(1.5), and βz = log(0.75). The surrogate value Wij = (Wij1, Wij2)′ is generated from the normal distribution N(xij, Σe) with forumla Various configurations are considered to feature distinct scenarios of measurement error in covariate xij. Specifically, we consider σ1, σ2 = 0.15, 0.50, and 0.75 to feature minor, moderate, and severe marginal measurement errors. ρx and ρ are specified as 0.5 to represent the cases with moderate correlations. The missing data indicator is generated from model (4.1), where we set α0 = α1 = 0.5, α2 = α3 = 0.1, and αz = 0.2.

In Table 2, we report on the results of the difference of the average of the estimates and the true value (Bias), the empirical standard error (SE), and the coverage rate (CR in percent) for 95% confidence intervals. If measurement error is minor, for instance, when both σ1 and σ2 are 0.15, even Analysis 1 may give rise to reasonable results with fairly small finite-sample biases and CRs that are close to the nominal level 95%. The 3 analyses provide fairly comparable results.

Table 2.

Simulation results

σ1 σ2 Analysis βx1 βx2 βz 
   Bias SE CR Bias SE CR Bias SE CR 
0.15 0.15 − 0.0175 0.1323 95.5 0.0000 0.1241 95.5 0.0044 0.2315 94.0 
  − 0.0073 0.1357 96.0 0.0094 0.1277 97.5 0.0038 0.2320 94.5 
  − 0.0073 0.1358 95.5 0.0094 0.1278 97.0 0.0038 0.2321 94.0 
0.15 0.50 0.0223 0.1303 94.5 − 0.1030 0.1098 87.5 0.0068 0.2314 94.0 
  0.0012 0.1366 95.0 − 0.0135 0.1389 96.0 0.0050 0.2341 94.5 
  0.0011 0.1367 94.5 − 0.0135 0.1393 95.5 0.0053 0.2344 94.5 
0.15 0.75 0.0579 0.1282 91.0 − 0.1839 0.0957 54.5 0.0080 0.2305 94.0 
  0.0253 0.1365 94.0 − 0.0728 0.1376 91.5 0.0061 0.2343 94.0 

 

 
3
 
0.0252
 
0.1365
 
94.0
 
− 0.0727
 
0.1381
 
90.5
 
0.0065
 
0.2347
 
94.0
 
0.50 0.15 − 0.1199 0.1175 79.0 0.0389 0.1233 97.5 0.0093 0.2316 94.5 
  − 0.0327 0.1472 95.5 0.0179 0.1301 97.0 0.0077 0.2338 94.5 
  − 0.0326 0.1475 95.0 0.0178 0.1307 97.0 0.0079 0.2342 94.5 
0.50 0.50 − 0.0970 0.1180 83.0 − 0.0798 0.1113 89.5 0.0129 0.2314 94.0 
  − 0.0259 0.1458 95.5 − 0.0068 0.1386 96.0 0.0094 0.2364 94.5 
  − 0.0258 0.1463 95.0 − 0.0067 0.1394 95.0 0.0098 0.2370 94.5 
0.50 0.75 − 0.0641 0.1173 92.5 − 0.1754 0.0982 58.5 0.0148 0.2303 93.5 
  − 0.0066 0.1451 96.5 − 0.0683 0.1387 91.0 0.0109 0.2369 94.5 

 

 
3
 
− 0.0067
 
0.1456
 
96.0
 
− 0.0681
 
0.1395
 
90.0
 
0.0114
 
0.2375
 
94.5
 
0.75 0.15 − 0.1976 0.1028 48.5 0.0730 0.1219 93.0 0.0118 0.2314 94.5 
  − 0.0908 0.1458 84.5 0.0411 0.1312 96.5 0.0107 0.2343 94.5 
  − 0.0906 0.1461 84.5 0.0410 0.1320 96.5 0.0111 0.2348 94.5 
0.75 0.50 − 0.1899 0.1041 54.5 − 0.0478 0.1109 93.5 0.0161 0.2311 94.0 
  − 0.0865 0.1454 86.0 0.0117 0.1386 98.0 0.0127 0.2370 94.5 
  − 0.0864 0.1459 85.5 0.0118 0.1396 96.5 0.0133 0.2377 94.5 
0.75 0.75 − 0.1637 0.1041 63.0 − 0.1498 0.0985 68.5 0.0184 0.2300 93.5 
  − 0.0698 0.1451 89.0 − 0.0521 0.1389 92.5 0.0144 0.2376 94.0 
  − 0.0697 0.1456 88.5 − 0.0518 0.1399 92.5 0.0152 0.2384 93.5 
σ1 σ2 Analysis βx1 βx2 βz 
   Bias SE CR Bias SE CR Bias SE CR 
0.15 0.15 − 0.0175 0.1323 95.5 0.0000 0.1241 95.5 0.0044 0.2315 94.0 
  − 0.0073 0.1357 96.0 0.0094 0.1277 97.5 0.0038 0.2320 94.5 
  − 0.0073 0.1358 95.5 0.0094 0.1278 97.0 0.0038 0.2321 94.0 
0.15 0.50 0.0223 0.1303 94.5 − 0.1030 0.1098 87.5 0.0068 0.2314 94.0 
  0.0012 0.1366 95.0 − 0.0135 0.1389 96.0 0.0050 0.2341 94.5 
  0.0011 0.1367 94.5 − 0.0135 0.1393 95.5 0.0053 0.2344 94.5 
0.15 0.75 0.0579 0.1282 91.0 − 0.1839 0.0957 54.5 0.0080 0.2305 94.0 
  0.0253 0.1365 94.0 − 0.0728 0.1376 91.5 0.0061 0.2343 94.0 

 

 
3
 
0.0252
 
0.1365
 
94.0
 
− 0.0727
 
0.1381
 
90.5
 
0.0065
 
0.2347
 
94.0
 
0.50 0.15 − 0.1199 0.1175 79.0 0.0389 0.1233 97.5 0.0093 0.2316 94.5 
  − 0.0327 0.1472 95.5 0.0179 0.1301 97.0 0.0077 0.2338 94.5 
  − 0.0326 0.1475 95.0 0.0178 0.1307 97.0 0.0079 0.2342 94.5 
0.50 0.50 − 0.0970 0.1180 83.0 − 0.0798 0.1113 89.5 0.0129 0.2314 94.0 
  − 0.0259 0.1458 95.5 − 0.0068 0.1386 96.0 0.0094 0.2364 94.5 
  − 0.0258 0.1463 95.0 − 0.0067 0.1394 95.0 0.0098 0.2370 94.5 
0.50 0.75 − 0.0641 0.1173 92.5 − 0.1754 0.0982 58.5 0.0148 0.2303 93.5 
  − 0.0066 0.1451 96.5 − 0.0683 0.1387 91.0 0.0109 0.2369 94.5 

 

 
3
 
− 0.0067
 
0.1456
 
96.0
 
− 0.0681
 
0.1395
 
90.0
 
0.0114
 
0.2375
 
94.5
 
0.75 0.15 − 0.1976 0.1028 48.5 0.0730 0.1219 93.0 0.0118 0.2314 94.5 
  − 0.0908 0.1458 84.5 0.0411 0.1312 96.5 0.0107 0.2343 94.5 
  − 0.0906 0.1461 84.5 0.0410 0.1320 96.5 0.0111 0.2348 94.5 
0.75 0.50 − 0.1899 0.1041 54.5 − 0.0478 0.1109 93.5 0.0161 0.2311 94.0 
  − 0.0865 0.1454 86.0 0.0117 0.1386 98.0 0.0127 0.2370 94.5 
  − 0.0864 0.1459 85.5 0.0118 0.1396 96.5 0.0133 0.2377 94.5 
0.75 0.75 − 0.1637 0.1041 63.0 − 0.1498 0.0985 68.5 0.0184 0.2300 93.5 
  − 0.0698 0.1451 89.0 − 0.0521 0.1389 92.5 0.0144 0.2376 94.0 
  − 0.0697 0.1456 88.5 − 0.0518 0.1399 92.5 0.0152 0.2384 93.5 
Table 2.

Simulation results

σ1 σ2 Analysis βx1 βx2 βz 
   Bias SE CR Bias SE CR Bias SE CR 
0.15 0.15 − 0.0175 0.1323 95.5 0.0000 0.1241 95.5 0.0044 0.2315 94.0 
  − 0.0073 0.1357 96.0 0.0094 0.1277 97.5 0.0038 0.2320 94.5 
  − 0.0073 0.1358 95.5 0.0094 0.1278 97.0 0.0038 0.2321 94.0 
0.15 0.50 0.0223 0.1303 94.5 − 0.1030 0.1098 87.5 0.0068 0.2314 94.0 
  0.0012 0.1366 95.0 − 0.0135 0.1389 96.0 0.0050 0.2341 94.5 
  0.0011 0.1367 94.5 − 0.0135 0.1393 95.5 0.0053 0.2344 94.5 
0.15 0.75 0.0579 0.1282 91.0 − 0.1839 0.0957 54.5 0.0080 0.2305 94.0 
  0.0253 0.1365 94.0 − 0.0728 0.1376 91.5 0.0061 0.2343 94.0 

 

 
3
 
0.0252
 
0.1365
 
94.0
 
− 0.0727
 
0.1381
 
90.5
 
0.0065
 
0.2347
 
94.0
 
0.50 0.15 − 0.1199 0.1175 79.0 0.0389 0.1233 97.5 0.0093 0.2316 94.5 
  − 0.0327 0.1472 95.5 0.0179 0.1301 97.0 0.0077 0.2338 94.5 
  − 0.0326 0.1475 95.0 0.0178 0.1307 97.0 0.0079 0.2342 94.5 
0.50 0.50 − 0.0970 0.1180 83.0 − 0.0798 0.1113 89.5 0.0129 0.2314 94.0 
  − 0.0259 0.1458 95.5 − 0.0068 0.1386 96.0 0.0094 0.2364 94.5 
  − 0.0258 0.1463 95.0 − 0.0067 0.1394 95.0 0.0098 0.2370 94.5 
0.50 0.75 − 0.0641 0.1173 92.5 − 0.1754 0.0982 58.5 0.0148 0.2303 93.5 
  − 0.0066 0.1451 96.5 − 0.0683 0.1387 91.0 0.0109 0.2369 94.5 

 

 
3
 
− 0.0067
 
0.1456
 
96.0
 
− 0.0681
 
0.1395
 
90.0
 
0.0114
 
0.2375
 
94.5
 
0.75 0.15 − 0.1976 0.1028 48.5 0.0730 0.1219 93.0 0.0118 0.2314 94.5 
  − 0.0908 0.1458 84.5 0.0411 0.1312 96.5 0.0107 0.2343 94.5 
  − 0.0906 0.1461 84.5 0.0410 0.1320 96.5 0.0111 0.2348 94.5 
0.75 0.50 − 0.1899 0.1041 54.5 − 0.0478 0.1109 93.5 0.0161 0.2311 94.0 
  − 0.0865 0.1454 86.0 0.0117 0.1386 98.0 0.0127 0.2370 94.5 
  − 0.0864 0.1459 85.5 0.0118 0.1396 96.5 0.0133 0.2377 94.5 
0.75 0.75 − 0.1637 0.1041 63.0 − 0.1498 0.0985 68.5 0.0184 0.2300 93.5 
  − 0.0698 0.1451 89.0 − 0.0521 0.1389 92.5 0.0144 0.2376 94.0 
  − 0.0697 0.1456 88.5 − 0.0518 0.1399 92.5 0.0152 0.2384 93.5 
σ1 σ2 Analysis βx1 βx2 βz 
   Bias SE CR Bias SE CR Bias SE CR 
0.15 0.15 − 0.0175 0.1323 95.5 0.0000 0.1241 95.5 0.0044 0.2315 94.0 
  − 0.0073 0.1357 96.0 0.0094 0.1277 97.5 0.0038 0.2320 94.5 
  − 0.0073 0.1358 95.5 0.0094 0.1278 97.0 0.0038 0.2321 94.0 
0.15 0.50 0.0223 0.1303 94.5 − 0.1030 0.1098 87.5 0.0068 0.2314 94.0 
  0.0012 0.1366 95.0 − 0.0135 0.1389 96.0 0.0050 0.2341 94.5 
  0.0011 0.1367 94.5 − 0.0135 0.1393 95.5 0.0053 0.2344 94.5 
0.15 0.75 0.0579 0.1282 91.0 − 0.1839 0.0957 54.5 0.0080 0.2305 94.0 
  0.0253 0.1365 94.0 − 0.0728 0.1376 91.5 0.0061 0.2343 94.0 

 

 
3
 
0.0252
 
0.1365
 
94.0
 
− 0.0727
 
0.1381
 
90.5
 
0.0065
 
0.2347
 
94.0
 
0.50 0.15 − 0.1199 0.1175 79.0 0.0389 0.1233 97.5 0.0093 0.2316 94.5 
  − 0.0327 0.1472 95.5 0.0179 0.1301 97.0 0.0077 0.2338 94.5 
  − 0.0326 0.1475 95.0 0.0178 0.1307 97.0 0.0079 0.2342 94.5 
0.50 0.50 − 0.0970 0.1180 83.0 − 0.0798 0.1113 89.5 0.0129 0.2314 94.0 
  − 0.0259 0.1458 95.5 − 0.0068 0.1386 96.0 0.0094 0.2364 94.5 
  − 0.0258 0.1463 95.0 − 0.0067 0.1394 95.0 0.0098 0.2370 94.5 
0.50 0.75 − 0.0641 0.1173 92.5 − 0.1754 0.0982 58.5 0.0148 0.2303 93.5 
  − 0.0066 0.1451 96.5 − 0.0683 0.1387 91.0 0.0109 0.2369 94.5 

 

 
3
 
− 0.0067
 
0.1456
 
96.0
 
− 0.0681
 
0.1395
 
90.0
 
0.0114
 
0.2375
 
94.5
 
0.75 0.15 − 0.1976 0.1028 48.5 0.0730 0.1219 93.0 0.0118 0.2314 94.5 
  − 0.0908 0.1458 84.5 0.0411 0.1312 96.5 0.0107 0.2343 94.5 
  − 0.0906 0.1461 84.5 0.0410 0.1320 96.5 0.0111 0.2348 94.5 
0.75 0.50 − 0.1899 0.1041 54.5 − 0.0478 0.1109 93.5 0.0161 0.2311 94.0 
  − 0.0865 0.1454 86.0 0.0117 0.1386 98.0 0.0127 0.2370 94.5 
  − 0.0864 0.1459 85.5 0.0118 0.1396 96.5 0.0133 0.2377 94.5 
0.75 0.75 − 0.1637 0.1041 63.0 − 0.1498 0.0985 68.5 0.0184 0.2300 93.5 
  − 0.0698 0.1451 89.0 − 0.0521 0.1389 92.5 0.0144 0.2376 94.0 
  − 0.0697 0.1456 88.5 − 0.0518 0.1399 92.5 0.0152 0.2384 93.5 

When there is moderate or substantial measurement error in covariates xij, the performance of Analysis 1 deteriorates remarkably in estimation of error-prone covariate effects. Analysis 1 may lead to considerably biased estimates for βx1 and βx2. For example, see the entries with σ1 = 0.75 and σ2 = 0.15 in Table 2. The CR for 95% confidence intervals for βx1 can be as low as 49%. Accounting for measurement error in the response model, both Analyses 2 and 3 remarkably improve the performance providing a lot smaller biases and much higher CRs for the 95% confidence intervals. Analysis 2 gives rise to very comparable results to those produced by Analysis 3, though Analysis 2 seems to yield a slightly larger finite-sample biases. The simulation study considered here suggests that the impact of ignoring measurement error in modeling the missing data process is not as remarkable as that in modeling the response process.

In terms of estimation of βz, Analysis 1 produces larger biases than Analyses 2 and 3 do, though the magnitude is not as striking as that for the estimates of βx. Among the 3 analyses, Analysis 1 provides the smallest standard errors while Analysis 3 yields the largest but the differences between Analyses 2 and 3 are not considerable. The CRs for the 95% confidence intervals obtained from the 3 analyses agree reasonably well with the nominal value.

In summary, ignoring measurement error may lead to substantially biased results. Properly addressing covariate measurement error in estimation procedures is necessary. The proposed method (i.e. Analysis 3) performs reasonably well under various configurations. Its performance may become less satisfactory when measurement error becomes substantial. However, the proposed SIMEX method does significantly improve the performance of the naive analysis (i.e. Analysis 1).

6. DISCUSSION

In this paper, we propose a simulation-based marginal method to analyze longitudinal data with both missing observations and error-contaminated covariates. This work is of particular interest because missingness and measurement error in covariates arise commonly in longitudinal studies, and up to date, there is little work to address both features (Liu and Wu, 2007). Yi (2005) discussed inference approaches to handle continuous or count data arising from longitudinal studies, but those methods cannot apply to binary responses due to the nature of the logistic regression. The proposed method may, however, handle binary responses, in addition to continuous responses or count data. Moreover, in contrast to the models of Yi (2005) where only precisely observed covariates may enter model (2.2), the proposed method allows the dependence of the missing data process on error-prone covariates. The proposed method is simple but flexible. Its implementation is straightforward by slightly modifying standard statistical software such as PROC GENMOD in SAS. The proposed method does not require the complete specification of the full distribution of the response process but only requires the specification of the structures of marginal means and variances. Also the method does not need modeling the underlying covariate process, which is desirable for many practical problems.

The proposed methods may apply to handle clustered or correlated data as well. In some situations, the interest may also concern the association strength among response components within clusters. We may, following the lines of Yi and Cook (2002), construct a second set of estimating equations for association parameters. In that formulation, proper adjustments should be introduced to account for biases induced by both missing observations and measurement error in covariates.

In this paper, we focus the discussion on the IPWGEE method for which MAR missing data mechanism is assumed. One may, however, employ other modeling framework such as random-effects models to accommodate NMAR mechanisms as well. Without considering missing observations, Wang and others (1998) studied the random-effects models to account for measurement error in covariates. It would be interesting to develop methods to simultaneously adjust for the biases resulted from missingness and measurement error in this context.

When modeling the missing data process, we consider the case that the true but error-prone covariates Xi enter the model to govern the missingness probability. In some instances, it could be more feasible to facilitate the dependence of dropout on the observed covariates Wi. In this case, the proposed method can apply with a minor modification. See Carroll and others (2006, Chapter 2, Section 11.8) for general discussion on the issue of building a model by conditioning on the true underlying covariates or the observed data.

As seen in Section 4, there is no additional information, such as repeated measurements of SBP and CHOL, available to estimate variance parameters σ1 and σ2, thereby, we undertake sensitivity analyses by specifying a sequence of values of σ1 and σ2 to assess the impact of measurement error on estimation of the response parameters β. Sometimes, there exists additional information on the measurement error process and the associated parameters may be estimated. In these circumstances, we need to accommodate the resulting variation induced by estimating error parameters. With replicate measurements Wi available, for example, we may modify the proposed method by adapting the arguments in Devanarayan and Stefanski (2002) to accommodate measurement error models with unknown variance parameters.

FUNDING

Natural Sciences and Engineering Research Council of Canada.

APPENDIX

Adapting the arguments in Carroll and others (1996), we outline the proof of the Theorem as follows. Let Ui(θ; b, λ), Si(α; b, λ), and Hi(θ; b, λ) be Ui(θ), Si(α), and Hi(θ), respectively, with xij replaced by Wij(b, λ). By standard estimating equation theory, under some regularity conditions, forumla(b, λ) → pθ(λ),as B, where θ(λ) is the solution of E[Hi(θ; 1, λ)] = 0.

Let forumla. For each given b and λ, the Taylor Series expansion leads to 
graphic
therefore, for very large B, 
graphic
Let forumla be qM × 1 vectors. Let forumla. Then, by the central limit theorem, as n, 
graphic
where forumla.
Assume that the exact extrapolation functions, say, G(γ; λ), are available in the extrapolation step to fit forumla, where γ is a vector of parameters of dimension d, say. Fit forumla to forumla. Define forumla. Let forumla be the d × qM matrix and forumla be a d × d matrix. Then, by the similar argument to that in Carroll and others (1996), we obtain, as n, 
graphic
where forumla. Letting λ = −1 leads to the SIMEX estimator forumla. Therefore, the asymptotic distribution of the SIMEX estimator is 
graphic

The author acknowledges referees' helpful comments. The author thanks Boston University and the National Heart, Lung, and Blood Institute (NHLBI) for providing the data set from the Framingham Heart Study (No. N01-HC-25195) in the illustration. The Framingham Heart Study is conducted and supported by the NHLBI in collaboration with Boston University. This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or NHLBI. Conflict of Interest: None declared.

References

Carroll
RJ
Küchenhoff
H
Lombard
F
StefanskiA
L
Asymptotics for the SIMEX estimator in nonlinear measurement error models
Journal of the American Statistical Association
1996
, vol. 
91
 (pg. 
242
-
250
)
Carroll
RJ
Ruppert
D
Stefanski
LA
Crainiceanu
CM
Measurement Error in Nonlinear Models
2006
2nd edition
Boca Raton (FL)
Chapman & Hall
Cook
J
Stefanski
LA
A simulation extrapolation method for parametric measurement error models
Journal of the American Statistical Association
1994
, vol. 
89
 (pg. 
464
-
467
)
Cook
RJ
Zeng
L
Yi
GY
Marginal analysis of incomplete longitudinal binary data: a cautionary note on LOCF imputation
Biometrics
2004
, vol. 
60
 (pg. 
820
-
828
)
Devanarayan
V
Stefanski
LA
Empirical simulation extrapolation for measurement error models with replicate measurements
Statistics and Probability Letters
2002
, vol. 
59
 (pg. 
219
-
225
)
Diggle
P
Heagerty
P
Liang
K-Y
Zeger
S
Analysis of Longitudinal Data
2002
2nd edition
New York
Oxford University Press
Diggle
P
Kenward
MG
Informative drop-out in longitudinal data analysis (with discussion)
Applied Statistics
1994
, vol. 
43
 (pg. 
49
-
93
)
Fuller
WA
Measurement Error Models
1987
New York
Wiley
Kenward
MG
Selection models for repeated measurements with nonrandom dropout: an illustration of sensitivity
Statistics in Medicine
1998
, vol. 
7
 (pg. 
2723
-
2732
)
Li
Y
Lin
X
Functional inference in frailty measurement error models for clustered survival data using the SIMEX approach
Journal of the American Statistical Association
2003
, vol. 
98
 (pg. 
191
-
203
)
Liu
W
Wu
L
Simultaneous inference for semiparametric nonlinear mixed-effects models with covariate measurement errors and missing responses
Biometrics
2007
, vol. 
63
 (pg. 
342
-
350
)
Miglioretti
DL
Heagerty
PJ
Marginal modeling of multilevel binary data with time-varying covariates
Biostatistics
2004
, vol. 
5
 (pg. 
381
-
398
)
Pepe
MS
Anderson
GL
A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data
Communications in Statistics—Simulation and Computation
1994
, vol. 
23
 (pg. 
939
-
951
)
Prentice
RL
Covariate measurement errors and parameter estimation in a failure time regression model
Biometrika
1982
, vol. 
69
 (pg. 
331
-
342
)
Robins
JM
Rotnitzky
A
Zhao
LP
Analysis of semiparametric regression models for repeated outcomes in the presence of missing data
Journal of the American Statistical Association
1995
, vol. 
90
 (pg. 
106
-
121
)
Strug
L
Sun
L
Corey
M
The genetics of cross-sectional and longitudinal body mass index
BMC Genetics
2003
, vol. 
4
 
Suppl 1
 
S14
Wang
N
Lin
X
Gutierrez
RG
Carroll
RJ
Bias analysis and SIMEX approach in generalized linear mixed measurement error models
Journal of the American Statistical Association
1998
, vol. 
93
 (pg. 
249
-
261
)
Yi
GY
Robust methods for incomplete longitudinal data with mismeasured covariates
Far East Journal of Theoretical Statistics
2005
, vol. 
16
 (pg. 
205
-
234
)
Yi
GY
Cook
RJ
Marginal methods for incomplete longitudinal data arising in clusters
Journal of the American Statistical Association
2002
, vol. 
97
 (pg. 
1071
-
1080
)
Yi
GY
He
W
Methods for bivariate survival data with mismeasured covariates under an accelerated failure time model
Communications in Statistics—Theory and Methods
2006
, vol. 
35
 (pg. 
1539
-
1554
)
Yi
GY
Lawless
JF
A corrected likelihood method for the proportional hazards model with covariates subject to measurement error
Journal of Statistical Planning and Inference
2007
, vol. 
137
 (pg. 
1816
-
1828
)
Yi
GY
Thompson
ME
Marginal and association regression models for longitudinal binary data with drop-outs: a likelihood-based approach
The Canadian Journal of Statistics
2005
, vol. 
33
 (pg. 
3
-
20
)
Yoo
YJ
Huo
Y
Ning
Y
Gordon
D
Finch
S
Mendell
NR
Power of maximum HLOD tests to detect linkage to obesity genes
BMC Genetics
2003
, vol. 
4
 
Suppl 1
 
S16