Improving upon the efficiency of complete case analysis when covariates are MNAR

Missing values in covariates of regression models are a pervasive problem in empirical research. Popular approaches for analyzing partially observed datasets include complete case analysis (CCA), multiple imputation (MI), and inverse probability weighting (IPW). In the case of missing covariate values, these methods (as typically implemented) are valid under different missingness assumptions. In particular, CCA is valid under missing not at random (MNAR) mechanisms in which missingness in a covariate depends on the value of that covariate, but is conditionally independent of outcome. In this paper, we argue that in some settings such an assumption is more plausible than the missing at random assumption underpinning most implementations of MI and IPW. When the former assumption holds, although CCA gives consistent estimates, it does not make use of all observed information. We therefore propose an augmented CCA approach which makes the same conditional independence assumption for missingness as CCA, but which improves efficiency through specification of an additional model for the probability of missingness, given the fully observed variables. The new method is evaluated using simulations and illustrated through application to data on reported alcohol consumption and blood pressure from the US National Health and Nutrition Examination Survey, in which data are likely MNAR independent of outcome.


INTRODUCTION
Missing data in covariates of regression models are a common problem in epidemiological and clinical studies. Three commonly applied approaches for analyzing datasets with missing covariates are complete case analysis (CCA), multiple imputation (MI), and inverse probability weighting (IPW). As typically implemented, each make different assumptions about various aspects of either the data or the mechanism causing missingness.
CCA has the advantage of being simple to apply, and is usually the default method in statistical packages. It is well known that CCA gives valid inferences when data are missing completely at random. Perhaps less widely appreciated is the fact that CCA gives valid inferences provided that the probability of being a complete case is independent of the outcome in the model of interest, conditional on the model's covariates (Little and Rubin, 2002). In particular, CCA is valid under missing not at random (MNAR) mechanisms in which missingness in a covariate is dependent on the value of that covariate, but is conditionally independent of outcome (White and Carlin, 2010). However, even when this assumption holds, it does not make full use of the observed information, since observed data from the incomplete cases are discarded.
MI involves creating multiple imputed values for each missing value, creating a number of imputed datasets. Each imputed dataset is analyzed separately, and their estimates combined using rules developed by Rubin (1987). If data are missing at random (MAR) and the imputation model is correctly specified, MI gives valid inferences, and is generally more efficient than CCA since it uses the observed data from incomplete cases and potentially also from auxiliary variables which are not involved in the model of interest. This had led to MI being widely advocated and used in applications (Sterne and others, 2009).
IPW is also typically implemented assuming MAR. IPW avoids the necessity to model the distribution of the partially observed variables, and instead relies on a model for the missingness mechanism (Seaman and White, 2011). However, IPW is difficult to implement with non-monotone missingness, and is usually less efficient than MI. In recent years, doubly robust MAR estimators have been proposed which attempt to improve upon the efficiency of IPW, and which also give additional robustness to model misspecification (Carpenter and others, 2006;Tsiatis, 2006).
In this article, we argue that in some settings an MNAR missingness mechanism under which CCA is valid is more plausible than an MAR mechanism which is required for validity of a conventional MI or IPW analysis. In such settings, it may therefore be preferable from the perspective of bias to use CCA. However, as previously noted, CCA is inefficient because it fails to draw on the information available in those subjects with some data missing. To address this, we develop an augmented CCA estimation method which can improve upon the efficiency of CCA, through specification of an additional model for the probability of missingness given the fully observed variables.
In Section 2, we argue why the CCA assumption may in some settings be more plausible than MAR, and propose an estimation method which makes this assumption but which draws on the information available from incomplete cases. We explore the performance of the proposed method in simulations in Section 3. In Section 4, we illustrate the method using alcohol and blood pressure data from the US National Health and Nutrition Examination Survey (NHANES), and give some concluding comments in Section 5.

IMPROVING UPON THE EFFICIENCY OF CCA
Consider a study where an outcome Y and covariates X and Z are intended to be collected for a random sample of independent subjects. Either of X and Z (or both) may be vector valued. We assume throughout that the following conditional mean model holds: Improving upon complete case analysis with covariates MNAR 721 where the function g(X, Z ; β) is a known smooth function of β, a finite-dimensional parameter to be estimated, with true value β * . Conditional mean models include familiar models such as linear and logistic regression.

Missingness assumptions
We assume that Y and Z are fully observed, while the covariate X is partially observed. In the case where X is vector valued, we assume that for a given subject either all elements of X are observed or all elements are missing (see Section 5 for discussion on extensions to more complex missingness patterns). We let R denote whether X is observed (R = 1) or missing (R = 0). Within this setup, the MAR assumption, upon which most MI and IPW methods rely, is that R ⊥ ⊥ X |Y, Z . In contrast, CCA provides consistent parameter estimates and valid inferences if missingness is independent of outcome conditional on covariates, i.e. R ⊥ ⊥ Y |X, Z . This condition encompasses both certain MAR mechanisms, whereby R depends on the fully observed covariates Z , but given this is independent of X and Y , and certain MNAR mechanisms, whereby R depends on X (and possibly Z ), but given this, is independent of Y . Unfortunately, as discussed by White and Carlin (2010), it is usually impossible to distinguish on the basis of the observed data which, if either, missingness assumption is appropriate. In Appendix A of supplementary material available at Biostatistics online, we show that there exist exceptions whereby the assumption that R ⊥ ⊥ Y |X, Z may be testable on the basis of the observed data. However, generally our contextual knowledge must guide us as to which, if either, is plausible. For some questions (and variables), it may be deemed likely from contextual knowledge and experience that propensity to respond to the question is at least partly determined by the value of that variable, such that missingness is not at random. Examples include surveys in which participants are asked about their income, with those with low or high income generally considered less likely to respond (Little and Zhang, 2011). In Section 4, we consider data on alcohol consumption and blood pressure from NHANES, in which we argue missingness in alcohol consumption is likely to depend largely on alcohol consumption itself, and given consumption (and other covariates), be independent of blood pressure level. As we describe in further detail in Appendix B of supplementary material available at Biostatistics online, in other settings the assumption that R ⊥ ⊥ Y |X, Z may be plausible if the covariate X is measured much earlier in time than the outcome Y . Thus, sometimes the assumption that R ⊥ ⊥ Y |X, Z may be plausible, while the MAR assumption that R ⊥ ⊥ X |Y, Z will not be.

Estimation with full data
Before considering estimation with partially observed X , we first consider estimation in the absence of missing data. Let (X i , Z i , Y i , R i ), i = 1, . . . , n denote an i.i.d. sample of n subjects. All regular and asymptotically linear estimators of the parameter β indexing the conditional mean model of interest (equation (2.1)) can be expressed (up to asymptotic equivalence) as the solutionβ to an estimating equation of the form where d(X, Z ) is a vector-valued function of (X, Z ) with dimension that of β, and (β) = Y − g(X, Z ; β) (Rotnitzky and Robins, 1997). That estimating β by solving such an estimating equation results in a consistent estimator follows from the fact that the expectation of the estimating function d(X, Z ) (β) is zero when evaluated at the true value β * . The efficiency of the estimator depends on the choice of the function d(X, Z ), with the optimal choice being given by d(

Estimation with partially observed X
Now suppose that X is partially observed, with R ⊥ ⊥ Y |X, Z . As described previously, under this assumption CCA gives valid inferences, but fails to draw on the observed information in the incomplete cases.
In Appendix C of supplementary material available at Biostatistics online, we show that if a fully parametric model f (Y |X, Z , β) is assumed (rather than the conditional mean model of (2.1)), without making further assumptions all regular and asymptotically linear estimators of β only use information from the complete cases. It follows that we must make additional assumptions in order to extract information from the incomplete cases. One route to gaining efficiency over CCA is to take a fully parametric approach, which involves specifying parametric models for P(R|X, Z ), for f (Y |X, Z ), and for f (X |Z ), or a semi-parametric approach as in Rotnitzky and Robins (1997), based only (in additional to the conditional mean model of interest) on a parametric model for P(R|X, Z ). These are somewhat unappealing because the validity of resulting inferences will depend on the correct specification of these models. In particular, since a model for P(R|X, Z ) cannot be directly estimated using the observed data (whenever R = 0, X is missing), ensuring that this model is correctly specified would be difficult.
Instead, we consider estimation of β given specification of a model P(R|Y, Z ; α), indexed by parameter α, for P(R|Y, Z ). Note that, under our assumptions, this is not a model for the underlying missingness mechanism P(R|X, Z ). However, since the model P(R|Y, Z ; α) only involves fully observed variables (unlike a model for the underlying missingness mechanism), estimation of α is standard, e.g. by maximum likelihood (ML). Specifically, we assume that the following logistic model holds: where h(Y, Z ; α) is a known function, linear in α, and α is a finite-dimensional parameter with true value α * . Suppose for the moment that the true value of α, α * is known.
and where d(X, Z ) and φ(Y, Z , β) are arbitrary functions with dimension the same as β, is consistent and asymptotically normal. The first part is identical to the CCA estimating function, which has mean zero (at β * ) when R ⊥ ⊥ Y |X, Z . The second part, to which both subjects with X observed and those with X missing contribute, has mean zero provided that the model for P(R|Y, Z ) is correctly specified, since then In practice, α must be estimated. We assume that α is estimated by its MLE, which is the valueα solving the likelihood score equations

Improving upon complete case analysis with covariates MNAR
The parameter β of interest can then be estimated by solving the estimating equation (2.4), replacing the unknown α * by its MLEα, i.e. by solving (2.6) In Appendix D.1 of supplementary material available at Biostatistics online, we show that under suitable regularity conditions, this augmented complete case (ACC) estimatorβ ACC is consistent and asymptotically normal, with influence function The asymptotic variance ofβ ACC is equal to n −1 times the variance of the influence function (as given by (2.7)), and this can be estimated using (2.8) and (2.9), replacing expectations and variances by their empirical counterparts, and α * and β * by their corresponding sample estimates.
The choices of the functions d(X, Z ) and φ(Y, Z , β) affect the efficiency ofβ ACC . For simplicity, we consider how to choose φ(Y, Z , β) in order to minimize the variance ofβ ACC for a given choice of d(X, Z ) (e.g. the choice we would use with full data). In Appendix D.2 of supplementary material available at Biostatistics online, we show that the optimal function φ opt (Y, Z , β) is given by (2.10) which in particular improves upon the efficiency of CCA, which is obtained by choosing φ(Y, Z , β) = 0. We letβ ACC-TRUE denote the estimator which uses φ(Y, Z , β) = φ opt (Y, Z , β).

Implementation
The optimal choice φ opt (Y, Z , β) depends on aspects of the data generating mechanism about which we have not made assumptions. We consider two approaches for estimating φ opt (Y, Z , β). The first is to posit a parametric working model f (X |R = 1, Y, Z ; η) and calculate the expectations required in φ opt (Y, Z , β). We denote the resulting estimator byβ ACC-WM . Note that while mis-specification of this working model will affect the efficiency of the estimator, it will not affect its consistency. This also means that, by Newey and McFadden (1994, Theorem 6.2), estimation of η can be ignored when calculating variance estimates, and that the estimator in which η is estimated will have the same asymptotic efficiency as the estimator which uses the probability limit value η * . If the working model is correctly specified,β ACC-WM will thus have the same efficiency asβ ACC-TRUE . In our simulations and illustrative example, we estimate η by ML in the complete cases and calculate the expectations involved in φ opt (Y, Z , β) by Monte-Carlo integration. This involves generating m improper imputations from the implied distribution f (X |R = 1, Y, Z ;η) for each subject, and approximating the required expectations by their empirical means based on these imputations. Inferences may be anti-conservative if a small value of m is used, although we did not find this to be the case in simulations (see Section 3).
If the posited working model f (X |R = 1, Y, Z ; η) is mis-specified, there is no guarantee thatβ ACC-WM will improve upon the efficiency of CCA. In Appendix D.3 of supplementary material available at Biostatistics online, we give details of a modified estimatorβ ACC2 which, for a given choice of φ(Y, Z , β) (or working model used to estimate φ opt (Y, Z , β)), ensures that estimates are at least as efficient as CCA. We denote the corresponding estimator which uses a parametric working model to estimate The second approach we consider is non-parametric estimation of φ opt (Y, Z , β) using kernel regression. Following similar approaches used in the MAR context (Qi and others, 2005), we estimate φ opt (Y, Z , β) using the Nadaraya-Watson estimator. Letting K denote a kernel function, this is given bŷ where K h (·) = K (·/ h) and h denotes a vector of bandwidths. To avoid having to calculateφ opt (Y, Z , β) repeatedly when solving the estimating equations, one can instead useφ opt (Y, Z ,β CCA ), whereβ CCA denotes the CCA estimator. In Appendix E of supplementary material available at Biostatistics online, we show that under suitable regularity conditions, the resulting estimator, denoted byβ ACC-NP , has the same asymptotic distribution asβ ACC-TRUE . This means in particular that, as in the case of a parametric working model, kernel estimation of φ opt (Y, Z , β) can be ignored for the purposes of variance estimation. Letting r denote the number of continuous components in (Y, Z ), the bandwidth conditions of Appendix E of supplementary material available at Biostatistics online can be satisfied by choosing h to be of order n −1/ p , for some integer p with p > r + 4 and p > 2r .
In the special case of a linear conditional mean model, φ opt (Y, Z , β) depends only on E(X |Y, Z , R = 1) and E(X 2 |Y, Z , R = 1), and so in the simulation study and illustrative analysis we also implement an estimatorβ ACC-NP2 in which φ opt (Y, Z , β) is estimated using the Nadaraya-Watson estimates of E(X |Y, Z , R = 1) and E(X 2 |Y, Z , R = 1).

SIMULATIONS
In this section, we present simulation results to examine the performance of the proposed estimator for a linear conditional mean model, and compare it to CCA, MI (assuming MAR), and an IPW CCA (assuming MAR) estimator. The simulation setup is described in detail in Appendix F of supplementary material available at Biostatistics online. In brief, for 1000 datasets of size n = 1000, the observation indicator R was simulated with P(R = 1) = 0.5 and covariates (Y, X, Z ) were then generated from a trivariate normal distribution conditional on R, such that R ⊥ ⊥ Y |X, Z . The setup meant that R|Y, X, Z was a logistic regression with X and Z as covariates and that R|Y, Z was a logistic regression with Y and Z as linear covariates. The conditional mean model of interest was with β 0 = 0, β X = β Z = 0.2, and the coefficient of determination ≈ 0.1. We present results for the following estimators: 1. CCA; 2. MI, assuming MAR: estimates based on 10 (proper) MIs of X , assuming a normal linear regression imputation model for X |Y, Z ; 3. IPW MAR: the standard IPW CCA estimator assuming MAR, using weights found from a logistic regression model with Y and Z included as linear covariates; 4. ACC estimator, assuming a logistic regression model for R|Y, Z : (a)β ACC-TRUE using the true φ opt (Y, Z , β); (b)β ACC-WM1 with φ opt (Y, Z , β) estimated using Monte-Carlo integration (10 imputations) based on a parametric working model (normal linear regression for X |Y, Z , R = 1, with Y and Z as covariates); (c)β ACC2-WM1 , using the working model ofβ ACC-WM1 ; (d)β ACC-WM2 with φ opt (Y, Z , β) estimated using Monte-Carlo integration but with a misspecified working model (normal linear regression with Y 2 and Z 2 as covariates); (e)β ACC2-WM2 using the working model ofβ ACC-WM2 ; (f)β ACC-NP andβ ACC-NP2 , using a normal kernel, assuming independence, and with h j =σ j n −1/7 CC whereσ j denotes the sample standard deviation of the jth conditioning variable in the subset where R = 1 and n CC denotes the number of complete cases. Table 1 shows the simulation results, based on 1000 simulations for each scenario. CCA was unbiased for all scenarios, as expected. Both MI and IPW assuming MAR were biased for β 0 , but had little bias for β X and β Z . These findings are consistent with the analytical results of Appendix G of supplementary material available at Biostatistics online, where we derive analytical expressions for the bias of MI assuming MAR for a simpler parametric linear regression model setting without Z .
The ACC estimator was unbiased for all choices of φ(Y, Z , β), as expected from the asymptotic theory. Using the true optimal φ opt (Y, Z , β) (β ACC-TRUE ) resulted in efficiency gain for β Z compared to CCA, but estimates of β 0 and β X had similar efficiency to CCA. Using a correctly specified working model (β ACC-WM1 ) resulted in identical efficiency toβ ACC-TRUE , in agreement with the asymptotic theory, which states that in our setting there is no cost (asymptotically) to estimating the working model parameters. Since the working model was correct here, as expected the estimatorβ ACC2-WM1 had identical efficiency toβ ACC-WM1 .
Using a mis-specified working model (β ACC-WM2 ) led to estimates of β 0 and β X which were less efficient than CCA, although as predicted from theory estimates remained unbiased. With this mis-specified working model, as predicted, use ofβ ACC2 ensured that efficiency was at least as good as CCA (in fact β ACC2-WM2 had the same efficiency as the optimal estimator). The non-parametric estimatorβ ACC-NP was less efficient, with estimates in fact more variable than CCA. However, the estimatorβ ACC-NP2 , which estimated φ opt (Y, Z , β) using non-parametric estimates of E(X |Y, Z , R = 1) and E(X 2 |Y, Z , R = 1), attained the same efficiency asβ ACC-TRUE . Table 2 shows the empirical coverage of the nominal 95% confidence intervals for the various ACC estimators, found using the sandwich estimator described in Section 2.3. Coverage was close to the nominal 95% level for all choices of φ(Y, Z , β).

APPLICATION TO NHANES
To illustrate the proposed method, we consider data on alcohol consumption and systolic blood pressure (SBP) from the 2003-2004 NHANES. We focus on the dependence of SBP on the reported average number of alcoholic drinks consumed per day on days where the participant drank alcohol (obtained via a questionnaire) ("no. drinks"), with adjustment for age and body mass index (BMI). Data are available for n = 2418 men, for whom 278 are missing SBP and 181 are missing BMI. As argued by Little and Zhang (2011), it is plausible that missingness in SBP and BMI is completely at random due to missed visits, and therefore excluding these participants ought not to introduce bias. Amongst the remaining 2111 participants, 720 (34.1%) have the alcohol variable missing. It is a priori plausible that missingness in the alcohol variable is primarily dependent on the value of the alcohol variable (i.e. MNAR), and given this, and age and BMI, is independent of SBP. Consequently, CCA is expected to give valid inferences, while the MAR assumption likely does not hold. A logistic regression model was fitted relating whether the alcohol variable was observed, with age, BMI, and SBP (linear and quadratic terms) as covariates (Table 3). There was strong evidence that age was associated with missingness, with increasing age associated with reduced odds of responding. Increasing BMI was independently associated with reduced odds of responding to the alcohol question. Lastly, there was evidence (joint test p = 0.028) that SBP was independently associated with the probability of missingness, with reduced odds of responding to the question for those with low or high SBP, relative to those with average SBP. Assuming that increasing levels of reported alcohol assumption is independently associated with increased SBP (see CCA results below), this finding is consistent with the probability that the alcohol variable is missing being elevated for those with either low or high alcohol consumption.
We fitted a linear regression model (using ordinary least squares and sandwich standard errors to allow for non-constant variance) for SBP with age (linear and quadratic effects), BMI, and log(no. drinks + 1) as covariates. The number of alcoholic drinks variable was entered using a (natural) log transformation so that the few participants with very large values did not have undue influence on parameter estimates and because preliminary analyses suggested a multiplicative effect of number of drinks fitted the data better.  Table 4 shows the CCA estimates, which assuming missingness in the alcohol variable is independent of SBP, conditional on age, BMI, and reported average number of alcoholic drinks per day, are unbiased. There was strong evidence that, as expected, increasing age is associated with increased SBP, with some suggestion of a non-linear effect. Increasing BMI was associated with increasing SBP, and there was evidence that increasing reported alcohol consumption is associated with increasing SBP. Next we estimated the conditional mean model parameters assuming missingness in the alcohol variable was MAR, first using MI. The alcohol variable on its original scale was imputed 200 times using a negative binomial regression model with covariates age (linear and quadratic), BMI (linear and quadratic), and SBP (linear and quadratic). Standard errors were obtained using Rubin's rules, but using the sandwich estimator of variance when estimating within-imputation variances. Consistency of MI here relies on the MAR assumption holding and the imputation model being correctly specified. The resulting estimates were fairly similar to CCA, although the coefficient of BMI was somewhat lower, the coefficient of the alcohol variable was somewhat higher, and the estimated constant was lower than that from CCA. Standard errors were smaller than those from CCA for the effects of BMI and age.
Since consistency of MI relies on the imputation model being correctly specified, we also used complete case IPW, with weights calculated using the previously described logistic regression model. Sandwich standard errors were found by stacking the estimating equation used to estimate the parameters of this logistic regression with the IPW complete case estimating equations. The estimated linear age effect was similar to that from CCA, but the estimated quadratic effect was smaller. The estimated coefficient of BMI was slightly smaller than from CCA, and the estimated constant was closer to that from CCA than the MI estimate. The estimated coefficient of the alcohol variable was almost identical to the MI estimate. As is typical, the (sandwich) standard errors for IPW CCA were larger than those for CCA (except for the linear age coefficient).
Lastly, we used the proposed ACC estimator, using the logistic model shown in Table 3 for P(R = 1|Y, Z ). We first usedβ ACC-WM , with the parametric working model identical to that used to impute the alcohol variable in MI (i.e. negative binomial regression imputation). This gave estimates with smaller standard errors than CCA, and also lower than MI. The estimated constant and effect of alcohol were both in between the corresponding CCA and MI estimates. The estimated BMI effect was close to that from CCA, and the estimated age effects were similar to those from CCA. Usingβ ACC2-WM led to an estimated constant closer to that from CCA and an estimated effect of alcohol which was smaller than from CCA. Standard errors were very slightly smaller than fromβ ACC-WM . The non-parametric estimatorβ ACC-NP gave estimates with much larger standard errors, whereasβ ACC-NP2 gave estimates with standard errors smaller than those fromβ ACC2-WM .
Overall inferences from the methods were fairly similar. Nevertheless, standard errors were smallest from the proposed ACC estimator(s), and in particular were smaller than CCA for the effects of the fully observed covariates. There is the suggestion that the estimated effect of the alcohol variable was larger when assuming MAR compared with assuming missingness conditionally independent of outcome. Unlike in the simulations, there was no apparent substantial bias in the estimated constant from the methods which assumed MAR.

DISCUSSION
In some settings, contextual knowledge will suggest that missingness in a covariate, such as income, is driven primarily by the value of the covariate itself, such that data are MNAR. In prospective studies, missingness in covariates measured at study entry may often plausibly be affected by the covariates themselves (and hence again MNAR), and given these, be independent of the (future) outcome. In these settings, an analysis based on the MAR assumption, such as most implementations of MI and IPW, will lead to asymptotically biased estimates and invalid inferences. For a linear conditional mean model, the analytical results of Appendix G of supplementary material available at Biostatistics online and simulation analyses suggest that the biases may be moderately large for the intercept parameter, but are sometimes modest for the parameters corresponding to covariate effects. However, there likely exist MNAR scenarios in which CCA is unbiased but MAR estimators of covariate effects are biased to a larger extent.
In contrast, if missingness is conditionally independent of outcome, which includes a particular class of MNAR mechanisms, CCA is unbiased but does not make use of all of the observed information. Our proposed augmented CCA estimator improves upon the efficiency of CCA, by relying on a parametric model for how missingness is associated with fully observed covariates and outcome. While one may argue whether it is appropriate to increase precision by relying on additional models, we note that this is also the case for other missing data methods. Furthermore, standard model selection techniques can be used since this model only involves fully observed variables. Given the assumption that missingness is independent of outcome given covariates, CCA and our proposed augmented CCA estimator are both consistent, provided the missingness model used in the latter is correctly specified.
Consistent with the findings of others (e.g. White and Carlin, 2010), in both our simulations and data analysis, while efficiency gain is possible for the coefficients of fully observed covariates, neither MI nor ACC gave improved efficiency for the covariate which was partially observed. This emphasizes the point that, in the absence of auxiliary variables or external information, the gain in efficiency (for both MI and ACC) is achieved through utilizing the observed information in incomplete cases. This implies that in studies where missingness only occurs in the exposure of interest, there is little efficiency to be gained for the exposure effect through using an estimation method which utilizes the incomplete cases.
For the proposed estimator, we have considered estimating the optimal augmentation function φ(Y, Z , β) either using a parametric working model or non-parametric kernel regression methods. For the latter, we found that direct non-parametric estimation of the optimal augmentation function lead to estimates with high variability, and in fact efficiency worse than CCA. In contrast, non-parametric estimation of the first two moments of the partially observed covariate, which is sufficient in the case of a linear conditional mean model, gave estimates with efficiency essentially identical to that obtained using the true optimal augmentation function. Further work is thus warranted regarding how to best non-parametrically estimate the optimal augmentation function directly for conditional mean models which are not linear.
The data analyst who adopts the assumption R ⊥ ⊥ Y |X, Z should be aware of the fact that the observed data may sometimes carry evidence to refute it (see Appendix A of supplementary material available at Biostatistics online). This can be a concern in settings where the outcome is continuous, but the covariates are discrete with few levels (see, e.g. Vansteelandt, 2009 for an example where failure to study the testability of missing data assumptions lead to a severely biased analysis); however, we tend not to worry about it in more realistic settings where the power to refute the assumption will typically be very low. Further, the data analyst should consider whether their postulated model for R|Y, Z is compatible with the restrictions imposed by the missing data assumption R ⊥ ⊥ Y |X, Z together with the conditional mean model; careful checking of the missingness model is therefore recommended.
Throughout, we have restricted our development to the case of a single partially observed covariate or vector of covariates. However, we believe the approach may be extendable to more general patterns of missingness, including non-monotone patterns, and describe how this could be done for the case of two partially observed variables in Appendix H of supplementary material available at Biostatistics online. An advantage of such an approach would be that the missingness assumption R ⊥ ⊥ Y |X, Z , where R is a vector of missingness indicators, is easier to interpret than MAR when missingness is non-monotone (Robins and Gill, 1997).
Lastly, we note that in the case of data MAR, so-called doubly robust estimators are available, which remain consistent so long as either the model for missingness or the imputation type model is correct (Tsiatis, 2006). The ACC estimator developed here does not possess such a doubly robust property, and it is indeed unclear whether such estimators exist under the assumptions considered here.

SOFTWARE
A Stata program implementingβ ACC2-WM for conditional linear mean models is available for free download by typing "net from http://missingdata.lshtm.ac.uk/stata" into Stata's command window and selecting "augcca".

ACKNOWLEDGMENTS
Conflict of Interest: None declared.