-
PDF
- Split View
-
Views
-
Cite
Cite
D Benkeser, M Carone, M J Van Der Laan, P B Gilbert, Doubly robust nonparametric inference on the average treatment effect, Biometrika, Volume 104, Issue 4, December 2017, Pages 863–880, https://doi.org/10.1093/biomet/asx053
Close -
Share
Summary
Doubly robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double robustness does not readily extend to inference. We present a general theoretical study of the behaviour of doubly robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast different methods for constructing such estimators and investigate the extent to which they may be modified to also allow doubly robust inference. We find that while targeted minimum loss-based estimation can be used to solve this problem very naturally, common alternative frameworks appear to be inappropriate for this purpose. We provide a theoretical study and a numerical evaluation of the alternatives considered. Our simulations highlight the need for and usefulness of these approaches in practice, while our theoretical developments have broad implications for the construction of estimators that permit doubly robust inference in other problems.
1. Introduction
In recent years, doubly robust estimators have gained immense popularity in many fields, including causal inference. An estimator is said to be doubly robust if it is consistent for the target parameter of interest when any one of two nuisance parameters is consistently estimated. This property gives doubly robust estimators a natural appeal: any possible inconsistency in the estimation of one nuisance parameter may be mitigated by the consistent estimation of the other. In many problems, doubly robust estimators arise naturally in the pursuit of asymptotic efficiency. Locally efficient estimators often exhibit double robustness due to the form of the efficient influence function of the estimated parameter in the statistical model considered. For many parameters that arise in causal inference, the efficient influence function assumes a doubly robust form, which may explain why doubly robust estimators arise so frequently in that area. For example, under common causal identification assumptions, the statistical parameter identifying the mean counterfactual response under a point treatment yields a doubly robust efficient influence function in a nonparametric model (Robins et al., 1994). Thus, locally efficient estimators of this statistical target parameter are naturally doubly robust. General frameworks for constructing locally efficient estimators can therefore be utilized to generate estimators that are doubly robust.
While the conceptual appeal of doubly robust estimators is clear, questions remain about how they should be constructed in practice. It has long been noted that finite-dimensional models are generally too restrictive to permit consistent estimation of nuisance parameters (Bang & Robins, 2005), but much current work on double robustness involves parametric working models and maximum likelihood estimation. Kang & Schafer (2007) showed that doubly robust estimators can be poorly behaved if both nuisance parameters are inconsistently estimated, leading to recent proposals for estimators that minimize bias resulting from misspecification (Vermeulen & Vansteelandt, 2014, 2016). While providing an improvement over conventional techniques, these estimators rely upon consistent estimation of at least one nuisance parameter using a parametric model. An alternative approach is to employ flexible, data-adaptive estimation techniques for both nuisance parameters to reduce the risk of inconsistency (van der Laan & Rose, 2011).
A general study of the behaviour of doubly robust estimators under inconsistent estimation of a nuisance parameter is needed in order to understand how to perform robust inference. This topic has not received much attention, perhaps because the problems arising from misspecification are well understood when parametric models are used. For example, if nuisance parameters are estimated using maximum likelihood, the resulting estimator of the parameter of interest is asymptotically linear even if one of the nuisance parameter models has been misspecified. Although in this scenario the asymptotic variance of the estimator may not be easy to calculate explicitly, resampling techniques may be employed for inference. When the estimator is the solution of an estimating equation, robust sandwich-type variance estimators may also be available. In contrast, when nuisance parameters are estimated using data-adaptive approaches, the implications of inconsistently estimating one nuisance parameter are much more serious. Generally, the resulting estimator is irregular, exhibits large bias and has a convergence rate slower than root-|$n$|. As we show below, the implications for inference are dire: regardless of nominal level, the coverage of naïve two-sided confidence intervals tends to zero and the Type I error rate of two-sided hypothesis tests tends to unity as the sample size increases. This phenomenon cannot simply be avoided by better variance estimation, and it occurs even when the true variance of the estimator is known exactly. Neither is the nonparametric bootstrap a remedy, as, due to the use of data-adaptive procedures, it is not generally valid for inference.
In view of these challenges, investigators may believe it simpler to use to parametric models. However, this is unappealing since both nuisance parameters, and hence also the parameter of interest, are then likely to be inconsistently estimated. The use of flexible data-adaptive techniques, such as ensemble machine learning, appears necessary if one is to have any reasonable expectation of consistency for any of the nuisance parameter estimators (van der Laan & Polley, 2007). However, because such methods are highly adaptive, research is needed into developing appropriate methods for doubly robust inference that use such tools.
A first theoretical study of the problem of doubly robust nonparametric inference is the work of van der Laan (2014), who focused on the counterfactual mean under a single time-point intervention and considered targeted minimum loss-based estimation. As the average treatment effect is the difference between two counterfactual means under different treatments, it too was directly addressed. The estimators proposed therein were shown to be doubly robust with respect to not only consistency but also asymptotic linearity. Furthermore, under regularity conditions, the analytic form of the influence function in van der Laan (2014) is known, paving the way for the construction of doubly robust confidence intervals and |$p$|-values. The proposed procedure is quite complex and has never been implemented. We are therefore motivated to study theoretically and numerically the following three questions about doubly robust nonparametric inference on an average treatment effect or, equivalently, on a counterfactual mean:
1. How badly does inconsistent nuisance parameter estimation affect inference using data-adaptive estimators, and how do estimators that allow doubly robust inference perform?
2. Can existing targeted minimum loss-based estimators be improved by new versions that require estimation of lower-dimensional nuisance parameters?
3. Can simpler alternatives to targeted minimum loss-based estimation be used to construct estimators that are doubly robust for inference and also easier to implement?
As we shall demonstrate in § 5, the answer to question 1 is that na§vely constructed confidence intervals can have very poor coverage, whereas intervals constructed based on appropriate correction procedures have coverage near their nominal level. This suggests that the methods discussed in the present paper are truly needed and may be quite useful. For question 2, we show that it is possible to reduce the dimension of the nuisance parameters introduced in the quest for doubly robust inference, which can provide finite-sample benefits relative to van der Laan (2014). More importantly, this methodological advance is likely to be critical to any extension of the methods discussed here to the setting of treatments defined by multiple time-point interventions. For question 3, we show that the most popular alternative framework to targeted minimum loss-based estimation, the so-called one-step approach, may not be used to theoretically guarantee doubly robust inference unless one knows which nuisance parameter is consistently estimated.
2. Doubly robust estimation
2.1. Notation and background
As the parameter of interest depends on |$P$| only through |$Q=Q(P)=(\bar{Q}, Q_W)$|, we will at times write |$\Psi(Q)$| in place of |$\Psi(P)$|. We will denote |$Q(P_0)$| by |$Q_0=(\bar{Q}_0,Q_{W,0})$| for short, where |$\bar{Q}_0$| is the true outcome regression and |$Q_{W,0}$| the true distribution of |$W$|. The propensity score, defined as |$g(w)=g_P(w) = {\rm pr}(A=1\mid W=w)$|, plays an important role, and throughout this paper the true propensity score |$g_0$| is assumed to satisfy |$g_0(w) > \delta$| for some |$\delta>0$| and all |$w$| in the support of |$Q_{W,0}$|. Below, we make use of empirical process notation, letting |$Pf$| denote |$\int f(o) \,{\rm d} P(o)$| for each |$P\in\mathcal{M}_0$| and |$P$|-integrable function |$f$|. We also denote by |$P_n$| the empirical distribution function based on |$O_1,\ldots,O_n$|, so |$P_n f$| is the average |$n^{-1} \sum_{i=1}^n f(O_i)$|.
2.2. Doubly robust consistency
This representation reduces the analysis of the plug-in estimator |$\Psi(Q_n)$| to the consideration of four terms. The first term, |$(P_n-P_0)D^*(Q,g)$|, is the average of |$n$| independent draws of the random variable |$D^*(Q,g)(O) - P_0 D^*(Q,g)$|, which has mean zero if either |$Q=Q_0$| or |$g=g_0$|. This observation is a simple but fundamental fact underlying doubly robust estimation. Since |$Q_{W,n}$| is known to converge to |$Q_{W,0}$|, the statement |$Q=Q_0$| is equivalent to |$\bar{Q}=\bar{Q}_0$|. The second term, |$B_n(Q_n, g_n)$|, is a first-order bias term that must be accounted for to allow |$\Psi(Q_n)$| to be asymptotically linear. The third term is an empirical process term that is often asymptotically negligible, that is, |$M_n=M_{n}(Q_n, Q, g_n, g)=o_{\rm p}(n^{-1/2})$|. This is true, for example, if |$D^*(Q_n,g_n)$| falls in a |$P_0$|-Donsker class with probability tending to 1 and |$P_0\{D^*(Q_n,g_n) - D^*(Q,g)\}^2$| converges to zero in probability. For a comprehensive reference on empirical processes, we refer readers to van der Vaart & Wellner (1996). Finally, the fourth term, |$R_n=R(Q_n,Q_0,g_n,g_0)$|, is the remainder from the linearization. By inspection, this term tends to zero at a rate determined by how fast the nuisance functions |$\bar{Q}_0$| and |$g_0$| are estimated.
This approach appeared in Ibragimov & Has’minskii (1981) and Pfanzagl (1982), and is the infinite-dimensional extension of the one-step Newton–Raphson technique for efficient estimation in parametric models. In this paper, the efficient influence function is a linear function of the parameter of interest. As such, the one-step estimator equals the solution of the optimal estimating equation for this parameter and is therefore equivalent to the so-called augmented inverse probability of treatment estimator (Robins et al., 1994; van der Laan & Robins, 2003). Owing to their simple construction, one-step estimators are generally computationally convenient, though this convenience comes at a cost, as the one-step correction may produce estimates outside the parameter space, such as probability estimates either below zero or above one. Targeted minimum loss-based estimation, formally developed in van der Laan & Rubin (2006) and comprehensively discussed in van der Laan & Rose (2011), provides an algorithm to convert |$Q_n$| into a targeted estimator |$Q_n^*$| of |$Q_0$| such that |$B_n(Q_n^*,g_n)=0$|, which may then be used to define the targeted plug-in estimator |$\psi_n^*=\Psi(Q_n^*)$|. The first update of |$Q_n$| in this recursive scheme consists of the minimizer of an empirical risk over a least favourable submodel through |$Q_n$|. The process is iterated using updated versions of |$Q_n$| until convergence to yield |$Q_n^*$|. In the problem considered here, convergence occurs in a single step. By virtue of being a plug-in estimator, |$\psi_n^*$| may exhibit improved finite-sample behaviour relative to its one-step counterpart (Porter et al., 2011).
The large-sample properties of both |$\psi_n^+$| and |$\psi_n^*$| can be studied through (1). As above, suppose that the empirical process term |$M_{n}$| is asymptotically negligible. If both |$Q_0$| and |$g_0$| are estimated consistently, so that |$Q=Q_0$| and |$g=g_0$|, and if estimation of these nuisance functions is fast enough to ensure that the remainder term |$R_{n}$| is asymptotically negligible, then |$\psi_n^+$| is asymptotically linear with influence function equal to |$D^*(Q_0,g_0)$|, and thus it is asymptotically efficient. The same can be said of |$\psi_n^*$| if the same conditions on |$M_n$| and |$R_n$| hold with |$Q_n$| replaced by |$Q_n^*$|. If only one of |$Q=Q_0$| or |$g=g_0$| holds, it is impossible to guarantee the asymptotic negligibility of the remainder term, even if both |$Q$| and |$g$| lie in parametric models. Nevertheless, provided |$Q=Q_0$| or |$g=g_0$|, under very mild conditions, the remainder term |$R_n$| based on either |$Q_n$| or |$Q_n^*$| tends to zero in probability, and the empirical process term |$M_n$| remains asymptotically negligible. Since |$D^*(Q,g)(O)$| has mean zero if either |$Q=Q_0$| or |$g=g_0$| and has finite variance, the central limit theorem implies that |$(P_n-P_0)D^*(Q,g)$| is |$O_{\rm p}(n^{-1/2})$|, so both |$\psi_n^+$| and |$\psi_n^*$| are consistent estimators of |$\psi_0$|. This is so-called double robustness: consistent estimation of |$\psi_0$| if either of the nuisance functions |$Q_0$| or |$g_0$| is consistently estimated.
2.3. Doubly robust asymptotic linearity
Doubly robust asymptotic linearity is a more stringent requirement than doubly robust consistency. It is also arguably more important, since without it the construction of valid confidence intervals and tests may be very difficult, if not impossible. A careful study of |$R_n$| is required to determine how doubly robust inference may be obtained.
When both the outcome regression and the propensity score are consistently estimated, |$R_n$| is a second-order term consisting of the product of two differences, both tending to zero. Hence, provided |$\bar{Q}_0$| and |$g_0$| are estimated sufficiently fast, |$R_{n} = {o}_{\rm p}(n^{-1/2})$|. This holds, for example, if both |$\bar{Q}_{n} - \bar{Q}_{0}$| and |$g_{n} - g_{0}$| are |$o_{\rm p}(n^{-1/4})$| with respect to the |$L^2(P_0)$|-norm. If only one of the outcome regression or propensity score is consistently estimated, one of the differences in |$R_n$| does not tend to zero. Consequently, |$R_n$| is either of the same order as or tends to zero more slowly than |$(P_n-P_0)D^*(Q,g)$|. As such, |$R_n$| at least contributes to the first-order behaviour of the estimator and may determine it entirely. If a correctly specified parametric model is used to estimate either |$\bar{Q}_0$| or |$g_0$|, the delta method generally implies that |$R_n$| is asymptotically linear. In this case, both |$\psi_n^+$| and |$\psi_n^*$| are also asymptotically linear, though their influence function is the sum of two terms: |$D^*(Q,g) - P_0 D^*(Q,g)$| and the influence function of |$R_n$| as an estimator of zero. Correctly specifying a parametric model is seldom feasible in realistic settings, however, so it may be preferable to use adaptive estimators of the nuisance functions. In such a situation, whenever one nuisance parameter is inconsistently estimated, the remainder term |$R_n$| tends to zero slowly and thus dominates the first-order behaviour of the estimator of |$\psi_0$|. Therefore, in this case the one-step and targeted minimum loss-based estimators are doubly robust with respect to consistency but not with respect to asymptotic linearity.
3. Doubly robust inference via targeted minimum loss-based estimation
3.1. Existing construction
Expression (2) is the bivariate regression of the true propensity of treatment on an outcome regression and a propensity score, whereas (3) is the univariate regression of the residual from an outcome regression on a propensity score in the treated subgroup. The subscript |$r$| emphasizes that these nuisance parameters are of reduced dimension relative to |$g_0$| and |$\bar{Q}_0$|. This dimension reduction is critical since it essentially guarantees that consistent estimators of these parameters can be constructed in practice. For example, we may be unable to consistently estimate |$g_0$|, which is a function of the entire vector of potential confounders; however, we can guarantee consistent estimation of |$g_{0,r}$|, which involves only a bivariate summary of |$W$|.
3.2. Novel reduced-dimension construction
We now show how to theoretically improve upon the proposal of van der Laan (2014) through an alternative formulation of a targeted minimum loss-based estimator. We derive an approximation of the remainder that relies on alternative nuisance parameters of lower dimension than those presented in § 3.1. This makes the estimation problem involved more feasible and may also pave the way to a generalization of this work to settings with longitudinal treatments.
These are univariate regressions, in contrast to the bivariate regression |$g_{0,r}(\bar{Q},g)$| described in § 3.1 and used in van der Laan (2014). Nonparametric estimators of these univariate parameters often achieve better rates than those proposed therein. Use of this alternative representation yields estimators guaranteed to be asymptotically linear under weaker conditions than previously required.
An algorithm to construct nuisance estimators that satisfy (5) can be devised based on targeted minimum loss-based estimation. Without loss of generality, suppose |$0 \leqslant Y \leqslant 1$|. Defining |$H_1(g)(a,w) = a / g(w)$|, |$H_2(g_1,g_2)(a,w) = a g_2(w)/g_1(w)$| and |$H_3(\bar{Q},g)(w) = \bar{Q}(w)/g(w)$|, we implement the following recursive procedure.
Construct initial estimates |$\bar{Q}_n^0$| and |$g_n^0$| of |$\bar{Q}_0$| and |$g_0$|, and set |$k=0$|.
Construct estimates |$g^{k}_{1,n,r}$| and |$g^{k}_{2,n,r}$| of |$g_{1,0,r}$| and |$g_{2,0,r}$| based on |$g_n^k$| and |$\bar{Q}_n^{k,\circ}$|.
Construct estimates |$\bar{Q}^{k}_{n,r}$| of |$\bar{Q}_{0,r}$| based on |$g_n^k$| and |$\bar{Q}_n^{k+1}$|.
Set |$k=k+1$| and iterate Steps 1–6 until |$K$| is large enough that |$P_n D^*(Q_n^{K}, g_n^K) \approx P_n D_A(\bar{Q}_{n,r}^K, g_n^K) \approx P_n D_Y(\bar{Q}_n^K, g_{1,n,r}^K, g_{2,n,r}^K) \approx 0$|.
Set |$\bar{Q}_{n,r}^* = \bar{Q}_{n,r}^K$|, |$g_n^*=g_n^K$|, |$\bar{Q}_{n,r}^* = \bar{Q}_{n,r}^K$|, |$g_{1,n,r}^* = g_{1,n,r}^K$| and |$g_{2,n,r}^*= g_{2,n,r}^K$|.
Theorem 1 implies that doubly robust confidence intervals and tests can readily be crafted. For example, the Wald construction |$\psi_n^{*,{\rm c}} \pm z_{1-\alpha/2} \sigma_n n^{-1/2}$| is a doubly robust |$100\times(1-\alpha)\%$| asymptotic confidence interval for |$\psi_0$|, and prescribing rejection of the null hypothesis |$\psi_0=\psi^\circ$| versus the alternative |$\psi_0\neq\psi^\circ$| only when |$| n^{1/2}(\psi_n^{*,{\rm c}}-\psi^\circ)/\sigma_n|>z_{1-\alpha/2}$| constitutes a doubly robust hypothesis test with asymptotic level |$\alpha$|. Thus, valid statistical inference is preserved when one nuisance parameter is inconsistently estimated, in contrast to conventional doubly robust estimation, wherein only consistency is preserved.
4. Doubly robust inference via one-step estimation
In § 2 we discussed the construction of doubly robust, locally efficient estimators of |$\psi_0$| and argued that both the one-step approach and targeted minimum loss-based estimation can be used for bias correction. For the sake of constructing asymptotically efficient estimators, these two strategies are generally considered to be alternatives, with targeted minimum loss-based estimation possibly delivering better finite-sample behaviour but the one-step approach often simpler to implement. In § 3, we outlined how the bias-correction feature of the targeted minimum loss-based estimation framework could be leveraged to achieve doubly robust asymptotic linearity and thus perform doubly robust inference. Since targeted minimum loss-based estimation can be more complicated to implement than the one-step correction procedure, it is natural to wonder whether a one-step approach could also account for the additional bias terms that result from the inconsistent estimation of either |$\bar{Q}_0$| or |$g_0$|. If so, the resulting one-step estimator could provide a computationally convenient alternative to the algorithm described in § 3.2.
The one-step estimator |$\psi_n^{+,{\rm c}}$| corrects for both inconsistent estimation of |$\bar{Q}_0$| and inconsistent estimation of |$g_0$|. However, for consistent estimation of |$\psi_0$|, no more than one of these two nuisance parameters can in reality be inconsistently estimated. In this case there is necessarily overcorrection in |$\psi_n^{+,{\rm c}}$|, and it is not a priori obvious whether this may be detrimental to the behaviour of the estimator. Elucidating this issue requires a careful study of each of the two bias-correction terms in settings where they are not in fact needed. For example, the term |$B_{A,n}(\bar{Q}_{n,r}, g_n)$|, used to correct for bias resulting from inconsistent estimation of |$\bar{Q}_0$|, must be analysed in the scenario where it is actually |$g_0$| that has been inconsistently estimated.
This warrants further discussion. The above theory shows that the targeted minimum loss-based estimation framework is able to simultaneously account for inconsistent estimation of either the outcome regression or the propensity score without the need to know which is required. In contrast, the one-step approach requires knowledge of which nuisance parameter is possibly inconsistently estimated to achieve doubly robust asymptotic linearity. Without this knowledge, asymptotic linearity cannot be guaranteed in a doubly robust fashion. This is relevant for future work to derive procedures for doubly robust inference on other parameters admitting doubly or multiply robust estimators.
5. Simulation study
5.1. Data-generating mechanism and crossvalidation set-up
In each of the simulations below, the baseline covariate vector |$W = (W_1, W_2)$| had independent components with |$W_1$| distributed according to a uniform distribution over the interval |$(-2,+2)$| and |$W_2$| distributed as a binary random variable with success probability |$1/2$|. The conditional probability of |$A=1$| given |$W=(w_1,w_2)$| was |$g_0(w_1,w_2)=\mbox{expit}(-w_1 + 2 w_1 w_2)$|. The outcome |$Y$| was a binary random variable whose conditional probability of occurrence given |$A=a$| is |$\bar{Q}_0(a,w) = \mbox{expit}(0{\cdot}2a - w_1 + 2 w_1 w_2)$|.
We implemented and compared the performance of the following six estimators:
(i) the standard, uncorrected targeted minimum loss-based estimator;
(ii) the corrected targeted minimum loss-based estimator using bivariate nuisance regression, as proposed by van der Laan (2014);
(iii) the corrected targeted minimum loss-based estimator using univariate nuisance regressions, as introduced in Theorem 1;
(iv) the standard, uncorrected one-step estimator, commonly referred to as the augmented inverse probability weighted estimator;
(v) the corrected one-step estimator using bivariate nuisance regression;
(vi) the corrected one-step estimator using univariate nuisance regressions, as given in (6).
We evaluated these estimators in the following three scenarios.
I. Only the outcome regression is consistently estimated.
II. Only the propensity score is consistently estimated.
III. Both the outcome regression and the propensity score are consistently estimated.
The consistently estimated nuisance parameter, either the outcome regression or the propensity score, was estimated using a bivariate Nadaraya–Watson estimator with bandwidth selected by crossvalidation, while the inconsistently estimated nuisance parameter was estimated using a logistic regression model with main terms only, thus ignoring the interaction between |$W_1$| and |$W_2$|. The reduced-dimension nuisance parameters required for the additional correction procedure involved in computing estimators (ii), (iii), (v) and (vi) were estimated using the Nadaraya–Watson estimator with bandwidth selected by leave-one-out crossvalidation (Racine & Li, 2004). For scenarios I and II, we considered sample sizes |$n=250, 500, 1000, 3000, 5000, 9000$|. For scenario III, theory dictates that all estimators considered should be asymptotically equivalent, so we used only the sample sizes |$n=100, 250, 500, 750, 1000$|. For each |$n$| we generated 5000 datasets. We summarized estimator performance in terms of bias, bias times |$\surd n$|, coverage of 95% confidence intervals, and accuracy of the standard error estimator, which we assessed by comparing the Monte Carlo variance of the estimator and the average estimated variance across simulations. We examined the following hypotheses based on our theoretical work.
(A) In scenarios I and II, the bias of estimators (i), (iv), (v) and (vi) tends to zero more slowly than |$n^{-1/2}$|, whereas that of estimators (ii) and (iii) does so faster than |$n^{-1/2}$|.
(B) In scenarios I and II, the slow convergence of the bias for estimators (i), (iv), (v) and (vi) adversely affects the nominal confidence interval coverage, whereas the corrected targeted minimum loss-based estimators (ii) and (iii) have asymptotically nominal coverage.
(C) In scenarios I and II, influence function-based variance estimators are accurate for the corrected estimators (ii), (iii), (v) and (vi), but not for the uncorrected estimators (i) and (iv).
(D) In scenario III, all estimators exhibit approximately the same behaviour.
5.2. Results
We first focus on the results for scenario I, in which only the outcome regression is consistently estimated. In Fig. 1(a), the bias of each estimator tends to zero, illustrating the conventional double robustness of these estimators. However, Fig. 1(b) supports hypothesis (A), as the bias of the uncorrected estimators tends to zero at a slower rate than |$n^{-1/2}$|, while the bias of the corrected targeted minimum loss-based estimators tends to zero faster. The bias of the corrected one-step estimators is reduced relative to the uncorrected estimators, and for the sample sizes considered we do not yet see the expected divergence in the bias when inflated by |$n^{1/2}$|. Figure 1(c) indicates strong support for hypothesis (B): the coverage of intervals based on the uncorrected estimators is not only far from the nominal level but also U-shaped, suggesting worsening coverage in larger samples, as is expected based on our arguments in § 2.3. Intervals based on the corrected estimators have approximately nominal coverage in moderate and large samples. Figure 1(d) indicates that the variance estimators for the uncorrected estimators are liberal, which contributes to the poor coverage of intervals. The variance estimators for the corrected estimators are approximately accurate in larger samples, thus supporting hypothesis (C).
Simulation results when only the outcome regression is consistently estimated, with the following performance measures plotted against the sample size |$n$|: (a) bias; (b) |$\surd n\,\times\:$|bias; (c) coverage of 95% confidence intervals; (d) accuracy of the standard error estimator. Squares represent estimators that do not account for inconsistent nuisance parameter estimation, circles represent estimators using the bivariate correction of van der Laan (2014), and triangles represent estimators using the proposed univariate corrections.
Simulation results when only the outcome regression is consistently estimated, with the following performance measures plotted against the sample size |$n$|: (a) bias; (b) |$\surd n\,\times\:$|bias; (c) coverage of 95% confidence intervals; (d) accuracy of the standard error estimator. Squares represent estimators that do not account for inconsistent nuisance parameter estimation, circles represent estimators using the bivariate correction of van der Laan (2014), and triangles represent estimators using the proposed univariate corrections.
Figure 2 summarizes the results for scenario II. In Fig. 2(b) we again see that the bias of the uncorrected estimators tends to zero more slowly than |$n^{-1/2}$|; this is also true of the corrected one-step estimators. In contrast, the bias of the corrected targeted minimum loss-based estimators appears to converge to zero faster than |$n^{-1/2}$|. Figure 2(c) partially supports hypothesis (B): intervals based on the uncorrected estimators achieve near-nominal coverage for moderate and large sample sizes in spite of the large bias. However, we again find the expected U-shape, with an eventual downturn in coverage as the sample size increases further. Intervals based on the corrected targeted minimum loss-based estimators using bivariate nuisance regression have improved coverage throughout, and intervals based on either corrected targeted minimum loss-based estimator have nearly nominal coverage in larger samples. Intervals based on the corrected one-step estimator with the univariate correction achieve approximately nominal coverage, while those based on the one-step estimator with bivariate correction do not, probably due to larger bias. Figure 2(d) shows that the variance estimator for the uncorrected one-step estimator is conservative, whereas that based on the uncorrected targeted minimum loss-based estimator is approximately accurate. The variance estimators based on the corrected one-step or targeted minimum loss-based estimators appear to be valid throughout, although that based on the latter using univariate nuisance regressions appears to be liberal in smaller samples.
Simulation results when only the propensity score is consistently estimated: (a) bias, (b) |$\surd n\,\times\:$|bias, (c) coverage of 95% confidence intervals, and (d) accuracy of the standard error estimator plotted against |$n$|. Squares represent estimators that do not account for inconsistent nuisance parameter estimation, circles represent estimators using the bivariate correction of van der Laan (2014), and triangles represent estimators using the proposed univariate corrections.
Simulation results when only the propensity score is consistently estimated: (a) bias, (b) |$\surd n\,\times\:$|bias, (c) coverage of 95% confidence intervals, and (d) accuracy of the standard error estimator plotted against |$n$|. Squares represent estimators that do not account for inconsistent nuisance parameter estimation, circles represent estimators using the bivariate correction of van der Laan (2014), and triangles represent estimators using the proposed univariate corrections.
Finally, Fig. 3 supports hypothesis (D): when both the propensity score and the outcome regression are consistently estimated, all of the estimators perform approximately equally well, even in smaller samples. This suggests that implementing the correction procedures needed to achieve doubly robust asymptotic linearity and inference does not come at the cost of estimator performance in situations where the additional corrections are not needed.
Simulation results when both the outcome regression and the propensity score are consistently estimated: (a) bias, (b) |$\surd n\,\times\:$|bias, (c) coverage of 95% confidence intervals, and (d) accuracy of the standard error estimator plotted against |$n$|. Squares represent estimators that do not account for inconsistent nuisance parameter estimation, circles represent estimators using the bivariate correction of van der Laan (2014), and triangles represent estimators using the proposed univariate corrections.
Simulation results when both the outcome regression and the propensity score are consistently estimated: (a) bias, (b) |$\surd n\,\times\:$|bias, (c) coverage of 95% confidence intervals, and (d) accuracy of the standard error estimator plotted against |$n$|. Squares represent estimators that do not account for inconsistent nuisance parameter estimation, circles represent estimators using the bivariate correction of van der Laan (2014), and triangles represent estimators using the proposed univariate corrections.
The run-times for the targeted minimum loss-based estimators (ii) and (iii) are on average two to three times as long as those of their one-step counterparts. The run-time required for the bivariate correction of estimators (ii) and (v) is on average one and a half times as long as the univariate correction for estimators (iii) and (iv). Two additional simulation studies in the Supplementary Material compare the proposed estimators with existing doubly robust estimators. The results demonstrate the advantage of estimators that allow for flexible nuisance parameter estimation in complex data-generating mechanisms. Results from a simulation study including a greater number of covariates, reported in the Supplementary Material, suggest a potential reduction in finite-sample bias for the proposed univariate-corrected targeted minimum loss-based estimator relative to the bivariate-corrected estimator of van der Laan (2014).
6. Concluding remarks
An interesting finding of this work is that it is possible to theoretically guarantee doubly robust inference under mild conditions using targeted minimum loss-based estimation, though not with the one-step approach. While we have found the corrected one-step estimators to perform relatively well in simulations, we cannot expect this in all scenarios, since theory suggests otherwise. Therefore, the preferred approach to providing doubly robust inference, in spite of its computational complexity, may be targeted minimum loss-based estimation. These methods are implemented in the R (R Development Core Team, 2017) package drtmle, available from the Comprehensive R Archive Network (Benkeser, 2017).
It may be fruitful to incorporate universally least favourable parametric submodels (van der Laan, 2016) into the targeted minimum loss-based estimation algorithms used here. Such submodels facilitate the construction of estimators using minimal data-fitting in the bias-reduction step of the algorithm. Rather than requiring iterations to perform bias reduction, use of these submodels would yield algorithms that converge in only a single step. This could yield improved performance in finite samples, particularly in extensions of this work to more complex parameters, such as average treatment effects defined by longitudinal interventions.
Acknowledgement
The authors thank the associate editor and reviewer for helpful suggestions. This work was a portion of the PhD thesis of Benkeser, supervised by Carone and Gilbert and funded by the Bill and Melinda Gates Foundation. Carone, van der Laan and Gilbert were partially funded by the National Institute of Allergy and Infectious Diseases. Carone was also funded by the University of Washington Department of Biostatistics Career Development Fund.
Supplementary material
Supplementary material available at Biometrika online includes results from an additional simulation study.
Appendix A
First-order expansion of the remainder
If, for example, each of |$\bar{Q}_{0n,r} - \bar{Q}_{n,r}$|, |$\bar{Q}_{n,r} - \bar{Q}_{0,r}$| and |$g_n - g_0$| is |$o_{\rm p}(n^{-1/4})$| in the |$L^2(P_0)$|-norm, then |$R_{3n}$| and |$R_{4n}$| are |$o_{\rm p}(n^{-1/2})$|. Furthermore, if |$D_A(\bar{Q}_{n,r}, g_{n})$| falls in a |$P_0$|-Donsker class with probability tending to one and |$P_0\{D_A(\bar{Q}_{n,r}, g_{n}) - D_A(\bar{Q}_{0,r}, g_{0})\}^2 = o_{\rm p}(1)$|, then |$M_{1n} = o_{\rm p}(n^{-1/2})$|.
If, for example, each of |$\bar{Q}_n - \bar{Q}_0$|, |$g_{0n,r} - g_{0,r}$| and |$g_{n,r} - g_{0,r}$| is |$o_{\rm p}(n^{-1/4})$| in the |$L^2(P_0)$|-norm, then |$R_{5n}$|, |$R_{6n}$| and |$R_{7n}$| are |$o_{\rm p}(n^{-1/2})$|. Furthermore, if |$D_Y(\bar{Q}_n, g_{n,r})$| falls in a |$P_0$|-Donsker class with probability tending to one and |$P_0 \{D_Y(\bar{Q}_n, g_{n,r}) - D_Y(\bar{Q}_0, g_{0,r}) \}^2 = o_{\rm p}(1)$|, then |$M_{2n}=o_{\rm p}(n^{-1/2})$|. Thus, we have shown that (4) holds with |$R_n^* = R_{1n} + R_{2n}$|, |$R_{Q,n} = R_{3n} + R_{4n} + M_{1n}$| and |$R_{g,n} = R_{5n} + R_{6n} + R_{7n} + M_{2n}$|.
Appendix B
Derivation of the reduced-dimension remainder representation
If, for example, each of |$\bar{Q}_n - \bar{Q}_0$|, |$g_{2,0n,r} - g_{2,0,r}$| and |$g_{2,n,r} - g_{2,0,r}$| is |$o_{\rm p}(n^{-1/4})$| in the |$L^2(P_0)$|-norm, it generally follows that |$\tilde{R}_{5n}$| and |$\tilde{R}_{6n}$| are |$o_{\rm p}(n^{-1/2})$|. Furthermore, if |$D_Y(\bar{Q}_n, g_{1,n,r}, g_{2,n,r})$| falls in a |$P_0$|-Donsker class with probability tending to one and |$P_0\{D_Y(\bar{Q}_n, g_{1,n,r}, g_{2,n,r}) - D_Y(\bar{Q}_0, g_{1,0,r}, g_{2,0,r})\}^2 = o_{\rm p}(1)$|, it also follows that |$\tilde{M}_{2n}=o_{\rm p}(n^{-1/2})$|. This implies that (4) holds with |$R_n^* = R_{1,n} + R_{2,n}$|, |$R_{Q,n} = R_{3n} + R_{4n} + M_{1n}$| and |$R_{g,n} = \tilde{R}_{5n} + \tilde{R}_{6n} + \tilde{M}_{2n}$| when the alternative reduced-dimension parameterization of the remainder is used.
Appendix C
Behaviour of unnecessary correction terms
References



