Abstract

We present a method for improving estimation in linear regression models in samples of moderate size, using shrinkage techniques. Our work connects the theory of causal inference, which describes how variable adjustment should be performed with large samples, with shrinkage estimators such as ridge regression and the least absolute shrinkage and selection operator (LASSO), which can perform better in sample sizes seen in epidemiologic practice. Shrinkage methods reduce mean squared error by trading off some amount of bias for a reduction in variance. However, when inference is the goal, there are no standard methods for choosing the penalty “tuning” parameters that govern these tradeoffs. We propose selecting the penalty parameters for these shrinkage estimators by minimizing bias and variance in future similar data sets drawn from the posterior predictive distribution. Our method provides both the point estimate of interest and corresponding standard error estimates. Through simulations, we demonstrate that it can achieve better mean squared error than using cross-validation for penalty parameter selection. We apply our method to a cross-sectional analysis of the association between smoking and carotid intima-media thickness in the Multi-Ethnic Study of Atherosclerosis (multiple US locations, 2000–2002) and compare it with similar analyses of these data.

When studying the association between an exposure and an outcome with regression, the accuracy of inference depends critically on the choice of adjustment variables. When there is clear information about causal relationships, it is now well understood how this choice of adjustment affects the large-sample consistency of the regression estimate (1). However, when we have modest sample sizes and/or many potential confounders, an approach that includes all the variables suggested by large-sample causal considerations may not be optimal. What to do in these settings remains largely unclear and is a source of confusion in practice.

We consider this problem in the setting of the classical linear model  
y=βx+γ1z1++γpzp+ϵ,
(1)
with outcome of interest y, exposure of interest x, independent error ϵ, and potential confounders z1, . . ., zp determined from prior knowledge, following the approach of, for example, Pearl (2). Interest lies in the association, denoted β, between x and y adjusted for confounders; estimates of β are evaluated by their mean squared error (MSE), the sum of their squared bias and variance. The fully adjusted ordinary least squares (OLS) estimates of the regression parameters are those values βˆOLS=(βˆ,γˆ1,,γˆp) that minimize the quantity  
i=1n(yixiβˆz1γˆ1zpγˆp)2.
(2)
Among linear unbiased estimators, βˆOLS has minimum variance and, thus, minimum MSE (3). However, strong correlations among the confounders may cause βˆOLS, although unbiased, to be unstable with high variance in small samples sizes. Stable biased estimators may outperform βˆOLS by having lower variances that compensate for their small but nonzero bias, leading to lower MSE (3, 4).

To pick a set of adjustment covariates when using OLS estimators, a variety of model selection methods are used in the epidemiology literature, including stepwise procedures, screening variables using significance tests or changes in estimates, and choosing models based on information criteria (59). Many of these procedures do not directly address questions of inference and can work poorly in practice (6, 10). Moreover, model-selection steps are often ignored when making inference in the final model, which can result in precision being overstated (10, 11).

Shrinkage estimators

Shrinkage methods can provide improvements in MSE by trading off bias in the estimator for reductions in its variance. They are not widely used in epidemiologic practice (8), although a recent application of note is to use shrinkage estimators as an alternative to high-dimensional propensity score models (12). Here, we consider shrinkage estimators based on ridge regression (4) and the least absolute shrinkage and selection operator (LASSO) (13). These estimators are the values (βˆ,γˆ1,,γˆp) that minimize  
i=1n(yixiβˆz1γˆ1zpγˆp)2+λj=1p|γˆj|ν,
(3)
where ν=1 for the LASSO and ν=2 for ridge regression. The second summation term penalizes large values of the coefficients γˆj so that they are shrunk toward 0 by an amount controlled by penalty parameter λ. Unlike ridge regression, the LASSO may shrink some coefficients to 0, hence performing variable selection as well as estimation. In model 3 (equation 3 above), only the confounders z1, . . ., zp are penalized, not the coefficient of x. In this way, we select estimators of β that lie between the fully adjusted OLS estimate (equivalent to λ=0) and the unadjusted OLS estimate from a model without any confounder adjustment (equivalent to λ=) (4).

To apply these shrinkage estimators, we must select the penalty parameter λ. In other settings, this is frequently done by cross-validation (14), but cross-validation does not address our goal of parameter estimation, because it minimizes the error of observations around their mean and not the error of the estimate βˆ around the true effect β. Here we present our method, which addresses the primary goal of estimation of β.

Bayesian adjustment for confounding

Bayesian inference uses posterior distributions, which combine prior information about parameters with information from the observed data. A Bayesian analog to model selection is model averaging, which combines models with different confounders (weighted according to their posterior probabilities) to provide a single estimate (15). It is used in Bayesian adjustment for confounding (BAC), which fits a set of treatment models (i.e., regression models with the exposure as the dependent variable) and forces covariates in the treatment model to also be included in the outcome model (1618). However, by this approach, variables that are uninformative for the outcome may nevertheless be included unnecessarily in the outcome model because of their strong correlation with the exposure.

METHODS

We present a new method for selecting the penalty parameter λ in model 3 (equation 3 above) that minimizes a combination of the shrinkage estimator’s bias and variance. Because the true model parameters are unknown, the bias and variance of potential estimators cannot be calculated directly; however, for a given λ, we can compute what we term future bias (fBias(λ)) and future variance (fVar(λ)): bias and variance computed on potential data sets generated similarly to the observed data. We take a Bayesian approach for computing the optimal value of λ—but not one that requires model averaging—by obtaining potential data sets as samples from the posterior predictive distribution.

To implement this method, we consider model 1 with conjugate normal priors for the coefficients β,γ1,,γp and an inverse gamma prior for ϵ. We assume that the model contains all confounding variables in the correct functional form, although it may contain additional variables that are not confounders. For a candidate λ value, we evaluate fBias(λ) and fVar(λ) on data drawn from the posterior predictive distribution. We define λˆ as the value that minimizes the posterior expectation of fMBV(λ), the maximum of fBias2 and fVar. The choice of fMBV(λ) targets settings in which fBias2 and fVar are equal, as opposed to the alternative loss criterion future MSE (fMSE(λ)), which is the sum of fBias2 and fVar and is used for comparison in the simulations below. Our approach could be adapted to more general functions of fBias(λ) and fVar(λ), although we do not do so here. We obtain an estimate of β from the original data by applying ridge regression or the LASSO, using λˆ. We call these estimators ridge-fMBV and LASSO-fMBV, respectively.

In brief, the procedure can be outlined as follows:

  1. Draw a sample from the posterior distribution of β,γ1,,γp.

  2. For a range of candidate λ values, compute the average value of fMBV(λ) across the posterior sample.

  3. Select the value λˆ that minimizes the averages from step 2.

  4. Apply ridge regression or LASSO to the original data, using λˆ.

Although the approach is conceptually the same for ridge and LASSO estimators, closed-form expressions for fBias and fVar simplify step 2 for ridge regression, whereas computing fMBV(λ) for LASSO requires simulating additional data sets from the posterior predictive distribution. A formal description of this algorithm is provided in Web Appendix 1 (available at https://academic.oup.com/aje), and the method is implemented in R (R Foundation for Statistical Computing, Vienna, Austria) in the publicly available package “eshrink.”

To conduct inference about β using the ridge-fMBV and LASSO-fMBV estimators, we compute standard errors and confidence intervals. We estimate the standard error by using the posterior standard deviation of the expected values of the estimator, which accounts for the variability in choosing λˆ. Because shrinkage estimators are inherently biased, a standard symmetrical confidence interval may not have correct coverage. To obtain an interval with improved coverage, we use an “inverting the test” approach (i.e., constructing a set of β under which the observed value would not be extreme). Technical details for estimating standard errors and confidence intervals are provided in Web Appendix 1.

SIMULATIONS

Simulation setup

We demonstrate the behavior of our estimators in 5 settings. We compare them with the unadjusted OLS estimator; the fully adjusted OLS estimator; BAC (via the approximate procedure available in the BACprior R package (19)); the posterior mean from a standard Bayesian analysis; the OLS estimators corresponding to the models selected by minimizing Akaike’s information criterion and the Bayesian information criterion among all possible subsets of confounder combinations; the estimator from a backwards elimination procedure; an inverse probability weighting estimator; the estimator selected by the procedure of Crainiceanu et al. (20); LASSO and ridge regression with λ selected by 10-fold cross-validation (21), where we do not penalize the coefficient of x; and ridge regression, with λ selected by generalized cross-validation (22). We also compare our estimators’ behaviors with that of an oracle ridge estimator that uses the true value of β (instead of values in a posterior sample) to choose λˆ. Additionally, we present results for the ridge-fMSE and LASSO-fMSE estimators, which use fMSE(λ) as the loss function in place of fMBV(λ).

In our simulations, we generated outcomes independently from the linear model (model 1) with sample size 100, β=1, ϵ normally distributed with variance 1, and γ values that varied by simulation. In simulations 1, 2, and 3, there were 6 confounders with moderate, strong, and weak effects, respectively. We then considered a more complex situation with 12 putative confounders in simulations 4 and 5, which had moderate and weak confounding effects, respectively. Web Appendix 2 and Web Figure 1 give details of the confounding effects and correlation structure, respectively. The MSE of the estimators was evaluated using 1,000 replications. To compute λˆ, we used a posterior sample of size 2,000 and we used 500 draws from the posterior predictive distribution for LASSO.

Results

The MSE and bias of the estimators for each simulation are provided in Table 1. In simulation 1, the ridge-fMBV and LASSO-fMBV estimators had the smallest MSE among all (nonoracle) estimators (MSE = 0.060 and MSE = 0.059, respectively). This is a substantial reduction in MSE compared with the fully adjusted estimator (MSE = 0.088) and corresponds to increasing the sample size by 46%. The 2 estimators performed markedly better than the ridge-fMSE and LASSO-fMSE estimators, which had MSEs of 0.072 and 0.075, respectively. Despite the similar MSE, the ridge-fMBV estimator had less bias than the LASSO-fMBV estimator (0.07 and 0.11, respectively). Selecting λ by cross-validation for both LASSO and ridge estimators resulted in estimates almost identical to the unadjusted estimate. The estimators from the information criteria approaches, backwards elimination, and the method of Crainiceanu et al. (20), which do not penalize coefficients but may not include all variables in the model, performed slightly better (MSEs between 0.076 and 0.085) than full adjustment but did not achieve as small an MSE as the shrinkage estimators. Using BAC achieved an MSE similar to that of the fully adjusted model. Figure 1 highlights the tradeoff of bias for variance that ridge-fMBV and LASSO-fMBV make to achieve lower MSE.

Table 1.

Mean Squared Error and Bias for Estimates of β in the Simulations

EstimatorMSE (×10)Bias
1234512345
Unadjusted 0.92 8.76 0.32 2.02 0.75 0.29 0.93 0.15 0.438 0.255 
Fully adjusted 0.88 0.88 0.88 0.99 0.99 −0.02 −0.02 −0.02 −0.004 −0.004 
AIC, all subsets 0.79 1.24 0.52 0.96 0.78 0.13 0.12 0.08 0.054 0.169 
BIC, all subsets 0.85 2.04 0.36 0.94 0.77 0.24 0.33 0.13 0.129 0.238 
BAC 0.88 0.88 0.88 0.93 0.85 −0.02 −0.02 −0.02 0.031 0.030 
Posterior mean 0.88 0.88 0.88 0.99 0.99 −0.02 −0.02 −0.02 −0.004 −0.004 
IPW 1.06 6.90 0.86 1.02 0.87 0.17 0.78 0.10 0.236 0.203 
Backwards elimination 0.76 1.15 0.58 0.96 0.78 0.10 0.12 0.06 0.039 0.142 
CDP 0.76 4.40 0.43 0.97 0.85 0.18 0.59 0.08 0.154 0.127 
LASSO           
 fMBV 0.59 1.14 0.41 0.67 0.62 0.11 0.19 0.06 0.095 0.116 
 fMSE 0.75 1.02 0.60 0.81 0.77 0.05 0.03 0.03 0.054 0.068 
 CV 0.92 5.99 0.32 1.27 0.75 0.29 0.75 0.15 0.332 0.255 
Ridge           
 fMBV 0.60 1.02 0.50 0.75 0.66 0.07 0.16 0.03 0.096 0.075 
 fMSE 0.72 0.97 0.62 0.86 0.79 0.04 0.05 0.02 0.058 0.047 
 CV 0.92 7.00 0.32 1.49 0.75 0.29 0.82 0.15 0.371 0.255 
 GCV 0.87 0.90 0.66 0.79 0.86 0.08 0.08 0.05 0.047 0.039 
 Oracle 0.50 0.81 0.27 0.61 0.50 0.13 0.06 0.11 0.134 0.139 
EstimatorMSE (×10)Bias
1234512345
Unadjusted 0.92 8.76 0.32 2.02 0.75 0.29 0.93 0.15 0.438 0.255 
Fully adjusted 0.88 0.88 0.88 0.99 0.99 −0.02 −0.02 −0.02 −0.004 −0.004 
AIC, all subsets 0.79 1.24 0.52 0.96 0.78 0.13 0.12 0.08 0.054 0.169 
BIC, all subsets 0.85 2.04 0.36 0.94 0.77 0.24 0.33 0.13 0.129 0.238 
BAC 0.88 0.88 0.88 0.93 0.85 −0.02 −0.02 −0.02 0.031 0.030 
Posterior mean 0.88 0.88 0.88 0.99 0.99 −0.02 −0.02 −0.02 −0.004 −0.004 
IPW 1.06 6.90 0.86 1.02 0.87 0.17 0.78 0.10 0.236 0.203 
Backwards elimination 0.76 1.15 0.58 0.96 0.78 0.10 0.12 0.06 0.039 0.142 
CDP 0.76 4.40 0.43 0.97 0.85 0.18 0.59 0.08 0.154 0.127 
LASSO           
 fMBV 0.59 1.14 0.41 0.67 0.62 0.11 0.19 0.06 0.095 0.116 
 fMSE 0.75 1.02 0.60 0.81 0.77 0.05 0.03 0.03 0.054 0.068 
 CV 0.92 5.99 0.32 1.27 0.75 0.29 0.75 0.15 0.332 0.255 
Ridge           
 fMBV 0.60 1.02 0.50 0.75 0.66 0.07 0.16 0.03 0.096 0.075 
 fMSE 0.72 0.97 0.62 0.86 0.79 0.04 0.05 0.02 0.058 0.047 
 CV 0.92 7.00 0.32 1.49 0.75 0.29 0.82 0.15 0.371 0.255 
 GCV 0.87 0.90 0.66 0.79 0.86 0.08 0.08 0.05 0.047 0.039 
 Oracle 0.50 0.81 0.27 0.61 0.50 0.13 0.06 0.11 0.134 0.139 

Abbreviations: AIC, Akaike information criterion; BAC, Bayesian adjustment for confounding; BIC, Bayesian information criterion; CDP, method of Crainiceanu, Dominici, and Parmigiani (20); CV, cross-validation; fMBV, maximum of squared future bias and future variance; fMSE, future mean squared error; GCV, generalized cross-validation; IPW, inverse-probability weighting; MSE, mean squared error; LASSO, least absolute shrinkage and selection operator.

Table 1.

Mean Squared Error and Bias for Estimates of β in the Simulations

EstimatorMSE (×10)Bias
1234512345
Unadjusted 0.92 8.76 0.32 2.02 0.75 0.29 0.93 0.15 0.438 0.255 
Fully adjusted 0.88 0.88 0.88 0.99 0.99 −0.02 −0.02 −0.02 −0.004 −0.004 
AIC, all subsets 0.79 1.24 0.52 0.96 0.78 0.13 0.12 0.08 0.054 0.169 
BIC, all subsets 0.85 2.04 0.36 0.94 0.77 0.24 0.33 0.13 0.129 0.238 
BAC 0.88 0.88 0.88 0.93 0.85 −0.02 −0.02 −0.02 0.031 0.030 
Posterior mean 0.88 0.88 0.88 0.99 0.99 −0.02 −0.02 −0.02 −0.004 −0.004 
IPW 1.06 6.90 0.86 1.02 0.87 0.17 0.78 0.10 0.236 0.203 
Backwards elimination 0.76 1.15 0.58 0.96 0.78 0.10 0.12 0.06 0.039 0.142 
CDP 0.76 4.40 0.43 0.97 0.85 0.18 0.59 0.08 0.154 0.127 
LASSO           
 fMBV 0.59 1.14 0.41 0.67 0.62 0.11 0.19 0.06 0.095 0.116 
 fMSE 0.75 1.02 0.60 0.81 0.77 0.05 0.03 0.03 0.054 0.068 
 CV 0.92 5.99 0.32 1.27 0.75 0.29 0.75 0.15 0.332 0.255 
Ridge           
 fMBV 0.60 1.02 0.50 0.75 0.66 0.07 0.16 0.03 0.096 0.075 
 fMSE 0.72 0.97 0.62 0.86 0.79 0.04 0.05 0.02 0.058 0.047 
 CV 0.92 7.00 0.32 1.49 0.75 0.29 0.82 0.15 0.371 0.255 
 GCV 0.87 0.90 0.66 0.79 0.86 0.08 0.08 0.05 0.047 0.039 
 Oracle 0.50 0.81 0.27 0.61 0.50 0.13 0.06 0.11 0.134 0.139 
EstimatorMSE (×10)Bias
1234512345
Unadjusted 0.92 8.76 0.32 2.02 0.75 0.29 0.93 0.15 0.438 0.255 
Fully adjusted 0.88 0.88 0.88 0.99 0.99 −0.02 −0.02 −0.02 −0.004 −0.004 
AIC, all subsets 0.79 1.24 0.52 0.96 0.78 0.13 0.12 0.08 0.054 0.169 
BIC, all subsets 0.85 2.04 0.36 0.94 0.77 0.24 0.33 0.13 0.129 0.238 
BAC 0.88 0.88 0.88 0.93 0.85 −0.02 −0.02 −0.02 0.031 0.030 
Posterior mean 0.88 0.88 0.88 0.99 0.99 −0.02 −0.02 −0.02 −0.004 −0.004 
IPW 1.06 6.90 0.86 1.02 0.87 0.17 0.78 0.10 0.236 0.203 
Backwards elimination 0.76 1.15 0.58 0.96 0.78 0.10 0.12 0.06 0.039 0.142 
CDP 0.76 4.40 0.43 0.97 0.85 0.18 0.59 0.08 0.154 0.127 
LASSO           
 fMBV 0.59 1.14 0.41 0.67 0.62 0.11 0.19 0.06 0.095 0.116 
 fMSE 0.75 1.02 0.60 0.81 0.77 0.05 0.03 0.03 0.054 0.068 
 CV 0.92 5.99 0.32 1.27 0.75 0.29 0.75 0.15 0.332 0.255 
Ridge           
 fMBV 0.60 1.02 0.50 0.75 0.66 0.07 0.16 0.03 0.096 0.075 
 fMSE 0.72 0.97 0.62 0.86 0.79 0.04 0.05 0.02 0.058 0.047 
 CV 0.92 7.00 0.32 1.49 0.75 0.29 0.82 0.15 0.371 0.255 
 GCV 0.87 0.90 0.66 0.79 0.86 0.08 0.08 0.05 0.047 0.039 
 Oracle 0.50 0.81 0.27 0.61 0.50 0.13 0.06 0.11 0.134 0.139 

Abbreviations: AIC, Akaike information criterion; BAC, Bayesian adjustment for confounding; BIC, Bayesian information criterion; CDP, method of Crainiceanu, Dominici, and Parmigiani (20); CV, cross-validation; fMBV, maximum of squared future bias and future variance; fMSE, future mean squared error; GCV, generalized cross-validation; IPW, inverse-probability weighting; MSE, mean squared error; LASSO, least absolute shrinkage and selection operator.

Figure 1.

Variance and squared bias of estimators in simulation 1. The curve represents the theoretical mean squared error for ridge estimators for varying values of λ. AIC, Akaike information criterion; BIC, Bayesian information criterion; CDP, method of Crainiceanu, Dominici, and Parmigiani (20); CV, cross-validation; fMBV, maximum of squared future bias and future variance; GCV, generalized cross-validation; LASSO, least absolute shrinkage and selection operator.

Figure 1.

Variance and squared bias of estimators in simulation 1. The curve represents the theoretical mean squared error for ridge estimators for varying values of λ. AIC, Akaike information criterion; BIC, Bayesian information criterion; CDP, method of Crainiceanu, Dominici, and Parmigiani (20); CV, cross-validation; fMBV, maximum of squared future bias and future variance; GCV, generalized cross-validation; LASSO, least absolute shrinkage and selection operator.

Within a particular data set, there is posterior uncertainty about fBias2 and fVar. Web Figure 2 illustrates this uncertainty by showing, for a single data set, the posterior distribution of fBias2 and fVar for several estimators. The distributions are mostly symmetrical with respect to fVar but are skewed toward higher values of fBias2 (except the adjusted estimate, which is unbiased relative to the posterior mean). Web Figure 3 shows the effect of this skewness on the posterior distributions of fMBV and fMSE. Because fMBV targets the setting when fBias2 and fVar are equal, it favors greater shrinkage. On the other hand, fMSE targets the setting when the sum of fVar and fBias2 is smallest, which occurs here when the fBias2 is much smaller than fVar and thus yields less shrinkage. These differences are reflected in the distributions of λˆ for ridge-fMBV and ridge-fMSE across the simulation data sets (Web Figure 4), which had medians of 10.7 and 7.8, respectively.

In simulation 2, the MSEs for ridge-fMBV and for LASSO-fMBV were 0.102 and 0.114, respectively, which were both worse than the fully adjusted estimator’s MSE (0.088). This is expected; the large confounding effects mean that shrinking the coefficients introduces nontrivial bias. In this setting, choosing λ by cross-validation gave an MSE of 0.700 for the ridge regression and 0.599 for LASSO—both far worse than our proposed estimators. Here, the LASSO-fMSE and ridge-fMSE estimators outperformed their fMBV counterparts, with MSEs of 0.102 and 0.097, respectively, and much smaller bias (0.03 and 0.05 compared with 0.19 and 0.15, respectively). This is consistent with less shrinkage being favorable when the confounding effects are large.

In simulation 3, the small effect sizes, compared with the first two settings, mean the confounding bias in the adjusted estimator is relatively small (0.15). The MSEs for ridge-fMBV and LASSO-fMBV (0.050 and 0.041, respectively) were higher than that of the unadjusted estimate (MSE = 0.032) but there was still a reduction of more than 20% compared with the adjusted estimate (MSE = 0.088). In this setting, shrinkage improved performance and the fMSE estimators performed worse than those based on fMBV loss.

Table 2 provides the true and estimated standard errors for ridge-fMBV and LASSO-fMBV. For both, the estimated standard error tends to be slightly less than the true value, although this difference disappears for larger samples. For ridge-fMBV, the estimator that accounts for selection of λˆ is closer to the truth than the naive estimator that considers λˆ to be fixed. Nominal 95% confidence intervals achieve correct coverage, although they are slightly wider than the corresponding confidence intervals for the fully adjusted OLS estimators (Table 3).

Table 2.

True and Estimated Standard Errors for the LASSO-fMBV and Ridge-fMBV in the Simulations

EstimatorSimulation
12345
LASSO-fMBV      
 True SE 0.216 0.275 0.194 0.241 0.220 
 Estimated SEa 0.212 0.264 0.201 0.221 0.206 
Ridge-fMBV      
 True SE 0.236 0.279 0.222 0.256 0.246 
 Estimated SEa 0.214 0.264 0.204 0.224 0.211 
 Estimated SE, naiveb 0.203 0.244 0.195 0.215 0.208 
EstimatorSimulation
12345
LASSO-fMBV      
 True SE 0.216 0.275 0.194 0.241 0.220 
 Estimated SEa 0.212 0.264 0.201 0.221 0.206 
Ridge-fMBV      
 True SE 0.236 0.279 0.222 0.256 0.246 
 Estimated SEa 0.214 0.264 0.204 0.224 0.211 
 Estimated SE, naiveb 0.203 0.244 0.195 0.215 0.208 

Abbreviations: fMBV, maximum of squared future bias and future variance; LASSO, least absolute shrinkage and selection operator; SE, standard error.

a Estimated via the Bayesian procedure described in Web Appendix 1, which accounts for uncertainty in selecting λˆ.

b Assumes that λˆ is known and fixed.

Table 2.

True and Estimated Standard Errors for the LASSO-fMBV and Ridge-fMBV in the Simulations

EstimatorSimulation
12345
LASSO-fMBV      
 True SE 0.216 0.275 0.194 0.241 0.220 
 Estimated SEa 0.212 0.264 0.201 0.221 0.206 
Ridge-fMBV      
 True SE 0.236 0.279 0.222 0.256 0.246 
 Estimated SEa 0.214 0.264 0.204 0.224 0.211 
 Estimated SE, naiveb 0.203 0.244 0.195 0.215 0.208 
EstimatorSimulation
12345
LASSO-fMBV      
 True SE 0.216 0.275 0.194 0.241 0.220 
 Estimated SEa 0.212 0.264 0.201 0.221 0.206 
Ridge-fMBV      
 True SE 0.236 0.279 0.222 0.256 0.246 
 Estimated SEa 0.214 0.264 0.204 0.224 0.211 
 Estimated SE, naiveb 0.203 0.244 0.195 0.215 0.208 

Abbreviations: fMBV, maximum of squared future bias and future variance; LASSO, least absolute shrinkage and selection operator; SE, standard error.

a Estimated via the Bayesian procedure described in Web Appendix 1, which accounts for uncertainty in selecting λˆ.

b Assumes that λˆ is known and fixed.

Table 3.

Coverage and Average Width for Nominal 95% Confidence Intervals for the Fully Adjusted OLS and Ridge-fMBV Estimators in the Simulations

EstimatorSimulation
12345
Coverage      
 OLS (fully adjusted) 0.938 0.938 0.938 0.936 0.936 
 Ridge-fMBV 0.948 0.948 0.946 0.942 0.940 
Width      
 OLS (fully adjusted) 1.170 1.170 1.170 1.197 1.197 
 Ridge-fMBV 1.200 1.194 1.203 1.176 1.179 
EstimatorSimulation
12345
Coverage      
 OLS (fully adjusted) 0.938 0.938 0.938 0.936 0.936 
 Ridge-fMBV 0.948 0.948 0.946 0.942 0.940 
Width      
 OLS (fully adjusted) 1.170 1.170 1.170 1.197 1.197 
 Ridge-fMBV 1.200 1.194 1.203 1.176 1.179 

Abbreviations: fMBV, maximum of squared future bias and future variance; OLS, ordinary least squares.

Table 3.

Coverage and Average Width for Nominal 95% Confidence Intervals for the Fully Adjusted OLS and Ridge-fMBV Estimators in the Simulations

EstimatorSimulation
12345
Coverage      
 OLS (fully adjusted) 0.938 0.938 0.938 0.936 0.936 
 Ridge-fMBV 0.948 0.948 0.946 0.942 0.940 
Width      
 OLS (fully adjusted) 1.170 1.170 1.170 1.197 1.197 
 Ridge-fMBV 1.200 1.194 1.203 1.176 1.179 
EstimatorSimulation
12345
Coverage      
 OLS (fully adjusted) 0.938 0.938 0.938 0.936 0.936 
 Ridge-fMBV 0.948 0.948 0.946 0.942 0.940 
Width      
 OLS (fully adjusted) 1.170 1.170 1.170 1.197 1.197 
 Ridge-fMBV 1.200 1.194 1.203 1.176 1.179 

Abbreviations: fMBV, maximum of squared future bias and future variance; OLS, ordinary least squares.

In simulations 4 and 5, the LASSO-fMBV and ridge-fMBV estimators outperformed all competing estimators in terms of MSE. In these simulations, the confidence interval coverages were slightly below the nominal level (0.942 and 0.940, respectively), although they were similar to coverage from confidence intervals for the fully adjusted estimator (0.936).

CARDIOVASCULAR OUTCOMES IN THE MULTI-ETHNIC STUDY OF ATHEROSCLEROSIS

To demonstrate our approach using data from a cohort study, we analyzed the association between smoking status and carotid intima-media thickness (cIMT) in the Multi-Ethnic Study of Atherosclerosis (MESA; multiple US locations, 2000–2002). The diverse MESA cohort comprises adults aged 45–84 years from 6 US metropolitan areas who were free of clinical cardiovascular disease at study entry (23). MESA was designed to study subclinical cardiovascular disease, and measurements of cIMT were made at baseline. Lefebvre et al. (17) recently analyzed cIMT in subcohorts of MESA to demonstrate the BAC method, and we used the same data for our comparison. Although described by Lefebvre et al. (17) as common carotid artery intima-media thickness, the values reported by those authors correspond to internal carotid artery intima-media thickness, which we used in our study.

Analysis

Following Lefebvre et al. (17), we considered the association between having ever been a smoker (smoking >100 cigarettes in one’s lifetime) and baseline cIMT. We performed separate analyses on 2 subcohorts of the MESA cohort: white adults younger than age 65 years and Chinese American adults younger than age 65 years. We considered the potential confounders of age, sex, body mass index, physical activity, cholesterol levels (total and high-density lipoprotein), triglyceride levels, inflammatory marker levels (interleukin 6 and C-reactive protein), diabetes, use of diabetes and lipid-lowering medications, hemostatic marker (fibrinogen) levels, alcohol consumption, education, and income. Table 4 provides summary statistics for the 2 subcohorts, stratified by smoking status.

Table 4.

Characteristics of the White and Chinese-American Subcohorts of the Multi-Ethnic Study of Atherosclerosis, United States, 2000–2002

CharacteristicWhite SubcohortChinese-American Subcohort
Ever Smoker (n = 774)Nonsmoker (n = 604)Ever Smoker (n = 93)Nonsmoker (n = 343)
Mean (SD)No.%Mean (SD)No.%Mean (SD)No.%Mean (SD)No.%
cIMT, μm 993 (473)   904 (446)   840 (335)   763 (327)   
Age, years 54.9 (5.6)   54.4 (5.6)   54.6 (5.3)   54.4 (5.9)   
Male sex  373 48  275 46  84 90  126 37 
Body mass indexa 28.0 (5.5)   28.0 (5.5)   24.8 (3.1)   24.0 (3.3)   
Alcohol consumption, drinks/week 7.2 (10.8)   3.8 (7.9)   1.5 (3.6)   0.6 (2.5)   
Physical activityb 2,440 (2,625)   2,697 (2,882)   1,609 (1,827)   1,792 (2,241)   
Total cholesterol, mg/dL 196 (36)   199 (37)   195 (31)   195 (31)   
High-density lipoprotein cholesterol, mg/dL 52.2 (16.3)   51.9 (15.0)   44.5 (9.5)   50.4 (13.7)   
Triglyceride levels, mg/dL 136 (91)   136 (119)   156 (86)   142 (86)   
Interleukin-6, pg/mL 1.40 (1.25)   1.23 (1.13)   1.06 (0.81)   1.06 (1.01)   
C-reactive protein, mg/L 3.25 (4.39)   3.42 (6.10)   1.66 (2.89)   1.89 (4.42)   
Fibrinogen, mg/dL 323 (68)   326 (67)   312 (55)   325 (60)   
Diabetes  38  25  11 12  31 
Use of antidiabetic medications  27  19   21 
Use of lipid-lowering medications  115 15  84 14  10 11  26 
Education completed             
 Less than high school  26  11  16 17  61 18 
 High school  375 48  198 33  33 35  128 37 
 College  172 22  166 27  25 27  81 24 
 Graduate school  201 26  229 38  19 20  73 21 
Annual income             
 <$25,000  82 11  51  29 31  123 36 
 $25,000–$49,999  179 23  139 23  27 29  84 24 
 $50,000–$99,999  296 38  214 35  23 25  84 24 
 ≥$100,000  217 28  200 33  14 15  52 15 
CharacteristicWhite SubcohortChinese-American Subcohort
Ever Smoker (n = 774)Nonsmoker (n = 604)Ever Smoker (n = 93)Nonsmoker (n = 343)
Mean (SD)No.%Mean (SD)No.%Mean (SD)No.%Mean (SD)No.%
cIMT, μm 993 (473)   904 (446)   840 (335)   763 (327)   
Age, years 54.9 (5.6)   54.4 (5.6)   54.6 (5.3)   54.4 (5.9)   
Male sex  373 48  275 46  84 90  126 37 
Body mass indexa 28.0 (5.5)   28.0 (5.5)   24.8 (3.1)   24.0 (3.3)   
Alcohol consumption, drinks/week 7.2 (10.8)   3.8 (7.9)   1.5 (3.6)   0.6 (2.5)   
Physical activityb 2,440 (2,625)   2,697 (2,882)   1,609 (1,827)   1,792 (2,241)   
Total cholesterol, mg/dL 196 (36)   199 (37)   195 (31)   195 (31)   
High-density lipoprotein cholesterol, mg/dL 52.2 (16.3)   51.9 (15.0)   44.5 (9.5)   50.4 (13.7)   
Triglyceride levels, mg/dL 136 (91)   136 (119)   156 (86)   142 (86)   
Interleukin-6, pg/mL 1.40 (1.25)   1.23 (1.13)   1.06 (0.81)   1.06 (1.01)   
C-reactive protein, mg/L 3.25 (4.39)   3.42 (6.10)   1.66 (2.89)   1.89 (4.42)   
Fibrinogen, mg/dL 323 (68)   326 (67)   312 (55)   325 (60)   
Diabetes  38  25  11 12  31 
Use of antidiabetic medications  27  19   21 
Use of lipid-lowering medications  115 15  84 14  10 11  26 
Education completed             
 Less than high school  26  11  16 17  61 18 
 High school  375 48  198 33  33 35  128 37 
 College  172 22  166 27  25 27  81 24 
 Graduate school  201 26  229 38  19 20  73 21 
Annual income             
 <$25,000  82 11  51  29 31  123 36 
 $25,000–$49,999  179 23  139 23  27 29  84 24 
 $50,000–$99,999  296 38  214 35  23 25  84 24 
 ≥$100,000  217 28  200 33  14 15  52 15 

Abbreviations: cIMT, carotid intima-media thickness; SD, standard deviation.

a Weight (kg)/height (m)2.

b Metabolic equivalent of task-hours per week.

Table 4.

Characteristics of the White and Chinese-American Subcohorts of the Multi-Ethnic Study of Atherosclerosis, United States, 2000–2002

CharacteristicWhite SubcohortChinese-American Subcohort
Ever Smoker (n = 774)Nonsmoker (n = 604)Ever Smoker (n = 93)Nonsmoker (n = 343)
Mean (SD)No.%Mean (SD)No.%Mean (SD)No.%Mean (SD)No.%
cIMT, μm 993 (473)   904 (446)   840 (335)   763 (327)   
Age, years 54.9 (5.6)   54.4 (5.6)   54.6 (5.3)   54.4 (5.9)   
Male sex  373 48  275 46  84 90  126 37 
Body mass indexa 28.0 (5.5)   28.0 (5.5)   24.8 (3.1)   24.0 (3.3)   
Alcohol consumption, drinks/week 7.2 (10.8)   3.8 (7.9)   1.5 (3.6)   0.6 (2.5)   
Physical activityb 2,440 (2,625)   2,697 (2,882)   1,609 (1,827)   1,792 (2,241)   
Total cholesterol, mg/dL 196 (36)   199 (37)   195 (31)   195 (31)   
High-density lipoprotein cholesterol, mg/dL 52.2 (16.3)   51.9 (15.0)   44.5 (9.5)   50.4 (13.7)   
Triglyceride levels, mg/dL 136 (91)   136 (119)   156 (86)   142 (86)   
Interleukin-6, pg/mL 1.40 (1.25)   1.23 (1.13)   1.06 (0.81)   1.06 (1.01)   
C-reactive protein, mg/L 3.25 (4.39)   3.42 (6.10)   1.66 (2.89)   1.89 (4.42)   
Fibrinogen, mg/dL 323 (68)   326 (67)   312 (55)   325 (60)   
Diabetes  38  25  11 12  31 
Use of antidiabetic medications  27  19   21 
Use of lipid-lowering medications  115 15  84 14  10 11  26 
Education completed             
 Less than high school  26  11  16 17  61 18 
 High school  375 48  198 33  33 35  128 37 
 College  172 22  166 27  25 27  81 24 
 Graduate school  201 26  229 38  19 20  73 21 
Annual income             
 <$25,000  82 11  51  29 31  123 36 
 $25,000–$49,999  179 23  139 23  27 29  84 24 
 $50,000–$99,999  296 38  214 35  23 25  84 24 
 ≥$100,000  217 28  200 33  14 15  52 15 
CharacteristicWhite SubcohortChinese-American Subcohort
Ever Smoker (n = 774)Nonsmoker (n = 604)Ever Smoker (n = 93)Nonsmoker (n = 343)
Mean (SD)No.%Mean (SD)No.%Mean (SD)No.%Mean (SD)No.%
cIMT, μm 993 (473)   904 (446)   840 (335)   763 (327)   
Age, years 54.9 (5.6)   54.4 (5.6)   54.6 (5.3)   54.4 (5.9)   
Male sex  373 48  275 46  84 90  126 37 
Body mass indexa 28.0 (5.5)   28.0 (5.5)   24.8 (3.1)   24.0 (3.3)   
Alcohol consumption, drinks/week 7.2 (10.8)   3.8 (7.9)   1.5 (3.6)   0.6 (2.5)   
Physical activityb 2,440 (2,625)   2,697 (2,882)   1,609 (1,827)   1,792 (2,241)   
Total cholesterol, mg/dL 196 (36)   199 (37)   195 (31)   195 (31)   
High-density lipoprotein cholesterol, mg/dL 52.2 (16.3)   51.9 (15.0)   44.5 (9.5)   50.4 (13.7)   
Triglyceride levels, mg/dL 136 (91)   136 (119)   156 (86)   142 (86)   
Interleukin-6, pg/mL 1.40 (1.25)   1.23 (1.13)   1.06 (0.81)   1.06 (1.01)   
C-reactive protein, mg/L 3.25 (4.39)   3.42 (6.10)   1.66 (2.89)   1.89 (4.42)   
Fibrinogen, mg/dL 323 (68)   326 (67)   312 (55)   325 (60)   
Diabetes  38  25  11 12  31 
Use of antidiabetic medications  27  19   21 
Use of lipid-lowering medications  115 15  84 14  10 11  26 
Education completed             
 Less than high school  26  11  16 17  61 18 
 High school  375 48  198 33  33 35  128 37 
 College  172 22  166 27  25 27  81 24 
 Graduate school  201 26  229 38  19 20  73 21 
Annual income             
 <$25,000  82 11  51  29 31  123 36 
 $25,000–$49,999  179 23  139 23  27 29  84 24 
 $50,000–$99,999  296 38  214 35  23 25  84 24 
 ≥$100,000  217 28  200 33  14 15  52 15 

Abbreviations: cIMT, carotid intima-media thickness; SD, standard deviation.

a Weight (kg)/height (m)2.

b Metabolic equivalent of task-hours per week.

We compared the ridge-fMBV and LASSO-fMBV estimators with the fully adjusted and unadjusted estimators, the BAC estimates reported by Lefebvre et al. (17), and the posterior mean from a standard Bayesian linear regression analysis, with no shrinkage method applied. To select λˆ, we used the weakly informative priors of Raftery et al. (15), which were also used for the BAC analysis of Lefebvre et al. (17). We used posterior samples of size 3,000 and candidate values of λ between 1 and 5,000 when computing fMBV (Web Figure 5). The categorical variables were coded with a sum-to-zero constraint, and alcohol consumption was log transformed. The design matrix was standardized before performing the ridge or LASSO regressions, but the reported estimates are back transformed to the original scale.

Results

Of the 1,378 participants in the white subcohort with complete covariate information, 774 were smokers and 604 were nonsmokers (Table 4). The estimated association between smoking and cIMT is reported in Table 5 for each approach. The unadjusted difference in cIMT between smokers and nonsmokers was 88.8 μm (estimated standard error, 25.0). The fully adjusted OLS estimate was 48.6 μm (standard error, 24.5). The ridge-fMBV and LASSO-fMBV estimates of 63.8 μm and 67.0 μm, respectively, were between the unadjusted and adjusted estimates. With the large size of the white cohort, the shrinkage estimators had only a slight reduction in estimated standard error (23.7 for ridge-fMBV and 23.8 for LASSO-fMBV). The posterior mean estimate was nearly identical to the fully adjusted estimate, which is unsurprising given the large sample size and uninformative priors. The BAC estimate reported by Lefebvre et al. (17) was also similar to the fully adjusted estimate (49.7 μm).

Table 5.

Difference (μm) in Carotid Intima-Media Thickness Associated With Smoking in the Multi-Ethnic Study of Atherosclerosis, United States, 2000–2002

EstimatorEstimateStandard Error95% Confidence Interval
White subcohort    
 Unadjusted model 88.8 25.0 39.7, 137.9 
 Fully adjusted model 48.6 24.5 0.6, 96.6 
 Posterior mean (Bayesian analysis) 48.6 24.3 1.2, 96.1a 
 Ridge-fMBV 63.8 23.7 0.8, 93.2 
 LASSO-fMBV 67.0 23.8 3.8, 106.4 
 BACb 49.7 24.4 1.7, 97.6 
Chinese-American subcohort    
 Unadjusted model 76.4 38.4 1.0, 151.8 
 Fully adjusted model −13.6 40.6 −93.5, 66.2 
 Posterior mean (Bayesian analysis) −13.6 40.0 −92.4, 64.8a 
 Ridge-fMBV 13.1 38.8 −91.3, 62.9 
 LASSO-fMBV 27.4 38.6 −69.0, 84.8 
 BACb 2.6 40.1 −76.0, 81.2 
EstimatorEstimateStandard Error95% Confidence Interval
White subcohort    
 Unadjusted model 88.8 25.0 39.7, 137.9 
 Fully adjusted model 48.6 24.5 0.6, 96.6 
 Posterior mean (Bayesian analysis) 48.6 24.3 1.2, 96.1a 
 Ridge-fMBV 63.8 23.7 0.8, 93.2 
 LASSO-fMBV 67.0 23.8 3.8, 106.4 
 BACb 49.7 24.4 1.7, 97.6 
Chinese-American subcohort    
 Unadjusted model 76.4 38.4 1.0, 151.8 
 Fully adjusted model −13.6 40.6 −93.5, 66.2 
 Posterior mean (Bayesian analysis) −13.6 40.0 −92.4, 64.8a 
 Ridge-fMBV 13.1 38.8 −91.3, 62.9 
 LASSO-fMBV 27.4 38.6 −69.0, 84.8 
 BACb 2.6 40.1 −76.0, 81.2 

BAC, Bayesian adjustment for confounding; cIMT, carotid intima-media thickness; fMBV, maximum of squared future bias and future variance; LASSO, least absolute shrinkage and selection operator.

a Intervals for Bayesian posterior means are 95% credible intervals.

b Based on Lefebvre et al. (17).

Table 5.

Difference (μm) in Carotid Intima-Media Thickness Associated With Smoking in the Multi-Ethnic Study of Atherosclerosis, United States, 2000–2002

EstimatorEstimateStandard Error95% Confidence Interval
White subcohort    
 Unadjusted model 88.8 25.0 39.7, 137.9 
 Fully adjusted model 48.6 24.5 0.6, 96.6 
 Posterior mean (Bayesian analysis) 48.6 24.3 1.2, 96.1a 
 Ridge-fMBV 63.8 23.7 0.8, 93.2 
 LASSO-fMBV 67.0 23.8 3.8, 106.4 
 BACb 49.7 24.4 1.7, 97.6 
Chinese-American subcohort    
 Unadjusted model 76.4 38.4 1.0, 151.8 
 Fully adjusted model −13.6 40.6 −93.5, 66.2 
 Posterior mean (Bayesian analysis) −13.6 40.0 −92.4, 64.8a 
 Ridge-fMBV 13.1 38.8 −91.3, 62.9 
 LASSO-fMBV 27.4 38.6 −69.0, 84.8 
 BACb 2.6 40.1 −76.0, 81.2 
EstimatorEstimateStandard Error95% Confidence Interval
White subcohort    
 Unadjusted model 88.8 25.0 39.7, 137.9 
 Fully adjusted model 48.6 24.5 0.6, 96.6 
 Posterior mean (Bayesian analysis) 48.6 24.3 1.2, 96.1a 
 Ridge-fMBV 63.8 23.7 0.8, 93.2 
 LASSO-fMBV 67.0 23.8 3.8, 106.4 
 BACb 49.7 24.4 1.7, 97.6 
Chinese-American subcohort    
 Unadjusted model 76.4 38.4 1.0, 151.8 
 Fully adjusted model −13.6 40.6 −93.5, 66.2 
 Posterior mean (Bayesian analysis) −13.6 40.0 −92.4, 64.8a 
 Ridge-fMBV 13.1 38.8 −91.3, 62.9 
 LASSO-fMBV 27.4 38.6 −69.0, 84.8 
 BACb 2.6 40.1 −76.0, 81.2 

BAC, Bayesian adjustment for confounding; cIMT, carotid intima-media thickness; fMBV, maximum of squared future bias and future variance; LASSO, least absolute shrinkage and selection operator.

a Intervals for Bayesian posterior means are 95% credible intervals.

b Based on Lefebvre et al. (17).

LASSO-fMBV shrank the coefficients for physical activity, C-reactive protein, fibrinogen, diabetes medication, graduate education, income greater than $50,000, and alcohol use to 0. This contrasts with the variables not present in the most probable model under BAC, which left out body mass index, triglyceride levels, C-reactive protein, fibrinogen, diabetes (diagnosis and medication use), and income (17). Examination of the correlation between the potential confounders, cIMT, and smoking status gives some insight into this different selection of confounders between the methods. Alcohol use was correlated with smoking status (r = 0.22) but not with cIMT (r = 0.05). Because BAC requires all variables in the exposure model to be included in the outcome model, the correlation of alcohol use with smoking status forces its inclusion in the most probable BAC model for cIMT. Our shrinkage estimator approach only considers a model with cIMT as the outcome, and the relatively weak correlation between alcohol use and cIMT means it is one of the first variables removed; it does not have the large effect size required to introduce a large confounding bias. Conversely, body mass index and triglyceride levels were slightly correlated with cIMT (r = 0.14 and r = 0.13, respectively) but not with smoking status (r = 0.00 for both). Although the most probable BAC model included neither, both had nonzero estimated coefficients in our shrinkage estimator. A plot of the correlations between the potential confounders, cIMT, and smoking status is provided in Web Figure 6.

In the Chinese-American subcohort, 93 of the 436 participants were smokers. The crude association was a 76.4-μm (standard error, 38.4) higher cIMT among smokers, whereas the fully adjusted estimate provided no evidence of an association: −13.6 μm (standard error, 40.6). The ridge-fMBV and LASSO-fMBV estimates were positive (13.1 and 27.4, respectively) and had smaller standard errors than the fully adjusted results (38.8 and 38.6, respectively).

As a sensitivity analysis, we reanalyzed the Chinese-American cohort, using the posterior distribution from the white subcohort analysis as the prior distribution for selecting λ. Because the smoking association in the white subcohort was strongly positive, this sensitivity analysis yielded estimates that were larger than those derived from analyzing the Chinese-American data alone (Web Appendix 3).

DISCUSSION

The selection of penalty parameters is a key step in the application of shrinkage methods for estimating associations, and here we have presented a principled approach to this problem. Our method, based on minimizing the bias and variance of the estimator in future similar data sets, directly addresses performance in estimating β. Although our method is related to using the posterior predictive distribution for model checking (24) and model selection (25, 26), our method uses the posterior predictive distribution to directly target estimation of a specific coefficient, which we believe is a novel contribution. Through simulations, we demonstrated that when there are many correlated putative confounders with small (possibly null) effects, this approach to choosing λ can reduce the error in estimation of β.

The shrinkage estimators we propose are inherently biased, but their reduced variability makes them useful for inference in analysis of small samples. Furthermore, our estimator provides information about variable importance by causing confounders either to drop out of the final estimate or have much smaller coefficients than those under OLS. This secondary information can be useful for further exploration of causal relationships.

Our approach, like most model-selection procedures, does assume that the truth is contained in the class of models being considered, particularly for its calculation of standard errors. In practice, this means that a priori scientific knowledge for identifying candidate confounders is critical, and any known confounders with strong effects can be included without penalty. As with similar regression-modeling approaches, flexible representations of covariates (e.g., splines (27)) and their interactions could be used instead of linear terms to lessen the reliance on these assumptions while exploring variable relationships that are not as well understood. Similarly, our definition of fMBV only considers future studies that have the same design as the observed study. More general forms of study could be considered, but choosing the most relevant design would require considerable knowledge or assumptions beyond the study at hand. Shrinkage of other terms could also be considered—for example, shrinking differences between effects in different strata to 0, when considering effect modification.

We presented our method in the context of linear models; in principle, however, it can be extended to the generalized linear model setting. However, for noncollapsible models such as logistic regression, the conditional or marginal interpretation of the parameters changes between the shrinkage and fully adjusted estimators. In such a case, the shrinkage estimator may not necessarily have reduced bias compared with the fully adjusted estimator, because it is estimating a different quantity.

These estimators can be computed simply using the eshrink R package we have developed. Although the LASSO version can be slow because of the many repeated optimizations required, this approach can be parallelized easily and remains computationally simpler than approaches such as Markov chain Monte Carlo methods. The choice between the ridge and LASSO estimators can be tailored to each problem. The LASSO estimator provides variable selection in addition to a point estimate, which may be beneficial in determining which putative confounders should be included in future models. However, calculating confidence intervals for ridge-fMBV is computationally simpler than for LASSO-fMBV, so ridge-fMBV may be preferred if conclusions will be based on this estimate alone.

In the MESA analysis, our estimators gave point estimates that were notably higher than the fully adjusted estimates and the BAC estimate. The standard errors of our estimators were slightly smaller than those of the fully adjusted estimators, but the difference was much smaller than the change in point estimate. Compared with the simulations, the correlation of the MESA variables was quite weak, which is likely the reason for the slight reductions in standard errors.

Overall, the estimators we present for applying shrinkage estimators to the problem of effect estimation provide important benefits for regression analysis using modest sample sizes.

ACKNOWLEDGMENTS

Author affiliations: Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland (Joshua P. Keller); and Department of Biostatistics, School of Public Health, University of Washington, Seattle, Washington (Kenneth M. Rice).

This work was supported by the National Institutes of Health through grants/contracts T32ES015459, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168 N01-HC-95169, UL1-TR-000040, and UL1-TR-001079.

Conflict of interest: none declared.

Abbreviations

     
  • BAC

    Bayesian adjustment for confounding

  •  
  • cIMT

    carotid intima-media thickness

  •  
  • fBias

    future bias

  •  
  • fMBV

    maximum of squared future bias and future variance

  •  
  • fMSE

    future mean squared error

  •  
  • fVar

    future variance

  •  
  • LASSO

    least absolute shrinkage and selection operator

  •  
  • MESA

    Multi-Ethnic Study of Atherosclerosis

  •  
  • MSE

    mean squared error

  •  
  • OLS

    ordinary least squares

REFERENCES

1

Hernán
M
,
Robins
J
.
Causal Inference
.
Boca Raton, FL
:
Chapman & Hall/CRC Press
. In press.

2

Pearl
J
.
Causality: Models, Reasoning, and Inference
. 2nd ed.
New York, NY
:
Cambridge University Press
;
2009
.

3

Seber
GA
,
Lee
AJ
.
Linear Regression Analysis
. 2nd ed.
Hoboken, NJ
:
John Wiley & Sons, Inc.
;
2003
.

4

Hoerl
AE
,
Kennard
RW
.
Ridge regression: biased estimation for nonorthogonal problems
.
Technometrics
.
1970
;
12
(
1
):
55
67
.

5

Greenland
S
.
Invited commentary: variable selection versus shrinkage in the control of multiple confounders
.
Am J Epidemiol
.
2008
;
167
(
5
):
523
529
.

6

Greenland
S
,
Pearce
N
.
Statistical foundations for model-based adjustments
.
Annu Rev Public Health
.
2015
;
36
(
1
):
89
108
.

7

Weng
HY
,
Hsueh
YH
,
Messam
LL
, et al. .
Methods of covariate selection: directed acyclic graphs and the change-in-estimate procedure
.
Am J Epidemiol
.
2009
;
169
(
10
):
1182
1190
.

8

Walter
S
,
Tiemeier
H
.
Variable selection: current practice in epidemiological studies
.
Eur J Epidemiol
.
2009
;
24
(
12
):
733
736
.

9

Vansteelandt
S
,
Bekaert
M
,
Claeskens
G
.
On model selection and model misspecification in causal inference
.
Stat Methods Med Res
.
2012
;
21
(
1
):
7
30
.

10

Greenland
S
,
Daniel
R
,
Pearce
N
.
Outcome modelling strategies in epidemiology: traditional methods and basic alternatives
.
Int J Epidemiol
.
2016
;
45
(
2
):
565
575
.

11

Dominici
F
,
Wang
C
,
Crainiceanu
C
, et al. .
Model selection and health effect estimation in environmental epidemiology
.
Epidemiology
.
2008
;
19
(
4
):
558
560
.

12

Franklin
JM
,
Eddings
W
,
Glynn
RJ
, et al. .
Regularized regression versus the high-dimensional propensity score for confounding adjustment in secondary database analyses
.
Am J Epidemiol
.
2015
;
182
(
7
):
651
659
.

13

Tibshirani
R
.
Regression shrinkage and selection via the lasso
.
J R Stat Soc Series B Stat Methodol
.
1996
;
58
(
1
):
267
288
.

14

Bühlmann
P
,
van de Geer
S
.
Statistics for High-Dimensional Data
. Methods, Theory and Applications.
New York, NY
:
Springer Publishing Company
;
2011
.

15

Raftery
A
,
Madigan
D
,
Hoeting
J
.
Bayesian model averaging for linear regression models
.
J Am Stat Assoc
.
1997
;
92
(
437
):
179
191
.

16

Wang
C
,
Parmigiani
G
,
Dominici
F
.
Bayesian effect estimation accounting for adjustment uncertainty
.
Biometrics
.
2012
;
68
(
3
):
661
671
.

17

Lefebvre
G
,
Delaney
JA
,
McClelland
RL
.
Extending the Bayesian Adjustment for Confounding algorithm to binary treatment covariates to estimate the effect of smoking on carotid intima-media thickness: the Multi-Ethnic Study of Atherosclerosis
.
Stat Med
.
2014
;
33
(
16
):
2797
2813
.

18

Wang
C
,
Dominici
F
,
Parmigiani
G
, et al. .
Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models
.
Biometrics
.
2015
;
71
(
3
):
654
665
.

19

Lefebvre
G
,
Atherton
J
,
Talbot
D
.
The effect of the prior distribution in the Bayesian Adjustment for Confounding algorithm
.
Comput Stat Data Anal
.
2014
;
70
:
227
240
.

20

Crainiceanu
CM
,
Dominici
F
,
Parmigiani
G
.
Adjustment uncertainty in effect estimation
.
Biometrika
.
2008
;
95
(
3
):
635
651
.

21

Hastie
T
,
Tibshirani
R
,
Friedman
J
.
The Elements of Statistical Learning
. Data Mining, Inference, and Prediction. 2nd ed.
New York, NY
:
Springer Publishing Company
;
2009
.

22

Golub
GH
,
Heath
M
,
Wahba
G
.
Generalized cross-validation as a method for choosing a good ridge parameter
.
Technometrics
.
1979
;
21
(
2
):
215
223
.

23

Bild
DE
,
Bluemke
DA
,
Burke
GL
, et al. .
Multi-Ethnic Study of Atherosclerosis: objectives and design
.
Am J Epidemiol
.
2002
;
156
(
9
):
871
881
.

24

Gelman
A
,
Meng
X
,
Stern
H
.
Posterior predictive assessment of model fitness via realized discrepancies
.
Stat Sin
.
1996
;
6
(
4
):
733
807
.

25

Gelfand
AE
,
Ghosh
SK
.
Model choice: a minimum posterior predictive loss approach
.
Biometrika
.
1998
;
85
(
1
):
1
11
.

26

Hahn
PR
,
Carvalho
CM
.
Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective
.
J Am Stat Assoc
.
2015
;
110
(
509
):
435
448
.

27

Vittinghoff
E
,
Glidden
DV
,
Shiboski
SC
, et al. .
Regression Methods in Biostatistics
. 2nd ed.
New York, NY
:
Springer Publishing Company
;
2012
.

Supplementary data