## Abstract

Four covariate selection approaches were compared: a directed acyclic graph (DAG) full model and 3 DAG and change-in-estimate combined procedures. Twenty-five scenarios with case-control samples were generated from 10 simulated populations in order to address the performance of these covariate selection procedures in the presence of confounders of various strengths and under DAG misspecification with omission of confounders or inclusion of nonconfounders. Performance was evaluated by standard error, bias, square root of the mean-squared error, and 95% confidence interval coverage. In most scenarios, the DAG full model without further covariate selection performed as well as or better than the other procedures when the DAGs were correctly specified, as well as when confounders were omitted. Model reduction by using change-in-estimate procedures showed potential gains in precision when the DAGs included nonconfounders, but underestimation of regression-based standard error might cause reduction in 95% confidence interval coverage. For modeling binary outcomes in a case-control study, the authors recommend construction of a “conservative” DAG, determination of all potential confounders, and then change-in-estimate procedures to simplify this full model. The authors advocate that, under the conditions investigated, the selection of final model should be based on changes in precision: Adopt the reduced model if its standard error (derived from logistic regression) is substantially smaller; otherwise, the full DAG-based model is appropriate.

Directed acyclic graphs (DAGs) and change-in-estimate procedures for confounder identification and selection during data analysis have, to date, been discussed separately in the epidemiologic literature (1–8). With few exceptions (9–11), data analysts have also tended to apply the procedures separately, although no obvious subject matter considerations preclude their joint use. This has been a natural course of action because the use of DAGs is generally based only on prior knowledge or a priori assumptions about causal relations among variables of interest in the source population from which the study sample is taken, while the change-in-estimate procedure relies on sample-based relations among variables. These fundamental differences serve also to highlight limitations in both approaches: DAGs in ignoring sampling variation and the change-in-estimate in not taking into consideration underlying causal relations.

Although use of prior knowledge in model building has long been advocated (2, 12–14), previous studies have not comprehensively examined the performance of covariate selection procedures that combine the effect of both approaches on parameter estimation. Thus, in this simulation study, we investigated whether combined approaches could improve parameter estimation over the DAG approach alone in the presence of confounders of various strengths and under DAG misspecification resulting from omission of confounders or inclusion of nonconfounders. This objective distinguishes this study from previous studies on confounder selection strategies (5, 15, 16). We do not discuss problems resulting from adjustment for colliders or from confounder selection based on significance tests, as these issues have been addressed elsewhere (3, 4, 5–7, 15, 17).

## MATERIALS AND METHODS

### Populations and samples

We created 10 different populations, each consisting of 500,000 observations with a binary exposure (*E*) and outcome (*O*), as well as covariates (*X _{j}*) (Table 1). We then selected random samples of size 300 and 1,000 from these populations with confounders of various strengths and generated 25 scenarios (Table 2) to examine the performance of the covariate selection procedures under correctly and incorrectly specified DAGs for populations with different confounding structures. Misspecified DAGs may omit confounders or include nonconfounding covariates (i.e., those nonconfounders that are neither colliders nor consequences of either exposure or outcome).

Covariatea | Population | |||||||||||||||||||

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |||||||||||

β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | |

X_{1} | 1.5 | 1.5 | 1.0 | 1.0 | 0.8 | 0.8 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 2.0 | 0 |

X_{2} | 0.5 | 0.5 | 0.5 | 0.5 | 0.3 | 0.3 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.0 | 1.0 | 1.5 | 0 |

X_{3} | 1.5 | 0.5 | 1.0 | 0.5 | 0.8 | 0.3 | −1.5 | −1.5 | −0.5 | −0.5 | 0.5 | 0.5 | 1.5 | 0 | 0 | 1.5 | 0.5 | 0.5 | 0.5 | 0 |

X_{4} | 0.5 | 1.5 | 0.5 | 1.0 | 0.3 | 0.8 | −0.5 | −0.5 | 0.5 | −0.5 | −1.5 | 0 | −1.5 | 0 | 1.5 | 0 | 0 | 2.0 | ||

X_{5} | 0.5 | 0.5 | 0.5 | 0.5 | 0.3 | 0.3 | −1.5 | 0 | −1.5 | 0 | 0 | −1.5 | 0 | −1.5 | 1.0 | 0 | 0 | 1.5 | ||

X_{6} | −1.5 | 0 | −1.0 | 0 | −0.8 | 0 | −1.5 | 0 | −1.5 | 0.5 | 0 | 0 | 1.0 | |||||||

X_{7} | 0 | −1.5 | 0 | −1.0 | 0 | −0.8 | 0 | 1.5 | 0 | 0.5 | ||||||||||

X_{8} | 0 | 1.0 | ||||||||||||||||||

X_{9} | 0 | 0.5 | ||||||||||||||||||

Exposure | 1.5 | 1.0 | 0.8 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | ||||||||||

Intercept | −3.6 | −7.2 | −3.6 | −6.3 | −3.3 | −5.4 | −2.1 | −5.2 | −2.1 | −5.3 | −2.1 | −5.3 | −2.5 | −5.2 | −1.1 | −6.0 | −3.5 | −6.5 | −3.1 | −8.4 |

Covariatea | Population | |||||||||||||||||||

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |||||||||||

β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | β_{E} | β_{O} | |

X_{1} | 1.5 | 1.5 | 1.0 | 1.0 | 0.8 | 0.8 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 2.0 | 0 |

X_{2} | 0.5 | 0.5 | 0.5 | 0.5 | 0.3 | 0.3 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.0 | 1.0 | 1.5 | 0 |

X_{3} | 1.5 | 0.5 | 1.0 | 0.5 | 0.8 | 0.3 | −1.5 | −1.5 | −0.5 | −0.5 | 0.5 | 0.5 | 1.5 | 0 | 0 | 1.5 | 0.5 | 0.5 | 0.5 | 0 |

X_{4} | 0.5 | 1.5 | 0.5 | 1.0 | 0.3 | 0.8 | −0.5 | −0.5 | 0.5 | −0.5 | −1.5 | 0 | −1.5 | 0 | 1.5 | 0 | 0 | 2.0 | ||

X_{5} | 0.5 | 0.5 | 0.5 | 0.5 | 0.3 | 0.3 | −1.5 | 0 | −1.5 | 0 | 0 | −1.5 | 0 | −1.5 | 1.0 | 0 | 0 | 1.5 | ||

X_{6} | −1.5 | 0 | −1.0 | 0 | −0.8 | 0 | −1.5 | 0 | −1.5 | 0.5 | 0 | 0 | 1.0 | |||||||

X_{7} | 0 | −1.5 | 0 | −1.0 | 0 | −0.8 | 0 | 1.5 | 0 | 0.5 | ||||||||||

X_{8} | 0 | 1.0 | ||||||||||||||||||

X_{9} | 0 | 0.5 | ||||||||||||||||||

Exposure | 1.5 | 1.0 | 0.8 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | ||||||||||

Intercept | −3.6 | −7.2 | −3.6 | −6.3 | −3.3 | −5.4 | −2.1 | −5.2 | −2.1 | −5.3 | −2.1 | −5.3 | −2.5 | −5.2 | −1.1 | −6.0 | −3.5 | −6.5 | −3.1 | −8.4 |

All covariates except *X*_{5} in populations 1–3 and *X*_{7} in population 10 were Bernoulli random variables with success probability of 0.2. These 4 covariates followed a normal distribution with a mean of 3 and a variance of 1.

Covariatea | Scenariosb | ||||||||||||||||||||||||

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | |

X_{1} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||

X_{2} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||||

X_{3} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||||||||

X_{4} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||||||||||

X_{5} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||||||||

X_{6} | √ | √ | √ | √ | √ | √ | √ | ||||||||||||||||||

X_{7} | √ | √ | √ | √ | √ | √ | |||||||||||||||||||

X_{8} | √ | √ | √ | ||||||||||||||||||||||

X_{9} | √ | √ | |||||||||||||||||||||||

Exposure | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | |

Intercept | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | |

Source population | 1 | 2 | 3 | 1 | 4 | 1 | 1 | 1 | 3 | 1 | 3 | 5 | 6 | 1 | 7 | 9 | 10 | 1 | 8 | 9 | 10 | 10 | 1 | 9 | 10 |

Covariatea | Scenariosb | ||||||||||||||||||||||||

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | |

X_{1} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||

X_{2} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||||

X_{3} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||||||||

X_{4} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||||||||||

X_{5} | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | ||||||||

X_{6} | √ | √ | √ | √ | √ | √ | √ | ||||||||||||||||||

X_{7} | √ | √ | √ | √ | √ | √ | |||||||||||||||||||

X_{8} | √ | √ | √ | ||||||||||||||||||||||

X_{9} | √ | √ | |||||||||||||||||||||||

Exposure | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | |

Intercept | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | √ | |

Source population | 1 | 2 | 3 | 1 | 4 | 1 | 1 | 1 | 3 | 1 | 3 | 5 | 6 | 1 | 7 | 9 | 10 | 1 | 8 | 9 | 10 | 10 | 1 | 9 | 10 |

Covariate relations with the exposure and outcome are described in Table 1 for each of the 10 source populations.

Scenarios 1–3 contained confounders of decreasing strengths. Scenarios 4 and 5 represented misspecified directed acyclic graphs that omitted a strong confounder. Scenarios 6–9 represented directed acyclic graphs that omitted a moderate or weak confounder. Scenarios 10–12 represented directed acyclic graphs that omitted 2 same-direction confounders, while scenario 13 omitted 2 opposite-direction confounders. Scenarios 14–17 represented misspecified directed acyclic graphs that included nonconfounders associated with only study exposure, while scenarios 18–22 represented the inclusion of nonconfounders associated with only study outcome. Scenarios 23–25 included nonconfounders that were each associated with only study exposure or outcome.

We iterated the process of cumulative incidence case-control sampling from the population for each sample size 1,000 times. One set consisted of 150 cases and 150 controls, and the other consisted of 500 cases and 500 controls. Sampling was without replacement within each iteration, but the sampled observations were replaced for the next iteration. SAS software for Windows (18) was used to generate the samples.

### Covariate selection procedures

Logistic regression was used to model the parameter of interest, that is, the odds ratio (OR) relating *E* to *O* while simultaneously controlling for selected covariates. The covariate selection procedures investigated in this study are as follows.

#### DAG full model.

The covariates identified as confounders by the relevant DAG were included in the logistic regression model without further covariate selection. For example, *X*_{1}, …, *X*_{5}, but not *X*_{6} or *X*_{7}, were included in the logistic regression models for all the samples in scenarios 1–3 (Tables 1 and 2). Different sets of covariates were included in the DAG full models in scenarios 4–25 to represent different types of DAG misspecification (Table 2). For example, *X*_{1}, …, *X*_{4}, but not *X*_{5}, were included in the DAG full model for scenario 4 (sampled from population 1) to represent a misspecified DAG that omitted a confounder (i.e., *X*_{5}).

#### DAG gold-standard change-in-estimate procedure.

In this procedure, the initial full model was the DAG full model, and covariates were selected by backward elimination. At each stage, the 1 covariate for which removal caused the smallest change in the exposure OR (defined as ΔOR) was removed, providing the ΔOR was less than 0.1 (a 10% change). ΔOR, at each stage, was given by the following equation:

*is the OR estimated at the*

_{i}*i*th step of the procedure, and OR

_{DAG}is the OR estimated by using the initial DAG full model.

The procedure was discontinued at the step where no covariate's removal met the criterion (i.e., ΔOR ≥ 0.1).

#### DAG gold-standard change-in-estimate procedure with consideration of precision.

After selecting the model using the previously described DAG gold-standard procedure, we compared precision, quantified using the logistic regression-estimated standard error of the natural logarithm-transformed OR (lnOR), of this simplified model with precision of the DAG full model. At the final step, the model simplified by using the DAG gold-standard change-in-estimate procedure was selected if, and only if, it had greater precision (i.e., a smaller regression-estimated standard error) than did the DAG full model. Otherwise, the DAG full model was selected.

#### DAG-stepwise change-in-estimate procedure.

The DAG full model was the initial model, and backward elimination was applied. Instead of being compared with the OR obtained from the DAG full model, as in the case of the DAG gold-standard change-in-estimate procedure, the OR obtained at each step was compared with that generated at the previous step (i.e., ΔOR = | OR* _{i}* − OR

_{i}_{–1}|/OR

_{i}_{−1}). The 0.1 criterion previously described was also applied to this procedure.

We used a SAS macro (19) to perform the DAG-stepwise change-in-estimate procedure (2). This macro was further modified to execute the DAG gold-standard change-in-estimate procedures.

### Performance measures

The performances of these 4 procedures were assessed by comparing the lnOR calculated from each sample with the corresponding population lnOR. The performance measures were as follows: 1) standard error, estimated by using the standard deviation of the sample lnORs; 2) bias, estimated as the difference between the mean of the sample lnORs and the population lnOR; 3) square root of the mean-squared error, estimated as the square root of the sum of the bias squared and the variance (squared standard error) of the sample lnORs; and 4) 95% confidence interval coverage, calculated as the percentage of sample 95% confidence intervals that included the population lnOR. Sample confidence intervals were Wald confidence intervals derived from logistic regression.

To quantify precision, we calculated the standard errors derived from both the standard deviation of the 1,000 iterations (sampling distribution-based standard error) and logistic regression (regression-based standard error). These 2 standard error estimates were then compared with each other.

### Simulations

For simulations, all covariates followed a Bernoulli distribution with a success probability of 0.2 except for *X*_{5} in populations 1–3 and *X*_{7} in population 10, which were continuous and distributed normally with a mean of 3 and a standard deviation of 1 (Table 1). After predetermination of the distribution of each covariate, its relations with the exposure and outcome were modeled by using logistic regression (refer to equations 2 and 3):

*E*and

*O*are as previously defined, and $Pr(E=1|Z)$ is the success probability of exposure conditional on the 1 ×

*n*covariate matrix

*Z*= 1,

*X*

_{1}, …

*X*(

_{j}*n*=

*j*+ 1); $Pr(O=1|Z\u2032)$ is the success probability of the outcome conditional on the 1 ×

*n*′ covariate matrix

*Z′*= 1,

*E*,

*X*

_{1}, …

*X*(

_{k}*n*′ =

*k*+ 2); and β

*and β*

_{E}*are the*

_{O}*n*× 1 and

*n*′ × 1 matrices of the regression coefficients expressing the covariate-exposure associations and the effects of the study exposure and other covariates on outcome, respectively.

Logistic regression intercepts were predetermined in order to ensure that the 95th percentile of outcome incidence proportion and the median of exposure prevalence in the source populations were approximately 10% and 20%, respectively. The percentiles were derived from the exposure-covariate joint probabilities, computed by using the predetermined regression coefficients (Table 1). The means of the continuous covariates and an exposure prevalence of 0.2 were used in these computations.

## RESULTS

Although the relative performances of the procedures did not vary with sample size, the magnitudes of the differences between them were greater at *n* = 1,000 than at *n* = 300. We present only the results of *n* = 1,000.

### Simulation results

The simulated 95% confidence interval coverages of properly specified DAG full models were all close to the nominal level (95%–96%), and the absolute values of the bias were all small (0.003–0.025) (only results from scenarios 1–3 are shown) (Table 3). These results indicated that the number of simulations performed in the study was satisfactory, because in this study confounding of less than 0.1 (a 10% change in OR) was considered inconsequential in covariate selection.

Measures by Scenariob | Covariate Selection Methods | |||

DAG | DAG-GS CE | DAG-GS P | DAG-S CE | |

Standard error | ||||

1 | 0.173 | 0.177 | 0.177 | 0.188 |

2 | 0.164 | 0.167 | 0.167 | 0.179 |

3 | 0.176 | 0.177 | 0.177 | 0.184 |

Bias | ||||

1 | 0.007 | 0.057 | 0.057 | 0.077 |

2 | −0.006 | 0.058 | 0.058 | 0.119 |

3 | 0.003 | 0.072 | 0.072 | 0.159 |

Square root of the mean-squared error | ||||

1 | 0.173 | 0.186 | 0.186 | 0.204 |

2 | 0.164 | 0.177 | 0.177 | 0.215 |

3 | 0.176 | 0.191 | 0.191 | 0.244 |

95% confidence interval coverage, % | ||||

1 | 96 | 94 | 94 | 90 |

2 | 95 | 93 | 93 | 86 |

3 | 96 | 94 | 94 | 84 |

Measures by Scenariob | Covariate Selection Methods | |||

DAG | DAG-GS CE | DAG-GS P | DAG-S CE | |

Standard error | ||||

1 | 0.173 | 0.177 | 0.177 | 0.188 |

2 | 0.164 | 0.167 | 0.167 | 0.179 |

3 | 0.176 | 0.177 | 0.177 | 0.184 |

Bias | ||||

1 | 0.007 | 0.057 | 0.057 | 0.077 |

2 | −0.006 | 0.058 | 0.058 | 0.119 |

3 | 0.003 | 0.072 | 0.072 | 0.159 |

Square root of the mean-squared error | ||||

1 | 0.173 | 0.186 | 0.186 | 0.204 |

2 | 0.164 | 0.177 | 0.177 | 0.215 |

3 | 0.176 | 0.191 | 0.191 | 0.244 |

95% confidence interval coverage, % | ||||

1 | 96 | 94 | 94 | 90 |

2 | 95 | 93 | 93 | 86 |

3 | 96 | 94 | 94 | 84 |

Abbreviations: DAG, directed acyclic graph full model; DAG-GS CE, directed acyclic graph gold-standard change-in-estimate procedure without consideration of precision; DAG-GS P, directed acyclic graph gold-standard change-in-estimate procedure with consideration of precision; DAG-S CE, directed acyclic graph stepwise change-in-estimate procedure.

Standard errors, bias, square root of the mean-squared error, and the 95% confidence interval coverage for the natural logarithm of sample odds ratios were obtained by using the 4 model selection methods shown. The results were from 1,000 case-control samples, each consisting of 500 cases and 500 controls.

Scenarios 1–3 contained confounders of decreasing strengths, respectively.

### Performance measures

#### No misspecification: strength of confounding (*scenarios 1–3*).

Overall, when the DAG correctly specified the underlying causal relations, the DAG full model performed best, and the DAG-stepwise change-in-estimate procedure performed worst (Table 3). This was true regardless of strength of confounding. A comparison of the DAG gold-standard change-in-estimate procedure with versus without precision considerations showed that the results were identical on all performance measures.

Bias and 95% confidence interval coverage were the measures that produced the most substantial differences among the 4 procedures. Although the DAG full model consistently produced the least bias, the bias resulting from the DAG-stepwise change-in-estimate procedure was consistently the greatest. Ninety-five percent confidence interval coverage was closer to nominal coverage for both the DAG full model and the DAG gold-standard change-in-estimate procedures than for the DAG-stepwise change-in-estimate procedure. The differences among the 4 procedures for these 2 performance measures (bias and 95% confidence interval coverage) were inversely associated with strength of confounding; that is, the differences increased as the strength of confounding decreased.

Standard errors generated by the DAG-stepwise change-in-estimate procedure were as much as 9% greater than DAG full model standard errors. Although the differences in standard error were small between the DAG full model and the DAG gold-standard change-in-estimate procedures, they increased with strength of confounding.

#### Misspecification: omission of confounders (*scenarios 4–13*).

When the DAG omitted confounders, the DAG full model performed as well as or better than the other procedures, and the DAG-stepwise procedure performed worst (Table 4). In scenarios 5, 12, and 13, in which only 2 covariates were included in the initial full model for selection, performance measures were essentially the same across the 4 methods. In all other scenarios (where ≥3 covariates were included in the initial full model for selection), the DAG full model performed best, with respect to bias, square root of the mean-squared error, and 95% confidence interval coverage, and the DAG-stepwise change-in-estimate procedure performed worst. This was also true in general for the standard errors.

Measures by Scenario (Omitted Confoundersb) | Covariate Selection Methods | |||

DAG | DAG-GS CE | DAG-GS P | DAG-S CE | |

Standard error | ||||

4 (1 strong) | 0.169 | 0.173 | 0.173 | 0.183 |

5 (1 strong) | 0.172 | 0.172 | 0.172 | 0.172 |

6 (1 moderate) | 0.164 | 0.165 | 0.165 | 0.171 |

7 (1 moderate) | 0.163 | 0.164 | 0.164 | 0.170 |

8 (1 weak) | 0.171 | 0.184 | 0.184 | 0.189 |

9 (1 weak) | 0.174 | 0.178 | 0.178 | 0.185 |

10 (2 moderate)c | 0.156 | 0.155 | 0.155 | 0.156 |

11 (2 weak)c | 0.171 | 0.177 | 0.177 | 0.181 |

12 (2 weak)c | 0.169 | 0.169 | 0.169 | 0.169 |

13 (2 weak)d | 0.157 | 0.157 | 0.157 | 0.157 |

Bias | ||||

4 (1 strong) | 0.214 | 0.265 | 0.265 | 0.292 |

5 (1 strong) | 0.171 | 0.171 | 0.171 | 0.171 |

6 (1 moderate) | 0.153 | 0.195 | 0.195 | 0.212 |

7 (1 moderate) | 0.149 | 0.197 | 0.197 | 0.207 |

8 (1 weak) | 0.051 | 0.076 | 0.076 | 0.081 |

9 (1 weak) | 0.017 | 0.084 | 0.084 | 0.159 |

10 (2 moderate)c | 0.272 | 0.307 | 0.307 | 0.308 |

11 (2 weak)c | 0.101 | 0.152 | 0.152 | 0.175 |

12 (2 weak)c | 0.082 | 0.082 | 0.082 | 0.082 |

13 (2 weak)d | 0.022 | 0.022 | 0.022 | 0.022 |

Square root of the mean-squared error | ||||

4 (1 strong) | 0.272 | 0.317 | 0.317 | 0.345 |

5 (1 strong) | 0.243 | 0.243 | 0.243 | 0.243 |

6 (1 moderate) | 0.224 | 0.256 | 0.256 | 0.272 |

7 (1 moderate) | 0.221 | 0.256 | 0.256 | 0.267 |

8 (1 weak) | 0.179 | 0.199 | 0.199 | 0.206 |

9 (1 weak) | 0.175 | 0.196 | 0.196 | 0.244 |

10 (2 moderate)c | 0.314 | 0.344 | 0.344 | 0.345 |

11 (2 weak)c | 0.199 | 0.234 | 0.234 | 0.251 |

12 (2 weak)c | 0.188 | 0.188 | 0.188 | 0.188 |

13 (2 weak)d | 0.159 | 0.159 | 0.159 | 0.159 |

95% confidence interval coverage, % | ||||

4 (1 strong) | 78 | 63 | 63 | 55 |

5 (1 strong) | 85 | 85 | 85 | 85 |

6 (1 moderate) | 85 | 79 | 79 | 74 |

7 (1 moderate) | 88 | 79 | 79 | 76 |

8 (1 weak) | 95 | 92 | 92 | 90 |

9 (1 weak) | 96 | 92 | 92 | 84 |

10 (2 moderate)c | 62 | 51 | 51 | 50 |

11 (2 weak)c | 93 | 86 | 86 | 82 |

12 (2 weak)c | 93 | 93 | 93 | 93 |

13 (2 weak)d | 96 | 96 | 96 | 96 |

Measures by Scenario (Omitted Confoundersb) | Covariate Selection Methods | |||

DAG | DAG-GS CE | DAG-GS P | DAG-S CE | |

Standard error | ||||

4 (1 strong) | 0.169 | 0.173 | 0.173 | 0.183 |

5 (1 strong) | 0.172 | 0.172 | 0.172 | 0.172 |

6 (1 moderate) | 0.164 | 0.165 | 0.165 | 0.171 |

7 (1 moderate) | 0.163 | 0.164 | 0.164 | 0.170 |

8 (1 weak) | 0.171 | 0.184 | 0.184 | 0.189 |

9 (1 weak) | 0.174 | 0.178 | 0.178 | 0.185 |

10 (2 moderate)c | 0.156 | 0.155 | 0.155 | 0.156 |

11 (2 weak)c | 0.171 | 0.177 | 0.177 | 0.181 |

12 (2 weak)c | 0.169 | 0.169 | 0.169 | 0.169 |

13 (2 weak)d | 0.157 | 0.157 | 0.157 | 0.157 |

Bias | ||||

4 (1 strong) | 0.214 | 0.265 | 0.265 | 0.292 |

5 (1 strong) | 0.171 | 0.171 | 0.171 | 0.171 |

6 (1 moderate) | 0.153 | 0.195 | 0.195 | 0.212 |

7 (1 moderate) | 0.149 | 0.197 | 0.197 | 0.207 |

8 (1 weak) | 0.051 | 0.076 | 0.076 | 0.081 |

9 (1 weak) | 0.017 | 0.084 | 0.084 | 0.159 |

10 (2 moderate)c | 0.272 | 0.307 | 0.307 | 0.308 |

11 (2 weak)c | 0.101 | 0.152 | 0.152 | 0.175 |

12 (2 weak)c | 0.082 | 0.082 | 0.082 | 0.082 |

13 (2 weak)d | 0.022 | 0.022 | 0.022 | 0.022 |

Square root of the mean-squared error | ||||

4 (1 strong) | 0.272 | 0.317 | 0.317 | 0.345 |

5 (1 strong) | 0.243 | 0.243 | 0.243 | 0.243 |

6 (1 moderate) | 0.224 | 0.256 | 0.256 | 0.272 |

7 (1 moderate) | 0.221 | 0.256 | 0.256 | 0.267 |

8 (1 weak) | 0.179 | 0.199 | 0.199 | 0.206 |

9 (1 weak) | 0.175 | 0.196 | 0.196 | 0.244 |

10 (2 moderate)c | 0.314 | 0.344 | 0.344 | 0.345 |

11 (2 weak)c | 0.199 | 0.234 | 0.234 | 0.251 |

12 (2 weak)c | 0.188 | 0.188 | 0.188 | 0.188 |

13 (2 weak)d | 0.159 | 0.159 | 0.159 | 0.159 |

95% confidence interval coverage, % | ||||

4 (1 strong) | 78 | 63 | 63 | 55 |

5 (1 strong) | 85 | 85 | 85 | 85 |

6 (1 moderate) | 85 | 79 | 79 | 74 |

7 (1 moderate) | 88 | 79 | 79 | 76 |

8 (1 weak) | 95 | 92 | 92 | 90 |

9 (1 weak) | 96 | 92 | 92 | 84 |

10 (2 moderate)c | 62 | 51 | 51 | 50 |

11 (2 weak)c | 93 | 86 | 86 | 82 |

12 (2 weak)c | 93 | 93 | 93 | 93 |

13 (2 weak)d | 96 | 96 | 96 | 96 |

Abbreviations: DAG, directed acyclic graph full model; DAG-GS CE, directed acyclic graph gold-standard change-in-estimate procedure without consideration of precision; DAG-GS P, directed acyclic graph gold-standard change-in-estimate procedure with consideration of precision; DAG-S CE, directed acyclic graph stepwise change-in-estimate procedure.

Standard errors, bias, square root of the mean-squared error, and the 95% confidence interval coverage for the natural logarithm of sample odds ratios were obtained by using the 4 model selection methods shown. The results were from 1,000 case-control samples, each consisting of 500 cases and 500 controls.

Number and strength of omitted confounders.

Omitted confounders were in the same direction.

Omitted confounders were in the opposite direction.

The results of the DAG gold-standard change-in-estimate procedure with versus without precision considerations were identical for all 4 performance measures for scenarios 4–13.

#### Misspecification: inclusion of nonconfounding covariates (*scenarios 14–25*).

When the initial DAG included nonconfounders, the DAG gold-standard change-in-estimate and DAG-stepwise change-in-estimate procedures produced smaller standard error than did the DAG full model in most scenarios regardless of whether the nonconfounder(s) were associated with only outcome or only exposure (Table 5). These procedures also outperformed the DAG full model with respect to bias and square root of the mean-squared error when the DAGs included nonconfounders that were associated with only study outcome. The DAG full model resulted in 95% confidence interval coverage that was closer to nominal coverage than the other procedures.

Measures by Scenario (Included Nonconfoundersb) | Covariate Selection Methods | |||

DAG | DAG-GS CE | DAG-GS P | DAG-S CE | |

Standard error | ||||

14 (1 E only) | 0.181 | 0.184 | 0.184 | 0.189 |

15 (1 E only) | 0.180 | 0.173 | 0.173 | 0.173 |

16 (3 E only) | 0.190 | 0.187 | 0.187 | 0.184 |

17 (1 E only) | 0.181 | 0.173 | 0.173 | 0.172 |

18 (1 O only) | 0.181 | 0.183 | 0.183 | 0.192 |

19 (1 O only) | 0.178 | 0.173 | 0.173 | 0.173 |

20 (3 O only) | 0.192 | 0.184 | 0.184 | 0.185 |

21 (3 O only) | 0.197 | 0.184 | 0.184 | 0.181 |

22 (4 O only) | 0.201 | 0.186 | 0.186 | 0.183 |

23 (1 E, 1 O only) | 0.188 | 0.190 | 0.190 | 0.193 |

24 (3 E, 3 O only) | 0.201 | 0.192 | 0.192 | 0.188 |

25 (3 E, 4 O only) | 0.223 | 0.204 | 0.204 | 0.193 |

Bias | ||||

14 (1 E only) | 0.010 | 0.060 | 0.060 | 0.077 |

15 (1 E only) | 0 | 0.005 | 0.005 | 0.005 |

16 (3 E only) | 0.059 | 0.087 | 0.087 | 0.083 |

17 (1 E only) | 0.013 | 0.007 | 0.007 | 0.001 |

18 (1 O only) | 0.028 | 0.065 | 0.065 | 0.080 |

19 (1 O only) | 0.032 | 0.021 | 0.021 | 0.021 |

20 (3 O only) | 0.095 | 0.088 | 0.088 | 0.091 |

21 (3 O only) | 0.050 | 0.029 | 0.029 | 0.026 |

22 (4 O only) | 0.065 | 0.037 | 0.037 | 0.028 |

23 (1 E, 1 O only) | 0.030 | 0.069 | 0.069 | 0.079 |

24 (3 E, 3 O only) | 0.102 | 0.095 | 0.095 | 0.092 |

25 (3 E, 4 O only) | 0.069 | 0.041 | 0.041 | 0.027 |

Square root of the mean-squared error | ||||

14 (1 E only) | 0.181 | 0.193 | 0.193 | 0.204 |

15 (1 E only) | 0.180 | 0.173 | 0.173 | 0.173 |

16 (3 E only) | 0.199 | 0.206 | 0.206 | 0.202 |

17 (1 E only) | 0.182 | 0.173 | 0.173 | 0.172 |

18 (1 O only) | 0.183 | 0.194 | 0.194 | 0.208 |

19 (1 O only) | 0.181 | 0.174 | 0.174 | 0.174 |

20 (3 O only) | 0.214 | 0.204 | 0.204 | 0.206 |

21 (3 O only) | 0.203 | 0.186 | 0.186 | 0.183 |

22 (4 O only) | 0.211 | 0.190 | 0.190 | 0.185 |

23 (1 E, 1 O only) | 0.191 | 0.202 | 0.202 | 0.208 |

24 (3 E, 3 O only) | 0.226 | 0.214 | 0.214 | 0.210 |

25 (3 E, 4 O only) | 0.234 | 0.208 | 0.208 | 0.195 |

95% confidence interval coverage, % | ||||

14 (1 E only) | 95 | 92 | 92 | 90 |

15 (1 E only) | 96 | 95 | 95 | 95 |

16 (3 E only) | 94 | 92 | 92 | 93 |

17 (1 E only) | 95 | 94 | 94 | 94 |

18 (1 O only) | 95 | 92 | 92 | 90 |

19 (1 O only) | 94 | 94 | 94 | 94 |

20 (3 O only) | 92 | 92 | 92 | 92 |

21 (3 O only) | 94 | 93 | 93 | 94 |

22 (4 O only) | 95 | 93 | 93 | 94 |

23 (1 E, 1 O only) | 94 | 90 | 90 | 90 |

24 (3 E, 3 O only) | 94 | 92 | 92 | 92 |

25 (3 E, 4 O only) | 95 | 90 | 90 | 92 |

Measures by Scenario (Included Nonconfoundersb) | Covariate Selection Methods | |||

DAG | DAG-GS CE | DAG-GS P | DAG-S CE | |

Standard error | ||||

14 (1 E only) | 0.181 | 0.184 | 0.184 | 0.189 |

15 (1 E only) | 0.180 | 0.173 | 0.173 | 0.173 |

16 (3 E only) | 0.190 | 0.187 | 0.187 | 0.184 |

17 (1 E only) | 0.181 | 0.173 | 0.173 | 0.172 |

18 (1 O only) | 0.181 | 0.183 | 0.183 | 0.192 |

19 (1 O only) | 0.178 | 0.173 | 0.173 | 0.173 |

20 (3 O only) | 0.192 | 0.184 | 0.184 | 0.185 |

21 (3 O only) | 0.197 | 0.184 | 0.184 | 0.181 |

22 (4 O only) | 0.201 | 0.186 | 0.186 | 0.183 |

23 (1 E, 1 O only) | 0.188 | 0.190 | 0.190 | 0.193 |

24 (3 E, 3 O only) | 0.201 | 0.192 | 0.192 | 0.188 |

25 (3 E, 4 O only) | 0.223 | 0.204 | 0.204 | 0.193 |

Bias | ||||

14 (1 E only) | 0.010 | 0.060 | 0.060 | 0.077 |

15 (1 E only) | 0 | 0.005 | 0.005 | 0.005 |

16 (3 E only) | 0.059 | 0.087 | 0.087 | 0.083 |

17 (1 E only) | 0.013 | 0.007 | 0.007 | 0.001 |

18 (1 O only) | 0.028 | 0.065 | 0.065 | 0.080 |

19 (1 O only) | 0.032 | 0.021 | 0.021 | 0.021 |

20 (3 O only) | 0.095 | 0.088 | 0.088 | 0.091 |

21 (3 O only) | 0.050 | 0.029 | 0.029 | 0.026 |

22 (4 O only) | 0.065 | 0.037 | 0.037 | 0.028 |

23 (1 E, 1 O only) | 0.030 | 0.069 | 0.069 | 0.079 |

24 (3 E, 3 O only) | 0.102 | 0.095 | 0.095 | 0.092 |

25 (3 E, 4 O only) | 0.069 | 0.041 | 0.041 | 0.027 |

Square root of the mean-squared error | ||||

14 (1 E only) | 0.181 | 0.193 | 0.193 | 0.204 |

15 (1 E only) | 0.180 | 0.173 | 0.173 | 0.173 |

16 (3 E only) | 0.199 | 0.206 | 0.206 | 0.202 |

17 (1 E only) | 0.182 | 0.173 | 0.173 | 0.172 |

18 (1 O only) | 0.183 | 0.194 | 0.194 | 0.208 |

19 (1 O only) | 0.181 | 0.174 | 0.174 | 0.174 |

20 (3 O only) | 0.214 | 0.204 | 0.204 | 0.206 |

21 (3 O only) | 0.203 | 0.186 | 0.186 | 0.183 |

22 (4 O only) | 0.211 | 0.190 | 0.190 | 0.185 |

23 (1 E, 1 O only) | 0.191 | 0.202 | 0.202 | 0.208 |

24 (3 E, 3 O only) | 0.226 | 0.214 | 0.214 | 0.210 |

25 (3 E, 4 O only) | 0.234 | 0.208 | 0.208 | 0.195 |

95% confidence interval coverage, % | ||||

14 (1 E only) | 95 | 92 | 92 | 90 |

15 (1 E only) | 96 | 95 | 95 | 95 |

16 (3 E only) | 94 | 92 | 92 | 93 |

17 (1 E only) | 95 | 94 | 94 | 94 |

18 (1 O only) | 95 | 92 | 92 | 90 |

19 (1 O only) | 94 | 94 | 94 | 94 |

20 (3 O only) | 92 | 92 | 92 | 92 |

21 (3 O only) | 94 | 93 | 93 | 94 |

22 (4 O only) | 95 | 93 | 93 | 94 |

23 (1 E, 1 O only) | 94 | 90 | 90 | 90 |

24 (3 E, 3 O only) | 94 | 92 | 92 | 92 |

25 (3 E, 4 O only) | 95 | 90 | 90 | 92 |

Abbreviations: DAG, directed acyclic graph full model; DAG-GS CE, directed acyclic graph gold-standard change-in-estimate procedure without consideration of precision; DAG-GS P, directed acyclic graph gold-standard change-in-estimate procedure with consideration of precision; DAG-S CE, directed acyclic graph stepwise change-in-estimate procedure; *E*, study exposure; *O*, study outcome.

Standard errors, bias, square root of the mean-squared error, and the 95% confidence interval coverage for the natural logarithm of sample odds ratios were obtained by using the 4 model selection methods shown. The results were from 1,000 case-control samples, each consisting of 500 cases and 500 controls.

Number of included nonconfounders and their relations with study exposure and outcome. For example, scenario 25 has included 7 nonconfounders, 3 of which were associated with exposure only and 4 with outcome only.

When the DAG was misspecified to include nonconfounders, the results of the DAG gold-standard change-in-estimate procedure with versus without precision considerations were, again, identical.

### Regression-based versus sampling distribution-based standard errors

The DAG gold-standard change-in-estimate procedure had a smaller mean of regression-based standard error than did their corresponding DAG full models in 22 of 25 scenarios (Table 6). On the other hand, only in 11 scenarios did the DAG gold-standard change-in-estimate procedure produce smaller sampling distribution-based standard error than the DAG full model. Ten of these scenarios corresponded to when the DAG included nonconfounders.

Scenarios | Sampling Distribution-based SE | Regression-based SE | % of DAG-GS CE With Smaller Regression-based SE | ||

DAG | DAG-GS CE | DAG | DAG-GS CE | ||

1 | 0.173 | 0.177 | 0.176 | 0.173 | 98.3 |

2 | 0.164 | 0.167 | 0.165 | 0.162 | 99.7 |

3 | 0.176 | 0.177 | 0.183 | 0.179 | 100 |

4 | 0.169 | 0.173 | 0.170 | 0.166 | 99.7 |

5 | 0.172 | 0.172 | 0.176 | 0.176 | 19.1 |

6 | 0.171 | 0.184 | 0.174 | 0.171 | 39.5 |

7 | 0.164 | 0.165 | 0.169 | 0.166 | 99.5 |

8 | 0.163 | 0.164 | 0.167 | 0.164 | 99.4 |

9 | 0.174 | 0.178 | 0.182 | 0.178 | 99.8 |

10 | 0.156 | 0.155 | 0.160 | 0.159 | 99.9 |

11 | 0.171 | 0.177 | 0.177 | 0.176 | 99.8 |

12 | 0.169 | 0.169 | 0.175 | 0.175 | 25.9 |

13 | 0.157 | 0.157 | 0.167 | 0.167 | 24.7 |

14 | 0.181 | 0.184 | 0.181 | 0.173 | 99.9 |

15 | 0.180 | 0.173 | 0.178 | 0.171 | 95.8 |

16 | 0.190 | 0.187 | 0.190 | 0.178 | 100 |

17 | 0.181 | 0.173 | 0.182 | 0.166 | 100 |

18 | 0.181 | 0.183 | 0.181 | 0.172 | 100 |

19 | 0.178 | 0.173 | 0.178 | 0.169 | 93.6 |

20 | 0.192 | 0.184 | 0.192 | 0.179 | 100 |

21 | 0.197 | 0.184 | 0.191 | 0.170 | 100 |

22 | 0.201 | 0.186 | 0.197 | 0.171 | 100 |

23 | 0.188 | 0.190 | 0.186 | 0.172 | 100 |

24 | 0.201 | 0.192 | 0.205 | 0.180 | 100 |

25 | 0.223 | 0.204 | 0.220 | 0.177 | 100 |

Scenarios | Sampling Distribution-based SE | Regression-based SE | % of DAG-GS CE With Smaller Regression-based SE | ||

DAG | DAG-GS CE | DAG | DAG-GS CE | ||

1 | 0.173 | 0.177 | 0.176 | 0.173 | 98.3 |

2 | 0.164 | 0.167 | 0.165 | 0.162 | 99.7 |

3 | 0.176 | 0.177 | 0.183 | 0.179 | 100 |

4 | 0.169 | 0.173 | 0.170 | 0.166 | 99.7 |

5 | 0.172 | 0.172 | 0.176 | 0.176 | 19.1 |

6 | 0.171 | 0.184 | 0.174 | 0.171 | 39.5 |

7 | 0.164 | 0.165 | 0.169 | 0.166 | 99.5 |

8 | 0.163 | 0.164 | 0.167 | 0.164 | 99.4 |

9 | 0.174 | 0.178 | 0.182 | 0.178 | 99.8 |

10 | 0.156 | 0.155 | 0.160 | 0.159 | 99.9 |

11 | 0.171 | 0.177 | 0.177 | 0.176 | 99.8 |

12 | 0.169 | 0.169 | 0.175 | 0.175 | 25.9 |

13 | 0.157 | 0.157 | 0.167 | 0.167 | 24.7 |

14 | 0.181 | 0.184 | 0.181 | 0.173 | 99.9 |

15 | 0.180 | 0.173 | 0.178 | 0.171 | 95.8 |

16 | 0.190 | 0.187 | 0.190 | 0.178 | 100 |

17 | 0.181 | 0.173 | 0.182 | 0.166 | 100 |

18 | 0.181 | 0.183 | 0.181 | 0.172 | 100 |

19 | 0.178 | 0.173 | 0.178 | 0.169 | 93.6 |

20 | 0.192 | 0.184 | 0.192 | 0.179 | 100 |

21 | 0.197 | 0.184 | 0.191 | 0.170 | 100 |

22 | 0.201 | 0.186 | 0.197 | 0.171 | 100 |

23 | 0.188 | 0.190 | 0.186 | 0.172 | 100 |

24 | 0.201 | 0.192 | 0.205 | 0.180 | 100 |

25 | 0.223 | 0.204 | 0.220 | 0.177 | 100 |

Abbreviations: DAG, directed acyclic graph full model; DAG-GS CE, directed acyclic graph gold-standard change-in-estimate procedure; SE, standard error.

The percentage of models simplified by using the DAG-GS CE procedure that resulted in a smaller regression-based standard error than the corresponding directed acyclic graph full models was also presented. The results were from 1,000 case-control samples, each consisting of 500 cases and 500 controls.

## DISCUSSION

This study generated 25 different scenarios to investigate whether covariate selection strategies that combined DAGs and change-in-estimate approaches could improve parameter estimation over the DAG procedure used by itself under different scenarios of correctly specified DAGs with various strengths of confounding and of DAG misspecification.

The finding that the DAG full model consistently performed best when DAGs were correctly specified with various strengths of confounding (scenarios 1–3) suggests that further model simplification by using the change-in-estimate procedure, which takes sampling variation into account, did not on average improve parameter estimation. The observations that the DAG full model provided minimal bias and the best 95% confidence interval coverage when DAGs were misspecified with omission of confounders (scenarios 4–13) simply reflect the fact that the variable selection procedures used in this study will not, in general, attenuate the bias resulting from inaccurate or incomplete causal assumptions made at the initial stage. The findings that the DAG gold-standard change-in-estimate procedures performed better with respect to bias than did the DAG-stepwise change-in-estimate procedure in scenarios 1–13 highlight a potential deficiency in the latter. If a 0.1 change in parameter estimate is used as the criterion, with regard to bias, the DAG gold-standard change-in-estimate and the DAG-stepwise change-in-estimate procedures estimate ORs that conform to (0.9)ΔOR_{DAG} < OR < (1.1)OR_{DAG} and (0.9)^{j}OR_{DAG} < OR < (1.1)^{j}ΔOR_{DAG}, respectively (where OR_{DAG} and *j* are the estimated OR and the number of confounders included in the initial DAG full model, respectively). Thus, the DAG-stepwise change-in-estimate procedure results in a wider range of estimated ORs than does the DAG gold-standard change-in-estimate approach and, on average, is expected to produce more biased estimates when a sizable number of covariates are being considered for elimination and when the bias produced by the DAG full model cannot be corrected by model reduction, such as when the DAG is incorrectly specified to omit confounders.

From a statistical point of view, model reduction should resolve the tradeoff of a larger bias for a smaller variance (13, 20–22). We found, however, that model reduction with logistic regression resulted in a larger bias but not necessarily a smaller standard error in most of the scenarios when the underlying causal assumptions were correctly specified or when the DAG was misspecified by the omission of confounders. Our study demonstrated that use of logistic regression-based standard error in covariate selection is problematic. Although the DAG gold-standard change-in-estimate procedure always produced an equal or smaller mean of regression-based standard error than did the corresponding DAG full model, it had a smaller sampling distribution-based standard error than did the DAG full model only in 1 of these 13 scenarios. Earlier work by Robinson and Jewell (21) and Robinson et al. (22) explains the identical results of the DAG gold-standard change-in-estimate with versus without precision considerations observed in our study. The inconsistency between the regression-based and the sampling distribution-based standard errors is in agreement with findings from a previous study (16) and likely reflects inflated precision resulting from ignoring covariate selection-related uncertainty in regression modeling (16, 23). Use of regression-based standard errors in covariate selection optimizes the regression-estimated precision but not necessarily the true precision, which is estimated by the underlying sampling distribution.

When the DAG was misspecified with inclusion of nonconfounders (scenarios 14–25), the DAG gold-standard change-in-estimate and the DAG-stepwise change-in-estimate procedures produced smaller standard errors than did the DAG full model in most scenarios regardless of whether the nonconfounders that were included were associated with only study exposure or only outcome. That the largest differences in the mean regression-based standard errors between the DAG gold-standard change-in-estimate procedure and the DAG full model were consistently observed in these scenarios is noteworthy and might serve as an indication for misspecification of DAGs with inclusion of nonconfounders. However, the reduction in the 95% confidence interval coverage resulting from the DAG gold-standard change-in-estimate procedures in many of these scenarios signaled the potential downward bias of the regression-based standard error. The DAG full model produced 95% confidence interval coverage that was closer to nominal coverage more consistently than did the other procedures in these scenarios.

The paradox that the DAG full models produced largest bias in most of the scenarios when the DAG included nonconfounders that were associated with only study outcome but not when they were associated with only study exposure could be explained by the noncollapsibility of OR (22, 24–27). After adjustment for these covariates that were associated with only study outcome, the DAG full model produced the smallest bias in 10 of these 12 scenarios (data not shown).

We caution the readers that the scenarios examined in our study are restricted to only a small region of the parameter space, and the conclusions made from this study are limited to the population structures generated and the number of covariates investigated. For instance, the underlying structure that we investigated involved confounders that were each independent of the others, that is, not on the same backdoor paths of the DAGs. Redundant confounders such as those lying on the same backdoor paths as other confounders were not investigated. We further acknowledge that there are different covariate selection procedures commonly used in epidemiologic studies, such as hierarchical backwards elimination (13, 28). The results from this study may not be generalizable or directly comparable with these procedures. Nevertheless, although we used case-control sampling, the results from this study should be applicable to a cohort study that uses risk or rate ratios as effect measures (29, 30) as we ensured that the outcomes were rare (i.e., incidence proportion <10%) in most joint strata of covariates in the source populations. On the other hand, our study encountered problems specific to using OR in confounder identification (12, 24–26). Additionally, although all logistic regression models converged, this study did not investigate constraints on model convergence. Although we expect that the results would be applicable to other commonly used models such as Poisson and Cox proportional-hazards models, the methods might not be feasible for others, such as log binomial models, as problems of model convergence using these models have been previously reported (31–33).

In conclusion, potential, but not conclusive, benefits of performing further covariate selection using change-in-estimate procedures were observed only when the DAGs were misspecified by the inclusion of nonconfounders. We conclude, therefore, that the primary task for the researcher/analyst is to ensure that proper causal assumptions are made, in particular, that no strong confounders are excluded from data collection or analysis. Given that the investigator is never certain about the accuracy of prior causal assumptions, the recommended strategy is to construct a “conservative” DAG, including all known confounders and potential confounders even at the risk of including nonconfounders (given that they are neither colliders nor downstream effects of the exposure or the outcome), and use this full model in data analysis. An alternative is to construct a series of DAGs, each having plausibility based on prior knowledge, with various degrees of “conservativeness” regarding potential but not established confounders. The DAG full model for each can then be reported with complete transparency about the assumed underlying model. The analysts then could perform covariate selection using the DAG gold-standard change-in-estimate procedure. A large reduction in regression-based standard error in the simplified model might be an indication of misspecification of the underlying causal assumption and, specifically, with inclusion of nonconfounders. However, even under this type of misspecification, bias may be increased, and the 95% confidence interval coverage may deviate from nominal coverage through covariate selection.

A final caveat is that results related to bias are applicable on the basis of the average of 1,000 iterations, but in any given single study, the actual performance of model reduction is unknown and may, in some circumstances, produce a less biased estimate of effect.

### Abbreviations

- DAG
directed acyclic graph

- lnOR
natural logarithm-transformed odds ratio

- OR
odds ratio

Author affiliations: Department of Pathobiology, College of Veterinary Medicine, University of Illinois at Urbana-Champaign, Urbana, Illinois (Hsin-Yi Weng); Department of Biostatistics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana (Ya-Hui Hsueh); Department of Public Health and Preventive Medicine, School of Medicine and School of Veterinary Medicine, St. George's University, Grenada, West Indies (Locksley L. McV. Messam); and Department of Public Health Sciences, School of Medicine, University of California at Davis, Davis, California (Irva Hertz-Picciotto).

The authors thank Lora D. Delwiche of the Public Health Sciences, University of California at Davis, for assisting with modifying the SAS macros for covariate selection procedures.

Conflict of interest: none declared.