Abstract

Pharmacoepidemiologic studies are increasingly conducted within linked databases, often to obtain richer confounder data. However, the potential for selection bias is frequently overlooked when linked data is available only for a subset of patients. We highlight the importance of accounting for potential selection bias by evaluating the association between antipsychotics and type 2 diabetes in youths within a claims database linked to a smaller laboratory database. We used inverse probability of treatment weights (IPTW) to control for confounding. In analyses restricted to the linked cohorts, we applied inverse probability of selection weights (IPSW) to create a population representative of the full cohort. We used pooled logistic regression weighted by IPTW only or IPTW and IPSW to estimate treatment effects. Metabolic conditions were more prevalent in linked cohorts compared with the full cohort. Within the full cohort, the confounding-adjusted hazard ratio was 2.26 (95% CI: 2.07, 2.49) comparing initiation of antipsychotics with initiation of control medications. Within the linked cohorts, a different magnitude of association was obtained without adjustment for selection, whereas applying IPSW resulted in point estimates similar to the full cohort’s (e.g., an adjusted hazard ratio of 1.63 became 2.12). Linked database studies may generate biased estimates without proper adjustment for potential selection bias.

Abbreviations

     
  • CI

    confidence interval

  •  
  • HR

    hazard ratio

  •  
  • IPSW

    inverse probability of selection weights

  •  
  • IPTW

    inverse probability of treatment weights

  •  
  • T2D

    type 2 diabetes

Health-care databases, such as administrative claims data, electronic health records, and registries, are widely used in pharmacoepidemiology and health services research. However, these databases are not collected for research purposes and often lack information on important confounders (1). With the widespread availability of health-care databases, records for the same patient may be available across different data sources (2, 3). As a result, richer patient data can be obtained through data linkage (411). Guidance on the feasibility of data linkage and recommendations for transparent reporting have been published elsewhere (1215).

One of the advantages to a linked database study is improved confounding control, but the potential for selection bias is often overlooked. Typically, a data linkage is feasible for only a subset of the study population. For example, a claims database from a health plan may be linked to an electronic health record database from a delivery system but only among patients who appear in both data sources. Therefore, linked database studies are generally restricted to a subset of the original study population (1620). If the subset of linked patients is not representative of the original study population, then restricting an analysis to the linked population may introduce bias (21).

Here we highlight the importance of considering potential selection bias when working with linked data sources to estimate treatment effects in the original study population (i.e., the target population of interest). We also demonstrate how available analytical approaches can be used to account for this potential bias.

METHODS

Application example

We applied the proposed approaches to evaluate the association of antipsychotics and type 2 diabetes (T2D) in youths (aged 5–24 years). In young patients, antipsychotic use is associated with a 2- to 3-fold increased risk of developing T2D (22, 23), as well as an increased risk of other adverse cardiometabolic side effects, such as weight gain and lipid and glucose abnormalities (24, 25). Therefore, the American Diabetes Association recommends metabolic monitoring for youths treated with antipsychotics (26). Metabolic screening prior to treatment initiation is intended to guide treatment decision making, so patients with poor metabolic health at baseline may choose an alternative treatment. A study evaluating antipsychotics and the risk of T2D within a database that does not capture laboratory data may have residual confounding by unmeasured metabolic test results.

Definitions

We defined the “primary data set” as the data set where the original study population was identified (and therefore, the target population we would like to make inferences about) and the “supplemental data set” as the data set that contains additional covariate data that was not available in the primary data set. The “linked cohort” consisted of patients within the primary data set who have been linked to the supplemental data set.

Data sources

The primary data set was the IBM MarketScan Commercial Database (MarketScan; IBM Watson Health, Cambridge, Massachusetts; January 2010 to March 2019), a nationwide claims database in the United States (27). To obtain additional confounder data, we identified a supplemental data set that captures test results from select laboratory networks for patients who have laboratory tests ordered (IBM MarketScan Lab Database). A previous study showed that the distribution of laboratory results within this database is representative of the general US population (28). Using the deidentified patient enrollment identification number, we linked records across data sources for the subset of MarketScan patients who had a test result available within the laboratory database. The laboratory database includes results from dozens of tests. For the application example, we were interested in 3 of these tests: hemoglobin A1c (HbA1c), cholesterol (total, high-density lipoprotein, low-density lipoprotein), and triglycerides. This study was approved by the Institutional Review Board of Harvard Pilgrim Health Care with a waiver of informed consent.

Study population

We defined the study population within the claims database (primary data set) as youths aged 5–24 years who initiated an antipsychotic medication or a comparator psychotropic medication. The exposed group included initiators of an antipsychotic medication. The date of the first observed dispensing for an antipsychotic served as the index date. The comparator group consisted of new users of other psychotropic drugs (antidepressants, medications for attention-deficit/hyperactivity disorder, and mood stabilizers; details in Web Table 1 available at https://doi.org/10.1093/aje/kwab299). The date of the first observed dispensing for a comparator drug served as the index date. In the comparator group, we required no previous use of the initiation drug, but use of the other comparator drugs was allowed. For example, antidepressant initiators were required to have no previous antidepressant dispensings, but previous dispensings of medications for attention-deficit/hyperactivity disorder or mood stabilizers were allowed.

Patients who did not have continuous medical and pharmacy enrollment, had a diagnosis of diabetes or a dispensing for an antihyperglycemic medication, or were pregnant during the 180 days prior to the index date were excluded. Youths were additionally required to have ≥1 mental health diagnosis on the index date or any point prior to the index date. In the comparator group, we also required patients to have no antipsychotic use in the 180 days prior to the index date.

Linked subset

For patients with data available in both the primary and supplemental data set, we varied the definition of linked cohort as follows (illustrated in Figure 1):

  1. Linked cohort 1: linked patients with data for any laboratory test (not limited to the 3 tests of interest) during the study period. This linked cohort included eligible patients who appeared in both data sets, but some of these patients might not have a recorded measurement for the tests of interest during the covariate assessment period (defined in the next section).

  2. Linked cohort 2: linked patients with data for any laboratory test (not limited to the 3 tests of interest) during the covariate assessment period. This linked cohort included eligible patients who had any supplemental data available during the covariate assessment period. As in linked cohort 1, some patients might not have a recorded measurement for the tests of interest during the covariate assessment period.

  3. Linked cohort 3: linked patients with a recorded result for each of the 3 tests of interest during the covariate assessment period. This linked cohort included eligible patients who would be included in a conventional complete-case analysis.

Overview of study cohorts, IBM MarketScan Data, United States, 2010–2019. We implemented the following framework for identifying the study cohorts: The study population was defined within the primary database. This population reflected the target population that we would like to make inferences about. To obtain additional confounder data, the primary database was linked to the supplemental database. Several definitions were considered for identifying the subset of patients who appeared in both data sources (linked cohort 1, linked cohort 2, linked cohort 3). In our study, the IBM MarketScan Commercial Database served as the primary database and the IBM MarketScan Lab Database served as the supplemental database.
Figure 1

Overview of study cohorts, IBM MarketScan Data, United States, 2010–2019. We implemented the following framework for identifying the study cohorts: The study population was defined within the primary database. This population reflected the target population that we would like to make inferences about. To obtain additional confounder data, the primary database was linked to the supplemental database. Several definitions were considered for identifying the subset of patients who appeared in both data sources (linked cohort 1, linked cohort 2, linked cohort 3). In our study, the IBM MarketScan Commercial Database served as the primary database and the IBM MarketScan Lab Database served as the supplemental database.

Patient characteristics

We measured baseline covariates during the 180 days prior to the index date (covariate assessment period). Within the claims database, we identified several characteristics as potential confounders or proxies of confounders, including demographic factors, metabolic conditions, psychiatric conditions, laboratory tests ordered, lifestyle factors, medications, indicators of health-care utilization, and the pediatric comorbidity index (full list in Web Table 2) (29). As was typically done in claims database studies, patients were defined as not having a certain characteristic (e.g., depression) unless they had a recorded diagnosis or dispensing in the database. Therefore, there were no missing values in the claims-based covariates.

Within the laboratory database, we obtained test results for hemoglobin A1c, cholesterol, and triglycerides. Implausibly extreme laboratory values were set to missing (details in Web Table 2). When there were multiple records of the test result available, we used only the value closest to the index date.

Outcome

We followed patients until the onset of T2D, end of insurance coverage, or end of available data. We defined cases using a previously validated algorithm for identifying T2D in children using health-care databases (positive predictive value = 87%) (30). This definition was based on the presence of inpatient or outpatient diagnosis codes for T2D and use of antihyperglycemic medications.

Statistical analysis

Descriptive statistics.

First, we examined whether patients within the linked cohorts were representative of the full cohort by comparing the distributions of baseline patient characteristics across cohorts. Then we compared the distributions of patient characteristics between treatment groups within each cohort to assess for potential confounding.

Adjusting for confounding only.

We used inverse probability of treatment weights (IPTW) to adjust for baseline confounding (31). We estimated stabilized treatment weights as the marginal probability of treatment divided by the probability of treatment conditional on measured baseline covariates, separately for the full cohort and each of the linked cohorts. Then we truncated weights at the 99th and 1st percentiles to prevent outliers from influencing the analysis (32). For each cohort, we applied 2 levels of baseline confounding adjustment: 1) adjusted for claims-based covariates only and 2) adjusted for claims-based and laboratory covariates. To estimate the association of antipsychotics and T2D, we estimated hazard ratios (HRs) using pooled logistic regression models weighted by IPTW. These analyses did not account for potential selection bias (see next section).

For analyses adjusted for laboratory covariates, we used multiple imputation by chained equations to handle missing laboratory data using the PROC MI procedure in SAS, version 9.4 (SAS Institute, Inc., Cary, North Carolina) (33, 34). We performed imputation on the continuous laboratory covariates and then dichotomized the respective variables to define high total cholesterol, high low-density lipoprotein cholesterol, and high triglyceride levels. The imputation models included the outcome and all previously defined claims-based and laboratory covariates. We assumed that the laboratory covariates were missing at random. We created 20 imputed data sets and specified a multivariate normal distribution for the imputation of continuous laboratory covariates. We fitted separate treatment weights and pooled logistic regression models (as described above) for each imputation (35, 36) and then used Rubin’s rules to pool HRs and 95% confidence intervals (CIs) across imputations (37).

Adjusting for selection bias and confounding.

To account for potential selection bias in restricting analyses to the linked cohorts, we applied inverse probability of selection weights (IPSW) (38, 39). Within each treatment group, we estimated stabilized selection weights as the marginal probability of being in the respective linked cohort divided by the probability of being in the respective linked cohort conditional on all previously described claims-based covariates. Then we truncated weights at the 99th and 1st percentiles. This weighting created a pseudopopulation in which the distributions of measured factors related to selection were expected to be balanced between the full cohort and the respective linked cohort. To evaluate the performance of these weights, we reexamined the distributions of characteristics across cohorts in the reweighted sample.

Then, within each newly defined pseudopopulation, we applied IPTW to account for baseline confounding. Specifically, we used logistic regression models weighted by IPSW to estimate stabilized treatment weights separately for each of the linked cohorts. We included the previously described baseline covariates in these weight models. By adjusting for confounding within the pseudopopulation created by IPSW, this approach would ideally create covariate balance across treatment groups (internal validity) within the target population of interest. For analyses adjusted for laboratory covariates, we filled in missing laboratory data using multiple imputation by chained equations (details above) before estimating IPTW.

To estimate treatment effects adjusted for confounding and selection bias, we fitted pooled logistic regression models weighted by IPTW and IPSW to generate HRs.

For all analyses, we computed 95% CIs for the HRs using the standard sandwich variance estimator (40) and further quantified precision using confidence limit ratios (41), the ratio of the upper limit to the lower limit of the 95% CI. We additionally estimated variance using a nonparametric bootstrapping method. We assumed that the targeted treatment effects were identifiable under the assumptions of conditional exchangeability, positivity, and causal consistency (42).

The SAS (SAS Institute, Inc.) code used to implement the main analyses is available in the Web Material.

RESULTS

Data linkage

The full cohort, identified from the claims database, consisted of 349,180 antipsychotic initiators and 2,000,308 initiators of a control medication (Figure 2). After linkage to the laboratory database, 10.7% of antipsychotic initiators and 8.4% of control patients remained in linked cohort 1. Requiring data for any laboratory test during the covariate assessment period (linked cohort 2) reduced the sample to 5.3% of antipsychotic initiators and 2.8% of control patients. Restriction to complete cases (linked cohort 3) substantially reduced the sample size (0.4% of antipsychotic initiators, 0.1% of control patients).

Flow diagram of cohort assembly, IBM MarketScan Data, United States, 2010–2019. A) New users aged 5–24 years of an antipsychotic medication. B) New users aged 5–24 years of a control medication (attention-deficit/hyperactivity disorder medications, antidepressants, mood stabilizers). HbA1c, hemoglobin A1c.
Figure 2

Flow diagram of cohort assembly, IBM MarketScan Data, United States, 2010–2019. A) New users aged 5–24 years of an antipsychotic medication. B) New users aged 5–24 years of a control medication (attention-deficit/hyperactivity disorder medications, antidepressants, mood stabilizers). HbA1c, hemoglobin A1c.

Patient characteristics

Compared with the full cohort, patients within the linked cohorts were slightly older (mean age of controls, 16.2 (standard deviation, 5.3) years in the full cohort versus 18.2 (standard deviation, 4.7) years in linked cohort 2; Table 1, Web Table 3). They were also more likely to have diagnoses of metabolic conditions and laboratory tests ordered, with the prevalence increasing as the definition for linked cohort became more restrictive. Notably, the prevalence of obesity or overweight diagnosis among control patients was 3.5% in the full cohort, 5.2% in linked cohort 1, 7.4% in linked cohort 2, and 26.0% in linked cohort 3. Similar trends were observed in the antipsychotic group. Other measured characteristics were generally similar across cohorts.

Table 1

Characteristics of Patients Who Initiated Antipsychotic Treatment or Control Treatment Before and After Accounting for Selection Bias, IBM MarketScan Data, United States, 2010–2019

Initiators of AntipsychoticsInitiators of Other Psychotropic Drugs
Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3
CharacteristicNo.%No.%No.%No.%No.%No.%No.%No.%
No Adjustment for Selection
Age at initiation, yearsa16.7 (4.9)17.2 (4.6)17.7 (4.5)17.6 (4.5)16.2 (5.3)17.4 (4.9)18.2 (4.7)18.8 (4.4)
Female sex155,07844.419,36551.97,24354.942547.5997,62149.9104,22161.936,82165.81,51558.6
Pediatric comorbidity indexa5.6 (4.9)6.2 (4.2)6.9 (4.2)6.3 (4.0)2.6 (2.7)3.0 (2.9)3.7 (3.1)4.1 (3.1)
Medical diagnoses
 Obesity or overweight16,5384.72,3106.21,0868.214215.970,2543.58,6785.24,1187.467326.0
 Weight management7,2352.11,2033.25904.5444.930,7841.54,1752.518333.31596.1
 Abnormal glucose/prediabetes2,4040.73591.01811.4414.66,2020.38690.55090.91535.9
 Hyperlipidemia6,3811.81,0582.86004.59310.420,5211.03,1581.919753.532912.7
 Bipolar disorder121,54834.813,55836.35,18739.333637.678,1733.97,3114.32,7514.91525.9
 Depression169,21248.519,56252.47,30055.343448.5596,37029.856,57333.621,39438.21,13543.9
 Psychotic disorders42,01612.04,78512.8183113.916017.916,2400.81,6571.06451.2401.5
Laboratory tests ordered
 Comprehensive metabolic panel125,12535.816,96945.58,62465.373882.6373,70018.749,46029.430,24154.12,18184.3
 Glucose test16,0984.62,4796.61,42810.812814.336,7741.85,2813.13,3616.031112.0
 HbA1c test18,0805.22,8207.6178913.681290.844,7082.26,8714.14,7758.52,39092.4
 Lipid test54,72415.78,23522.15,0163881190.7156,3087.822,53713.415,32227.42,37691.9
Weighted by the Inverse Probability of Selection
Age at initiation, yearsa16.7 (4.9)16.7 (4.3)17.1 (4.0)17.9 (4.4)16.2 (5.3)16.3 (5.3)16.8 (5.1)19.3 (5.5)
Female sex155,07844.413,74444.74,70249.126338.4997,62149.987,82950.729,68856.72,49162.3
Pediatric comorbidity indexa5.6 (4.9)5.7 (3.6)6.2 (3.3)7.3 (3.8)2.6 (2.7)2.6 (2.8)3.0 (2.7)3.9 (3.7)
Medical diagnoses
 Obesity or overweight16,5384.71,4864.85585.87210.670,2543.56,5273.82,4644.786721.7
 Weight management7,2352.16912.32812.9334.830,7841.53,0181.71,1912.31974.9
 Abnormal glucose/prediabetes2,4040.72220.7830.910.26,2020.35920.32590.5160.4
 Hyperlipidemia6,3811.85731.92202.3324.720,5211.019501.18341.61423.6
 Bipolar disorder121,54834.810,64834.73,52736.830244.178,1733.96,9144.02,3514.52596.5
 Depression169,21248.515,02948.94,9825237054.0596,37029.852,61130.417,34033.1189847.4
 Psychotic disorders42,01612.03,70312.11,25513.124035.016,2400.81,4470.85381.01423.5
Laboratory tests ordered
 Comprehensive metabolic panel125,12535.811,14936.33,92340.924035.1373,70018.734,13419.712,44523.866116.5
 Glucose test16,0984.61,4644.86196.59213.436,7741.83,5222.01,6583.23147.8
 HbA1c test18,0805.21,6145.36266.5233.444,7082.24,1712.417913.4541.3
 Lipid test54,72415.74,87915.9177418.5365.3156,3087.814,4238.35,83411.1972.4
Initiators of AntipsychoticsInitiators of Other Psychotropic Drugs
Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3
CharacteristicNo.%No.%No.%No.%No.%No.%No.%No.%
No Adjustment for Selection
Age at initiation, yearsa16.7 (4.9)17.2 (4.6)17.7 (4.5)17.6 (4.5)16.2 (5.3)17.4 (4.9)18.2 (4.7)18.8 (4.4)
Female sex155,07844.419,36551.97,24354.942547.5997,62149.9104,22161.936,82165.81,51558.6
Pediatric comorbidity indexa5.6 (4.9)6.2 (4.2)6.9 (4.2)6.3 (4.0)2.6 (2.7)3.0 (2.9)3.7 (3.1)4.1 (3.1)
Medical diagnoses
 Obesity or overweight16,5384.72,3106.21,0868.214215.970,2543.58,6785.24,1187.467326.0
 Weight management7,2352.11,2033.25904.5444.930,7841.54,1752.518333.31596.1
 Abnormal glucose/prediabetes2,4040.73591.01811.4414.66,2020.38690.55090.91535.9
 Hyperlipidemia6,3811.81,0582.86004.59310.420,5211.03,1581.919753.532912.7
 Bipolar disorder121,54834.813,55836.35,18739.333637.678,1733.97,3114.32,7514.91525.9
 Depression169,21248.519,56252.47,30055.343448.5596,37029.856,57333.621,39438.21,13543.9
 Psychotic disorders42,01612.04,78512.8183113.916017.916,2400.81,6571.06451.2401.5
Laboratory tests ordered
 Comprehensive metabolic panel125,12535.816,96945.58,62465.373882.6373,70018.749,46029.430,24154.12,18184.3
 Glucose test16,0984.62,4796.61,42810.812814.336,7741.85,2813.13,3616.031112.0
 HbA1c test18,0805.22,8207.6178913.681290.844,7082.26,8714.14,7758.52,39092.4
 Lipid test54,72415.78,23522.15,0163881190.7156,3087.822,53713.415,32227.42,37691.9
Weighted by the Inverse Probability of Selection
Age at initiation, yearsa16.7 (4.9)16.7 (4.3)17.1 (4.0)17.9 (4.4)16.2 (5.3)16.3 (5.3)16.8 (5.1)19.3 (5.5)
Female sex155,07844.413,74444.74,70249.126338.4997,62149.987,82950.729,68856.72,49162.3
Pediatric comorbidity indexa5.6 (4.9)5.7 (3.6)6.2 (3.3)7.3 (3.8)2.6 (2.7)2.6 (2.8)3.0 (2.7)3.9 (3.7)
Medical diagnoses
 Obesity or overweight16,5384.71,4864.85585.87210.670,2543.56,5273.82,4644.786721.7
 Weight management7,2352.16912.32812.9334.830,7841.53,0181.71,1912.31974.9
 Abnormal glucose/prediabetes2,4040.72220.7830.910.26,2020.35920.32590.5160.4
 Hyperlipidemia6,3811.85731.92202.3324.720,5211.019501.18341.61423.6
 Bipolar disorder121,54834.810,64834.73,52736.830244.178,1733.96,9144.02,3514.52596.5
 Depression169,21248.515,02948.94,9825237054.0596,37029.852,61130.417,34033.1189847.4
 Psychotic disorders42,01612.03,70312.11,25513.124035.016,2400.81,4470.85381.01423.5
Laboratory tests ordered
 Comprehensive metabolic panel125,12535.811,14936.33,92340.924035.1373,70018.734,13419.712,44523.866116.5
 Glucose test16,0984.61,4644.86196.59213.436,7741.83,5222.01,6583.23147.8
 HbA1c test18,0805.21,6145.36266.5233.444,7082.24,1712.417913.4541.3
 Lipid test54,72415.74,87915.9177418.5365.3156,3087.814,4238.35,83411.1972.4

Abbreviation: HbA1c, hemoglobin A1c.

a Values are expressed as mean (standard deviation).

Table 1

Characteristics of Patients Who Initiated Antipsychotic Treatment or Control Treatment Before and After Accounting for Selection Bias, IBM MarketScan Data, United States, 2010–2019

Initiators of AntipsychoticsInitiators of Other Psychotropic Drugs
Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3
CharacteristicNo.%No.%No.%No.%No.%No.%No.%No.%
No Adjustment for Selection
Age at initiation, yearsa16.7 (4.9)17.2 (4.6)17.7 (4.5)17.6 (4.5)16.2 (5.3)17.4 (4.9)18.2 (4.7)18.8 (4.4)
Female sex155,07844.419,36551.97,24354.942547.5997,62149.9104,22161.936,82165.81,51558.6
Pediatric comorbidity indexa5.6 (4.9)6.2 (4.2)6.9 (4.2)6.3 (4.0)2.6 (2.7)3.0 (2.9)3.7 (3.1)4.1 (3.1)
Medical diagnoses
 Obesity or overweight16,5384.72,3106.21,0868.214215.970,2543.58,6785.24,1187.467326.0
 Weight management7,2352.11,2033.25904.5444.930,7841.54,1752.518333.31596.1
 Abnormal glucose/prediabetes2,4040.73591.01811.4414.66,2020.38690.55090.91535.9
 Hyperlipidemia6,3811.81,0582.86004.59310.420,5211.03,1581.919753.532912.7
 Bipolar disorder121,54834.813,55836.35,18739.333637.678,1733.97,3114.32,7514.91525.9
 Depression169,21248.519,56252.47,30055.343448.5596,37029.856,57333.621,39438.21,13543.9
 Psychotic disorders42,01612.04,78512.8183113.916017.916,2400.81,6571.06451.2401.5
Laboratory tests ordered
 Comprehensive metabolic panel125,12535.816,96945.58,62465.373882.6373,70018.749,46029.430,24154.12,18184.3
 Glucose test16,0984.62,4796.61,42810.812814.336,7741.85,2813.13,3616.031112.0
 HbA1c test18,0805.22,8207.6178913.681290.844,7082.26,8714.14,7758.52,39092.4
 Lipid test54,72415.78,23522.15,0163881190.7156,3087.822,53713.415,32227.42,37691.9
Weighted by the Inverse Probability of Selection
Age at initiation, yearsa16.7 (4.9)16.7 (4.3)17.1 (4.0)17.9 (4.4)16.2 (5.3)16.3 (5.3)16.8 (5.1)19.3 (5.5)
Female sex155,07844.413,74444.74,70249.126338.4997,62149.987,82950.729,68856.72,49162.3
Pediatric comorbidity indexa5.6 (4.9)5.7 (3.6)6.2 (3.3)7.3 (3.8)2.6 (2.7)2.6 (2.8)3.0 (2.7)3.9 (3.7)
Medical diagnoses
 Obesity or overweight16,5384.71,4864.85585.87210.670,2543.56,5273.82,4644.786721.7
 Weight management7,2352.16912.32812.9334.830,7841.53,0181.71,1912.31974.9
 Abnormal glucose/prediabetes2,4040.72220.7830.910.26,2020.35920.32590.5160.4
 Hyperlipidemia6,3811.85731.92202.3324.720,5211.019501.18341.61423.6
 Bipolar disorder121,54834.810,64834.73,52736.830244.178,1733.96,9144.02,3514.52596.5
 Depression169,21248.515,02948.94,9825237054.0596,37029.852,61130.417,34033.1189847.4
 Psychotic disorders42,01612.03,70312.11,25513.124035.016,2400.81,4470.85381.01423.5
Laboratory tests ordered
 Comprehensive metabolic panel125,12535.811,14936.33,92340.924035.1373,70018.734,13419.712,44523.866116.5
 Glucose test16,0984.61,4644.86196.59213.436,7741.83,5222.01,6583.23147.8
 HbA1c test18,0805.21,6145.36266.5233.444,7082.24,1712.417913.4541.3
 Lipid test54,72415.74,87915.9177418.5365.3156,3087.814,4238.35,83411.1972.4
Initiators of AntipsychoticsInitiators of Other Psychotropic Drugs
Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3
CharacteristicNo.%No.%No.%No.%No.%No.%No.%No.%
No Adjustment for Selection
Age at initiation, yearsa16.7 (4.9)17.2 (4.6)17.7 (4.5)17.6 (4.5)16.2 (5.3)17.4 (4.9)18.2 (4.7)18.8 (4.4)
Female sex155,07844.419,36551.97,24354.942547.5997,62149.9104,22161.936,82165.81,51558.6
Pediatric comorbidity indexa5.6 (4.9)6.2 (4.2)6.9 (4.2)6.3 (4.0)2.6 (2.7)3.0 (2.9)3.7 (3.1)4.1 (3.1)
Medical diagnoses
 Obesity or overweight16,5384.72,3106.21,0868.214215.970,2543.58,6785.24,1187.467326.0
 Weight management7,2352.11,2033.25904.5444.930,7841.54,1752.518333.31596.1
 Abnormal glucose/prediabetes2,4040.73591.01811.4414.66,2020.38690.55090.91535.9
 Hyperlipidemia6,3811.81,0582.86004.59310.420,5211.03,1581.919753.532912.7
 Bipolar disorder121,54834.813,55836.35,18739.333637.678,1733.97,3114.32,7514.91525.9
 Depression169,21248.519,56252.47,30055.343448.5596,37029.856,57333.621,39438.21,13543.9
 Psychotic disorders42,01612.04,78512.8183113.916017.916,2400.81,6571.06451.2401.5
Laboratory tests ordered
 Comprehensive metabolic panel125,12535.816,96945.58,62465.373882.6373,70018.749,46029.430,24154.12,18184.3
 Glucose test16,0984.62,4796.61,42810.812814.336,7741.85,2813.13,3616.031112.0
 HbA1c test18,0805.22,8207.6178913.681290.844,7082.26,8714.14,7758.52,39092.4
 Lipid test54,72415.78,23522.15,0163881190.7156,3087.822,53713.415,32227.42,37691.9
Weighted by the Inverse Probability of Selection
Age at initiation, yearsa16.7 (4.9)16.7 (4.3)17.1 (4.0)17.9 (4.4)16.2 (5.3)16.3 (5.3)16.8 (5.1)19.3 (5.5)
Female sex155,07844.413,74444.74,70249.126338.4997,62149.987,82950.729,68856.72,49162.3
Pediatric comorbidity indexa5.6 (4.9)5.7 (3.6)6.2 (3.3)7.3 (3.8)2.6 (2.7)2.6 (2.8)3.0 (2.7)3.9 (3.7)
Medical diagnoses
 Obesity or overweight16,5384.71,4864.85585.87210.670,2543.56,5273.82,4644.786721.7
 Weight management7,2352.16912.32812.9334.830,7841.53,0181.71,1912.31974.9
 Abnormal glucose/prediabetes2,4040.72220.7830.910.26,2020.35920.32590.5160.4
 Hyperlipidemia6,3811.85731.92202.3324.720,5211.019501.18341.61423.6
 Bipolar disorder121,54834.810,64834.73,52736.830244.178,1733.96,9144.02,3514.52596.5
 Depression169,21248.515,02948.94,9825237054.0596,37029.852,61130.417,34033.1189847.4
 Psychotic disorders42,01612.03,70312.11,25513.124035.016,2400.81,4470.85381.01423.5
Laboratory tests ordered
 Comprehensive metabolic panel125,12535.811,14936.33,92340.924035.1373,70018.734,13419.712,44523.866116.5
 Glucose test16,0984.61,4644.86196.59213.436,7741.83,5222.01,6583.23147.8
 HbA1c test18,0805.21,6145.36266.5233.444,7082.24,1712.417913.4541.3
 Lipid test54,72415.74,87915.9177418.5365.3156,3087.814,4238.35,83411.1972.4

Abbreviation: HbA1c, hemoglobin A1c.

a Values are expressed as mean (standard deviation).

After applying IPSW, the distributions of baseline patient characteristics (including metabolic conditions) in linked cohort 1 and linked cohort 2 were similar to those of the full cohort (Table 1, Web Table 4). There were residual imbalances for linked cohort 3 compared with the full cohort. Within the full cohort and linked cohorts 1 and 2, characteristics were similar between treatment groups after weighting by IPTW and IPSW, with absolute standardized differences of less than 0.10 for nearly all measured covariates (Table 2, Web Figure 1) (43).

Table 2

Distribution of Patient Characteristics After Accounting for Potential Selection Bias and Baseline Confoundinga, IBM MarketScan Data, United States, 2010–2019

Initiators of AntipsychoticsInitiators of other Psychotropic Drugs
Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3
CharacteristicNo.%No.%No.%No.%No.%No.%No.%No.%
No. of patients288,40210025,0691007,80410014,7971002,020,663100173,43910052,68110016,137100
Demographic factors
 Age at initiation, yearsb16.7 (4.9)16.4 (4.0)16.8 (3.7)20.2 (16.3)16.3 (5.3)16.3 (5.1)16.9 (5.1)20.0 (9.9)
 Female sex155,07844.412,40049.54,36355.910,75172.7993,27649.281,20748.929,24055.57,46046.2
 Pediatric comorbidity indexb5.6 (4.0)3.7 (2.7)4.1 (2.6)5.3 (13.1)3.1 (3.3)3.2 (3.3)3.6 (3.2)9.5 (13.9)
Metabolic conditions
 Obesity or overweight11,4524.01,0034.03684.72,96720.074,6503.76,7383.92,5484.81,5719.7
 Weight management5,1091.84701.91862.49736.632,8911.63,1221.81,2152.38185.1
 Abnormal weight gain2,7080.92481.01001.31,66411.217,2590.91,5660.96461.22821.7
 Abnormal glucose or prediabetes1,2550.4990.4480.64,80232.57,4870.46670.42750.5740.5
 Metabolic syndrome3380.1390.2180.24252.920250.11860.1760.1450.3
 Hyperlipidemia3,6971.33471.41451.91,0347.023,1561.12,1011.28941.71,1407.1
 Hypothyroidism4,6361.64351.71902.43,69224.927,9901.42,5011.41,1312.1810.5
Lab tests ordered
 Comprehensive metabolic panel71,35824.76,33025.22,19828.29,80766.3436,49021.638,47222.213,89926.49,78060.6
 Glucose test7,4672.66822.73023.9170811.545,8882.34,1592.419143.6181911.3
 HbA1c test8,7173.07783.13184.114,36897.153,7282.74,8072.820343.912,38076.7
 Lipid test27,3619.52,50010.095512.29,98567.5179,6038.916,0429.26,37312.19,56159.3
Psychiatric conditions
 ADHD57,59820.04,82419.21,33917.21,56410.6472,54223.439,89423.011,16521.2181811.3
 Anxiety103,69836.09,03836.12,91337.310,37570.1676,17933.558,71233.919,35336.710,41464.5
 Autism13,1694.61,1874.73484.56124.170,7693.55,9563.41,6933.26864.3
 Bipolar disorder32,26311.22,87311.51,00812.91380.9185,6329.214,9938.65,26910.08,14550.5
 Depression112,22838.99,89139.53,35743.09,37763.4671,15133.257,97433.419,23936.59,74160.4
 Psychotic disorders9,6933.48823.53184.11721.254,7442.74,3262.517093.21,1537.1
Medications
 Lithium3,6211.33141.31121.480.117,6240.91,3440.84970.91,0766.7
 Anticonvulsant mood stabilizers28,0309.72,51310.084510.8182312.3140,1776.911,6516.74,0477.75,98637.1
 SSRIs153,76453.313,33953.24,38256.17,37349.8964,07247.784,30848.627,53952.37,87948.8
 Other antidepressants50,62617.64,43817.71,49219.1200213.5271,16313.422,95013.27,86014.94,83630.0
 ADHD medications133,13146.211,16644.53,11940.04,29929.01,019,95250.585,99349.623,64544.94,15125.7
Health-care utilization
 No. of outpatient visitsc5 (2–10)5 (2–10)6 (3–11)7 (6–12)4 (2–8)4 (2–10)5 (3–9)9 (8–13)
 No. of distinct generic drugsc3 (1–5)4 (2–5)4 (3–6)3 (2–5)4 (2–5)3 (2–5)3 (2–5)6 (2–9)
 Any hospitalization29,06610.12,58310.397212.52811.9152,0447.512,5467.24,8339.25,08531.5
Laboratory test resultsd
 HbA1c, %b5.28 (0.2)5.30 (0.2)5.34 (0.2)5.26 (0.9)5.27 (0.5)5.3 (0.4)5.33 (0.3)5.26 (0.8)
  Proportion missing99.898.696.40.099.898.496.30.0
 Total cholesterol, mg/dLb161.06 (26.2)159.07 (21.5)157.12 (21.4)156.75 (94.8)163.11 (42.2)160.28 (35.3)157.15 (32.3)153.87 (81.8)
  Proportion missing99.495.487.10.099.394.687.50.0
  Proportion high cholesterol13.112.012.26.614.513.212.411.0
 LDL cholesterol, mg/dLb90.04 (21.2)88.75 (17.1)87.69 (16.6)78.04 (89.8)90.85 (34.2)89.42 (27.7)87.67 (23.9)81.72 (70.7)
  Proportion missing99.495.788.00.099.395.088.80.0
  Proportion high LDL cholesterol8.57.97.63.29.89.38.94.7
 HDL cholesterol, mg/dLb51.13 (10.6)50.83 (8.6)50.62 (8.2)61.33 (70.1)51.82 (17.5)50.79 (14.1)49.76 (12.4)50.25 (32.9)
  Proportion missing99.495.687.60 0.099.394.988.50.0
 Triglycerides, mg/dLb101.76 (43.7)99.93 (34.9)98.87 (33.2)87.53 (218.0)103.31 (67.3)102.25 (53.4)101.43 (42.8)101.58 (154.2)
  Proportion missing99.495.788.10.099.395.089.00.0
  Proportion high triglycerides16.515.614.810.015.915.515.322.5
Initiators of AntipsychoticsInitiators of other Psychotropic Drugs
Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3
CharacteristicNo.%No.%No.%No.%No.%No.%No.%No.%
No. of patients288,40210025,0691007,80410014,7971002,020,663100173,43910052,68110016,137100
Demographic factors
 Age at initiation, yearsb16.7 (4.9)16.4 (4.0)16.8 (3.7)20.2 (16.3)16.3 (5.3)16.3 (5.1)16.9 (5.1)20.0 (9.9)
 Female sex155,07844.412,40049.54,36355.910,75172.7993,27649.281,20748.929,24055.57,46046.2
 Pediatric comorbidity indexb5.6 (4.0)3.7 (2.7)4.1 (2.6)5.3 (13.1)3.1 (3.3)3.2 (3.3)3.6 (3.2)9.5 (13.9)
Metabolic conditions
 Obesity or overweight11,4524.01,0034.03684.72,96720.074,6503.76,7383.92,5484.81,5719.7
 Weight management5,1091.84701.91862.49736.632,8911.63,1221.81,2152.38185.1
 Abnormal weight gain2,7080.92481.01001.31,66411.217,2590.91,5660.96461.22821.7
 Abnormal glucose or prediabetes1,2550.4990.4480.64,80232.57,4870.46670.42750.5740.5
 Metabolic syndrome3380.1390.2180.24252.920250.11860.1760.1450.3
 Hyperlipidemia3,6971.33471.41451.91,0347.023,1561.12,1011.28941.71,1407.1
 Hypothyroidism4,6361.64351.71902.43,69224.927,9901.42,5011.41,1312.1810.5
Lab tests ordered
 Comprehensive metabolic panel71,35824.76,33025.22,19828.29,80766.3436,49021.638,47222.213,89926.49,78060.6
 Glucose test7,4672.66822.73023.9170811.545,8882.34,1592.419143.6181911.3
 HbA1c test8,7173.07783.13184.114,36897.153,7282.74,8072.820343.912,38076.7
 Lipid test27,3619.52,50010.095512.29,98567.5179,6038.916,0429.26,37312.19,56159.3
Psychiatric conditions
 ADHD57,59820.04,82419.21,33917.21,56410.6472,54223.439,89423.011,16521.2181811.3
 Anxiety103,69836.09,03836.12,91337.310,37570.1676,17933.558,71233.919,35336.710,41464.5
 Autism13,1694.61,1874.73484.56124.170,7693.55,9563.41,6933.26864.3
 Bipolar disorder32,26311.22,87311.51,00812.91380.9185,6329.214,9938.65,26910.08,14550.5
 Depression112,22838.99,89139.53,35743.09,37763.4671,15133.257,97433.419,23936.59,74160.4
 Psychotic disorders9,6933.48823.53184.11721.254,7442.74,3262.517093.21,1537.1
Medications
 Lithium3,6211.33141.31121.480.117,6240.91,3440.84970.91,0766.7
 Anticonvulsant mood stabilizers28,0309.72,51310.084510.8182312.3140,1776.911,6516.74,0477.75,98637.1
 SSRIs153,76453.313,33953.24,38256.17,37349.8964,07247.784,30848.627,53952.37,87948.8
 Other antidepressants50,62617.64,43817.71,49219.1200213.5271,16313.422,95013.27,86014.94,83630.0
 ADHD medications133,13146.211,16644.53,11940.04,29929.01,019,95250.585,99349.623,64544.94,15125.7
Health-care utilization
 No. of outpatient visitsc5 (2–10)5 (2–10)6 (3–11)7 (6–12)4 (2–8)4 (2–10)5 (3–9)9 (8–13)
 No. of distinct generic drugsc3 (1–5)4 (2–5)4 (3–6)3 (2–5)4 (2–5)3 (2–5)3 (2–5)6 (2–9)
 Any hospitalization29,06610.12,58310.397212.52811.9152,0447.512,5467.24,8339.25,08531.5
Laboratory test resultsd
 HbA1c, %b5.28 (0.2)5.30 (0.2)5.34 (0.2)5.26 (0.9)5.27 (0.5)5.3 (0.4)5.33 (0.3)5.26 (0.8)
  Proportion missing99.898.696.40.099.898.496.30.0
 Total cholesterol, mg/dLb161.06 (26.2)159.07 (21.5)157.12 (21.4)156.75 (94.8)163.11 (42.2)160.28 (35.3)157.15 (32.3)153.87 (81.8)
  Proportion missing99.495.487.10.099.394.687.50.0
  Proportion high cholesterol13.112.012.26.614.513.212.411.0
 LDL cholesterol, mg/dLb90.04 (21.2)88.75 (17.1)87.69 (16.6)78.04 (89.8)90.85 (34.2)89.42 (27.7)87.67 (23.9)81.72 (70.7)
  Proportion missing99.495.788.00.099.395.088.80.0
  Proportion high LDL cholesterol8.57.97.63.29.89.38.94.7
 HDL cholesterol, mg/dLb51.13 (10.6)50.83 (8.6)50.62 (8.2)61.33 (70.1)51.82 (17.5)50.79 (14.1)49.76 (12.4)50.25 (32.9)
  Proportion missing99.495.687.60 0.099.394.988.50.0
 Triglycerides, mg/dLb101.76 (43.7)99.93 (34.9)98.87 (33.2)87.53 (218.0)103.31 (67.3)102.25 (53.4)101.43 (42.8)101.58 (154.2)
  Proportion missing99.495.788.10.099.395.089.00.0
  Proportion high triglycerides16.515.614.810.015.915.515.322.5

Abbreviations: ADHD, attention-deficit/hyperactivity disorder; HbA1c, hemoglobin A1c; HDL, high-density lipoprotein; LDL, low-density lipoprotein; SSRI, selective serotonin reuptake inhibitor.

a The mean of the inverse probability of treatment weights after truncation were as follows: 0.98 (standard deviation, 0.63) for the full cohort, 0.98 (standard deviation, 0.73) for linked cohort 1, 0.98 (standard deviation, 0.77) for linked cohort 2, and 1.00 (standard deviation, 1.02) for linked cohort 3.

b Values are expressed as mean (standard deviation).

c Values are expressed as median (interquartile range).

d Distribution of lab test results summarized among available values (before multiple imputation).

Table 2

Distribution of Patient Characteristics After Accounting for Potential Selection Bias and Baseline Confoundinga, IBM MarketScan Data, United States, 2010–2019

Initiators of AntipsychoticsInitiators of other Psychotropic Drugs
Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3
CharacteristicNo.%No.%No.%No.%No.%No.%No.%No.%
No. of patients288,40210025,0691007,80410014,7971002,020,663100173,43910052,68110016,137100
Demographic factors
 Age at initiation, yearsb16.7 (4.9)16.4 (4.0)16.8 (3.7)20.2 (16.3)16.3 (5.3)16.3 (5.1)16.9 (5.1)20.0 (9.9)
 Female sex155,07844.412,40049.54,36355.910,75172.7993,27649.281,20748.929,24055.57,46046.2
 Pediatric comorbidity indexb5.6 (4.0)3.7 (2.7)4.1 (2.6)5.3 (13.1)3.1 (3.3)3.2 (3.3)3.6 (3.2)9.5 (13.9)
Metabolic conditions
 Obesity or overweight11,4524.01,0034.03684.72,96720.074,6503.76,7383.92,5484.81,5719.7
 Weight management5,1091.84701.91862.49736.632,8911.63,1221.81,2152.38185.1
 Abnormal weight gain2,7080.92481.01001.31,66411.217,2590.91,5660.96461.22821.7
 Abnormal glucose or prediabetes1,2550.4990.4480.64,80232.57,4870.46670.42750.5740.5
 Metabolic syndrome3380.1390.2180.24252.920250.11860.1760.1450.3
 Hyperlipidemia3,6971.33471.41451.91,0347.023,1561.12,1011.28941.71,1407.1
 Hypothyroidism4,6361.64351.71902.43,69224.927,9901.42,5011.41,1312.1810.5
Lab tests ordered
 Comprehensive metabolic panel71,35824.76,33025.22,19828.29,80766.3436,49021.638,47222.213,89926.49,78060.6
 Glucose test7,4672.66822.73023.9170811.545,8882.34,1592.419143.6181911.3
 HbA1c test8,7173.07783.13184.114,36897.153,7282.74,8072.820343.912,38076.7
 Lipid test27,3619.52,50010.095512.29,98567.5179,6038.916,0429.26,37312.19,56159.3
Psychiatric conditions
 ADHD57,59820.04,82419.21,33917.21,56410.6472,54223.439,89423.011,16521.2181811.3
 Anxiety103,69836.09,03836.12,91337.310,37570.1676,17933.558,71233.919,35336.710,41464.5
 Autism13,1694.61,1874.73484.56124.170,7693.55,9563.41,6933.26864.3
 Bipolar disorder32,26311.22,87311.51,00812.91380.9185,6329.214,9938.65,26910.08,14550.5
 Depression112,22838.99,89139.53,35743.09,37763.4671,15133.257,97433.419,23936.59,74160.4
 Psychotic disorders9,6933.48823.53184.11721.254,7442.74,3262.517093.21,1537.1
Medications
 Lithium3,6211.33141.31121.480.117,6240.91,3440.84970.91,0766.7
 Anticonvulsant mood stabilizers28,0309.72,51310.084510.8182312.3140,1776.911,6516.74,0477.75,98637.1
 SSRIs153,76453.313,33953.24,38256.17,37349.8964,07247.784,30848.627,53952.37,87948.8
 Other antidepressants50,62617.64,43817.71,49219.1200213.5271,16313.422,95013.27,86014.94,83630.0
 ADHD medications133,13146.211,16644.53,11940.04,29929.01,019,95250.585,99349.623,64544.94,15125.7
Health-care utilization
 No. of outpatient visitsc5 (2–10)5 (2–10)6 (3–11)7 (6–12)4 (2–8)4 (2–10)5 (3–9)9 (8–13)
 No. of distinct generic drugsc3 (1–5)4 (2–5)4 (3–6)3 (2–5)4 (2–5)3 (2–5)3 (2–5)6 (2–9)
 Any hospitalization29,06610.12,58310.397212.52811.9152,0447.512,5467.24,8339.25,08531.5
Laboratory test resultsd
 HbA1c, %b5.28 (0.2)5.30 (0.2)5.34 (0.2)5.26 (0.9)5.27 (0.5)5.3 (0.4)5.33 (0.3)5.26 (0.8)
  Proportion missing99.898.696.40.099.898.496.30.0
 Total cholesterol, mg/dLb161.06 (26.2)159.07 (21.5)157.12 (21.4)156.75 (94.8)163.11 (42.2)160.28 (35.3)157.15 (32.3)153.87 (81.8)
  Proportion missing99.495.487.10.099.394.687.50.0
  Proportion high cholesterol13.112.012.26.614.513.212.411.0
 LDL cholesterol, mg/dLb90.04 (21.2)88.75 (17.1)87.69 (16.6)78.04 (89.8)90.85 (34.2)89.42 (27.7)87.67 (23.9)81.72 (70.7)
  Proportion missing99.495.788.00.099.395.088.80.0
  Proportion high LDL cholesterol8.57.97.63.29.89.38.94.7
 HDL cholesterol, mg/dLb51.13 (10.6)50.83 (8.6)50.62 (8.2)61.33 (70.1)51.82 (17.5)50.79 (14.1)49.76 (12.4)50.25 (32.9)
  Proportion missing99.495.687.60 0.099.394.988.50.0
 Triglycerides, mg/dLb101.76 (43.7)99.93 (34.9)98.87 (33.2)87.53 (218.0)103.31 (67.3)102.25 (53.4)101.43 (42.8)101.58 (154.2)
  Proportion missing99.495.788.10.099.395.089.00.0
  Proportion high triglycerides16.515.614.810.015.915.515.322.5
Initiators of AntipsychoticsInitiators of other Psychotropic Drugs
Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3Full  
Cohort
Linked Cohort 1Linked Cohort 2Linked Cohort 3
CharacteristicNo.%No.%No.%No.%No.%No.%No.%No.%
No. of patients288,40210025,0691007,80410014,7971002,020,663100173,43910052,68110016,137100
Demographic factors
 Age at initiation, yearsb16.7 (4.9)16.4 (4.0)16.8 (3.7)20.2 (16.3)16.3 (5.3)16.3 (5.1)16.9 (5.1)20.0 (9.9)
 Female sex155,07844.412,40049.54,36355.910,75172.7993,27649.281,20748.929,24055.57,46046.2
 Pediatric comorbidity indexb5.6 (4.0)3.7 (2.7)4.1 (2.6)5.3 (13.1)3.1 (3.3)3.2 (3.3)3.6 (3.2)9.5 (13.9)
Metabolic conditions
 Obesity or overweight11,4524.01,0034.03684.72,96720.074,6503.76,7383.92,5484.81,5719.7
 Weight management5,1091.84701.91862.49736.632,8911.63,1221.81,2152.38185.1
 Abnormal weight gain2,7080.92481.01001.31,66411.217,2590.91,5660.96461.22821.7
 Abnormal glucose or prediabetes1,2550.4990.4480.64,80232.57,4870.46670.42750.5740.5
 Metabolic syndrome3380.1390.2180.24252.920250.11860.1760.1450.3
 Hyperlipidemia3,6971.33471.41451.91,0347.023,1561.12,1011.28941.71,1407.1
 Hypothyroidism4,6361.64351.71902.43,69224.927,9901.42,5011.41,1312.1810.5
Lab tests ordered
 Comprehensive metabolic panel71,35824.76,33025.22,19828.29,80766.3436,49021.638,47222.213,89926.49,78060.6
 Glucose test7,4672.66822.73023.9170811.545,8882.34,1592.419143.6181911.3
 HbA1c test8,7173.07783.13184.114,36897.153,7282.74,8072.820343.912,38076.7
 Lipid test27,3619.52,50010.095512.29,98567.5179,6038.916,0429.26,37312.19,56159.3
Psychiatric conditions
 ADHD57,59820.04,82419.21,33917.21,56410.6472,54223.439,89423.011,16521.2181811.3
 Anxiety103,69836.09,03836.12,91337.310,37570.1676,17933.558,71233.919,35336.710,41464.5
 Autism13,1694.61,1874.73484.56124.170,7693.55,9563.41,6933.26864.3
 Bipolar disorder32,26311.22,87311.51,00812.91380.9185,6329.214,9938.65,26910.08,14550.5
 Depression112,22838.99,89139.53,35743.09,37763.4671,15133.257,97433.419,23936.59,74160.4
 Psychotic disorders9,6933.48823.53184.11721.254,7442.74,3262.517093.21,1537.1
Medications
 Lithium3,6211.33141.31121.480.117,6240.91,3440.84970.91,0766.7
 Anticonvulsant mood stabilizers28,0309.72,51310.084510.8182312.3140,1776.911,6516.74,0477.75,98637.1
 SSRIs153,76453.313,33953.24,38256.17,37349.8964,07247.784,30848.627,53952.37,87948.8
 Other antidepressants50,62617.64,43817.71,49219.1200213.5271,16313.422,95013.27,86014.94,83630.0
 ADHD medications133,13146.211,16644.53,11940.04,29929.01,019,95250.585,99349.623,64544.94,15125.7
Health-care utilization
 No. of outpatient visitsc5 (2–10)5 (2–10)6 (3–11)7 (6–12)4 (2–8)4 (2–10)5 (3–9)9 (8–13)
 No. of distinct generic drugsc3 (1–5)4 (2–5)4 (3–6)3 (2–5)4 (2–5)3 (2–5)3 (2–5)6 (2–9)
 Any hospitalization29,06610.12,58310.397212.52811.9152,0447.512,5467.24,8339.25,08531.5
Laboratory test resultsd
 HbA1c, %b5.28 (0.2)5.30 (0.2)5.34 (0.2)5.26 (0.9)5.27 (0.5)5.3 (0.4)5.33 (0.3)5.26 (0.8)
  Proportion missing99.898.696.40.099.898.496.30.0
 Total cholesterol, mg/dLb161.06 (26.2)159.07 (21.5)157.12 (21.4)156.75 (94.8)163.11 (42.2)160.28 (35.3)157.15 (32.3)153.87 (81.8)
  Proportion missing99.495.487.10.099.394.687.50.0
  Proportion high cholesterol13.112.012.26.614.513.212.411.0
 LDL cholesterol, mg/dLb90.04 (21.2)88.75 (17.1)87.69 (16.6)78.04 (89.8)90.85 (34.2)89.42 (27.7)87.67 (23.9)81.72 (70.7)
  Proportion missing99.495.788.00.099.395.088.80.0
  Proportion high LDL cholesterol8.57.97.63.29.89.38.94.7
 HDL cholesterol, mg/dLb51.13 (10.6)50.83 (8.6)50.62 (8.2)61.33 (70.1)51.82 (17.5)50.79 (14.1)49.76 (12.4)50.25 (32.9)
  Proportion missing99.495.687.60 0.099.394.988.50.0
 Triglycerides, mg/dLb101.76 (43.7)99.93 (34.9)98.87 (33.2)87.53 (218.0)103.31 (67.3)102.25 (53.4)101.43 (42.8)101.58 (154.2)
  Proportion missing99.495.788.10.099.395.089.00.0
  Proportion high triglycerides16.515.614.810.015.915.515.322.5

Abbreviations: ADHD, attention-deficit/hyperactivity disorder; HbA1c, hemoglobin A1c; HDL, high-density lipoprotein; LDL, low-density lipoprotein; SSRI, selective serotonin reuptake inhibitor.

a The mean of the inverse probability of treatment weights after truncation were as follows: 0.98 (standard deviation, 0.63) for the full cohort, 0.98 (standard deviation, 0.73) for linked cohort 1, 0.98 (standard deviation, 0.77) for linked cohort 2, and 1.00 (standard deviation, 1.02) for linked cohort 3.

b Values are expressed as mean (standard deviation).

c Values are expressed as median (interquartile range).

d Distribution of lab test results summarized among available values (before multiple imputation).

Treatment effects

In the full cohort, the unadjusted association suggested a 3-fold increased hazard of T2D among antipsychotic initiators compared with control patients (HR = 3.06, 95% CI: 2.87, 3.25; Figure 3A, Web Table 5). The magnitude of association attenuated after controlling for baseline confounders in the claims (HR = 2.26, 95% CI: 2.07, 2.49; Figure 3B) and baseline confounders in both the claims and the imputed laboratory data (HR = 2.25, 95% CI: 2.05, 2.47; Figure 3C). The distributions of laboratory test results were generally similar before and after imputation (Web Table 6).

Comparison of treatment effect estimates before and after accounting for selection bias and confounding, IBM MarketScan Data, United States, 2010–2019. A) Estimates with no confounding adjustment. B) Confounding adjustment by claims covariates. C) Confounding adjustment by claims and laboratory covariates. The full cohort, identified in the primary data set, was the target population of interest. The linked cohorts gradually became more restrictive: Linked cohort 1 consisted of linked patients with data for any laboratory test (not limited to the 3 tests of interest) during the study period, linked cohort 2 consisted of linked patients with data for any laboratory test (not limited to the 3 tests of interest) during the covariate assessment period, and linked cohort 3 consisted of linked patients with a recorded result for each of the 3 lab tests of interest during the covariate assessment period (complete confounder data). Estimates reported for the full cohort were repeated after adjustment for selection bias for the ease of comparison. The 95% confidence interval (CIs) are based on standard sandwich variance estimators; see Web Table 8 for 95% CIs based on variance estimated using a nonparametric bootstrap method. HR, hazard ratio.
Figure 3

Comparison of treatment effect estimates before and after accounting for selection bias and confounding, IBM MarketScan Data, United States, 2010–2019. A) Estimates with no confounding adjustment. B) Confounding adjustment by claims covariates. C) Confounding adjustment by claims and laboratory covariates. The full cohort, identified in the primary data set, was the target population of interest. The linked cohorts gradually became more restrictive: Linked cohort 1 consisted of linked patients with data for any laboratory test (not limited to the 3 tests of interest) during the study period, linked cohort 2 consisted of linked patients with data for any laboratory test (not limited to the 3 tests of interest) during the covariate assessment period, and linked cohort 3 consisted of linked patients with a recorded result for each of the 3 lab tests of interest during the covariate assessment period (complete confounder data). Estimates reported for the full cohort were repeated after adjustment for selection bias for the ease of comparison. The 95% confidence interval (CIs) are based on standard sandwich variance estimators; see Web Table 8 for 95% CIs based on variance estimated using a nonparametric bootstrap method. HR, hazard ratio.

With no adjustment for potential selection bias, effect estimates in the linked cohorts suggested a different magnitude of association from the full cohort. The HR adjusted for claims covariates was 1.79 (95% CI: 1.41, 2.27) in linked cohort 1, 1.63 (95% CI: 0.99, 2.68) in linked cohort 2, and 0.87 (95% CI: 0.29, 2.56) in linked cohort 3 (Figure 3B). Estimates were similar after controlling for both claims and laboratory covariates (Figure 3C).

After accounting for selection bias, claims-only confounding–adjusted HRs in linked cohort 1 (HR = 2.11, 95% CI: 1.62, 2.74) and linked cohort 2 (HR = 2.12, 95% CI: 0.88, 5.13) were nearly identical to the point estimate observed in the full cohort (HR = 2.26, 95% CI: 2.07, 2.49; Figure 3B). In linked cohort 3, the claims-only confounding-adjusted estimate (HR = 8.15, 95% CI: 1.24, 53.56) remained different from the full cohort estimate, but accounting for potential selection bias corrected the direction of association (Figure 3B). After adjustment for both claims and laboratory covariates, effect estimates remained similar in linked cohort 1 and linked cohort 2, whereas the variance increased substantially in linked cohort 3 (HR = 13.90, 95% CI: 1.61, 120.09; Figure 3C). As expected, treatment effects estimated within the linked cohorts were less efficient compared with the full cohort, and weighting by IPSW resulted in even wider 95% CIs. The extent to which applying IPSW increased variance differed for each cohort (Web Table 7). For example, for claims-only confounding–adjusted estimates, IPSW had limited impact on precision (confidence limit ratio of 1.61 before IPSW vs. 1.69 after IPSW) in linked cohort 1, whereas a more substantial increase in variance was observed in linked cohort 2 (confidence limit ratio 2.71 vs. 5.83) and linked cohort 3 (confidence limit ratio 8.83 vs. 43.19). The standard errors obtained from bootstrapping were generally similar to the estimates obtained from the standard sandwich variance estimator (Web Table 8), although in the weighted settings, the estimates from bootstrapping were slightly smaller.

DISCUSSION

We highlighted the importance of accounting for potential selection bias in linked database studies using the example of antipsychotics and the risk of T2D in youths within a claims and laboratory linked database. While this linked database offered more potential for confounding control compared with the claims database alone, patients within the linked cohorts were not representative of the full claims-based cohort, and failure to account for potential selection bias resulted in incorrect point estimates. In our application example, the laboratory values within the supplemental data set were not strong confounders, and restriction to a linked cohort introduced a substantial amount of selection bias. Applying IPSW resulted in effect estimates that were comparable to the full, original study cohort (the target population), demonstrating that a valid solution exists for addressing the often-neglected issue of selection bias in linked database studies.

Studies conducted within linked databases should carefully consider who is being analyzed and what the target population is. We assumed that the target population was the full cohort identified in the primary data set, and therefore, estimates obtained within linked subsets of the full cohort could potentially be biased if selection bias was not appropriately accounted for. IPSW provided a valid approach to extending our inferences from the linked population to the full cohort. However, some linked database studies may consider the linked subset as the target population (as opposed to the full cohort). In such circumstances, effect estimates from the linked subset target the proper estimand associated with the population of interest and are not necessarily prone to selection bias (44). In other words, the estimates would be unbiased for the linked subset, but they may not be generalizable to the full cohort. Finally, some studies may consider the target population as an external population that is distinct from the study sample (i.e., partially or completely nonoverlapping with the full cohort or the linked subset). In these situations, findings from the study sample can potentially be extended to an external population using approaches for transportability (such as inverse odds weighting) that are described elsewhere (4548).

Our study highlights the importance of explicitly specifying the target population (e.g., full cohort, linked subset, an external population) and identifying an appropriate approach to generate unbiased effect estimates for the target population of interest (with the usual assumptions of conditional exchangeability, positivity, and causal consistency). We found that applying IPSW created a pseudopopulation comparable to the full cohort for 2 of the 3 linked cohorts. However, residual selection bias was present in linked cohort 3 (complete cases), which was not unexpected given the small sample size (reflecting <0.5% of the full cohort) that was substantially unrepresentative of the full cohort. A complete-case analysis is also known to be biased in most circumstances (49, 50). While we highlighted several definitions of a linked cohort, we anticipate that linked cohort 2 (any linked data during covariate assessment period) will be the most widely used in pharmacoepidemiologic studies.

To further investigate the residual biases in linked cohort 3, we explored different ways to truncate inverse probability weights and a different specification of the weight models (Web Table 9, Web Figure 2). In these exploratory analyses, we found that possible positivity violations likely generated extreme weights, which resulted in residual imbalances across treatment groups and unstable effect estimates (adjusted HR ranging from 0.63 to 11.20). By requiring a recorded test result for hemoglobin A1c, cholesterol, and triglycerides, we likely included a much greater proportion of patients who were at a higher metabolic risk in the complete-case cohort and were not able to adequately account for the selection bias due to insufficient information on patients with a lower metabolic risk (who were more common in the full cohort).

There are several potential limitations to this analysis. First, we applied the approaches to only one empirical example. In our study, the potential confounders from the supplemental database turned out not to be strong confounders, but the data linkage introduced a substantial amount of selection bias. The extent to which selection bias or confounding may be present in a linked database study may differ in other applications. Nevertheless, the outlined approaches can be applied to other linked database studies in general. Second, we used multiple imputation to handle missing laboratory data but there are other approaches, such as inverse probability weighting, that could be considered (51, 52). Third, we truncated inverse probability weights to minimize the influence of outliers, but compared with no weight truncation, this approach increased precision at the expense of potentially increasing the imbalances between treatment groups (32). Since the degree of truncation was small, it is unlikely to substantially influence our findings. Finally, as expected, we observed a bias-variance tradeoff in effect estimates weighted by IPSW. Weighted estimates can increase variances (32), and we observed that variances got progressively larger as the linked cohorts differed more from the full cohort. However, accurate point estimates are generally prioritized in nonrandomized studies to minimize bias and achieve internal validity.

CONCLUSIONS

Studies conducted within linked databases, often with the goal of improved confounding control, may be restricted to patients who are not representative of the target population of interest. Analyses conducted within linked cohorts may generate biased effect estimates for the target population of interest, but this selection bias can be reduced through inverse probability of selection weights.

ACKNOWLEDGMENTS

Author affiliations: Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, United States (Jenny W. Sun, Rui Wang, Dongdong Li, Sengwee Toh); and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States (Rui Wang).

This study was funded by Harvard Medical School and Harvard Pilgrim Health Care Institute through the Thomas O. Pyle Fellowship Fund and the Agency for Healthcare Research and Quality (grant R01HS026214).

This study was based on data from the IBM MarketScan Commercial Database obtained and used under license for the present study. Restrictions apply to the availability of these data, so they are not publicly available. The data underlying the results presented in the study are available for purchase by contacting the database owners.

We thank Jenny Hochstadt for her assistance in accessing the MarketScan data.

This work was presented as a podium presentation at the 37th International Conference on Pharmacoepidemiology (online), August 23–25, 2021.

J.W.S. is currently employed by Pfizer Inc. for unrelated work. All aspects of this work included in the initial submission, including the study design, data analysis, and manuscript draft, were completed prior to employment at Pfizer Inc. The other authors report no conflicts.

REFERENCES

1.

Schneeweiss
 
S
,
Avorn
 
J
.
A review of uses of health care utilization databases for epidemiologic research on therapeutics
.
J Clin Epidemiol.
 
2005
;
58
(
4
):
323
337
.

2.

Bradley
 
CJ
,
Penberthy
 
L
,
Devers
 
KJ
, et al.  
Health services research and data linkages: issues, methods, and directions for the future
.
Health Serv Res.
 
2010
;
45
(
5
):
1468
1488
.

3.

Trifirò
 
G
,
Sultana
 
J
,
Bate
 
A
.
From big data to smart data for pharmacovigilance: the role of healthcare databases and other emerging sources
.
Drug Saf.
 
2018
;
41
(
2
):
143
149
.

4.

Mears
 
GD
,
Rosamond
 
WD
,
Lohmeier
 
C
, et al.  
A link to improve stroke patient care: a successful linkage between a statewide emergency medical services data system and a stroke registry
.
Acad Emerg Med.
 
2010
;
17
(
12
):
1398
1404
.

5.

García Álvarez
 
L
,
Aylin
 
P
,
Tian
 
J
, et al.  
Data linkage between existing healthcare databases to support hospital epidemiology
.
J Hosp Infect.
 
2011
;
79
(
3
):
231
235
.

6.

van
 
Herk-Sukel
 
MPP
,
Lemmens
 
VEPP
,
van de
 
Poll-Franse
 
LV
, et al.  
Record linkage for pharmacoepidemiological studies in cancer patients
.
Pharmacoepidemiol Drug Saf.
 
2012
;
21
(
1
):
94
103
.

7.

Harron
 
K
,
Goldstein
 
H
,
Wade
 
A
, et al.  
Linkage, evaluation and analysis of national electronic healthcare data: application to providing enhanced blood-stream infection surveillance in paediatric intensive care
.
PLoS One.
 
2013
;
8
(
12
):e85278.

8.

Setoguchi
 
S
,
Zhu
 
Y
,
Jalbert
 
JJ
, et al.  
Validity of deterministic record linkage using multiple indirect personal identifiers: linking a large registry to claims data
.
Circ Cardiovasc Qual Outcomes.
 
2014
;
7
(
3
):
475
480
.

9.

Patorno
 
E
,
Gopalakrishnan
 
C
,
Franklin
 
JM
, et al.  
Claims-based studies of oral glucose-lowering medications can achieve balance in critical clinical variables only observed in electronic health records
.
Diabetes Obes Metab.
 
2018
;
20
(
4
):
974
984
.

10.

Huybrechts
 
KF
,
Gopalakrishnan
 
C
,
Franklin
 
JM
, et al.  
Claims data studies of direct oral anticoagulants can achieve balance in important clinical parameters only observable in electronic health records
.
Clin Pharmacol Ther.
 
2019
;
105
(
4
):
979
993
.

11.

Schmidt
 
M
,
Schmidt
 
SAJ
,
Adelborg
 
K
, et al.  
The Danish health care system and epidemiological research: from health care contacts to database records
.
Clin Epidemiol.
 
2019
;
11
:
563
591
.

12.

Pratt
 
NL
,
Mack
 
CD
,
Meyer
 
AM
, et al.  
Data linkage in pharmacoepidemiology: a call for rigorous evaluation and reporting
.
Pharmacoepidemiol Drug Saf.
 
2020
;
29
(
1
):
9
17
.

13.

Rivera
 
DR
,
Gokhale
 
MN
,
Reynolds
 
MW
, et al.  
Linking electronic health data in pharmacoepidemiology: appropriateness and feasibility
.
Pharmacoepidemiol Drug Saf.
 
2020
;
29
(
1
):
18
29
.

14.

Lin
 
KJ
,
Schneeweiss
 
S
.
Considerations for the analysis of longitudinal electronic health records linked to claims data to study the effectiveness and safety of drugs
.
Clin Pharmacol Ther.
 
2016
;
100
(
2
):
147
159
.

15.

Dusetzina
 
SB
,
Tyree
 
S
,
Meyer
 
A-M
, et al.  
Linking Data for Health Services Research: A Framework and Instructional Guide
,
Rockville, MD
:
Agency for Healthcare Research and Quality (US)
;
2014
. https://www.ncbi.nlm.nih.gov/books/NBK253313/.
Accessed November 19, 2020
.

16.

Mansfield
 
KE
,
Nitsch
 
D
,
Smeeth
 
L
, et al.  
Prescription of renin–angiotensin system blockers and risk of acute kidney injury: a population-based cohort study
.
BMJ Open.
 
2016
;
6
(
12
):e012690.

17.

Bouras
 
G
,
Markar
 
SR
,
Burns
 
EM
, et al.  
The psychological impact of symptoms related to esophagogastric cancer resection presenting in primary care: a national linked database study
.
Eur J Surg Oncol
.
2017
;
43
(
2
):
454
460
.

18.

Solomon
 
DH
,
Liu
 
C-C
,
Kuo
 
I-H
, et al.  
Effects of colchicine on risk of cardiovascular events and mortality among patients with gout: a cohort study using electronic medical records linked with Medicare claims
.
Ann Rheum Dis.
 
2016
;
75
(
9
):
1674
1679
.

19.

Lee
 
MP
,
Glynn
 
RJ
,
Schneeweiss
 
S
, et al.  
Risk factors for heart failure with preserved or reduced ejection fraction among Medicare beneficiaries: application of competing risks analysis and gradient boosted model
.
Clin Epidemiol.
 
2020
;
12
:
607
616
.

20.

Berger
 
A
,
Simpson
 
A
,
Leeper
 
NJ
, et al.  
Real-world predictors of major adverse cardiovascular events and major adverse limb events among patients with chronic coronary artery disease and/or peripheral arterial disease
.
Adv Ther.
 
2020
;
37
(
1
):
240
252
.

21.

Bohensky
 
M
. Bias in data linkage studies. In:
Harron
 
K
,
Golstein
 
H
,
Dibben
 
C
, eds.
Methodological Developments in Data Linkage
.
London, UK
:
John Wiley & Sons, Ltd
;
2015
:
63
82
.

22.

Galling
 
B
,
Roldán
 
A
,
Nielsen
 
RE
, et al.  
Type 2 diabetes mellitus in youth exposed to antipsychotics: a systematic review and meta-analysis
.
JAMA Psychiat.
 
2016
;
73
(
3
):
247
259
.

23.

Bobo
 
WV
,
Cooper
 
WO
,
Stein
 
CM
, et al.  
Antipsychotics and the risk of type 2 diabetes mellitus in children and youth
.
JAMA Psychiat.
 
2013
;
70
(
10
):
1067
.

24.

De Hert
 
M
,
Detraux
 
J
,
van
 
Winkel
 
R
, et al.  
Metabolic and cardiovascular adverse effects associated with antipsychotic drugs
.
Nat Rev Endocrinol.
 
2012
;
8
(
2
):
114
126
.

25.

De Hert
 
M
,
Dobbelaere
 
M
,
Sheridan
 
EM
, et al.  
Metabolic and endocrine adverse effects of second-generation antipsychotics in children and adolescents: a systematic review of randomized, placebo controlled trials and guidelines for clinical practice
.
Eur Psychiatry
.
2011
;
26
(
3
):
144
158
.

26.

American Diabetes Association
.
Consensus development conference on antipsychotic drugs and obesity and diabetes
.
Diabetes Care.
 
2004
;
27
(
2
):
596
601
.

27.

IBM
. MarketScan Research Databases.
2019
; https://www.ibm.com/products/marketscan-research-databases.
Accessed November 19, 2020
.

28.

Brookhart
 
MA
,
Todd
 
JV
,
Li
 
X
, et al.  
Estimation of biomarker distributions using laboratory data collected during routine delivery of medical care
.
Ann Epidemiol.
 
2014
;
24
(
10
):
754
761
.

29.

Sun
 
JW
,
Bourgeois
 
FT
,
Haneuse
 
S
, et al.  
Development and validation of a pediatric comorbidity index
.
Am J Epidemiol.
 
2021
;
190
(
5
):
918
927
.

30.

Teltsch
 
DY
,
Fazeli Farsani
 
S
,
Swain
 
RS
, et al.  
Development and validation of algorithms to identify newly diagnosed type 1 and type 2 diabetes in pediatric population using electronic medical records and claims data
.
Pharmacoepidemiol Drug Saf.
 
2019
;
28
(
2
):
234
243
.

31.

Robins
 
JM
,
Hernán
 
,
Brumback
 
B
.
Marginal structural models and causal inference in epidemiology
.
Epidemiology
.
2000
;
11
(
5
):
550
560
.

32.

Cole
 
SR
,
Hernán
 
MA
.
Constructing inverse probability weights for marginal structural models
.
Am J Epidemiol.
 
2008
;
168
(
6
):
656
664
.

33.

Sterne
 
JAC
,
White
 
IR
,
Carlin
 
JB
, et al.  
Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls
.
BMJ
.
2009
;
338
:b2393.

34.

SAS Institute Inc.
 
SAS/STAT, 14.1 User’s Guide The MI Procedure
.
Cary, NC
:
SAS Institute Inc
;
2015
.

35.

Leyrat
 
C
,
Seaman
 
SR
,
White
 
IR
, et al.  
Propensity score analysis with partially observed covariates: how should multiple imputation be used?
 
Stat Methods Med Res.
 
2019
;
28
(
1
):
3
19
.

36.

Granger
 
E
,
Sergeant
 
JC
,
Lunt
 
M
.
Avoiding pitfalls when combining multiple imputation and propensity scores
.
Stat Med.
 
2019
;
38
(
26
):
5120
5132
.

37.

Rubin
 
DB
.
Multiple Imputation for Survey Nonresponse
.
New York, NY
:
Wiley
;
1987
.

38.

Hernán
 
MA
,
Hernández-Díaz
 
S
,
Robins
 
JM
.
A structural approach to selection bias
.
Epidemiology.
 
2004
;
15
(
5
):
615
625
.

39.

Cole
 
SR
,
Stuart
 
EA
.
Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial
.
Am J Epidemiol.
 
2010
;
172
(
1
):
107
115
.

40.

Lin
 
DY
,
Wei
 
L-J
.
The robust inference for the Cox proportional hazards model
.
J Am Stat Assoc.
 
1989
;
84
(
408
):
1074
1078
.

41.

Poole
 
C
.
Low P values or narrow confidence intervals: which are more durable?
 
Epidemiology.
 
2001
;
12
(
3
):
291
294
.

42.

Hernán
 
MA
,
Robins
 
JM
.
Causal Inference: What If?
 
Boca Raton, FL
:
CRC Press LLC
;
2020
.

43.

Austin
 
PC
.
Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples
.
Stat Med.
 
2009
;
28
(
25
):
3083
3107
.

44.

Hernán
 
MA
.
Invited commentary: selection bias without colliders
.
Am J Epidemiol.
 
2017
;
185
(
11
):
1048
1050
.

45.

Dahabreh
 
IJ
,
Robertson
 
SE
,
Tchetgen Tchetgen
 
EJ
, et al.  
Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals
.
Biometrics.
 
2019
;
75
(
2
):
685
694
.

46.

Westreich
 
D
,
Edwards
 
JK
,
Lesko
 
CR
, et al.  
Transportability of trial results using inverse odds of sampling weights
.
Am J Epidemiol.
 
2017
;
186
(
8
):
1010
1014
.

47.

Dahabreh
 
IJ
,
Robertson
 
SE
,
Steingrimsson
 
JA
, et al.  
Extending inferences from a randomized trial to a new target population
.
Stat Med.
 
2020
;
39
(
14
):
1999
2014
.

48.

Webster-Clark
 
M
,
Lund
 
JL
,
Stürmer
 
T
, et al.  
Reweighting oranges to apples: transported RE-LY trial versus nonexperimental effect estimates of anticoagulation in atrial fibrillation
.
Epidemiology.
 
2020
;
31
(
5
):
605
613
.

49.

Laird
 
NM
.
Missing data in longitudinal studies
.
Stat Med.
 
1988
;
7
(
1–2
):
305
315
.

50.

Ross
 
RK
,
Breskin
 
A
,
Westreich
 
D
.
When is a complete-case approach to missing data valid? The importance of effect-measure modification
.
Am J Epidemiol.
 
2020
;
189
(
12
):
1583
1589
.

51.

Horton
 
NJ
,
Kleinman
 
KP
.
Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models
.
Am Stat.
 
2007
;
61
(
1
):
79
90
.

52.

Little
 
RJ
,
Rubin
 
DB
.
Statistical Analysis With Missing Data
. 3rd ed.
Hoboken, NJ
:
John Wiley & Sons
;
2019
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data