-
PDF
- Split View
-
Views
-
Cite
Cite
Eva Petkova, Thaddeus Tarpey, Zhe Su, R. Todd Ogden, Generated effect modifiers (GEM’s) in randomized clinical trials, Biostatistics, Volume 18, Issue 1, January 2017, Pages 105–118, https://doi.org/10.1093/biostatistics/kxw035
Close -
Share
In a randomized clinical trial (RCT), it is often of interest not only to estimate the effect of various treatments on the outcome, but also to determine whether any patient characteristic has a different relationship with the outcome, depending on treatment. In regression models for the outcome, if there is a non-zero interaction between treatment and a predictor, that predictor is called an “effect modifier”. Identification of such effect modifiers is crucial as we move towards precision medicine, that is, optimizing individual treatment assignment based on patient measurements assessed when presenting for treatment. In most settings, there will be several baseline predictor variables that could potentially modify the treatment effects. This article proposes optimal methods of constructing a composite variable (defined as a linear combination of pre-treatment patient characteristics) in order to generate an effect modifier in an RCT setting. Several criteria are considered for generating effect modifiers and their performance is studied via simulations. An example from a RCT is provided for illustration.
1. Introduction
Precision medicine focuses on making treatment decisions for an individual patient based on the patient’s measures (e.g., clinical and biological features). The idea underlies a long history of attempts to identify characteristics that exhibit interaction with treatment assignment in a regression model for the outcome of interest. Such baseline characteristics, called “treatment effect modifiers”, indicate that the outcome under one treatment compared to another treatment depends on these characteristics. Measures with such interactions can aid decisions about which treatment to prescribe (Gail and Simon, 1985; Wellek, 1997; Song and Pepe, 2004; Wang and Ware, 2013).
It has long been recognized that features that are important for predicting outcome might not be necessarily be useful for making treatment decisions (e.g., Wellek, 1997; Song and Pepe, 2004). Much recent research has focused on identification of individual baseline covariates related to the treatment effect (i.e., variables that exhibit interactions with the treatment indicator in predicting treatment outcome) in contrast to being important in the baseline model. A major challenge in precision medicine is that most baseline measures typically have small moderating effects and individually contribute little to informed treatment decisions. Unconstrained regression models with |$p$| predictors (plus treatment and predictor- by-treatment interactions) become unwieldy, unstable and difficult to interpret when |$p$| is moderate to large. Various strategies have been proposed to deal with the problem (see Qian and Murphy, 2011; Gunter and others, 2011; Lu and others, 2011, among others). Extensions of the methodology that allow functional data objects to be incorporated as baseline features have also been developed (e.g., McKeague and Qian, 2014; Ciarleglio and others, 2015).
A parsimonious alternative to these previous methods that has received little attention is to use a simple model with only a single “composite” predictor. Herein, a methodology is developed for combining several baseline predictors into a single treatment effect modifier in the context of the classic linear model, which we call a generated effect modifier (GEM). Given a vector of |$p$| predictors |$\boldsymbol{x} = (x_1,\dots, x_p)^\prime$|, we consider linear combinations of the predictors |$z =\boldsymbol{\alpha}^\prime \boldsymbol{x}$| for |$\boldsymbol{\alpha} \in \Re^p$| as potential GEMs. The idea of combining covariates was proposed by Tukey (1991, 1993) for balancing and increasing the precision of the estimates of treatment effect in RCTs. A closely related approach was proposed by Tian and Tibshirani (2011) who developed a method of constructing binary “markers” from continuous variables (via cut-off values) and forming an index to detect treatment–marker interactions. Emura and others (2012) introduced a compound covariate approach for predicting survival time in the case when there are too many covariates, for example, gene expression data. In contrast to this work, we propose to combine covariates with the goal of obtaining a single moderating variable, a GEM, that would aid in deciding which treatment is appropriate for any particular patient. Although the GEM model is more restrictive than an unconstrained model, it provides a parsimonious single index approach for making individualized treatment decisions.
Alternative approaches to optimal treatment decision estimation have been proposed that fall in the realm of machine learning and can often be framed in the context of classification problems (Zhang and others, 2012a). Examples are the outcome weighted learning (OWL) (e.g., Zhao and others, 2015; Song and others, 2015) based on support vector machines, tree-based classification (e.g. Laber and Zhao, 2015), and the Kang and others (2014) method based on adaptive boosting. Although these approaches can be appealing options in many settings, we base our general approach on the linear model as it is most frequently utilized in practice and lends itself very well to interpretability. This paper fulfills the practical need of providing a simple treatment effect modifier methodology in the classic linear model setting for making precision medicine decisions. Also, the GEM approach provides the benefit of a visual presentation that is familiar to clinicians.
In efficacy studies, after the primary analysis of treatment efficacy has been performed, the usual practice is to seek individual effect modifiers (single patient baseline characteristics) with the ultimate goal of informing treatment decisions. When no single variable has a strong modifying effect, the GEM is an appealing and novel approach for secondary exploratory analysis to find a strong treatment effect modifier. The GEM can be particularly useful for analysis of studies designed to discover biosignatures for treatment response.
2. Criteria for choosing a GEM
Since the GEM is defined as a linear combination of predictors, the GEM model lends itself most naturally to continuous predictors. In the results that follow, there is nothing that precludes the use of discrete predictors; only care must be taken in how discrete predictors are coded and how the corresponding GEM is to be interpreted. It is very common in clinical practice that categorical variables are actually discretized versions of continuous variables. If this is the case, we recommend that the original variable is used in the GEM instead of its discretized version.
There are several principled criteria one can use for choosing |$\boldsymbol{\alpha}$| for optimizing the GEM. A natural choice obviously, in terms of moderator analysis, is to maximize the magnitude of interaction in the GEM model. Alternatively, |$\boldsymbol{\alpha}$| can be choosen to provide the best fit to the data using a GEM model which is consistent with the classic goal in linear models of minimizing the error sum of squares. A third approach, also consistent with the linear model framework, is to determine |$\boldsymbol{\alpha}$| that maximizes the statistical significance of the interaction effects via an |$F$|-test. Summarizing, we consider the following three criteria, which we refer to as the “numerator” (N), “denominator” (D) and “|$F$|-ratio” (F) criteria, respectively:
(N)Maximizing the interaction effect: Maximize the variability in the GEM scaling coefficients |$\gamma_k$|’s in (2.3), corresponding to maximizing the Numerator of an |$F$|-test for significance of interaction effects. When there are |$K=2$| treatment groups, this is the same as maximizing the squared difference between the scaling coefficients |$\gamma_1$| and |$\gamma_2$| in the GEM model.
(D)Fidelity to the data: Minimize the sum of squared residuals in the GEM model (2.3). This corresponds to the Denominator of an |$F$|-test for significance of interaction effects.
(|$F$|)|$\boldsymbol{F}$|-ratio: Combine the first two criteria and maximize the ratio of the variability of the GEM scaling coefficients relative to the sum of squared residuals for the GEM model. This criterion corresponds to choosing |$\boldsymbol{\alpha}$| to maximize the |$F$|-test statistic when testing significance of interactions in the GEM model.
2.1. The “numerator” criterion: maximizing the interaction effect
Using (2.4), we seek |$\boldsymbol{\alpha}$| that maximizes |$ \boldsymbol{\alpha}'\boldsymbol{\Psi}_x\boldsymbol{B}\boldsymbol{\Psi}_x\boldsymbol{\alpha} =\boldsymbol{\alpha}'\boldsymbol{\Psi}_x^{1/2}\left [\boldsymbol{\Psi}_x^{1/2}\boldsymbol{B}\boldsymbol{\Psi}_x^{1/2}\right]\boldsymbol{\Psi}_x^{1/2}\boldsymbol{\alpha}, $| where |$\boldsymbol{\Psi}_x^{1/2}$| is the symmetric square-root of |$\boldsymbol{\Psi}_x$|. The solution is |$\boldsymbol{\alpha}^N = \boldsymbol{\Psi}_x^{-1/2}\boldsymbol{e}_1,$| where |$\boldsymbol{e}_1$| is the eigenvector of |$\boldsymbol{\Psi}_x^{1/2}\boldsymbol{B}\boldsymbol{\Psi}_x^{1/2}$| that is associated with the largest eigenvalue. To obtain an estimator |$\hat{\boldsymbol{\alpha}}^N$|, we can apply the plug-in principal, use the pooled estimator |$\boldsymbol{\Psi}_x$| from (2.5) and the usual unrestricted LS estimators |$\hat{\boldsymbol{\beta}}_k$| in place of the |$\boldsymbol{\beta}_k$|’s. The GEM |$\gamma_k$|’s and intercepts can be estimated via LS.
Section 1.1 of the supplementary material shows that for |$K=2$|, in terms of population parameters, the treatment decision based on the unrestricted regression is equivalent to the treatment decision based on the numerator GEM model. Minor differences in the empirical decision rules from these two methods are due to differences in the LS estimates using the GEM predictor versus using the original predictors in the unrestricted model.
2.2. The “denominator” criterion: minimizing the residual error
This subsection gives the LS expression for |$\boldsymbol{\alpha}$| that minimizes the sum of squared residuals in a GEM model, that is, that provides the best fitting GEM model. Under the assumption of normality, the LS estimator coincides with the maximum likelihood estimator in the GEM linear model.
The sum of squared residuals from a standard linear model using LS can be written as |$\boldsymbol{y}'(\boldsymbol{I}-\boldsymbol{H})\boldsymbol{y},$| where |$\boldsymbol{H}$| is the hat matrix and |$\boldsymbol{I}$| is an identity matrix. This sum of squared residuals (when divided by its associated degrees of freedom) is an estimate of the quantity |$\sigma_y^2 - \boldsymbol{\Psi}_{xy}'\boldsymbol{\Psi}_x^{-1}\boldsymbol{\Psi}_{xy}.$| In the GEM model (2.3) with |$K$| treatment arms, the hat matrix in the |$k$|th group is |$ \boldsymbol{H}_k(\boldsymbol{\alpha}) = (\boldsymbol{X}_k\boldsymbol{\alpha})(\boldsymbol{\alpha}'\boldsymbol{X}_k'\boldsymbol{X}_k\boldsymbol{\alpha})^{-1}(\boldsymbol{X}_k\boldsymbol{\alpha})'. $| Letting |$\boldsymbol{D}=\sum_{k=1}^K\pi_k\boldsymbol{\beta}_k \boldsymbol{\beta}_k'=\boldsymbol{B}+\bar{\boldsymbol{\beta}}\bar{\boldsymbol{\beta}}',$| Section 1.2 of the supplementary material available at Biostatistics online shows that the |$\boldsymbol{\alpha}$| minimizing the “denominator” criterion is given by |$\boldsymbol{\alpha}^D = \boldsymbol{\Psi}_x^{-1/2}\boldsymbol{e}_2,$| where |$\boldsymbol{e}_2$| is the leading eigenvector of |$\boldsymbol{\Psi}_x^{1/2} \boldsymbol{D} \boldsymbol{\Psi}_x^{1/2} $|. As before, |$\boldsymbol{\alpha}^D$| can be estimated by plugging in the LS estimators for |$\boldsymbol{\beta}_k$| in the expression for |$\boldsymbol{D}$| and using the sample covariance matrix of the pooled predictors (2.5) to estimate |$\boldsymbol{\Psi}_x$|.
2.3. The “$F$ -criterion”: maximizing the $F$ -statistic
The derivation is in Section 1.3 of the supplementary material. Once again, |$\boldsymbol{\alpha}^F$| can be estimated by plugging parameter estimates into (2.10) and extracting the leading eigenvector.
3. Fitting a GEM when the GEM model is misspecified
From (2.10), |$\boldsymbol{\alpha}^F$| depends on the error variance; the results above are for a coefficient of determination |$R^2=0.8$|. As expected, the |$\boldsymbol{\alpha}^F$| is closer to |$\boldsymbol{\alpha}^D$| when the data is from a GEM model since the GEM regression fits the data well in this case, while when the model is far from a GEM model, |$\boldsymbol{\alpha}^F$| is closer to |$\boldsymbol{\alpha}^N$|. This observation together with results from simulations suggest the use of the |$F$|-criterion in practice.
4. Permutation testing for the interaction in a GEM model
Theoretical details for using permutation tests for interaction effects in the presence of possible main effects have been investigated previously in the literature (e.g., Wang and others, 2015, p. 2046).
5. Simulation studies
The sample sizes per treatment group considered are |$n: n_1=n_2=100,\;300, \mbox{ and }1000$|, mimicking typical situations in medical research. For each sample size, the number of predictors used were |$p=10$| and |$p=200$| (except when |$p>n$|, namely |$n=100$| and |$p=200$|). The predictors are generated from |$p$|-variate normal distributions with mean zero and variances equal to 1, and small pairwise correlations (from |$-$|0.2 to 0.2) randomly selected, while ensuring a positive definite correlation matrix. For each |$p$|, |$\boldsymbol{\beta}_1=(1, \frac{1}{2}, \dots, \frac{1}{p})$|. Under GEM, |$\boldsymbol{\beta}_2$| is computed to satisfy the respective |$R^2$| and |$ES$|. Under non-GEM, |$\boldsymbol{\beta}_2$| is obtained by adding random noise to the |$p$| coefficients in |$\boldsymbol{\beta}_1$| and computing the angle |$\theta$| between |$\boldsymbol{\beta}_1$| and |$\boldsymbol{\beta}_2$|. More details about the values of |$\boldsymbol{\beta}_2$| are given in Section 3.2 of the supplementary material. For each combination of |$p$| and the |$\boldsymbol{\beta}_k$|’s (|$k=1, 2$|), a large sample (|$N=10^6$|) is generated with known outcome values under both treatments and it is used to evaluate the “true” optimal population average outcome |$V^+$|, which is the highest achievable value of any decision.
For each simulation configuration (|$n, p, \boldsymbol{\beta}_1,\boldsymbol{\beta}_2$|, |$R^2$| and ES), |$B=1000$| data sets are generated and estimates of |$\boldsymbol{\alpha}^N, \boldsymbol{\alpha}^D, \mbox{ and } \boldsymbol{\alpha}^F$| are computed, as well as |$\boldsymbol{\beta}_1$| and |$\boldsymbol{\beta}_2$| coefficients of the unrestricted regression model (2.2). These estimates are used to define treatment decisions as described in Section 2. These decisions are applied to the |$N=10^6$| cases in the large data set to obtain the estimated values |$V$| of the respective decisions |$V(d(\boldsymbol{x}; \hat{\boldsymbol{\beta}}_1,\hat{\boldsymbol{\beta}}_2))$|, |$V(d^N(\boldsymbol{x}; \hat{\boldsymbol{\alpha}}^N,\hat{\boldsymbol{\gamma}}^N))$|, |$V(d^D(\boldsymbol{x}; \hat{\boldsymbol{\alpha}}^D,\hat{\boldsymbol{\gamma}}^D)),$| and |$V(d^F(\boldsymbol{x}; \hat{\boldsymbol{\alpha}}^F,\hat{\boldsymbol{\gamma}}^F))$|. For the sake of comparison, these values are expressed as a proportion of the “true” optimal average outcome |$V^+$|, and also taking into account the the worst average outcome |$V^-$|, which is obtained by choosing the worst (lower) outcome for each subject in the large data set. For example, the values of the treatment decision based on the “numerator” GEM approach are reported as |$\left [ V\left( d^N(\boldsymbol{x}; \hat{\boldsymbol{\alpha}}^N,\hat{\boldsymbol{\gamma}}^N) \right) - V^- \right ] / (V^+ - V^-) $|.
GEM data generation model. Mean and 95% Monte Carlo (MC) confidence intervals (based on the |$B =1000$| MC runs) of the values |$V$| of the decisions, as a proportion |$(V-V^-)/(V^+-V^-)$|, for |$p=10$| (left half of panels) and 200 (right half of panels), and for ES = 0.1 (top half of panels) and ES=0.3 (bottom half of panels). The three panels per (|$p$|, ES) combination correspond to |$R^2=0.2$| on the left, |$R^2=0.5$| in the middle and |$R^2=0.8$| on the right. The method based on the unrestricted regression and the three GEM approaches are denoted as: (i) unrestricted—red color, most left; (ii) numerator criteria—green, second from left; (iii) denominator criterion—blue, third from left; (iv) |$F$| criterion—purple, most right. The “Number of observations” on the bottom horizontal axis is the sample size per group.
GEM data generation model. Mean and 95% Monte Carlo (MC) confidence intervals (based on the |$B =1000$| MC runs) of the values |$V$| of the decisions, as a proportion |$(V-V^-)/(V^+-V^-)$|, for |$p=10$| (left half of panels) and 200 (right half of panels), and for ES = 0.1 (top half of panels) and ES=0.3 (bottom half of panels). The three panels per (|$p$|, ES) combination correspond to |$R^2=0.2$| on the left, |$R^2=0.5$| in the middle and |$R^2=0.8$| on the right. The method based on the unrestricted regression and the three GEM approaches are denoted as: (i) unrestricted—red color, most left; (ii) numerator criteria—green, second from left; (iii) denominator criterion—blue, third from left; (iv) |$F$| criterion—purple, most right. The “Number of observations” on the bottom horizontal axis is the sample size per group.
Non-GEM data generation model. Mean and 95% Monte Carlo (MC) confidence intervals (based on the |$B =1000$| MC runs) of the values |$V$| of the decisions, as a proportion |$(V-V^-)/(V^+-V^-)$|, for |$p=10$| (left half of panels) and 200 (right half of panels), and for small deviation from GEM (top half of panels) and large deviation from GEM (bottom half of panels). The three panels per (|$p$|, deviation from GEM) combination correspond to |$R^2=0.2$| on the left, |$R^2=0.5$| in the middle and |$R^2=0.8$| on the right. The method based on the unrestricted regression and the three GEM approaches are denoted as: (i) unrestricted—red color, most left; (ii) numerator criteria—green, second from left; (iii) denominator criterion—blue, third from left; (iv) |$F$| criterion—purple, most right. The “Number of observations” on the bottom horizontal axis is the sample size per group.
Non-GEM data generation model. Mean and 95% Monte Carlo (MC) confidence intervals (based on the |$B =1000$| MC runs) of the values |$V$| of the decisions, as a proportion |$(V-V^-)/(V^+-V^-)$|, for |$p=10$| (left half of panels) and 200 (right half of panels), and for small deviation from GEM (top half of panels) and large deviation from GEM (bottom half of panels). The three panels per (|$p$|, deviation from GEM) combination correspond to |$R^2=0.2$| on the left, |$R^2=0.5$| in the middle and |$R^2=0.8$| on the right. The method based on the unrestricted regression and the three GEM approaches are denoted as: (i) unrestricted—red color, most left; (ii) numerator criteria—green, second from left; (iii) denominator criterion—blue, third from left; (iv) |$F$| criterion—purple, most right. The “Number of observations” on the bottom horizontal axis is the sample size per group.
Figure 2 presents information similar to that on Figure 1, but here the data are generated under a general linear non-GEM model (2.2). It shows that even when the data is not generated from a GEM model, the criteria perform quite well for relatively small number of covariates |$p$|. For larger |$p$|, larger sample sizes and larger |$R^2$| are needed to achieve good performance. The values of the decisions based on the denominator criterion are meaningfully inferior to the values of the decisions from the other methods as the deviation from GEM becomes large. The denominator’s inferiority becomes more pronounced as |$R^2, n$|, and |$p$| increase. Regardless of the data generating model, the values produced by the |$F$| method are either the best or very close to the best values produced by either of the other methods compared here. Additionally, simulations were run using the non-GEM generating model except that a subset of predictors were discretized to be binary (5 out of 10 for |$p=10$| and 20 out of 200 when |$p=200$|); the results are very similar to those when all predictors are continuous—details are provided in the supplementary material.
Section 4 of the supplementary material available at Biostatistics online presents results on the performance of the GEM methods in the case when the data generation is not from the linear model (2.2). There we show simulation results based on a doubly-robust estimation procedure using an augmented inverse probability weighted estimator (AIPWE) of the value |$V(d)$| (Robins and others, 1994; Zhang and others, 2012b). Although the GEM approach based on the AIPWE does marginally worse than the unrestricted approach described in Zhang and others (2012b) using an example with |$p=3$| predictors, their approach becomes computationally infeasible for larger values of |$p$|. In cases with large |$p$|, the GEM reduces the dimensionality of the predictor space to |$1$| making the AIPWE approach fast and feasible.
6. Application to data from a RCT
We illustrate the three GEM procedures using data from a RCT for the treatment of depression comparing antidepressants of the class of selective serotonin reuptake inhibitors (SSRI) against placebo. In addition to establishing the overall efficacy of the SSRI, the investigators were interested in finding biosignatures for SSRI treatment response. The investigators defined “biosignature” as a baseline patient characteristic or a combination of such characteristics, that constitutes a moderator of the treatment effect of SSRI vs. placebo.
Data from 76 and 72 subjects randomized to placebo and SSRI, respectively, were available. The outcome was the change from baseline (week 0) to 8 week of treatment on the Hamilton Rating Scale for Depression (HRSD). High values of HRSD indicate higher depression severity and thus positive change (week 0–week 8) indicate reduction of depression. The following baseline clinical measures were proposed as potential moderators: (i) level of anxiety (ii) severity of anger attack; (iii) suicidal risk; (iv) medical comorbidity score; and (v) experience of pleasure score.
Outcome was modeled as a linear function of a baseline measure, treatment indicator (SSRI |$A=1$| vs. placebo |$A=0$|) and the interaction between them for each measure individually. None of the interaction terms were statistically significant, see Table 1. A comparison of a full unrestricted model with all five predictors and their interactions with treatment against a reduced model without the interactions, yielded a non-significant |$F$|-test for the interactions (|$F_{(5,136)}=1.41,\ p\mbox{ value}=0.14$|). Thus, the usual approaches of treating each predictor separately or a full unrestricted model for all predictors fail to find evidence for heterogeneous effect of SSRI and consequently fails to identify patients who stand to benefit from or be harmed by it.
SSRI Clinical biosignature: potential moderators of the efficacy of treatment with SSRI vs. placebo with respect to change in HRSD from baseline to week 8. The 3rd column gives the |$p$|-values for the interaction predictor-by-treatment term and the 4th column gives the effect size of the predictor as a moderator of treatment effect from a regression model with only that variable as a predictor in addition to treatment. The last two columns give the regression coefficients from models with all five baseline measures as predictors for treatment |$A=0$| (placebo) and |$A=1$| (SSRI) respectively
| . | Mean . | St. dev. . | Interaction . | Effect . | Reg. coefs . | |
|---|---|---|---|---|---|---|
| . | . | . | p value . | size . | |$A=0$| . | |$A=1$| . |
| Anxiety | 5.36 | 1.80 | 0.797 | 0.020 | 1.06 | 1.44 |
| Anger attack | 3.05 | 2.12 | 0.671 | 0.034 | |$-$|0.59 | |$-$|0.09 |
| Suicide risk | 5.42 | 2.37 | 0.155 | 0.113 | 1.00 | |$-$|0.38 |
| Medical comorbidity score | 2.01 | 2.78 | 0.092 | 0.140 | 0.11 | |$-$|0.58 |
| Life pleasure score | 33.17 | 5.51 | 0.065 | 0.148 | |$-$|0.20 | 0.04 |
| . | Mean . | St. dev. . | Interaction . | Effect . | Reg. coefs . | |
|---|---|---|---|---|---|---|
| . | . | . | p value . | size . | |$A=0$| . | |$A=1$| . |
| Anxiety | 5.36 | 1.80 | 0.797 | 0.020 | 1.06 | 1.44 |
| Anger attack | 3.05 | 2.12 | 0.671 | 0.034 | |$-$|0.59 | |$-$|0.09 |
| Suicide risk | 5.42 | 2.37 | 0.155 | 0.113 | 1.00 | |$-$|0.38 |
| Medical comorbidity score | 2.01 | 2.78 | 0.092 | 0.140 | 0.11 | |$-$|0.58 |
| Life pleasure score | 33.17 | 5.51 | 0.065 | 0.148 | |$-$|0.20 | 0.04 |
SSRI Clinical biosignature: potential moderators of the efficacy of treatment with SSRI vs. placebo with respect to change in HRSD from baseline to week 8. The 3rd column gives the |$p$|-values for the interaction predictor-by-treatment term and the 4th column gives the effect size of the predictor as a moderator of treatment effect from a regression model with only that variable as a predictor in addition to treatment. The last two columns give the regression coefficients from models with all five baseline measures as predictors for treatment |$A=0$| (placebo) and |$A=1$| (SSRI) respectively
| . | Mean . | St. dev. . | Interaction . | Effect . | Reg. coefs . | |
|---|---|---|---|---|---|---|
| . | . | . | p value . | size . | |$A=0$| . | |$A=1$| . |
| Anxiety | 5.36 | 1.80 | 0.797 | 0.020 | 1.06 | 1.44 |
| Anger attack | 3.05 | 2.12 | 0.671 | 0.034 | |$-$|0.59 | |$-$|0.09 |
| Suicide risk | 5.42 | 2.37 | 0.155 | 0.113 | 1.00 | |$-$|0.38 |
| Medical comorbidity score | 2.01 | 2.78 | 0.092 | 0.140 | 0.11 | |$-$|0.58 |
| Life pleasure score | 33.17 | 5.51 | 0.065 | 0.148 | |$-$|0.20 | 0.04 |
| . | Mean . | St. dev. . | Interaction . | Effect . | Reg. coefs . | |
|---|---|---|---|---|---|---|
| . | . | . | p value . | size . | |$A=0$| . | |$A=1$| . |
| Anxiety | 5.36 | 1.80 | 0.797 | 0.020 | 1.06 | 1.44 |
| Anger attack | 3.05 | 2.12 | 0.671 | 0.034 | |$-$|0.59 | |$-$|0.09 |
| Suicide risk | 5.42 | 2.37 | 0.155 | 0.113 | 1.00 | |$-$|0.38 |
| Medical comorbidity score | 2.01 | 2.78 | 0.092 | 0.140 | 0.11 | |$-$|0.58 |
| Life pleasure score | 33.17 | 5.51 | 0.065 | 0.148 | |$-$|0.20 | 0.04 |
Next, the linear combinations |$\boldsymbol{\alpha}$| for the 3 GEM criteria were estimated, see Table 2. The numerator and |$F$|-criteria give similar results, but only the |$F$|-criterion has a statistically significant permutation |$p$| value (|$p<0.05$|). Note, that the effect sizes for the GEMs based on the numerator and the |$F$|-criterion (which are very similar, both |$ES=0.27$|), are double that of any individual predictor. The denominator GEM, on the other hand, does not produce a significant interaction |$p$| value (and also has a very small estimated ES), which is consistent with the observation that, since the angle between the unrestricted regression coefficient vectors is relatively large (|$0.35\pi$|), the model deviates quite a bit from a true GEM model.
GEM Model for SSRI clinical biosignature. The estimated GEMs of the SSRI treatment effect on change in HRSD. The bottom rows give the GEM effect sizes (row 6), permutation-adjusted |$p$|-values (row 7); the estimated value (1.1) of the decision based on GEM criteria along with a 95% cross-validated bootstrap confidence interval (CI) (row 8); the difference in value and 95% cross-validated bootstrap CI for the difference between the decision based on the respective GEM and the decision (i) give everyone placebo (row 9), (ii) give everyone SSRI (row 10), and (iii) give everyone SSRI or placebo at random (row 11).
| . | Estimated |$\boldsymbol{\alpha}$| . | ||
|---|---|---|---|
| . | |$\hat{\boldsymbol{\alpha}}^N $| . | |$\hat{\boldsymbol{\alpha}}^D $| . | |$\hat{\boldsymbol{\alpha}}^F $| . |
| Anxiety | 0.12 | 0.55 | 0.12 |
| Anger attack | 0.15 | |$-$|0.15 | 0.15 |
| Suicide risk | |$-$|0.42 | 0.14 | |$-$|0.42 |
| Medical comorbidity score | |$-$|0.21 | |$-$|0.10 | |$-$|0.21 |
| Life pleasure score | 0.07 | -0.04 | 0.07 |
| Effect size | 0.27 | 0.01 | 0.27 |
| Permutation |$p$|-value | 0.061 | 0.895 | 0.048 |
| Value of GEM | 8.03 | 7.60 | 8.03 |
| (95% CI) | (6.28, 9.78) | (5.62, 9.43) | (6.21, 9.68) |
| Value of GEM |$-$| Value of placebo | 2.02 | 1.57 | 2.00 |
| (95% CI) | (1.97, 2.06) | (1.52, 1.62) | (1.96, 2.05) |
| Value of GEM |$-$| Value of SSRI | 0.52 | 0.07 | 0.50 |
| (95% CI) | (0.48, 0.55) | (0.04, 0.10) | (0.46, 0.54) |
| Value of GEM |$-$| Value of random | 1.29 | 0.84 | 1.27 |
| (95% CI) | (1.25, 1.32) | (0.80, 0.87) | (1.24, 1.31) |
| . | Estimated |$\boldsymbol{\alpha}$| . | ||
|---|---|---|---|
| . | |$\hat{\boldsymbol{\alpha}}^N $| . | |$\hat{\boldsymbol{\alpha}}^D $| . | |$\hat{\boldsymbol{\alpha}}^F $| . |
| Anxiety | 0.12 | 0.55 | 0.12 |
| Anger attack | 0.15 | |$-$|0.15 | 0.15 |
| Suicide risk | |$-$|0.42 | 0.14 | |$-$|0.42 |
| Medical comorbidity score | |$-$|0.21 | |$-$|0.10 | |$-$|0.21 |
| Life pleasure score | 0.07 | -0.04 | 0.07 |
| Effect size | 0.27 | 0.01 | 0.27 |
| Permutation |$p$|-value | 0.061 | 0.895 | 0.048 |
| Value of GEM | 8.03 | 7.60 | 8.03 |
| (95% CI) | (6.28, 9.78) | (5.62, 9.43) | (6.21, 9.68) |
| Value of GEM |$-$| Value of placebo | 2.02 | 1.57 | 2.00 |
| (95% CI) | (1.97, 2.06) | (1.52, 1.62) | (1.96, 2.05) |
| Value of GEM |$-$| Value of SSRI | 0.52 | 0.07 | 0.50 |
| (95% CI) | (0.48, 0.55) | (0.04, 0.10) | (0.46, 0.54) |
| Value of GEM |$-$| Value of random | 1.29 | 0.84 | 1.27 |
| (95% CI) | (1.25, 1.32) | (0.80, 0.87) | (1.24, 1.31) |
GEM Model for SSRI clinical biosignature. The estimated GEMs of the SSRI treatment effect on change in HRSD. The bottom rows give the GEM effect sizes (row 6), permutation-adjusted |$p$|-values (row 7); the estimated value (1.1) of the decision based on GEM criteria along with a 95% cross-validated bootstrap confidence interval (CI) (row 8); the difference in value and 95% cross-validated bootstrap CI for the difference between the decision based on the respective GEM and the decision (i) give everyone placebo (row 9), (ii) give everyone SSRI (row 10), and (iii) give everyone SSRI or placebo at random (row 11).
| . | Estimated |$\boldsymbol{\alpha}$| . | ||
|---|---|---|---|
| . | |$\hat{\boldsymbol{\alpha}}^N $| . | |$\hat{\boldsymbol{\alpha}}^D $| . | |$\hat{\boldsymbol{\alpha}}^F $| . |
| Anxiety | 0.12 | 0.55 | 0.12 |
| Anger attack | 0.15 | |$-$|0.15 | 0.15 |
| Suicide risk | |$-$|0.42 | 0.14 | |$-$|0.42 |
| Medical comorbidity score | |$-$|0.21 | |$-$|0.10 | |$-$|0.21 |
| Life pleasure score | 0.07 | -0.04 | 0.07 |
| Effect size | 0.27 | 0.01 | 0.27 |
| Permutation |$p$|-value | 0.061 | 0.895 | 0.048 |
| Value of GEM | 8.03 | 7.60 | 8.03 |
| (95% CI) | (6.28, 9.78) | (5.62, 9.43) | (6.21, 9.68) |
| Value of GEM |$-$| Value of placebo | 2.02 | 1.57 | 2.00 |
| (95% CI) | (1.97, 2.06) | (1.52, 1.62) | (1.96, 2.05) |
| Value of GEM |$-$| Value of SSRI | 0.52 | 0.07 | 0.50 |
| (95% CI) | (0.48, 0.55) | (0.04, 0.10) | (0.46, 0.54) |
| Value of GEM |$-$| Value of random | 1.29 | 0.84 | 1.27 |
| (95% CI) | (1.25, 1.32) | (0.80, 0.87) | (1.24, 1.31) |
| . | Estimated |$\boldsymbol{\alpha}$| . | ||
|---|---|---|---|
| . | |$\hat{\boldsymbol{\alpha}}^N $| . | |$\hat{\boldsymbol{\alpha}}^D $| . | |$\hat{\boldsymbol{\alpha}}^F $| . |
| Anxiety | 0.12 | 0.55 | 0.12 |
| Anger attack | 0.15 | |$-$|0.15 | 0.15 |
| Suicide risk | |$-$|0.42 | 0.14 | |$-$|0.42 |
| Medical comorbidity score | |$-$|0.21 | |$-$|0.10 | |$-$|0.21 |
| Life pleasure score | 0.07 | -0.04 | 0.07 |
| Effect size | 0.27 | 0.01 | 0.27 |
| Permutation |$p$|-value | 0.061 | 0.895 | 0.048 |
| Value of GEM | 8.03 | 7.60 | 8.03 |
| (95% CI) | (6.28, 9.78) | (5.62, 9.43) | (6.21, 9.68) |
| Value of GEM |$-$| Value of placebo | 2.02 | 1.57 | 2.00 |
| (95% CI) | (1.97, 2.06) | (1.52, 1.62) | (1.96, 2.05) |
| Value of GEM |$-$| Value of SSRI | 0.52 | 0.07 | 0.50 |
| (95% CI) | (0.48, 0.55) | (0.04, 0.10) | (0.46, 0.54) |
| Value of GEM |$-$| Value of random | 1.29 | 0.84 | 1.27 |
| (95% CI) | (1.25, 1.32) | (0.80, 0.87) | (1.24, 1.31) |
For the sake of comparison, estimates of the value for the three GEM criteria were obtained using an Inverse Probability Weighted Estimator (IPWE) |$\mbox{IPWE} = {1\over n}\sum_{i=1}^n{C(\hat{d}(\boldsymbol{x}_i))y_i\over \pi^{A_i}(1-\pi)^{1-A_i}},$| where |$C(\hat{d}(\boldsymbol{x}_i)) = 1,$| if the treatment assignment |$A$| and treatment decision |$d$| coincide for subject |$i$| with covariates |$\boldsymbol{x}_i$|. Here, |$\pi^{A_i}$| is the probability of treatment assignment, which will be a constant for a RCT and is 0.5 in this example. Row 8 of Table 2 gives a 95% cross-validation bootstrap confidence interval (using 1000 bootstrap samples) for the value of each GEM criterion. The CIs were computed using a 10-fold cross-validation on each bootstrap sample, where treatment decisions were estimated by applying the respective GEM approach to 9 of 10 non-overlapping subsamples of equal size, and then applied to the remaining 10th subsample to obtain an estimate of the value of the treatment decision and finally averaging those estimates across the 10 folds of the cross-validation. As Table 2 shows, the |$F$| and numerator approaches produce very similar bootstrap confidence intervals for the value of the decision, while the denominator criterion results in a lower decision value that has a wider 95% CI. The last three rows of Table 2 show the differences between the values of the treatment decisions derived from each the three GEM approaches and the value of three commonly used comparison decisions (i) give everyone placebo; (ii) give everyone SSRI; and (iii) give placebo and SSRI at random estimated by the same cross-validation approach based on 1000 bootstrap samples.
The relationship between the GEMs obtained from the three criteria and the change in depression (HRSD) from baseline to week 8 for the SSRI (blue) and placebo (red) interventions. The GEMs corresponding to each of the criteria are plotted on the horizontal axis. The lines are the LS lines and the shaded areas indicate the 95% pointwise CIs. The densities of the respective GEMs for the two treatment groups are indicted at the lower part of each panel. The vertical lines indicate the cut-off point on the linear combinations of predictors above which a depressed patient would benefit from treatment with SSRI.
The relationship between the GEMs obtained from the three criteria and the change in depression (HRSD) from baseline to week 8 for the SSRI (blue) and placebo (red) interventions. The GEMs corresponding to each of the criteria are plotted on the horizontal axis. The lines are the LS lines and the shaded areas indicate the 95% pointwise CIs. The densities of the respective GEMs for the two treatment groups are indicted at the lower part of each panel. The vertical lines indicate the cut-off point on the linear combinations of predictors above which a depressed patient would benefit from treatment with SSRI.
7. Discussion
This article has shown how to combine several baseline characteristics into a single generated effect moderator in the context of the classic linear model. Closed-form expressions have been derived for these GEMs that do not require complex iterative computations. The GEM offers a straightforward approach to determine beneficial treatments for patients. From this perspective, GEMs can be viewed as indices for treatment decisions. Of the three criteria, we generally recommend the |$F$|-criterion, because it simultaneously maximizes the interaction effect (the numerator) and also minimizes the prediction error (denominator) in the class of GEM models. Additionally, from our results, the |$F$|-criterion’s performance is either optimal or very close to optimal with respect to making rules for treatment decisions with highest values.
In practice, after conducting the main hypotheses testing in efficacy studies, investigators attempt to discover baseline patient features that moderate the effect of treatment. Given that (if present) variables with large moderating effects of treatments for most illnesses have already been discovered, it is not surprising that researchers regularly fail to discover other moderators in studies where the primary goal is to establish efficacy. The proposed methods show that combining patient characteristics with little to no moderating effects of a treatment can result in a strong treatment effect modifier that can help with making treatment decisions. Of course, any treatment decision has to be validated in properly designed studies; for example, a 3-arm RCT where the experimental treatment, the control treatment and treatment according to the investigated treatment decision are compared. The proposed methodology is expected to be of particular utility in studies specifically designed to discover biosignatures for response to treatment, as discussed in the Introduction.
Several generalizations of the GEM procedure are currently under development, such as extending the GEM to generalized linear models and longitudinal outcomes. Work is also underway to allow the outcome to depend on nonparametric functions of GEMs, similar to generalized additive models. It will be useful to compare the linear GEM model developed here and a more flexible nonparametric GEM model to other methods for precision medicine for providing guidance in making treatment decisions.
Supplementary material
Supplementary material is available at http://biostatistics.oxfordjournals.org
Acknowledgements
The authors are thankful to the editors and three reviewers whose feedback has greatly improved this article. Conflict of interest: None declared.
Funding
National Institutes of Health grant R01 MH099003.



