Testing gene–environment interactions in the presence of confounders and mismeasured environmental exposures

Abstract Interest in investigating gene–environment (GxE) interactions has rapidly increased over the last decade. Although GxE interactions have been extremely investigated in large studies, few such effects have been identified and replicated, highlighting the need to develop statistical GxE tests with greater statistical power. The reverse test has been proposed for testing the interaction effect between continuous exposure and genetic variants in relation to a binary disease outcome, which leverages the idea of linear discriminant analysis, significantly increasing statistical power comparing to the standard logistic regression approach. However, this reverse approach did not take into consideration adjustment for confounders. Since GxE interaction studies are inherently nonexperimental, adjusting for potential confounding effects is critical for valid evaluation of GxE interactions. In this study, we extend the reverse test to allow for confounders. The proposed reverse test also allows for exposure measurement errors as typically occurs. Extensive simulation experiments demonstrated that the proposed method not only provides greater statistical power under most simulation scenarios but also provides substantive computational efficiency, which achieves a computation time that is more than sevenfold less than that of the standard logistic regression test. In an illustrative example, we applied the proposed approach to the Veterans Aging Cohort Study (VACS) to search for genetic susceptibility loci modifying the smoking-HIV status association.


Introduction
It is commonly believed that genetic variants can explain a proportion of the risk for most complex human diseases (e.g., cancer, diabetes, and asthma), where additional unexplained risk could be explained, in part, by gene-environment (GxE) interactions (Thomas 2010). Although many studies have attempted to identify genetic susceptibility loci interacting with environmental exposures in samples using logistic regression analysis, few such interactions have been identified and replicated (Aschard et al. 2012). Reasons for the failure to detect GxE interactions may include: (1) insufficient power for testing interaction effects compared to testing main effects (Smith and Day 1984); (2) measurement errors in the environmental exposure sacrifice the power of the statistical test; (3) relatively small interaction effect sizes compared to the main effect sizes of genetic variants and environmental exposures, and (4) a large number of interactions to be tested as compared to a relatively small sample size in epidemiological studies. In addition, since GxE research is inherently nonexperimental, it is important to control for potential confounders (Keller 2014). For example, it is important to adjust for population stratification so that the interaction effects detected between genetic variants and exposures are not driven by ethnicity (Witte et al. 1999;Kraft and Hunter 2005;Wang et al. 2006;Wang and Lee 2008). In an attempt to minimize the number of covariates in the logistic regression model due to the concern about low power, investigators usually enter the confounders as covariates into the regression model, and ignore the potential confounder-environment or confounder-gene interaction effects. Several authors (e.g., VanderWeele et al. 2013;Keller 2014) suggested to take into consideration the possible confounderenvironment and confounder-gene interactions in order to obtain a more robust analysis result.
In this study, we focus on statistical methods for testing the interaction effect between a genetic variant and a continuous environmental exposure on a binary disease outcome. The genetic variant can be either binary or ordinal. When the environmental exposure is a continuous variable, Aschard et al. (2018) proposed a test that "reverses" the role of the disease status and the continuous exposure in the logistic regression model, i.e., treating the disease status as an independent variable and the environmental exposure as the dependent variable. This regression model is now a linear regression including an interaction term between the genetic variant and disease status, encoding the GxE interaction effect. When the error term in this linear regression model follows a constant-variance normal distribution, both the reverse test and the standard logistic regression test evaluate the same null hypothesis (Aschard et al. 2018). The reverse test has several advantages over the logistic regression approach. First, the reverse test approach generally uses less computation time, since the reverse test statistic has a closed form, in contrast to the iterative optimization procedure used in logistic regression. Simulation studies and some theoretical analysis that follows indicate that the reverse test tends to achieve a several-to 10-fold reduction in computation time compared to the logistic regression test, as the sample size becomes large. Second, in the reverse test approach, measurement error in the environmental exposure does not cause bias in the point estimates of the regression coefficients, because the measurement error is adsorbed in the error term. Third, the reverse test approach exhibits greater statistical power than the logistic regression approach, especially when the main effect of the exposure and the GxE interaction effect are large.
The reverse test in Aschard et al. (2018) did not consider confounders except for a brief discussion on this issue, whereas, as pointed out above, in GxE interaction studies it is usually necessary to control for a set of potential confounders. In this study, we extend the reverse test to adjust for confounders in the evaluation of the GxE interaction. Specifically, we introduce confounder terms into the linear regression model, which include not only the main effect of confounders but also the interaction effects between the confounders and gene/disease. In the scenarios where the confounder-environmental exposure relationship may be nonlinear, spline terms for the confounders can be incorporated in model. We perform simulation studies to evaluate the type I error rate and power of this test and compare it to the standard logistic regression approach. In an illustrative example, we apply the reverse test to the Veterans Aging Cohort Study (VACS) to investigate genetic variants modifying the association of smoking and HIV status.

Materials and methods
The logistic regression approach In genome-wide association studies (GWAS), the logistic regression model has been commonly used to estimate the GxE interaction effects and test for their presence. This model has the following form is the logit function, and X, G, D, and Z ¼ ½Z 1 ; . . . ; Z p T denote a continuous exposure, a genetic variant, a binary disease status and possible confounders, respectively. We first consider a binary genetic variant, G ¼ f0; 1g, to denote the risk allele noncarriers and carriers. The ordinal scenario for G will be considered at the end of this subsection. The confounders, Z j (j ¼ 1; . . . ; p), can be continuous or binary. Here, e ax , denoted as ORðDjX; G ¼ 0Þ, is the odds ratio (OR) for the disease for a one unit increase in X at the reference genetic level G ¼ 0, e ag , denoted as ORðDjG; X ¼ 0Þ, is the OR for G ¼ 1 vs G ¼ 0 at the reference exposure level X ¼ 0. The parameter of interest is e agx ¼ ORðDjX;G¼1Þ ORðDjX;G¼0Þ , which is the ratio of the ORs (ROR) with respect to X for G ¼ 1 vs G ¼ 0, representing the effect of the GxE interaction. The null hypothesis, ROR ¼ 1, or equivalently a gx ¼ 0, indicates no interaction between the genetic variant and environmental exposure.
Without loss of generality, in the logistic regression model (1), we assume that the confounders term, mðZÞ, can be written as P p j¼1 m j ðZ j Þ. For binary Z j , m j ðZ j Þ ¼ b z j Z j . For continuous variable Z j , we may use the restricted cubic spline (Durrleman and Simon 1989) or other spline functions to model Z j . If using restricted cubic splines, we have m j ðZ j Þ ¼ P MÀ1 k¼1 a z j ;k B j;k ðZ j Þ, where fB j;k ðZ j Þg MÀ1 k¼1 is the spline basis based on M knots (Durrleman and Simon 1989). If M ¼ 2, m j ðZ j Þ includes only a linear term, a z j ;1 Z j . The knots are often placed at evenly spaced percentiles over the distribution of Z j . The confounder term, mðZÞ, can be rewritten as a T zZ , where the vectorZ denotes all the terms generated by Z, including all the possible spline basis, and a z are the corresponding coefficients. Throughout this study, all vectors are column vectors. Generally, the maximum likelihood estimator of a ¼ ½a 0 ; a T z ; a x ; a g ; a gx T can be obtained through an iteratively procedure. A 1-df wald test with respect to the null hypothesis, H 0 : a gx ¼ 0, can be used to test the GxE interaction. The wald test statistic is s 2 a ¼â 2 gx VarðâgxÞ , which follows a 1-df v 2 -distribution under the null.
In the above, Z-G or Z-X interaction effects have not been included in model (1). Vanderweele et al. (2013) and Keller (2014) pointed out that neglecting Z-G interaction terms could bias estimates of the GxE interaction and inflate the type I error of the null hypothesis. A similar argument applies to the Z-X interaction terms. Therefore, the Z-G and Z-X interaction terms must also be included in order to obtain valid GxE interaction estimates and tests. In model (1), it is straightforward to control for all the main effects and interaction effects of Z by replacing Z in the term mðZÞ with Z Ã ¼ ½Z T ; ðZXÞ T ; ðZGÞ T T . Inclusion of these terms will not change the estimation and test procedure for the GxE interaction effect.
The genetic variant, G, may also be treated as an ordinal variable with values 0, 1, and 2, corresponding to wild type, heterozygous genotype, and homozygous genotype, respectively. Under this scenario, e ag now represents the OR for one unit increase in G at the exposure reference level X ¼ 0, and e agx , the ROR, represents the ratio of ORs in X for one unit increase of G. The parameter estimation and testing for H 0 : a gx ¼ 0, or equivalently H 0 : ROR ¼ 1, are similar to those in the binary genetic variant scenario.
The reverse test approach Aschard et al. (2018) proposed a reverse test that exchanges the roles of the disease outcome D and continuous exposure X. Now, taking into account potential confounders, Z, we assume X, conditional on D, G, and Z, follows a normal distribution with constant variance; i.e., where e $ Nð0; r 2 Þ. Similar to mðZÞ in the logistic regression model, hðZÞ can be written as P p j¼1 h j ðZ j Þ with each of h j ðZ j Þ a linear form b z j Z j or a spline function. Estimation of the unknown parameters in (2), denoted by b, can use ordinary least squares (OLS). Unlike with the logistic regression model (1), the parameter estimators for this linear regression model have a closed form. Thus, this linear regression approach is computationally more efficient than the logistic regression approach. Now, we calculate the ROR based on the linear regression model (2). For notational simplicity, we assume the genetic variant is binary. The approach for ordinal genetic variants will be similar. In Appendix A, we show that ORðDjX; G ¼ gÞ ¼ e ðb d þb gd gÞ=r 2 for g ¼ 0, 1. Therefore, ROR ¼ e b gd =r 2 , and testing H 0 : ROR ¼ 1 is equivalent to testing H 0 : b gd ¼ 0 in linear model (2). Also, using the linear regression model, the ROR can be estimated bŷ ROR ¼ expðb gd =r 2 Þ, whereb gd andr 2 are the OLS estimates of b gd and the estimated variance of the error term, r 2 . The analytic formula for the standard error and estimated confidence interval of ROR are shown in Appendix B. In the linear model, we can also control for possible interaction effects between confounders and gene/disease by replacing Z in the term hðZÞ with Z ÃÃ ¼ ½Z T ; ðZDÞ T ; ðZGÞ T T . In this linear model including the additional interactions, e b gd =r 2 also represents the ROR (see Supplementary Material Appendix A for more detail); therefore, testing b gd ¼ 0 is also valid for testing ROR ¼ 1. Similar to the logistic regression test, the reverse test can uses a 1-df wald test statistic s 2 b ¼b 2 gd Var ðb gd Þ to check for the null hypothesis H 0 : b gd ¼ 0. One of major advantages of the logistic regression approach is that it can provide valid statistical inference if the data are a case-control study that is retrospectively sampled from a source population with known disease status. In Appendix C, we prove that the reverse test is also a valid approach under such a casecontrol sampling design.
Consider the following linear model with Z-G interaction terms, where h $ ðZÞ; h % ðZÞ and h $% ðZÞ are functions of Z and e $ Nð0; r 2 Þ as for (2). Note that model (2) is a special case of (3). As shown in Supplementary Material Appendix B, the logistic model (1) and linear model (3) can hold simultaneously, and if both models hold, we can develop the following parametric relationships between the logistic regression and linear models: Otherwise, however, the logistic regression model (1) and linear model (2) cannot hold simultaneously. In Supplementary Material Appendix B, we show that models (1) and (2) can also hold simultaneously and the parametric relationships above still hold. This occurs under the following two conditions. First, the error term in the linear model (2) is normally distributed with a constant variance. Second, logitðPrðD ¼ 1jG; where G can be binary or ordinal, and b 0 , b 1 , b 2 , b z ðZÞ, and b zg ðZ; GÞ are functions of the unknown parameters in models (1) and (2), and the definitions of b's are shown in Supplementary Material Equation (S2). The linear regression model (2) mainly relies on two assumptions: (i) the conditional normality assumption; i.e., the error term in the linear model follows a normal distribution and (ii) the constant variance assumption; i.e., VarðeÞ ¼ r 2 is constant. A normality test, such as the Jarque-Bera test (Jarque and Bera 1987) or the Shapiro-Wilk test (Shapiro and Wilk 1965), can be used to check whether regression residuals, , satisfy the conditional normality assumption. If necessary, a transformation on X can be applied before performing the GxE analysis using the reverse test approach. Commonly used transformations include logarithm, square root, Box-Cox (Box and Cox 1964), and Yeo-Johnson transformations (Yeo and Johnson 2000).
The constant variance assumption can be evaluated by implementing Levene's test (Levene 1961) or the White test (White 1980) on the linear regression model's residuals. When the constant variance assumption does not hold, i.e., VarðeÞ depends on Z, D, and G, the OLS estimator of b gd will be consistent but the variance estimator ofb gd is invalid, often inflating the type I error also follows a 1-df v 2 -distribution under the null hypothesis. Sun et al. (2018) suggested that the sandwich estimator of variance may be biased downwards in finite samples and cause inflated type I error in hypothesis testing, and provided a bootstrap-based method, referred as BICS, to improve the performance of the s 2 statistic. It should be noted that, in the presence of heteroskedasticity, testing H 0 : b gd ¼ 0 is no longer equivalent to testing ROR ¼ 1. In the next section, we present alternative definitions of the GxE interaction and show that testing H 0 : b gd ¼ 0 is still a valid test for GxE interaction, even if the constant variance or the conditional normality assumption does not hold.

Alternative definitions of the GxE interaction
In the previous two sections, the OR was used to represent the exposure-disease association; if the exposure-disease association differs in genetic subgroups, a GxE interaction was identified. In this section, we present several other parameters that also represent the exposure-disease association, leading to alternative definitions for GxE interactions. The conditional mean difference of X between cases and controls (abbreviated as MD), i.e., EðXjD ¼ 1; G; ZÞ À EðXjD ¼ 0; G; ZÞ, can be used to represent the exposure-disease association. The difference of MD between the subgroups of G ¼ 1 and G ¼ 0, referred as the difference of mean difference (DMD), can represent the GxE interaction effect, and DMD ¼ 0 represents no GxE interaction (Aschard et al. 2018). In model (2), we can show the MDs at the genetic levels G ¼ 1 and G ¼ 0 are b d þ b gd and b d , respectively, and therefore DMD ¼ b gd . Noticing that the previous derivation of MD and DMD only requires that EðXjD; G; ZÞ agrees with what is presented in (2), it does not place any restriction on the distribution of the error term in linear model (2). In other words, even if in (2) is nonnormally distributed and heteroskedastic, H 0 : b gd ¼ 0 is valid for testing for no GxE interaction H 0 : DMD ¼ 0.
Alternatively, CorrðX; DjG; ZÞ also measures the exposuredisease association, and the difference in correlation coefficient (DCC), defined as CorrðX; DjG ¼ 1; ZÞ À CorrðX; DjG ¼ 0; ZÞ, also represents GxE interaction. If DCC ¼ 0, there is no exposuredisease correlation difference across genetic subgroups, and therefore no GxE interaction. In Supplementary Material Appendix C, we show that testing for H 0 : DCC ¼ 0 is also valid through evaluating H 0 : b gd ¼ 0, under the assumptions of weak X-G and D-G associations, regardless of whether the normality assumption holds or not.

Measurement errors in the environmental exposure
Many environmental exposures are measured with error, including data from self-administered questionnaires and laboratory measurements. In this section, we consider scenario under the classical additive measurement error model: where X Ã is the measurement of X, and d, independent from X, G, Z, and D, is the measurement error term following a mean zero normal distribution with variance r 2 d . We can use the regression calibration method (Rosner et al. 1989(Rosner et al. , 1990) to obtain the corrected estimator for the interaction coefficient,â gx , in the logistic regression model (1). Using the regression calibration approach, we fit the logistic regression model with X replaced by X Ã : It follows thatâ gx %â Ã gx =q, whereq ¼V ar ðXÞ Var ðXÞþr 2 d and q represents the magnitude of the measurement error, when either (i) PrðD ¼ 1jX; X Ã ; G; ZÞ ¼ PrðD ¼ 1jX; G; ZÞ, the disease is rare, and XjG; Z is normal (Rosner et al. 1989); or (ii) PrðD ¼ 1jX; X Ã ; G; ZÞ ¼ PrðD ¼ 1jX; G; ZÞ and VarðXjX Ã ; G; ZÞ is small (Carroll et al. 2006). Under either of the above conditions, testing H 0 : a Ã gx ¼ 0 in model (5) is also a valid test for no GxE interaction H 0 : ROR ¼ 1, although the point estimatorâ Ã gx is attenuated. It is more straightforward to cope with the measurement errors in the linear regression model (2). Based on the assumption that d is independent from D, G, and Z, we can show that EðX Ã jG; D; ZÞ ¼ EðX þ djG; D; ZÞ ¼ EðXjG; D; ZÞ and VarðX Ã jG; D; ZÞ ¼ VarðX þ djG; D; ZÞ ¼ r 2 þ r 2 d . Noting that both d and are normally distributed, we have where e Ã $ Nð0; r 2 þ r 2 d Þ. All of the coefficients in the conditional mean (6) equal those in the original linear model (2) , however, will be attenuated, becauser 2 tends to be overestimated as the variance of the error term, r 2 þ r 2 d , is larger than r 2 . If the measurement error term depends on G, D, or Z, i.e., VarðdÞ ¼ VðG; D; ZÞr 2 d , where VðG; D; ZÞ is an unknown positive function, the linear model (6) d is no longer a constant. As discussed in the section above, in this heteroskedasticity scenario, we can use the sandwich variance in the test for H 0 : b gd ¼ 0.

Simulation studies
To assess the validity of the reverse test approach and its power compared to the standard logistic regression model approach, we conducted simulation studies under a range of scenarios. We describe first the models considered in the simulation study, and then the data generation procedures.
Specifically, we consider the following logistic and linear regression model: where e $ Nð0; 1Þ, and Z 1 and Z 2 are potential confounders. A brief summary of the data generation process is shown as below and the detailed information can be found in Supplementary Material Appendix D. Specifically, we first generate Z 1 and Z 2 , followed by the genetic variant G. Next the disease D is generated conditional on Z 1 , Z 2 , and G. Finally, the exposure X is generated based on the linear regression model in (7). Note that the logistic model in (7) also holds by this generation procedure, since the data generating process for DjG; Z 1 ; Z 2 are carefully manipulated such that the distribution DjX; G; Z 1 ; Z 2 coincides with the linear model in (7). We now specify the parameter values in (7). Noting that , we choose the values of a 0 such that the baseline disease prevalence ranged from 5% to 50%. In the logistic regression model, we set e a1 ; e a2 and e a3 as 1.2, and specified a x , a g and a gx over realistic values such that ORðDjX; G ¼ 0Þ 1:5; ORðDjG; X ¼ 0Þ 1:5 and ROR 1:5, each corresponding to a one unit increase in X and G. In the linear regression model, we fixed b 0 ¼ 0; b 1 ¼ 0:03 and b 2 ¼ 0:03, and defined b 3 and b g for a series of correlations between the outcome X and independent variables G, D and Z, with b 3 ¼ were set at a range from 0.01 to 0.2, the conditional variances of X were specified as 1, and the conditional variances of G and Z 2 were set to their corresponding unconditional variances. We specified We first simulated a cohort study with 100,000 subjects using the parameters and data generation procedure as above, and then randomly selected 1000 cases and 1000 controls from the cohort to create a 1:1 matched case-control study. Then, we fit the logistic model and linear model to obtainâ gx andb gd , where a cubic B-spline approach with 3 interior knots was used for Z 1 in each model. Next, we tested H 0 : a gx ¼ 0 and H 0 : b gx ¼ 0 based on a 1-df wald test. We repeated the above procedure 500,000 times to evaluate the validity and statistical power of the two tests. Under measurement error scenarios, we generated X Ã based on (4), where the measurement error was set to explain 25-75% of the variance of X Ã ; that is, q ¼ VarðXÞ In addition, we compared performance of the two tests when models (1) and (2) do not hold simultaneously. Specifically, we considered two scenarios, corresponding to the two conditions that are required for both models hold simultaneously. In Scenario I, we used a simple logistic regression model with probability logitðPðD ¼ 1jG; ZÞÞ ¼ a 0 þ a 1 Z 1 þ a 2 Z 2 1 þ a 3 Z 2 2 þ a g G to generate the disease outcome, instead of the more complex expression given above to generate D in order to ensure both models hold simultaneously. In Scenario II, we considered scenarios where error term in the linear model does not follow normality and set the error term to follow a rectified Gaussian distribution (Socci et al. 1998), a right-skewed distribution that resets the negative elements of a normal distribution to 0. In each of these scenarios, we considered two data generation procedures, referred as the logistic data generation procedure and linear data generation procedure, corresponding to the following two cases: (i) a correctly specified logistic model but a misspecified linear model and (ii) a correctly specified linear model but a misspecified logistic model. More details for the simulation setting-ups in Scenarios I and II are deferred to Supplementary Material Appendix E and F, respectively.

Illustrative example: the VACS
We illustrated the utility of the proposed reverse test by investigating the genome-wide interaction of gene-smoking on HIV status from the VACS. VACS is a multi-center, longitudinal observational study of HIV-infected and -uninfected veterans, whose primary objective is to understand the risk of substance abuse in subjects with HIV infection (Justice et al. 2006;Wu et al. 2019). As the GxE interaction effects may vary among different ethnic groups, we focused on the subgroup of African Americans in this example. The environmental exposure, smoking, was measured by cigarettes per day (! 0) in patient surveys collected at six clinic visits. Previous literature shows that cigarette smoking is a potential risk factor for HIV acquisition as it may be associated with high-risk sexual behavior (Burns et al. 1991;Marshall et al. 2009). For each subject, we defined smoking (cigarettes per day) as the average of the smoking data over the six clinic visits. The distribution of smoking is highly right-skewed with the coefficient of skewness 3.54 (Supplementary Figure S2A). To alleviate the skewness, a Yeo-Johnson transformation (Yeo and Johnson 2000) was applied. Specifically, the transformed smoking value is TðxÞ ¼ ðxþ1Þ c À1 c , where x is the original smoking values and c is a tuning parameter which equals c ¼ À0:074 here. Because c is close to 0, we can make the following approximation by L'Hopital's rule: TðxÞ % ðx þ 1Þ c Â logðx þ 1Þ % logðx þ 1Þ. Therefore, approximately, we can interpret the transformed smoking values as the logarithm of one plus the number of cigarettes per day. The density plots for the transformed exposure are shown in Supplementary Figure S2B, which presents that the transformed smoking variable is barely skewed, with the coefficient of skewness 0.03. In the reverse test, we used the transformed smoking varible as the response. We also used the transformed smoking variable in the logistic regression model.
All samples were genotyped on the Illumina OmniExpress BeadChip, and then imputed using IMPUTE2 (Howie et al. 2009) with the 1000 Genomes Phase 3 data as a reference panel, which resulted in a total of 17,092,657 SNPs. In this application study, we excluded subjects whose environmental exposure or HIV status were unavailable, and among that, the proportion of successfully imputed SNPs < 95%. Thus, after data cleaning, 1484 subjects were retained in the analysis, with 1403 males and 81 females, of whom 965 were HIV positive and 519 were HIV negative. The characteristics of the 1484 subjects stratified by the HIV status are provided in Supplementary Table S1, where the HIV negative group has lower proportions of males and smokers, whereas other characteristics are balanced between the HIV positive and negative groups. SNPs with minor allele frequency (MAF) > 1% and call rate > 95% were included in the analysis, which resulted in 10,079,672 SNPs for analysis of their interaction effects with smoking on the HIV infection. As for the confounders in the logistic and linear model, we considered age at baseline (continuous), and gender, as well as top 10 Principal Components of genotypes to control for population structure. We also consider alcohol intake (binary, drinkers vs nondrinkers) as a potential confounder in the smoking-HIV association because evidence shows that alcohol consumption may be associated with cigarette smoking and is also a potential risk factor for the incidence of HIV (Shuper et al. 2010). In the analysis, strong nonlinear main effects of age at baseline were observed. Thus, we used the cubic spline with 3 interior knots to adjust for the main effect of age in both models. We also included age-by-smoking and age-by-gene interactions in the logistic model to control for possible interaction effects between age at baseline and smoking/gene on disease outcome. Similarly, age-by-gene and age-by-HIV interaction terms were included in our reverse test.

Type I error
We evaluated the validity of the reverse approach and logistic regression approach for testing GxE interaction by calculating the nominal type I error rates at 5% and 0.01% significant levels. Table 1 presents the empirical type I error rates of these tests over a range of scenarios, with and without measurement error in X in the circumstance when both the logistic regression and linear models hold. The null models were simulated by setting a gx ¼ logðRORÞ ¼ 0 in the data simulation procedure. As shown in Table 1, type I error rates were close to the nominal P-value thresholds over the full range of design parameters we studied for both tests, regardless of whether there was a measurement error.
We evaluated the performance of both tests in Scenarios I and II where the logistic regression model (1) and linear model (2) do not hold simultaneously. Table 2 shows the type I error rates at a 0.01% significant level for both tests in Scenario I where PðDjG; ZÞ follows a simple logistic regression model. The reverse test approach had well-controlled type I error rates under both the logistic and the linear data generation procedures, whereas the logistic regression test exhibited very slight type I error deflation under the linear data generation procedure. At the 5% significant level, both approaches always presented satisfactory type I error rates when either the linear or the logistic regression model did not hold simultaneously (Supplementary Table S2).
Supplementary Tables S3 and S4 provided empirical type I error rates in Scenario II where the error term in the linear model follows a rectified Gaussian distribution. Under the linear data generation procedure, the reverse test exhibited no inflation or deflation at any of the nominal threshold considered; in contrast, the logistic regression test had inflated type I error under most settings considered, especially when the OR of X was large without measurement error. Under the logistic data generation procedure, the logistic regression test had well-controlled empirical type I error rates; for the reverse test, while the type I error rates of the reverse test were almost always below the nominal levels with occasionally a slight deflation.

Power comparison
We compared the statistical power of the logistic regression test and reverse test by calculating the ratio v 2 rev v 2 log , simply referring as v 2 ratio henceforth, where v 2 rev and v 2 log are the average v 2 test statistics of the reverse test and logistic regression test over 500,000 repetitions for each simulation study scenario. A v 2 ratio greater than 1 indicates that the reverse test obtained statistical power than the logistic regression test. Figure 1 provides the v 2 ratio of the reverse test against logistic regression test in the absence of measurement error in X. We considered the exposure main effects, i.e., ORðDjX; G ¼ 0Þ, from 1.1 to 1.5, and GxE effects, i.e., ROR, from 1.1 to 1.5, while considering weak and strong exposure-confounder correlation (CorrðX; Z 2 jZ 1 ¼ G ¼ D ¼ 0Þ ¼ 0:01, 0.2), weak and strong genetic variant main effects (ORðDjG; X ¼ 0Þ ¼ 1:1, 1.5), rare and common disease prevalence (PrðD ¼ 1jG ¼ X ¼ Z 1 ¼ Z 2 ¼ 0Þ¼0.05, 0.2), and weak and strong gene-exposure correlation (CorrðX; GjZ 1 ¼ Z 2 ¼ D ¼ 0Þ¼0.01, 0.2). As shown in Figure 1, all the v 2 ratios were greater than 1, indicating the reverse test provided higher statistical power than the logistic regression test. The power advantage of the reverse test against logistic regression improved with the increase of the main effect of exposure; for example, the v 2 ratios were generally below 1.25 for ORðDjX; G ¼ 0Þ ¼ 1:1 and ROR ¼ 1:5 and were greater than 1.45 for ORðDjX; G ¼ 0Þ ¼ 1:5 and ROR ¼ 1:5. As the ROR increased, the reverse test also tended to have higher statistical power than the logistic regression test. However, we observed a slight decrease in the relative efficiency as the association between X and Z 2 and the exposure-gene association increased. The disease prevalence and disease-gene association had minimal influence on the relative efficiency of the two tests.
More detailed results for the power comparison between the reverse and logistic regression test are provided in Table 3, in which we investigate the impact of exposure measurement error on the relative efficiency. Typically, the magnitude of the measurement error resulted in a decrease in the relative efficiency of the reverse test vs the logistic regression test, but the reverse test still provided a power advantage even when q ¼ 0:25. For example, when ORðDjX; G ¼ 0Þ ¼ 1:5, the v 2 ratio decreased from over 1.4 in the absence of measurement error (q ¼ 1) to below 1.15 for q ¼ 0:25. Supplementary Figure S5 presents the rejection rates (i.e., power) under the alternative hypothesis that ROR ¼ 1.5 across 50,000 replications for the logistic regression approach and the reverse test at significance levels of 0.01 and 5% when the logistic and linear regression models hold simultaneously. As expected, the rejection rate raises when measurement error decreases or significance level increases. The reverse test exhibits higher rejection rates under nearly all settings in the absence of measurement error and also provides power advantage under most simulation settings with a large measurement error (q ¼ 0:25).
We also investigated relative efficiency when the linear and logistic regression models did not hold simultaneously. The results In this table, "logistic" and "reverse" represent the logistic regression test and reverse test respectively. P0ðD ¼ 1Þ; Corr0ðX; Z2Þ, and Corr0ðX; GÞ denote Here, q ¼ VarðXÞ VarðX Ã Þ denotes the magnitude of the measurement error, where X is the true exposure and X Ã is the observed exposure measured with error. The empirical type I error rates were calculated across 500,000 simulations for each scenario, where the empirical type I error rates outside the 95% confidence boundary, i.e., p61:96 Â ffiffiffiffiffiffiffiffiffiffiffi pð1ÀpÞ B q , were highlighted in bold. Here, p denotes the nominal threshold (5 or 0.01%) and B denotes the number of replication (500,000).
for Scenario I are shown in Supplementary Table S6. As expected, the reverse test exhibited a significant power advantage under the linear model data generation procedure. Under the logistic data generation procedure, the reverse test still provided power advantage among 62.5% of the simulation scenarios under a weak D-X association (ORðDjX; G ¼ 0Þ ¼ 1:1) and 50%, when ORðDjX; G ¼ 0Þ ¼ 1:5. Among the simulation scenarios where v 2 ration <1, only 5 out of 84 (6.0%) had v 2 ratio below 0.9. Supplementary Table S7 presents the relative efficiency for Scenario II where the error term in the linear model followed a rectified Gaussian distribution. Under the linear data generation procedure, the reverse test generally outperformed the logistic In this table, "logistic" and "reverse" represent the logistic regression test and reverse test respectively. P0ðD ¼ 1Þ; Corr0ðX; Z2Þ, and Corr0ðX; GÞ denote Here, q ¼ VarðXÞ VarðX Ã Þ denotes the magnitude of the measurement error, where X is the true exposure and X Ã is the observed exposure measured with error. The empirical type I error rates were calculated across 500,000 simulations for each scenario, where the empirical type I error rates outside the 95% confidence boundary, i.e., p61:96 Â ffiffiffiffiffiffiffiffiffiffiffi pð1ÀpÞ B q , were highlighted in bold. Here, p denotes the significance level (0.01%) and B denotes number of replication (500,000). Figure 1 Power comparison between the reverse test and logistic regression test, where the x-axis is the ROR representing the magnitude of the GxE interaction, and the y-axis is the average v 2 ratio of the reverse test against the logistic regression test. The results were calculated through 500,000 simulations, with increasing the ROR and main effect of X (i.e., ORðXjG ¼ 0Þ) from 1.1 to 1.5. Left column: weak vs strong exposure-confounder association, of CorrðX; Z 2 jZ 1 ¼ D ¼ G ¼ 0Þ ¼ 0.01 (A) and 0.2 (E); Second column: weak vs strong disease-gene association, ORðGjX ¼ 0Þ¼ 1.1 (B) or 1.5 (F); Third column: weak vs strong gene-exposure correlation, of CorrðX;   where v 2 rev and v 2 rev are the average v 2 statistics for the logistic regression test and reverse test based on 500,000 simulation repetitions for each scenario. regression test, the v 2 ratio was less than 1 in only 12 out of 192 (6.3%) simulation scenarios. Under the logistic data generation procedure, the reverse test still provided satisfactory results; the v 2 ratio was less than 0.9 in 49 out of 192 (25.6%) simulation scenarios. In addition, we also observed that the reverse test provided greater statistical power against the logistic regression test for small main effects of X and weak correlations between exposure and genetic variant.

Computational time
The reverse test is more computationally efficient than the logistic regression test, since it is based on a closed form test statistics, in contrast to the iterative procedure used by the logistic regression test. Table 4 displays the computation time of the two tests using R software on a single desktop computer (3.7 GHz CPU and 16 GB RAM) for GxE analysis of 1,000,000 SNPs with the sample sizes from 1000 to 200,000. We can see that the reverse test outperformed the logistic regression test in all scenarios. Furthermore, as the sample size increased, the ratio of the computation time of the logistic regression test comparing to the reverse test increased. With a sample size of 200,000, an approximately 7.5-fold reduction in computation time was observed for the reverse test.

Illustrative example: gene-smoking interactions in VACS
Interaction effects between genetic variants and smoking in relation to HIV status was considered among 10,079,672 SNPs in 1484 African Americans in VACS, using both the reverse test and logistic regression approach, adjusting for age, sex, alcohol intake, and the top 10 Principal Components. Quantile-Quantile (Q-Q) plots and Manhattan plots for both tests are presented in Figure  2. The reverse test displayed no evidence of P-value inflation in the QQ plot ( Figure 2B), with a genomic inflation factor, k, close to 1, while the logistic regression test showed some evidence of in-flation in the QQ plot ( Figure 2D), with k ¼ 1:07 We observed a strong correlation between the P-values of the logistic and reverse tests, with the correlation coefficient 0.82. Table 5 presents the results for SNPs where at least one of the reverse and logistic regression tests provided a P-value below 1 Â 10 À6 . We can see the top-ranked SNPs largely overlapped between the two approaches. No SNPs showed genome-wide significance (P-value < 5 Â 10 À8 ) for the interaction under the logistic regression approach. However, the reverse test presents that two SNPs, rs10744166 and rs10773060, have significant interaction effects (P-value ¼ 3:71 Â 10 À8 and 3:42 Â 10 À8 ). Both SNPs are located at gene ZNF664 in chromosome 12, which was previously identified to be associated with clubfoot (Zhang et al. 2014). The minor allele frequencies (MAFs) of the two SNPs were 45.6 and 45.4% in the study sample, and were slightly higher as compared to the 1000 Genomes (with MAF 37.5 and 37.4%, respectively). In order to illustrate interactions across genotypes, averages of the smoking values (cigarettes per day) between the HIV-infected and -uninfected veterans stratified by different genotypes of rs10744166 are visualized in Supplementary Figure S3A. As can be seen from the figure, an increase in the number of minor alleles of rs10744166 indicates a larger mean difference between the smoking values in the HIV-infected and -uninfected individuals, such that the mean difference for the genotype with two minor alleles is at nearly 3.5 times the mean difference for the genotype with no minor alleles. In other words, the minor allele in rs10744166 is associated with a stronger smoking-HIV infection relationship. A similar pattern between genotypes and mean differences were also found in rs10773060, as visualized in Supplementary Figure S3B. A linkage disequilibrium analysis shows evidence that rs10744166 and rs10773060, along with the other four SNPs located at ZNF664 shown in Table 5, are in linkage disequilibria spanning a 1.5 kb length (Supplementary Figure  S3C), where the squared allele frequency correlations R 2 for each pair of SNPs are ranged between 0.95 and 0.99. It is of note that  there was limited information on the detected SNPs given the small sample size of this illustrative example. Replication in an independent dataset is needed to confirm those associations. We further performed a separate analysis for gene-smoking interactions in the subgroup of smokers (n ¼ 905). The QQ and Manhattan plots of both tests are visualized in Supplementary Figure S4. We observed no SNPs exhibiting genome-wide significance based on either logistic regression test or reverse test. This is not surprising as the sample size is relatively small. Although rs10744166 and rs10773060 show genome-wide significance previously, both SNPs do not appear to be significant in the smokers subgroup by either the reverse test (P-value ¼ 0.08 and 0.09) and the logistic regression test (P-value ¼ 0.02 and 0.02).

Discussion
In this study, we propose a reverse test for interaction between an environmental exposure and a genetic variant on a binary disease status, adjusting for confounding. Comparing with the standard logistic regression test for interaction with a continuous or binary environmental exposure, the reverse test only applies to a continuous environmental exposure. This reverse test leverages the spirit of linear discriminant analysis by reversing the roles of the environmental exposure and disease status, and obtains a closed form of the GxE test statistic. Our analysis shows that when the error term in the linear model follows a normal distribution with constant variance, both the reverse and standard logistic regression approaches are valid for testing H 0 : ROR ¼ 1. Compared to the logistic regression approach, the reverse test has a larger statistical power. As a trade-off, the reverse test approach additionally assumes that the exposure is continuous.
The logistic regression approach has a wider range of applications, in which the exposure can be continuous, binary, count, and categorical variables.
The reverse approach can be extended to test for interactions between a genetic marker set and a continuous environmental exposure with a binary disease outcome. The standard approach is based upon the logistic regression model logit½PrðD ¼ 1jX; . . . ; G p T denotes a genetic marker set containing p genetic variants and S ¼ ½G 1 X; G 2 X; . . . ; G p X T denotes the interaction terms between the genetic marker set and exposure. The null hypothesis for the interaction is H 0 : a gx ¼ 0. Following reverse approach, we assume a linear model for X, where e $ Nð0; r 2 Þ and U ¼ ½G 1 D; G 2 D; . . . ; G p D T is the interaction terms between the genetic set and disease status. We can also build a parametric relationship such that a gx ¼ b gd r 2 . Then, testing for H 0 : a gx ¼ 0 in the logistic model can be evaluated through testing for H 0 : b gd ¼ 0 in the linear model. Based on the theory of linear discriminant analysis (Efron 1975), it is expected that the reverse approach outperforms the reverse test with respect to the statistical power if linear model is correctly specified and the error term follows a homoskedastic normal distribution.
We conducted simulation studies across a wide range of scenarios to evaluate the performance of the proposed approach. Several observations followed from our simulation experiments. First, the reverse test produces correct type I error rates, both for standard or very small P-value threshold of 5 and 0.01%, whether or not there is exposure measurement error. Second, the reverse test generally exhibited greater statistical power than the standard logistic approach, and its relative statistical power improved when the magnitude of the GxE Figure 2 Manhattan plots and Quantile-Quantile plots for the test of interaction effect between 10,079,672 common SNPs and smoking (cigarettes per day) using the reverse test (upper panel) and standard logistic regression test (lower panel). In the Quantile-Quantile plots, k denotes the genomic inflation factor. Red line: genome-wide significance level (P-value¼ 5 Â 10 À8 ). Blue line: suggestive level (P-value¼ 1 Â 10 À5 ). interaction effect or the main effect of exposure increased. Although measurement error in the exposure tended to diminish the power advantage of the reverse test, the reverse test still provided some power gain even under very severe measurement error of q ¼ 0:25. Third, the reverse test is substantially more computationally efficient. It achieves a computation time that is more than sevenfold less than that of the logistic regression test, a great advantage in coping with large-scale genomic studies with millions of SNPs. Fourth, the proposed approach performed reasonably well when the error term is not normal. In summary, the reverse test provided a valid, powerful, and computationally efficient alternative for investigating GxE interactions in large-scale genomic research.
Does the relative computational advantage of the reverse test change as sample size increases? In fact, the ratio of the computation time of the logistic regression test and the reverse test is likely bounded by a constant when the sample size is sufficiently large. Specifically, fitting a linear model with an OLS algorithm has Oðnp 2 Þ computational complexity in general (Drineas et al. 2011;Iyer 2015), where n is the sample size and p is the number of unknown coefficients . The computational complexity of logistic regression model depends on the optimization algorithm used. One popular algorithm is iteratively reweighted least squares (IRLS), which is also the default algorithm for fitting generalized linear models in many statistical softwares (such as R). As an iterative algorithm, the IRLS solves a weighted least squares (WLS) subproblem at each iteration, and the running time of this WLS subproblem is comparable with the OLS algorithm if they share same number of sample size and unknown coefficients (see Supplementary Material Appendix G for more details). In other words, the additional computational burden of the logistic regression test compared to the reverse approach mainly depends on the number of the IRLS iterations, when both tests adjusted for same number of covariates. Although the number of IRLS iterations is affected by initial values, as discussed in Komarek (2004), it is typically around 5 or 10 in most of the scenarios and barely larger than 30. It follows that the computation time of the logistic regression test based on IRLS tends to be several to 10 times longer than the reverse test when the sample size is sufficient large. These sorts of gains in computational efficiency can become quite important when whole-genome scanning, involving 6.4 billion SNPs, are to be undertaken.
In conclusion, given its power advantages and substantial benefits in computing time, the reverse test can be quite a useful tool in investigating GxE interactions, permitting whole-genome scans over many exposures simultaneously, measured with or without measurement error.

Data availability
The authors state that all data necessary for confirming the conclusions of the article are present within the article. A tutorial for implementing the proposed methods in R software is available at https://github.com/chaochengstat/GxE2020. The R codes for replicating the simulation studies and analysis of the VACS illustrative example are available at https://github.com/chaochengstat/ GxE2020/tree/main/Rcode_Genetics. Supplementary Material for this manuscript is attached at the end of the article, which includes Supplementary Material Appendix A-G, Supplementary  Figures S1-S3, and Supplementary Tables S1-S7.
Supplementary material is available at G3 online.