Abstract

Motivation

Many variants identified by genome-wide association studies (GWAS) have been found to affect multiple traits, either directly or through shared pathways. There is currently a wealth of GWAS data collected in numerous phenotypes, and analyzing multiple traits at once can increase power to detect shared variant effects. However, traditional meta-analysis methods are not suitable for combining studies on different traits. When applied to dissimilar studies, these meta-analysis methods can be underpowered compared to univariate analysis. The degree to which traits share variant effects is often not known, and the vast majority of GWAS meta-analysis only consider one trait at a time.

Results

Here, we present a flexible method for finding associated variants from GWAS summary statistics for multiple traits. Our method estimates the degree of shared effects between traits from the data. Using simulations, we show that our method properly controls the false positive rate and increases power when an effect is present in a subset of traits. We then apply our method to the North Finland Birth Cohort and UK Biobank datasets using a variety of metabolic traits and discover novel loci.

Availability and implementation

Our source code is available at https://github.com/lgai/CONFIT.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Over the past few decades, genome wide association studies (GWAS) have found numerous genetic variants associated with phenotypic variation (Dorn and Cresci, 2009; Eskin, 2015; McCarthy et al., 2008). These phenotypes include a wide range of diseases and medically relevant traits such as heart disease (Dorn and Cresci, 2009; Lee et al., 2013; Nikpay et al., 2015), cholesterol level (Postmus et al., 2016) and depression (Cai et al., 2015; Hyde et al., 2016), among others. In some cases, variants have been found to affect multiple traits, a phenomenon known as pleiotropy (Andreassen et al., 2015). For example, multiple psychiatric disorders, immune diseases and nervous system phenotypes have been found to share causal variants (Chen et al., 2016; Chesler et al., 2005; Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013; Solovieff et al., 2013; Zeggini and Ioannidis, 2009). Variants associated with disease have also been found to be associated with tissue-specific gene expression phenotypes (Liu et al., 2016). Considering multiple traits at once may increase power to detect variant effects when there is pleiotropy.

One approach to combine information from different studies is to apply meta-analysis. Meta-analysis methods are often used in GWAS to combine results from different studies on the same trait to increase power (Berndt et al., 2016; Nikpay et al., 2015; Postmus et al., 2016). Intuitively, one can effectively increase the sample size by pooling summary statistics from multiple small studies, which also have the benefit of being more readily obtainable compared to individual level data. The two classic versions of meta-analysis are fixed effects (FE) meta-analysis and random effects (RE) meta-analysis (Fleiss, 1993). In the FE model, a variant is assumed to have the same effect in each study, which is only realistic if all studies in the meta-analysis measure the same phenotype in the same population. If instead the true effect size differs between studies, we say there is heterogeneity. The RE model allows for heterogeneity by assuming study-specific effect sizes are drawn independently from a normal distribution. The binary effects (BE) model also allows for heterogeneity (Han and Eskin, 2012). In BE meta-analysis, a variant may either have an effect of fixed size or no effect in each study (Han and Eskin, 2012). A variant’s configuration of effects across traits may then be expressed as binary vector with entries indicating whether or not the effect is zero for each trait.

However, it is problematic to directly apply meta-analysis to combine studies that analyze different traits for a number of reasons. First, some traits share many causal variants while others share very few. Existing meta-analysis methods do not allow for varying degrees of shared variants between traits, and combining unrelated traits in a meta-analysis may actually decrease power compared to independent analysis of such traits. Second, a variant that affects one trait may have no effect in a different trait. While RE meta-analysis and related methods allow for differences in effect size between studies, such methods inherently assume an effect is present in all studies in the meta-analysis. Finally, studies may share individuals across traits. For example, data on several traits may be collected from the same cohort of individuals. Meta-analysis techniques assume that the studies are independent, but this only holds if the studies are performed on non-overlapping individuals.

In this paper, we present CONFIT, a novel meta-analysis method for multiple traits that addresses these shortcomings. CONFIT estimates the degree of shared effects between traits from the data using GWAS summary statistics, then uses these estimates to analyze multiple traits while allowing effects to be present in only a subset of the traits. CONFIT is inspired by the existence of pleiotropy and its potential to increase power to detect variants that affect multiple traits. Unlike traditional meta-analysis methods, CONFIT is designed to combine GWAS on different traits and does not assume a particular relationship between the different traits. Our test statistic is a likelihood ratio averaged over many models, where each model assumes the variant to have non-zero effect in a particular subset of traits and is weighted by a prior estimated from the data.

We tested CONFIT and show it has increased power compared to multiple independent (MI) GWAS in simulated data when variants have effect in multiple traits. We also show CONFIT accounts for correlated effect size estimates from overlapping individuals between studies. We then demonstrate that CONFIT finds unique loci when combining studies on multiple traits using the North Finland Birth Cohort (NFBC) dataset and the UK Biobank (UKKB) dataset. CONFIT has many potential applications due to the vast variety of GWAS datasets available.

2 Materials and methods

2.1 Finding associated variants in one trait using a genome-wide association study (GWAS)

We now describe how to test a variant v for association in a trait t using a GWAS. Let gvt be the vector of genotype values in nt individuals collected in the study for trait t. Denote entry j in gvt as gvt,j, which corresponds to the genotype of the jth individual in study t, i.e. the number of copies of variant v they possess. Thus gvt,j{0,1,2}. Let xvt be the vector of standardized genotype values in study t. In other words, xvt is obtained by mean-centering and scaling gvt to have a sample variance of 1.

Let yt be the vector of phenotype values in nt individuals for trait t. Assume yt has been centered to have mean 0. Given xvt,yt, GWAS assumes the linear model
yt=βvtxvt+et
(1)
where βvt is the effect of v on trait t and etN(0,σe2I) is gaussian noise (Eskin, 2015). The magnitude of βvt indicates how predictive v is. One then finds the estimated effect β^vt by linear regression. The solution given by ordinary least squares is
β^vt=(xvtxvt)1xvtyk
(2)
where
β^vtN(βvt,(xvtxvt)1σe2)
(3)
Since σe2 is unknown, we estimate it as σ^e2=1nt1||ytβ^vtxvt||22. Let dvt=(xvtxvt)1σ^e2. The summary statistic for v in study t is then the pair (β^vt,dvt). One may also estimate β^vt and dvt using a LMM, which corrects for population structure within the study cohort (Furlotte and Eskin, 2015; Kang et al., 2010).
Because the variance may differ from study to study, we normalize each effect by its standard error to obtain a z-score, where for each variant v, we have
zvt=β^vt/dvtN(λvt,1)
(4)
where λvt is the true normalized effect size. One may then use zvt as a test statistic to test whether v is associated with t. Let α be the desired significance level. If |zvt| exceeds some threshold value zα, or equivalently, p(zvt)=Pr(|z||zvt||H0)α, then we conclude v is significantly associated with t.

Because a typical GWAS may test millions of variants, α should be set to account for multiple testing at the variant level. Say 0.05 is the desired significance level for the whole family of tests. A simple way to correct for multiple testing is to apply the Bonferroni correction, which yields α=0.05/|V|. However, due to the presence of linkage disequilibrium (LD) in the human genome, the Bonferroni correction on the total number of variants is overly conservative. In the GWAS community, αGWAS = 5× 10–8 is commonly accepted as a significance level that takes into account the number of SNPs and presence of LD in the human genome (Consortium, 2005; McCarthy et al., 2008; Pe’er et al., 2008).

2.2 Finding associated variants in at least one of multiple traits using MI GWAS

Suppose we have a variant v and a set of traits T={t1,,tk}, and we are given GWAS effect sizes and variance (β^vt,dvt2) of v for each trait t in T. To perform MI GWAS on a set of traits, one simply performs a GWAS as described above for each variant v on each trait to obtain a vector of z-scores across traits z=(zvt1,zvt2,,zvtk). The MI GWAS test statistic is then maxt|zvt|, or equivalently the smallest GWAS P-value across traits, mintp(zvt). In MI GWAS, one must correct for two levels of multiple testing–multiple variants and multiple traits. If we assume each trait to be an independent test, then we may apply Bonferroni correction for k traits to αGWAS, yielding multiple testing corrected significance level αMI = αGWAS/k. Then v is significant if mint p(zvt) ≤ αMI.

2.3 Finding associated variants using CONFIT

CONFIT attempts to find variants vV that affect at least one of k traits t1,,tk, given summary statistics from a GWAS on each trait. CONFIT assumes each variant v either has zero effect on the trait, or if it has non-zero effect, that its normalized effect size, i.e. its NCP, follows a Fisher polygenic model. We describe whether the variant has non-zero effect in each of the k traits using a binary vector c=[c1ck], where ct = 1 if the variant is active in trait t in that configuration and 0 otherwise.

For convenience, we use a fixed λ for all traits and variants when explaining the test statistic in this section. This fixed λ assumption is very strong. Later, we describe how this assumption can be relaxed to allow different NCPs for each variant v. We also assume the z-scores are independent across studies given the activity configuration, but will also relax this assumption in a later section. Let zv=[zvt1,,zvtk] Then
zvN(λc,I)
(5)
Our test statistic at v is a likelihood ratio with multiple alternate models, where model is a different activity configuration. The statistic has the likelihoods of each alternate configuration against c0, weighted by a prior on each configuration Pr(c). Let C denote the set of all possible configurations and c0 denote the null configuration c0=[00], and CA denote the set of alternate configurations, CA=C{c0}. Then
Fv=cCAp(z|c,λ)Pr(c)p(z|c0)Pr(c0)
(6)

2.4 Setting a prior on each activity configuration

Many choices of prior on the configurations are possible. We set an initial prior Pr0(c) as the fraction of variants which have univariate GWAS P-value less than threshold 104 in the subset of traits that are active in c. We chose 104 as a threshold because we wished to capture shared effects between variants which are not necessarily strong enough to reach GWAS significance. If c contains only one active trait, we set the final prior Pr(c) by averaging Pr0(c) over all configurations c with a single active trait. Otherwise we set Pr(c)=Pr0(c). The reason for this is that the CONFIT model assumes a similar distribution of GWAS z-scores for each trait, but in real life, some traits may tend to have larger effects and others to have smaller effects. We mitigate this by averaging the prior for each trait alone being active. Then traits with large effect sizes will still have high power even with a smaller prior on their configuration, and traits with small effect sizes will now have a power boost with a larger prior. This is the default choice of prior for CONFIT.

2.5 Significance testing with Fv

We now describe how to find a P-value and perform significance testing for variant v using Fv.

We find a null distribution for Fv by generating GWAS summary statistics at a variant v under the null hypothesis, by drawing vector of z-scores for each trait zN(0,I). To generate GWAS summary statistics under the null in the real dataset, one may permute the labels on the set of phenotypes for each trait, such that the correlation between traits is preserved but variant-phenotype correlation is not before performing GWAS, or one could perform GWAS on the real genotypes and simulated phenotypes generated under the null.

The null distribution of Fv also depends on the estimated priors {Pr(c):cC}. Say we have estimated priors {Pr(c):cC} from the data. We generate GWAS summary statistics for 5×109 variants under the null hypothesis and compute Fv on the null data using the {Pr(c):cC} from the original data. Then we have obtained a null distribution for Fv. The P-value of Fv,p(Fv) is the fraction of null variants with test statistic less than Fv. Let pα be the desired P-value threshold. If p(Fv)pα, we then conclude variant v is associated with at least one of the k traits.

In a simulated dataset containing m independent variants, one may set pα as the Bonferroni corrected threshold pα=0.05/m. However, the Bonferroni correction is overly stringent when LD is present between variants, as is the case in real datasets. For the NFBC and UKBB datasets, we perform significance testing with Fv at the P-value threshold pα=5×108. This threshold is widely used by the GWAS community to account for multiple testing across the human genome (McCarthy et al., 2008; Pe’er et al., 2008).

2.6 Setting a prior on the NCP

We now return to our assumption that NCP λvt=λ is fixed for all variants. We instead relax this assumption by allowing each variant to have an NCP drawn from a zero-mean normal distribution with variance σ2, as in the Fisher polygenic model. Consider a vector of z-scores at the same variant across traits, rather than across variants. Recall our earlier simple formulation, with fixed λv for all variants.
(z|λv,c)N(λvc,I)
This assumption about λv is strong and not necessarily realistic. We instead model the NCP for a given variant as a vector, and allow it to differ between traits. Let λv=[λvt1,,λvtk] be the vector of NCPs across traits for variant v. Supposing a true causal status c, we then put a prior on λv:
λv|cN(0,σ2Ik×k)
(7)
where Ik×k is the k-dimensional identity matrix. This prior assumes a Fisher polygenic model on the active traits, where the parameter σ2 is a fixed value set by the user. In our experiments, we set σ2=25. However, the performance is not that sensitive to choice of σ2, as shown in power simulation results for CONFIT with σ2={4,10,36} in Supplementary Table S2.

2.7 Correcting for overlapping individuals across studies

We may also relax the assumption that the estimated effects are independent across traits given the NCPs. This is useful in scenarios where there are overlapping individuals across studies, such as studies where multiple traits are collected from the same individuals. When the cohorts fully overlap between studies (i.e. the k traits are collected from the same individuals), we assume a linear model in each trait
yt1=βvt1xvt1+et1,ytk=βvtkxvtk+etk
(8)
where for each individual j, we have yj=(yt1,j,yt1,j) following the model
yj=βvxv,j+ej
(9)
where ejN(0,σe2Σe). Σe is a k by k covariance matrix representing how the environmental effect on an individual is correlated across traits. Note that under this single-variant linear model,
Σe=Cov(et1,j,,etk,j)=Cov(yt1,j,,ytk,j)
(10)
Let Y be the matrix of phenotype values such that entry yij is the value of ith trait in the jth individual. The correlation between traits can be modeled as a mix of correlation explained by genetics and correlation explained by shared environment. Σe should represent correlation explained by the environment. Assume the proportion of covariance explained by genetics is 50%, i.e. each trait in the analysis is 50% heritable. Then Σe may be estimated as
Σ^e=12(YYn1+Ik×k)
(11)
where n is the number of individuals.
If individual level phenotype data is not available, as is often the case with publicly released summary statistics, Σe may instead be approximated using the correlation between z-scores across traits, assuming that the contribution of any particular variant is small and the heritability is known. Let Z be the matrix of phenotype values such that entry zij is the value of ith trait in the jth SNP. Then if m is the number of SNPs,
Σ^e=12(ZZTm1+Im×m)
(12)

Under this model with correlated environmental effects for each individual, the distribution of zv under the null becomes N(0,Σe) instead of N(0, I), and given a particular alternate configuration c, then z|cN(λc,Σe) instead of N(λc,I). We then compute test statistic Fv as in Equation (13) using this distribution for z to account for correlation due to sharing of individuals between studies.

To generate null CONFIT test statistics to set a significance threshold when studies are correlated, we now draw zN(0,ΣZ), where ΣZ=ZZTm1 is the empirical correlation matrix for the GWAS z-scores. Again assuming that the contribution of any particular variant is small, ΣZ will capture correlation of z-scores between traits due to the environment and due to variants besides the one being tested.

3 Results

3.1 Method overview

CONFIT tests whether variant v affects at least one of k traits t1,,tk, given summary statistics from a GWAS on each trait. Assume that for each trait, variant v either has an effect on the trait or not, and in each trait where there is an effect, v’s non-centrality parameter (NCP) λvt (i.e. its standardized effect size) follows a Fisher polygenic model and is drawn from λvtN(0,σ2). If the variant has non-zero effect on a phenotype, then it is considered ‘active’ in that phenotype. We can then describe a potential activity configuration of a variant in the k traits as a binary vector c=[c1ck], where ct = 1 if it is active in trait t and 0 otherwise. Let C denote the set of all possible configurations, c0 denote the null configuration c0=[00], and CA denote the set of alternate configurations.

The CONFIT test statistic is a sum of the relative likelihoods for each alternate configuration c against c0, weighted by a prior on each configuration Pr(c):
Fv=cCAp(z|c)Pr(c)p(z|c0)Pr(c0)
(13)
where z=[z1,,zk] is a vector of standardized GWAS effect sizes for each trait t, ztN(λvt,1). The null hypothesis is that v is not active in any trait (corresponding to the null configuration c0), and the alternate hypothesis is that v is active in at least one trait. We estimate the prior on configuration c,Pr(c), using GWAS summary statistics for each variant and trait. More details of the method are given in Section 2. We then run CONFIT on simulated datasets to evaluate its performance, and apply it to two real datasets on metabolic traits to find novel variants.

3.2 CONFIT increases power when a variant has effect in multiple traits

To measure the power of CONFIT, we generated simulated GWAS summary statistics for k traits as follows. For each variant, we draw a true effect configuration from a multi-nomial distribution with known probability Prs(c) for each configuration cC, where C is all possible effect configurations. We set Prs(c)=0.005 for each alternate configuration. Then the probability of a variant being active in a given trait is dependent on whether it is active in other traits.

Given the true configuration, for each variant we draw GWAS z-scores with mean zero in traits where there is no effect, and mean λsN(0,25) where there is an effect. For each of the following experiments, we generated a panel of 5×105 variants. We then run CONFIT by setting the priors on each configuration from the 5×105 variants, then computing the CONFIT test statistic F for each variant. We run this experiment in two and three simulated traits.

The CONFIT test statistic threshold is set using 5×109 null simulations for each experiment, and we find no false positives in the simulations. To demonstrate that the threshold is properly calibrated, we compute the genomic control (GC) factor (Devlin and Roeder, 1999) for CONFIT and for GWAS in each trait in the CONFIT analysis (Tables 1 and 2). The GC factor measures how far the median test statistic or P-value deviates from the expected median under the null hypothesis, where larger values indicate more inflation. We find that the GC factor for CONFIT is similar or below the GC factors of the input GWAS. We also show quantile-quantile plots for CONFIT P-values on the NFBC and UKKB datasets in the Supplementary Figure S1.

Table 1.

GC factors for the NFBC dataset

MethodGC
GLU1.000761
HDL0.998390
INS1.002076
LDL0.998764
TG0.997929
CONFIT0.841884
MethodGC
GLU1.000761
HDL0.998390
INS1.002076
LDL0.998764
TG0.997929
CONFIT0.841884

Notes: We report GC factors for univariate GWAS in each trait and for CONFIT on the glucose (GLU), HDL, insulin (INS), LDL and TG traits.

Table 1.

GC factors for the NFBC dataset

MethodGC
GLU1.000761
HDL0.998390
INS1.002076
LDL0.998764
TG0.997929
CONFIT0.841884
MethodGC
GLU1.000761
HDL0.998390
INS1.002076
LDL0.998764
TG0.997929
CONFIT0.841884

Notes: We report GC factors for univariate GWAS in each trait and for CONFIT on the glucose (GLU), HDL, insulin (INS), LDL and TG traits.

Table 2.

GC factors for the UKBB dataset

MethodGC
High cholesterol1.125458
Cholesterol medication1.101478
Insulin medication1.030950
Elevated blood glucose1.031507
CONFIT1.106578
MethodGC
High cholesterol1.125458
Cholesterol medication1.101478
Insulin medication1.030950
Elevated blood glucose1.031507
CONFIT1.106578

Notes: We report GC factors for univariate GWAS in each trait and for CONFIT applied to GWAS summary statistics in four traits.

Table 2.

GC factors for the UKBB dataset

MethodGC
High cholesterol1.125458
Cholesterol medication1.101478
Insulin medication1.030950
Elevated blood glucose1.031507
CONFIT1.106578
MethodGC
High cholesterol1.125458
Cholesterol medication1.101478
Insulin medication1.030950
Elevated blood glucose1.031507
CONFIT1.106578

Notes: We report GC factors for univariate GWAS in each trait and for CONFIT applied to GWAS summary statistics in four traits.

From our power simulations, we find that CONFIT loses power compared to MI GWAS when the variant is only active in one trait, but strongly outperforms MI GWAS when the variant is active in more than one trait (Tables 3 and 4). To understand when CONFIT has more power over MI GWAS, we plotted the H0 rejection region for each method on simulated GWAS z-scores in two traits (Fig. 1A). MI GWAS is slightly more powerful if the GWAS statistic is large in only one trait, but CONFIT is able to detect variants with moderate effects in both traits.

Table 3.

Power simulation in two traits

Uncorrelated studies
Correlated studies
λsN(0,25)1 active trait2 traits1 trait2 traits
GWAS in t10.2900.291
MI GWAS0.2780.4740.2830.481
CONFIT0.2720.5130.2760.540
Uncorrelated studies
Correlated studies
λsN(0,25)1 active trait2 traits1 trait2 traits
GWAS in t10.2900.291
MI GWAS0.2780.4740.2830.481
CONFIT0.2720.5130.2760.540

The power of univariate GWAS in t1 is in italics. Bolded values indicate multi-trait method with highest power for each simulation.

Notes: Here, the probability of each alternate configuration is set as 0.5%. We draw the true NCP for each variant in each trait from a normal distribution, λsN(0,25), either with or without correlation of effect size between traits. For GWAS in t1, we only count simulated SNPs which truly have an effect in t1. We find significant variants using a P-value significance threshold of 5×108. For MI GWAS, we apply the Bonferroni correction to this threshold to account for multiple testing of traits.

Table 3.

Power simulation in two traits

Uncorrelated studies
Correlated studies
λsN(0,25)1 active trait2 traits1 trait2 traits
GWAS in t10.2900.291
MI GWAS0.2780.4740.2830.481
CONFIT0.2720.5130.2760.540
Uncorrelated studies
Correlated studies
λsN(0,25)1 active trait2 traits1 trait2 traits
GWAS in t10.2900.291
MI GWAS0.2780.4740.2830.481
CONFIT0.2720.5130.2760.540

The power of univariate GWAS in t1 is in italics. Bolded values indicate multi-trait method with highest power for each simulation.

Notes: Here, the probability of each alternate configuration is set as 0.5%. We draw the true NCP for each variant in each trait from a normal distribution, λsN(0,25), either with or without correlation of effect size between traits. For GWAS in t1, we only count simulated SNPs which truly have an effect in t1. We find significant variants using a P-value significance threshold of 5×108. For MI GWAS, we apply the Bonferroni correction to this threshold to account for multiple testing of traits.

Table 4.

Power simulation in three traits with 0.5% true probability of drawing each alternate configuration

Uncorrelated studies
Correlated studies
λsN(0,25)1 active trait2 traits3 traits1 active trait2 traits3 traits
GWAS in t10.2830.286
MI GWAS0.2740.4690.6070.2670.4570.602
CONFIT0.2720.5040.6810.2850.5180.697
Uncorrelated studies
Correlated studies
λsN(0,25)1 active trait2 traits3 traits1 active trait2 traits3 traits
GWAS in t10.2830.286
MI GWAS0.2740.4690.6070.2670.4570.602
CONFIT0.2720.5040.6810.2850.5180.697

The power of univariate GWAS in t1 is in italics. Bolded values indicate multi-trait method with highest power for each simulation.

Notes: We draw the true NCP λs from a normal distribution for each variant, λsN(0,25), either with or without correlation of effect size between traits. For GWAS in t1, we only count simulated SNPs which truly have an effect in t1.

Table 4.

Power simulation in three traits with 0.5% true probability of drawing each alternate configuration

Uncorrelated studies
Correlated studies
λsN(0,25)1 active trait2 traits3 traits1 active trait2 traits3 traits
GWAS in t10.2830.286
MI GWAS0.2740.4690.6070.2670.4570.602
CONFIT0.2720.5040.6810.2850.5180.697
Uncorrelated studies
Correlated studies
λsN(0,25)1 active trait2 traits3 traits1 active trait2 traits3 traits
GWAS in t10.2830.286
MI GWAS0.2740.4690.6070.2670.4570.602
CONFIT0.2720.5040.6810.2850.5180.697

The power of univariate GWAS in t1 is in italics. Bolded values indicate multi-trait method with highest power for each simulation.

Notes: We draw the true NCP λs from a normal distribution for each variant, λsN(0,25), either with or without correlation of effect size between traits. For GWAS in t1, we only count simulated SNPs which truly have an effect in t1.

Fig. 1.

Rejection regions for MI GWAS and CONFIT. We ran MI GWAS and CONFIT on simulated GWAS summary statistics in two traits with simulation settings λ2N(0,25) for (A) uncorrelated and (B) correlated studies. In each plot, the variants are color coded black if significant by both MI GWAS and CONFIT (i.e. MI GWAS P-value 2.5×108 and CONFIT P-value 5×108), red if found significant by CONFIT but not MI GWAS, blue if found significant by MI GWAS and not CONFIT, and grey if not found significant by either method

In real datasets, it is possible that some traits will tend have larger or smaller effects than others. To see how CONFIT performs in this case, we also ran simulations where non-zero effects for one trait are drawn from λs1 N(0,4) and λs1 N(0,100), and non-zero effects in the remaining traits are drawn λsN(0,25). We found that CONFIT still increases power when an effect is present in more than one trait (Table 5).

Table 5.

Power simulation in three traits with differing effect size distributions between traits

1 active trait2 traits3 traits
λs1N(0,4)
GWAS in t10.013
MI GWAS0.1820.34040.474
CONFIT0.1980.3840.552
λs1N(0,100)
GWAS in t10.581
MI GWAS0.3660.6050.768
CONFIT0.3470.6270.832
1 active trait2 traits3 traits
λs1N(0,4)
GWAS in t10.013
MI GWAS0.1820.34040.474
CONFIT0.1980.3840.552
λs1N(0,100)
GWAS in t10.581
MI GWAS0.3660.6050.768
CONFIT0.3470.6270.832

The power of univariate GWAS in t1 is in italics. Bolded values indicate multi-trait method with highest power for each simulation.

Notes: In the first trait t1, we draw true effect size λs1N(0,4) or λsN(0,100), and in the other two traits, we draw λsN(0,25). The true probability for each alternate configuration is 0.5%. For GWAS in t1, we only count simulated SNPs which truly have an effect in t1.

Table 5.

Power simulation in three traits with differing effect size distributions between traits

1 active trait2 traits3 traits
λs1N(0,4)
GWAS in t10.013
MI GWAS0.1820.34040.474
CONFIT0.1980.3840.552
λs1N(0,100)
GWAS in t10.581
MI GWAS0.3660.6050.768
CONFIT0.3470.6270.832
1 active trait2 traits3 traits
λs1N(0,4)
GWAS in t10.013
MI GWAS0.1820.34040.474
CONFIT0.1980.3840.552
λs1N(0,100)
GWAS in t10.581
MI GWAS0.3660.6050.768
CONFIT0.3470.6270.832

The power of univariate GWAS in t1 is in italics. Bolded values indicate multi-trait method with highest power for each simulation.

Notes: In the first trait t1, we draw true effect size λs1N(0,4) or λsN(0,100), and in the other two traits, we draw λsN(0,25). The true probability for each alternate configuration is 0.5%. For GWAS in t1, we only count simulated SNPs which truly have an effect in t1.

3.3 CONFIT increases power in polygenic variants when applied to studies with overlapping cohorts

To model the scenario where each trait is measured in the same cohort, i.e. dependent studies, we simulate summary statistics with correlation ΣZ between the z-scores across traits, using ΣZ computed from the Northern Finland Birth Cohort (NFBC) low-density lipoprotein (LDL) and high-density lipoprotein (HDL) traits for simulations in two traits, and from LDL, HDL and triglycerides (TG) for simulations in three traits. We find that the ΣZ estimated from the covariance between individual level phenotypes matches closely with ΣZ estimated from summary statistics (results not shown). We then run CONFIT with the correction for overlapping individuals described in Section 2.7.

Again, we see that CONFIT achieves slightly less power than MI GWAS when the effect is present in one trait, and increased power when the effect is present in more than one trait (Tables 3 and 4). The rejection region for CONFIT is now shifted relative to the rejection region for CONFIT without the overlapping individuals assumption, as shown in Figure 1B.

3.4 CONFIT finds unique loci for metabolic traits in the NFBC

Next, we applied CONFIT to a real dataset, on metabolic traits from the NFBC dataset (Kang et al., 2010; Sabatti et al., 2009). This dataset contains 331 476 variants and 5326 individuals, with data collected in ten traits from each individual. These traits include a variety of metabolic traits. We selected the five traits with at least one SNP with a GWAS P-values less than 104 in two or more traits and ran CONFIT on their summary statistics. These traits were measurements for glucose (GLU), HDL, insulin level (INS), LDL and TG. Note that for MI GWAS with five traits, the significance threshold is 1×108 for the minimum GWAS P-value out of the five traits.

We used pyLMM (https://github.com/nickFurlotte/pylmm) to obtain GWAS summary statistics on the full NFBC cohort for each trait under a linear mixed model (LMM) as in (Kang et al., 2010). Our GWAS results are consistent with those reported by a previous GWAS in the NFBC data also using LMMs (Kang et al., 2010). We report the univariate GWAS P-value in each trait as well as the CONFIT P-value in Table 6. For MI GWAS in five traits, the significance threshold is 1×108.

Table 6.

P-values of peak CONFIT SNPs in analysis of five metabolic traits in NFBC data

Univariate GWAS
ChrPositionrsIDGLUHDLINSLDLTGCONFIT
CONFIT only
819875201rs100966334.5E–013.0E–064.1E–019.3E–011.9E–088.0E–10
1666570972rs2550498.4E–011.7E–087.3E–011.7E–011.9E–012.0E–08
MI GWAS only
1911056030rs116684778.3E–011.8E–021.4E–023.5E–091.7E–026.4E–08
Found by both CONFIT and MI GWAS
1109620053rs6467768.8E–011.2E–011.0E–013.0E–157.6E–01<2.0E–10
221047434rs67281781.6E–016.7E–078.9E–014.8E–081.8E–07<2.0E–10
227584444rs12603262.4E–012.6E–013.2E–012.1E–011.9E–102.0E–10
2169471394rs5608876.9E–138.8E–019.9E–013.8E–016.2E–01<2.0E–10
744177862rs29716714.4E–099.0E–012.4E–015.9E–015.4E–018.6E–09
1192308474rs38475542.4E–103.5E–011.3E–026.2E–015.9E–018.0E–10
1556470658rs15320852.3E–017.2E–125.1E–015.6E–018.8E–02<2.0E–10
1655550825rs37642614.4E–011.0E–327.5E–012.8E–011.2E–01<2.0E–10
Univariate GWAS
ChrPositionrsIDGLUHDLINSLDLTGCONFIT
CONFIT only
819875201rs100966334.5E–013.0E–064.1E–019.3E–011.9E–088.0E–10
1666570972rs2550498.4E–011.7E–087.3E–011.7E–011.9E–012.0E–08
MI GWAS only
1911056030rs116684778.3E–011.8E–021.4E–023.5E–091.7E–026.4E–08
Found by both CONFIT and MI GWAS
1109620053rs6467768.8E–011.2E–011.0E–013.0E–157.6E–01<2.0E–10
221047434rs67281781.6E–016.7E–078.9E–014.8E–081.8E–07<2.0E–10
227584444rs12603262.4E–012.6E–013.2E–012.1E–011.9E–102.0E–10
2169471394rs5608876.9E–138.8E–019.9E–013.8E–016.2E–01<2.0E–10
744177862rs29716714.4E–099.0E–012.4E–015.9E–015.4E–018.6E–09
1192308474rs38475542.4E–103.5E–011.3E–026.2E–015.9E–018.0E–10
1556470658rs15320852.3E–017.2E–125.1E–015.6E–018.8E–02<2.0E–10
1655550825rs37642614.4E–011.0E–327.5E–012.8E–011.2E–01<2.0E–10

Notes: Table contains loci found significant by CONFIT or MI GWAS. The traits used in the analysis are glucose (GLU), HDL, insulin (INS), LDL and triglyceride (TG) levels.

Table 6.

P-values of peak CONFIT SNPs in analysis of five metabolic traits in NFBC data

Univariate GWAS
ChrPositionrsIDGLUHDLINSLDLTGCONFIT
CONFIT only
819875201rs100966334.5E–013.0E–064.1E–019.3E–011.9E–088.0E–10
1666570972rs2550498.4E–011.7E–087.3E–011.7E–011.9E–012.0E–08
MI GWAS only
1911056030rs116684778.3E–011.8E–021.4E–023.5E–091.7E–026.4E–08
Found by both CONFIT and MI GWAS
1109620053rs6467768.8E–011.2E–011.0E–013.0E–157.6E–01<2.0E–10
221047434rs67281781.6E–016.7E–078.9E–014.8E–081.8E–07<2.0E–10
227584444rs12603262.4E–012.6E–013.2E–012.1E–011.9E–102.0E–10
2169471394rs5608876.9E–138.8E–019.9E–013.8E–016.2E–01<2.0E–10
744177862rs29716714.4E–099.0E–012.4E–015.9E–015.4E–018.6E–09
1192308474rs38475542.4E–103.5E–011.3E–026.2E–015.9E–018.0E–10
1556470658rs15320852.3E–017.2E–125.1E–015.6E–018.8E–02<2.0E–10
1655550825rs37642614.4E–011.0E–327.5E–012.8E–011.2E–01<2.0E–10
Univariate GWAS
ChrPositionrsIDGLUHDLINSLDLTGCONFIT
CONFIT only
819875201rs100966334.5E–013.0E–064.1E–019.3E–011.9E–088.0E–10
1666570972rs2550498.4E–011.7E–087.3E–011.7E–011.9E–012.0E–08
MI GWAS only
1911056030rs116684778.3E–011.8E–021.4E–023.5E–091.7E–026.4E–08
Found by both CONFIT and MI GWAS
1109620053rs6467768.8E–011.2E–011.0E–013.0E–157.6E–01<2.0E–10
221047434rs67281781.6E–016.7E–078.9E–014.8E–081.8E–07<2.0E–10
227584444rs12603262.4E–012.6E–013.2E–012.1E–011.9E–102.0E–10
2169471394rs5608876.9E–138.8E–019.9E–013.8E–016.2E–01<2.0E–10
744177862rs29716714.4E–099.0E–012.4E–015.9E–015.4E–018.6E–09
1192308474rs38475542.4E–103.5E–011.3E–026.2E–015.9E–018.0E–10
1556470658rs15320852.3E–017.2E–125.1E–015.6E–018.8E–02<2.0E–10
1655550825rs37642614.4E–011.0E–327.5E–012.8E–011.2E–01<2.0E–10

Notes: Table contains loci found significant by CONFIT or MI GWAS. The traits used in the analysis are glucose (GLU), HDL, insulin (INS), LDL and triglyceride (TG) levels.

Table 7.

P-values of peak CONFIT SNPs in analysis of four metabolic traits in NFBC dataset

Univariate GWAS
ChrPositionrsIDCRPHDLLDLTGCONFIT
CONFIT only
819875201rs100966333.9E–013.0E–069.3E–011.9E–084.0E–09
1666570972rs2550497.8E–011.7E–081.7E–011.9E–014.2E–08
Found by both CONFIT and MI GWAS
1109620053rs6467761.4E–011.2E–013.0E–157.6E–01<2.0E–10
1157908973rs18114721.2E–154.8E–026.1E–018.7E–01<2.0E–10
221047434rs67281785.3E–026.7E–074.8E–081.8E–07<2.0E–10
227584444rs12603265.1E–022.6E–012.1E–011.9E–102.4E–09
12119873345rs26500002.2E–122.8E–016.8E–016.0E–01<2.0E–10
1556470658rs15320857.1E–017.2E–125.6E–018.8E–02<2.0E–10
1655550825rs37642613.2E–011.0E–322.8E–011.2E–01<2.0E–10
1911056030rs116684778.7E–011.8E–023.5E–091.7E–023.4E–08
Univariate GWAS
ChrPositionrsIDCRPHDLLDLTGCONFIT
CONFIT only
819875201rs100966333.9E–013.0E–069.3E–011.9E–084.0E–09
1666570972rs2550497.8E–011.7E–081.7E–011.9E–014.2E–08
Found by both CONFIT and MI GWAS
1109620053rs6467761.4E–011.2E–013.0E–157.6E–01<2.0E–10
1157908973rs18114721.2E–154.8E–026.1E–018.7E–01<2.0E–10
221047434rs67281785.3E–026.7E–074.8E–081.8E–07<2.0E–10
227584444rs12603265.1E–022.6E–012.1E–011.9E–102.4E–09
12119873345rs26500002.2E–122.8E–016.8E–016.0E–01<2.0E–10
1556470658rs15320857.1E–017.2E–125.6E–018.8E–02<2.0E–10
1655550825rs37642613.2E–011.0E–322.8E–011.2E–01<2.0E–10
1911056030rs116684778.7E–011.8E–023.5E–091.7E–023.4E–08

Notes: Table contains peak CONFIT SNPs for loci found significant by CONFIT or MI GWAS. Italics indicates the only locus found significant by (Furlotte and Eskin, 2015) in their joint analysis of all four traits.

Table 7.

P-values of peak CONFIT SNPs in analysis of four metabolic traits in NFBC dataset

Univariate GWAS
ChrPositionrsIDCRPHDLLDLTGCONFIT
CONFIT only
819875201rs100966333.9E–013.0E–069.3E–011.9E–084.0E–09
1666570972rs2550497.8E–011.7E–081.7E–011.9E–014.2E–08
Found by both CONFIT and MI GWAS
1109620053rs6467761.4E–011.2E–013.0E–157.6E–01<2.0E–10
1157908973rs18114721.2E–154.8E–026.1E–018.7E–01<2.0E–10
221047434rs67281785.3E–026.7E–074.8E–081.8E–07<2.0E–10
227584444rs12603265.1E–022.6E–012.1E–011.9E–102.4E–09
12119873345rs26500002.2E–122.8E–016.8E–016.0E–01<2.0E–10
1556470658rs15320857.1E–017.2E–125.6E–018.8E–02<2.0E–10
1655550825rs37642613.2E–011.0E–322.8E–011.2E–01<2.0E–10
1911056030rs116684778.7E–011.8E–023.5E–091.7E–023.4E–08
Univariate GWAS
ChrPositionrsIDCRPHDLLDLTGCONFIT
CONFIT only
819875201rs100966333.9E–013.0E–069.3E–011.9E–084.0E–09
1666570972rs2550497.8E–011.7E–081.7E–011.9E–014.2E–08
Found by both CONFIT and MI GWAS
1109620053rs6467761.4E–011.2E–013.0E–157.6E–01<2.0E–10
1157908973rs18114721.2E–154.8E–026.1E–018.7E–01<2.0E–10
221047434rs67281785.3E–026.7E–074.8E–081.8E–07<2.0E–10
227584444rs12603265.1E–022.6E–012.1E–011.9E–102.4E–09
12119873345rs26500002.2E–122.8E–016.8E–016.0E–01<2.0E–10
1556470658rs15320857.1E–017.2E–125.6E–018.8E–02<2.0E–10
1655550825rs37642613.2E–011.0E–322.8E–011.2E–01<2.0E–10
1911056030rs116684778.7E–011.8E–023.5E–091.7E–023.4E–08

Notes: Table contains peak CONFIT SNPs for loci found significant by CONFIT or MI GWAS. Italics indicates the only locus found significant by (Furlotte and Eskin, 2015) in their joint analysis of all four traits.

Table 8.

P-values of peak SNPs in analysis of four metabolic traits in UKKB dataset

Univariate GWAS
ChrPositionrsIDHigh cholesterolCholesterol medicationInsulin medicationElevated blood GLUCONFIT
CONFIT only
3135925191rs11549885.2E–089.8E–077.2E–012.3E–015.6E–09
773020301rs7991573.2E–083.1E–055.4E–017.3E–013.9E–08
7150690176rs39182263.0E–083.0E–073.9E–011.9E–011.0E–09
1094772638rs107485882.3E–071.5E–068.7E–013.1E–013.0E–08
11126225876rs1127710355.9E–074.2E–063.6E–028.0E–014.8E–08
2017844492rs26185671.6E–086.0E–073.9E–012.4E–011.0E–09
Univariate GWAS
ChrPositionrsIDHigh cholesterolCholesterol medicationInsulin medicationElevated blood GLUCONFIT
CONFIT only
3135925191rs11549885.2E–089.8E–077.2E–012.3E–015.6E–09
773020301rs7991573.2E–083.1E–055.4E–017.3E–013.9E–08
7150690176rs39182263.0E–083.0E–073.9E–011.9E–011.0E–09
1094772638rs107485882.3E–071.5E–068.7E–013.1E–013.0E–08
11126225876rs1127710355.9E–074.2E–063.6E–028.0E–014.8E–08
2017844492rs26185671.6E–086.0E–073.9E–012.4E–011.0E–09

Notes: Table contains peak SNPs found significant by CONFIT (CONFIT P-value 5E–08) only. SNPs found significant by MI GWAS only are shown in the Supplementary Material.

Table 8.

P-values of peak SNPs in analysis of four metabolic traits in UKKB dataset

Univariate GWAS
ChrPositionrsIDHigh cholesterolCholesterol medicationInsulin medicationElevated blood GLUCONFIT
CONFIT only
3135925191rs11549885.2E–089.8E–077.2E–012.3E–015.6E–09
773020301rs7991573.2E–083.1E–055.4E–017.3E–013.9E–08
7150690176rs39182263.0E–083.0E–073.9E–011.9E–011.0E–09
1094772638rs107485882.3E–071.5E–068.7E–013.1E–013.0E–08
11126225876rs1127710355.9E–074.2E–063.6E–028.0E–014.8E–08
2017844492rs26185671.6E–086.0E–073.9E–012.4E–011.0E–09
Univariate GWAS
ChrPositionrsIDHigh cholesterolCholesterol medicationInsulin medicationElevated blood GLUCONFIT
CONFIT only
3135925191rs11549885.2E–089.8E–077.2E–012.3E–015.6E–09
773020301rs7991573.2E–083.1E–055.4E–017.3E–013.9E–08
7150690176rs39182263.0E–083.0E–073.9E–011.9E–011.0E–09
1094772638rs107485882.3E–071.5E–068.7E–013.1E–013.0E–08
11126225876rs1127710355.9E–074.2E–063.6E–028.0E–014.8E–08
2017844492rs26185671.6E–086.0E–073.9E–012.4E–011.0E–09

Notes: Table contains peak SNPs found significant by CONFIT (CONFIT P-value 5E–08) only. SNPs found significant by MI GWAS only are shown in the Supplementary Material.

CONFIT finds two unique loci in the NFBC data compared to MI GWAS. One of these loci (Chr 16, peak SNP rs255049) is significant for HDL under a univariate GWAS threshold, and the other loci (Chr 8, peak SNP rs10096633) has been associated with TG in a larger study from 2010 (Kamatani et al., 2010). CONFIT missed one loci found by MI GWAS only which is GWAS significant for TG only, also shown.

3.5 CONFIT outperforms a multi-variate linear regression model when applied to multiple traits

Next, we compared the performance of CONFIT against another multi-trait analysis method. Previously, Furlotte et al. applied multivate regression with a LMM (implemented in their software mvLMM) to the NFBC dataset using four traits: C-reactive protein (CRP), HDL, LDL and TG (Furlotte and Eskin, 2015). When running mvLMM to CRP, HDL, LDL and TG simultaneously, Furlotte et al. found only one significant locus, which contains SNPs rs1811472, rs2794520, rs2592887 and rs12093699.

We applied CONFIT to the NFBC dataset in these same four traits, again using GWAS summary statistics generated by pyLMM. Results are shown in Table 7. CONFIT in fact finds this locus, as well as nine other loci which were all reported in the univariate LMM analysis performed by (Kang et al., 2010). CONFIT discovers the same loci in these four traits as in the analysis on GLU, HDL, INS, LDL and TG, with the exception of a GLU-specific locus. It also finds a locus (Chr 19, rs11668477) that it missed in the five trait analysis. Although CONFIT can discover SNPs with effects present in only a subset of traits in the analysis, the specific traits chosen will affect its performance.

3.6 CONFIT finds unique loci in the UKKB dataset

We also applied CONFIT to UKKB summary statistics publicly released by Neale lab. We selected four traits related to the metabolic traits we used in the NFBC data. These are: self-reported high cholesterol (phenotype code 20002_1473), use of cholesterol lowering medication (phenotype code 6177_1), use of insulin medication (phenotype code 6177_3) and diagnosis of elevated blood glucose level (phenotype code R73, ICD10 R73). CONFIT finds 6 unique loci (Table 8), MI GWAS finds 44 unique loci (shown in Supplementary Material) and 304 loci are found by both methods (not shown). The loci found by CONFIT are all close to GWAS significance in both the self-reported high cholesterol and use of cholesterol medication phenotypes, whereas the loci it fails to discover are mostly borderline GWAS significant in a single trait (Supplementary Table S1).

4 Discussion

Here, we present CONFIT, a method for detecting associated variants from independent GWAS in multiple traits using summary statistics. We demonstrate our method in simulated data on two and three traits, and on real data up to four traits, though this framework may be applied to larger numbers of traits. CONFIT controls the false positive rate and increases power relative to MI GWAS when the variant is active in multiple traits in the analysis. When the variant is only active in one trait, CONFIT is less powerful than MI GWAS, which is the standard method for analyzing independent traits, so CONFIT does not discover exactly the same SNPs as GWAS. We discover unique loci when applying CONFIT to summary statistics from the NFBC and UKKB datasets.

A related problem exists in the field of eQTL studies, which often collect gene expression data from individuals in multiple tissues. In this case, the phenotypes are a given gene’s expression levels in each tissue, and the problem is to find variants associated with the gene’s expression in at least one tissue. Several approaches have successfully increased power in these multi-tissue eQTL datasets. Examples include MetaTissue (Sul et al., 2013), RECOV (Duong et al., 2017) and eQTL-bma (Flutre et al., 2013). MetaTissue uses RE meta-analysis to combine data from different tissues. RECOV explicitly models correlation between studies using a covariance matrix. eQTL-bma uses configurations to allow heterogeneity and performs Bayesian model averaging using each potential configuration as a model. We note the similarity of our test statistic to that of eQTL-bma, which was developed by Flutre et al. specifically for multi-tissue eQTL context (Flutre et al., 2013). A variant is an eQTL if it is associated with the expression of any gene in any tissue, which is quite likely when there are large number of tissues. For this reason, methods developed for multi-tissue eQTL studies differ from those for traditional GWAS in that eQTL studies typically do not assume a sparse model. In contrast, the majority of variants are believed to have no effect on the majority of disease traits. Hence it is not obvious whether multi-phenotype analysis methods for eQTL studies are also applicable to GWAS. Our results suggest they may be applicable.

The CONFIT framework is general and there are many options for setting the priors on each configuration. Here, we used a relatively simple method to estimate the priors by counting the number of SNPs with GWAS summary statistics that match each configuration. One alternative is to formulate this as an optimization problem and select priors that explicitly maximize power, with some form of regularization to avoid overfitting. Another possibility is to use external information about the variants to set the prior. This has been done previously in eQTL data, where variants in regulatory regions receive a stronger prior for association (Duong et al., 2016).

The count-based prior used here has the disadvantage of not scaling well as the number of traits grows, since as the number of possible configurations grows exponentially, the probability of observing any particular configuration decreases sharply. From a methods viewpoint, count-based methods for setting the prior on each configuration become less and less useful with larger numbers of traits, as the probability of observing any particular configuration amongst the GWAS statistics decreases with the number of traits. From a computational viewpoint, the runtime of CONFIT grows exponentially. For these reasons, we do not recommend running CONFIT on more than 10 traits. If the user has a large set of candidate traits, they may narrow down which traits to include in the analysis by choosing sets of traits with overlapping GWAS significant SNPs. One may use the Jacquard index to measure overlap between traits while also accounting for the fact where one trait may simply have more significant SNPs than other traits.

It is common for GWAS datasets to share individuals between studies. For example, a study may collect both LDL and triglyceride levels from each individual, or controls may be shared across multiple case-control studies. CONFIT handles the cases where the studies use the same cohort by approximating the correlation between traits due to sharing of individuals as proportional to correlation between traits or association statistics. This assumes the effect and residuals are approximately independent, and that any individual SNP or LD block has small effect on the phenotype. In this paper, we assume heritability of 50% when estimating this correlation, but a more sophisticated approach would be to use trait-specific heritability estimates. There are also many other methods to address the issue of overlapping individuals. For example, MetaTissue uses LMMs to model effects in multiple studies with shared individuals (Sul et al., 2013). Although their method was designed for multi-tissue eQTL studies, a similar LMM approach could be applied to combine GWAS. This approach has the advantage of estimating the proportion of the phenotype that can be attributed to sharing of individuals, and applies even if there is only partial overlap between studies. However, it requires individual level data and is relatively computationally expensive.

Several methods for analyzing multiple traits require individual level genotype and phenotype data, such as multi-variate regression. Several methods, such as GEMMA-mvLMM, mvLMM and GAMMA, extend this to use LMMs, which allow for correction of population structure and other covariates (Furlotte and Eskin, 2015; Joo et al., 2016; Zhou and Stephens, 2014). As with traditional meta-analysis, multi-variate regression is not suitable for combining data on arbitary traits and may achieve sub-optimal power for detecting variants that only affect one of the traits tested, or in the case where the variant only affects one trait, which indirectly affects another (Stephens, 2013). Such methods are typically applied to sets of traits that are already believed to share an underlying genetic basis (Furlotte and Eskin, 2015). Thus there is a need for flexible approaches to association testing when the traits only partially share a genetic basis and the study cohorts are not independent between traits.

Funding

L.G. and E.E. was supported by National Science Foundation grants 0513612, 0731455, 0729049, 0916676, 1065276, 1302448, 1320589 and 1331176 and National Institutes of Health grants K25-HL080079, U01-DA024417, P01-HL30568, P01-HL28481, R01-GM083198, R01-ES021801, R01-MH101782 and R01-ES022282.

Conflict of Interest: none declared.

References

Andreassen
 
O.A.
 et al.  (
2015
)
Genetic pleiotropy between multiple sclerosis and schizophrenia but not bipolar disorder: differential involvement of immune-related gene loci
.
Mol. Psychiatry
,
20
,
207
214
.

Berndt
 
S.I.
 et al.  (
2016
)
Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia
.
Nat. Commun
.,
7
,
10933.

Cai
 
N.
 et al.  (
2015
)
Sparse whole-genome sequencing identifies two loci for major depressive disorder
.
Nature
,
523
,
588
591
.

Chen
 
L.
 et al.  (
2016
)
Genetic drivers of epigenetic and transcriptional variation in human immune cells
.
Cell
,
167
,
1398
1414.e24
.

Chesler
 
E.J.
 et al.  (
2005
)
Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function
.
Nat. Genet
.,
37
,
233
242
.

Consortium
 
T.I.H.
(
2005
)
A haplotype map of the human genome
.
Nature
,
437
,
1299
1320
.

Cross-Disorder Group of the Psychiatric Genomics Consortium
. (
2013
)
Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis
.
Lancet
,
381
,
1371
1379
.

Devlin
 
B.
,
Roeder
K.
(
1999
)
Genomic control for association studies
.
Biometrics
,
55
,
997
1004
.

Dorn
 
G.W.
,
Cresci
S.
(
2009
)
Genome-wide association studies of coronary artery disease and heart failure: where are we going?
Pharmacogenomics
,
10
,
213
223
.

Duong
 
D.
 et al.  (
2016
)
Using genomic annotations increases statistical power to detect eGenes
.
Bioinformatics
,
32
,
i156
i163
.

Duong
 
D.
 et al.  (
2017
) Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes.
Bioinformatics
,
33
,
i67
i74
.

Eskin
 
E.
(
2015
)
Discovering genes involved in disease and the mystery of missing heritability
.
Commun. ACM
,
58
,
80
87
.

Fleiss
 
J.
(
1993
)
Review papers: the statistical basis of meta-analysis
.
Stat. Meth. Med. Res
.,
2
,
121
145
.

Flutre
 
T.
 et al.  (
2013
)
A statistical framework for joint eQTL analysis in multiple tissues
.
PLoS Genet
.,
9
,
e1003486.

Furlotte
 
N.A.
,
Eskin
E.
(
2015
)
Efficient multiple-trait association and estimation of genetic correlation using the matrix-variate linear mixed model
.
Genetics
,
200
,
59
68
.

Han
 
B.
,
Eskin
E.
(
2012
)
Interpreting meta-analyses of genome-wide association studies
.
PLoS Genet
.,
8
,
e1002555.

Hyde
 
C.L.
 et al.  (
2016
)
Identification of 15 genetic loci associated with risk of major depression in individuals of european descent
.
Nat. Genet
.,
48
,
1031
1036
.

Joo
 
J.W.J.
 et al.  (
2016
)
Efficient and accurate multiple-phenotype regression method for high dimensional data considering population structure
.
Genetics
,
204
,
1379
1390
.

Kamatani
 
Y.
 et al.  (
2010
)
Genome-wide association study of hematological and biochemical traits in a japanese population
.
Nat. Genet
.,
42
,
210
215
.

Kang
 
H.M.
 et al.  (
2010
)
Variance component model to account for sample structure in genome-wide association studies
.
Nat. Genet
.,
42
,
348
354
.

Lee
 
J.-Y.
 et al.  (
2013
)
A genome-wide association study of a coronary artery disease risk variant
.
J. Human Genet
.,
58
,
120
126
.

Liu
 
G.
 et al.  (
2016
)
Cis-eQTLs regulate reducedLST1gene andNCR3gene expression and contribute to increased autoimmune disease risk: table 1
.
Proc. Natl. Acad. Sci
.,
113
,
E6321
E6322
.

McCarthy
 
M.I.
 et al.  (
2008
)
Genome-wide association studies for complex traits: consensus, uncertainty and challenges
.
Nat. Rev. Genet
.,
9
,
356
369
.

Nikpay
 
M.
 et al.  (
2015
)
A comprehensive 1000 genomes–based genome-wide association meta-analysis of coronary artery disease
.
Nat. Genet
.,
47
,
1121
1130
.

Pe’er
 
I.
 et al.  (
2008
)
Estimation of the multiple testing burden for genomewide association studies of nearly all common variants
.
Genet. Epidemiol
.,
32
,
381
385
.

Postmus
 
I.
 et al.  (
2016
)
Meta-analysis of genome-wide association studies of HDL cholesterol response to statins
.
J. Med. Genet
.,
53
,
835
845
.

Sabatti
 
C.
 et al.  (
2009
)
Genome-wide association analysis of metabolic traits in a birth cohort from a founder population
.
Nat. Genet
.,
41
,
35
46
.

Solovieff
 
N.
 et al.  (
2013
)
Pleiotropy in complex traits: challenges and strategies
.
Nat. Rev. Genet
.,
14
,
483
495
.

Stephens
 
M.
(
2013
)
A unified framework for association analysis with multiple related phenotypes
.
PLoS One
,
8
,
e65245.

Sul
 
J.H.
 et al.  (
2013
)
Effectively identifying eQTLs from multiple tissues by combining mixed model and meta-analytic approaches
.
PLoS Genet
.,
9
,
e1003491.

Zeggini
 
E.
,
Ioannidis
J.P.
(
2009
)
Meta-analysis in genome-wide association studies
.
Pharmacogenomics
,
10
,
191
201
.

Zhou
 
X.
,
Stephens
M.
(
2014
)
Efficient multivariate linear mixed model algorithms for genome-wide association studies
.
Nat. Meth
.,
11
,
407
409
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Supplementary data