PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics

Abstract Background Identifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restricted access to much individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. Two state-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using only genome-wide association study (GWAS) summary results. Results Here, we present an integrated R toolkit, PhenoSpD, to use LD score regression to estimate phenotypic correlations using GWAS summary statistics and to utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest that it is possible to identify nonindependence of phenotypes using samples with partial overlap; as overlap decreases, the estimated phenotypic correlations will attenuate toward zero and multiple testing correction will be more stringent than in perfectly overlapping samples. Also, in contrast to LD score regression, metaCCA will provide approximate genetic correlations rather than phenotypic correlation, which limits its application for multiple testing correction. In a case study, PhenoSpD using UK Biobank GWAS results suggested 399.6 independent tests among 487 human traits, which is close to the 352.4 independent tests estimated using true phenotypic correlation. We further applied PhenoSpD to an estimated 5,618 pair-wise phenotypic correlations among 107 metabolites using GWAS summary statistics from Kettunen's publication and PhenoSpD suggested the equivalent of 33.5 independent tests for these metabolites. Conclusions PhenoSpD extends the use of summary-level results, providing a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics. This is particularly valuable in the age of large-scale biobank and consortia studies, where GWAS results are much more accessible than individual-level data.


Abstract
Background: Identifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restricted access to individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. State-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using genome-wide association study (GWAS) summary statistics.
Results: Here, we present an integrated R toolkit, PhenoSpD, to 1) apply metaCCA (or LD score regression) to estimate phenotypic correlations using GWAS summary statistics; and 2) to utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest it is possible to estimate phenotypic correlation using samples with only a partial overlap, but as overlap decreases correlations will attenuate towards zero and multiple testing correction will be more stringent than in perfectly overlapping samples. In a case study, PhenoSpD using GWAS results suggested 324.4 independent tests among 452 metabolites, which is close to the 296 independent tests estimated using true phenotypic correlation. We further applied PhenoSpD to estimated 7,503 pair-wise phenotypic correlations among 123 metabolites using GWAS summary statistics from Kettunen et al. and PhenoSpD suggested 44.9 number of independent tests for theses metabolites. Conclusion: PhenoSpD integrates existing methods and provides a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics, which is particularly valuable for post-GWAS analysis of complex molecular traits.
A v a i l a b i l i t y : R code and documentation for PhenoSpD V1.0.0 is available online (https://github.com/MRCIEU/PhenoSpD).

Introduction
Phenotypic correlations between complex human traits and diseases provide useful etiological insights. For GWAS metaanalysis, a lack of individual-level phenotype data makes it difficult to estimate the phenotypic correlation across human traits and diseases. Here we consider two methods that estimate phenotypic correlations using GWAS summary statistics: metaCCA (Cichonska et al., 2016) and bivariate LD score regression (Bulik-Sullivan et al., 2015b). The metaCCA framework estimates phenotypic correlation between two traits based on a Pearson correlation between two univariate regression coefficients (betas) across a set of genetic variants; The bivariate LD score regression approach estimates the phenotypic correlation amongst the overlapping samples of two GWAS.
The recently developed MR-Base (Hemani et al.2016) and LD Hub (Zheng et al., 2017) tools include harmonized GWAS summary-level results. This provides an opportunity to estimate the phenotypic correlation structure across a wide range of high-dimensional, complex molecular traits, such as metabolites, that are potentially highly correlated. Bonferroni correction would markedly overcorrect for the inflated false-positive rate in such correlated datasets, resulting in a reduction in power. An appropriate method to correct for multiple testing among human traits and diseases based on the spectral decomposition of matrices (SpD) (Nyholt, 2004;Li and Ji, 2005) is considered in this study. Figure 1 illustrates the key steps of the proposed pipeline, PhenoSpD: step (1) harmonise GWAS summary results fro the same sample; step (2) apply the harmonized GWAS results to metaCCA or LD score regression to estimate the ph notypic correlation matrix of the traits; step (3) apply the phenotypic correlation matrix to the SpD approach and estim the number of independent variables among the traits.

Simulation and validation of phenotypic correlation estimation
Firstly, we simulated the influence of the number of single nucleotide polymorphisms (SNPs), sample sizes of two GWASs and sample overlap between two GWASs on the accuracy of the phenotypic correlation estimation. As shown Figure 2, we first created two samples A and B with different number of individuals (from 300 to 10,000 individuals), where the sample overlap between sample A and B ranged from 10% to 90%. Within each sample, we further assume complex human traits were influenced by both genetic and environmental factors. We simulated the phenotype data o two correlated human traits (phenotype 1 and phenotype 2) based on varying numbers of genetic factors (ranging from to 10,000 SNPs) and 100 environmental factors. We then simulated the genotype data in each sample. After simulatin the two phenotypic traits and the genotypic data in sample A and B, we then conducted four GWASs (GWASs of phe type 1 in sample A and B; GWASs of phenotype 2 in sample A and B) and recorded the summary statistics of these from phetimate wn in ls), me of om 10 ing heno-GWASs. To measure the accuracy of phenotypic correlation using GWAS summary statistics, we (1) calculated the observational phenotypic correlation (the Pearson correlation) of trait 1 and trait 2 in sample A and B separately; (2) estimated the phenotypic correlation of trait 1 and trait 2 using GWAS summary statistics of sample A and B separately. We simulated step (2) 100 times and estimate the mean and standard deviation of the estimated phenotypic correlations. Finally, we compared the estimated phenotypic correlation with the observational phenotypic correlation and recorded the deviation between observed and estimated correlations. To demonstrate the simulation systematically, we set up 4 groups of comparisons: (i) tested the influence of sample size; (ii) simulated the influence of sample overlap; (iii) estimated the influence of unbalanced sample size in sample A and B; (iv) tested the influence of number of SNPs. The R script for this simulation were provided as an supplementary file (simulation.R). We then tested the accuracy of phenotypic correlation estimation using GWAS summary statistics of 452 metabolites from Shin's (Shin et al, 2014). Shin et al. also reported the observational phenotypic correlation in the supplementary table (Table S4), which was used as bench mark of our real case accuracy test. Based on the simulation and real case validation, we listed our traits selection criteria in Table S1.

Figure 2.
Demonstration of the simulation. For two samples A and B, we simulated the genotype data and pheno type data of two correlated human traits, phenotype 1 and phenotype 2. The sample overlap between sample A and B were ranged from 10% to 90% in this simulation.

Estimating the phenotypic correlations
Within our GWAS summary results database containing 1094 human traits, we selected 123 metabolites from Kettunen et al as a real case application (Kettunen et al, 2016) since these complex molecular traits are potentially highly correlat ed. We then applied metaCCA to these 123 metabolites to estimate the phenotypic correlation matrix (Table S2). Among the 123 metabolites, we further applied LD score regression to 107 of them (Table S3), which fit the assumptions of LD score regression (traits with large sample size (e.g. N > 5,000), good SNP coverage (e.g. number of SNPs > 200,000) and heritable (e.g. SNP heritability > 0.05)).

Multiple testing correction for human traits
We applied the SpD approach to correct for multiple testing among the 123 metabolites. We implemented the R code o the well-known method, SNPSpD (Nyholt, 2004;Li and Ji, 2005), to estimate the number of independent traits using the phenotypic correlation matrix as input (Fig. 1). The output of the SpD function is the estimated the number of independ ent tests. Table 1 show the influence of number of SNPs, sample sizes of two GWASs and sample overlap between two GWASs on the accuracy of the phenotypic correlation estimation. We found that the accuracy of phenotypic correlation estimation is mainly influenced by the number of overlapped individual in two GWAS studies. For example, the deviation between observed and estimated phenotypic correlation (Deviation_obs_est) improved from 82.1% to 5.8% when the percentage of sample overlap between two samples increased from 10% to 90%. Base on this simulation, we only applied metaCCA to GWASs from the same study to maximize the sample overlap between GWASs. In addition, we observed that the number of SNPs in GWAS will also influence the accuracy of the phenotypic correlation estimation. We should include as many SNPs as possible to maximize the accuracy of the estimation. Besides, we did not observe any major influence of the sample size of the GWAS on the accuracy of the estimation, so we included metabolites GWASs with various o sample sizes.  In this simulation, we compared the agreements of the observational (calculated from phenotypes) and estimated phenotypic correlation ( We also tested the accuracy of phenotypic correlation estimation by comparing the observed phenotypic correlations (Ta ble S4) and the estimated phenotypic correlation (Table S5) using real data from Shin's (Shin et al, 2014). Figure 3 showed the estimated phenotypic correlations have a very high agreement with the observed phenotypic correlations (r =0.89). The only exception is that some metabolites with large observed correlation have estimated correlation towards null.

Evaluation of phenotypic correlation estimation using simulated and real GWAS summary data
To further investigate this exception, we measured the difference between observed and estimated phenotypic correla tions by using the sum of errors of phenotypic correlation (the sum of the differences between one metabolites on the res metabolites). As show in Figure 4, the points with high level of errors (disagreements) are metabolites with limited sam ple size. The metabolites with limited sample size also have a limited number of sample overlap with other metabolites which drive the phenotypic correlation estimation towards null.  ae 3 (r 2 rds arest mtes, ach s of one A practical comparison between metaCCA and LD score regression on estimating phenotypic correlation Both metaCCA and LD score regression has its advantages and limitations on estimating phenotypic correlation. In this section, we summarized the practical difference between the two to inform the PhenoSpD users on how to choose the appropriate methods. MetaCCA can be applied to almost all GWASs (e.g. in our simulation, the sample size>300 and the number of SNPs>1000). However, 1) the genetic effect of SNPs may bias the phenotypic correlation estimation; 2) it only provides the central estimate of the phenotypic correlation; 3) it is difficult to quantify the effect of sample overlap. LD score regression is designed to estimate genetic correlation between a pair of human traits. As a side product, it also provides the pair-wise phenotypic correlation estimation with standard errors. It deals with sample overlap automatically (when there is no sample overlap between two GWASs, the phenotypic correlation estimation will be zero). However, its application is limited to traits with large sample size (e.g. N > 5,000), good SNP coverage (e.g. number of SNPs > 200,000) and heritable (SNP heritability at least > 0.05) to fit the assumptions of LD score regression (Bulik-Sullivan et al., 2015a).

The phenotypic correlations of the human metabolome
In a real case study, we applied both metaCCA and LD score regression to the human metabolome. We firstly estimated 108,978 pair-wise phenotypic correlations among these 123 metabolites from Kettunen et al (Kettunen et al, 2016) and 452 metabolites from Shin et al (Shin et al, 2014). More details of the metabolites were listed in Table S2. The phenotypic correlations estimated by metaCCA were presented in Table S5 and S6. Among the 123 metabolites from Kettunen et al, we further selected 107 of them (Table S3), which fit the assumptions of LD score regression analysis (details of the assumptions were listed in methods section). We then estimated 5,618 pair-wise correlations using LD score regression. The phenotypic correlation structure estimated by LD score regression was presented in Table S7. Table 2 shows the number of independent traits for two high-dimensional, complex metabolites datasets. PhenoSpD using GWAs results suggested 324.4 independent tests among 452 metabolites from Shin et al., which is close to 296 independent tests estimated using real phenotypic correlation. For metabolites from Kettunen et al, PhenoSpD suggested 44.9 number of independent tests for theses metabolites, which greatly reduced the dimensionality for these complex molecular traits. More details of the multiple testing correction are listed in Table S8.

Discussion
In this study, we present an integrative method which allows phenotypic correlation estimation and multiple testing correction for human phenome using GWAS summary statistics. We illustrate the application of PhenoSpD by estimating the phenotypic correlation structure of the correlation structure of 123 metabolites from Kettunen's study for the very first time (Kettunen et al, 2016). These results showcase the ability of PhenoSpD to estimate an appropriate multiple testing correction for complex molecular traits.

Advantages and limitations of PhenoSpD
There are two key advantages of PhenoSpD. Firstly, PhenoSpD integrated the phenotypic correlation estimation function of metaCCA and LD score regression, with the spectral decomposition of matrices, which provides a simple way of correcting multiple testing for the human phenome using only GWAS summary results. Secondly, such multiple testing correction is still stringent (since limited sample overlap between two GWASs will drive correlated traits towards null), but more appropriate than Bonferroni correction, which is particularly valuable for GWAS of complex molecular traits. It can also be used as an indicative threshold for post-GWAS data mining tools such as MR-Base and LD Hub. For limitation, PhenoSpD can only be used for human traits from the same sample, which is a general limitation of estimating phenotypic correlation using GWAS summary statistics.