-
PDF
- Split View
-
Views
-
Cite
Cite
Aika Terada, Ryo Yamada, Koji Tsuda, Jun Sese, LAMPLINK: detection of statistically significant SNP combinations from GWAS data, Bioinformatics, Volume 32, Issue 22, November 2016, Pages 3513–3515, https://doi.org/10.1093/bioinformatics/btw418
- Share Icon Share
Summary: One of the major issues in genome-wide association studies is to solve the missing heritability problem. While considering epistatic interactions among multiple SNPs may contribute to solving this problem, existing software cannot detect statistically significant high-order interactions. We propose software named LAMPLINK, which employs a cutting-edge method to enumerate statistically significant SNP combinations from genome-wide case–control data. LAMPLINK is implemented as a set of additional functions to PLINK, and hence existing procedures with PLINK can be applicable. Applied to the 1000 Genomes Project data, LAMPLINK detected a combination of five SNPs that are statistically significantly accumulated in the Japanese population.
Availability and Implementation: LAMPLINK is available at http://a-terada.github.io/lamplink/.
Contact: [email protected] or [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.
1 Background
Genome-wide association studies (GWASs) have identified hundreds of loci associated with various complex human traits (Welter et al., 2014). These studies conduct screening of individual single nucleotide polymorphisms (SNPs) using statistical tests to assess the association of each SNP with a phenotype. However, this procedure is known to cause the ‘missing heritability’, namely, a large proportion of heritability remains unexplained by loci identified (Maher, 2008), hence it is increasingly important to evaluate combinatorial effects of SNPs (Wei et al., 2014).
Several types of software have been developed to detect interactions among SNPs related to a phenotype (Purcell et al., 2007; Zhang and Liu, 2007; Calle et al., 2010; Wan et al., 2010; Kam-Thong et al., 2012; Van Lishout et al., 2013). However, few methods can simultaneously overcome two major problems. One is that statistical validity is not performed. Most methods that can enumerate higher-order interactions do not evaluate statistical significance of the results. The other is that a combination size is limited in practical application. Existing statistical techniques such as logistic regression and multifactor dimensionality reduction can be used to find combinatorial effects. When we investigate all combinatorial effects, these techniques have to be applied to all possible combinations, which is too computationally intensive. Both problems need to be overcome if high-order interaction analysis is to be successfully performed.
A recently proposed statistical method called Limitless Arity Multiple-testing Procedure (LAMP) (Terada et al., 2013) provides a possibility of detecting statistically significant higher-order interactions. LAMP is a multiple testing procedure for listing statistically significant combinatorial effects by introducing a theoretical upper bound of family-wise error rate tighter than Bonferroni correction. Its application to GWAS analysis may uncover synergistic effects of SNPs associated with diseases.
We therefore developed LAMPLINK, a software that incorporates LAMP with a widely used GWAS analysis software PLINK (Purcell et al., 2007). Applied to the 1000 Genomes Project data, it detected a combination of five SNPs accumulated in the Japanese population with statistical significance.
2 Methods and implementation

Overview of LAMPLINK. (a) Workflow to detect statistically significant SNP combinations. (b) Two significant combinations including three and five SNPs (IDs 3 and 10 in Supplementary Table S3) detected in Procedure 1. Each petal corresponds to the P-value of a single SNP, and the central circle represents that of the SNP combination. Color shows the adjusted P-values. The P-value of the combination was smaller than the P-values of any single SNP, suggesting the existence of an epistatic effect among the three SNPs. (c) Detected combination after Procedure 2. ID 10 has been eliminated because it includes pairs of SNPs whose . (d) Manhattan plot of P-values from the test of the association between the Japanese population and other populations. Crosses represent significant SNP combinations in (b). The horizontal line indicates the adjusted significance level (5.49e-07)
2.1 Detection of statistically significant SNP combinations
The ––lamp option with ––model-dom (or ––model-rec) can be used for enumerating statistically significant SNP combinations (Procedure 1 in Fig. 1a). The input and output filenames are specified with the ––file (or ––bfile for binary format) and ––out options, respectively. When you set ––model-dom, LAMPLINK detects statistically significant combinations of SNPs according to a dominant exclusive model, whereas ––model-rec uses a recessive exclusive model. These two genetic models are defined in Supplementary Text.
LAMPLINK results are exported to files: ‘<lamp_out_filename >.lamp’ and ‘<lamp_out_filename >.lamplink’. The former file reports all SNP combinations statistically significantly associated with the phenotype. The latter file reports detailed information about each SNP in a format similar to the result generated by PLINK for association analysis. All columns of the result files are listed in Supplementary Table S2.
2.2 Elimination of redundant SNP combinations
Procedure 1 may end up listing combinations of SNPs that are in the same linkage disequilibrium (LD) region, which may prevent understanding of SNP–phenotype associations. The ––lamp-ld-remove option is useful to filter out uninformative combinations (Procedure 2 in Fig. 1a). Using this option eliminates SNP combinations whose members have r2 higher than the user-specified threshold, on the assumption that they are located in the same LD region. If all r2 scores computed for SNP pairs in each chromosome are higher than the threshold, the combination is removed.
3 Analysis of exome data
We applied LAMPLINK to human exome data provided by the 1000 Genomes Project (The 1000 Genomes Project Consortium, 2010), including 12 758 SNPs and 697 individuals from seven populations. We demonstrated a case–control study by regarding 105 Japanese individuals as cases and the remaining as controls. We detected combinations of SNPs accumulated in Japanese with statistical significance. All of the settings and commands for this demonstration are described in Supplementary Text. The experiments were run on a machine with an Intel Xeon E5-2680v2 processor at 2.6 GHz running Red Hat Enterprise Linux 6.4.
We compared the time performance of LAMPLINK with the ––epistatic option in PLINK, which exhaustively analyzes the relationship of pairs of SNPs to a phenotype. The calculation time of LAMPLINK was 21.281 s, whereas PLINK required over 150 min to investigate all pairs of SNPs, showing that LAMPLINK has the ability to identify combinatorial effects of SNPs within a short time despite investigating all possible combinations of SNPs. A detailed time performance analysis of LAMPLINK is provided in Supplementary Text.
Procedure 1 detected 106 statistically significant SNP combinations, including 10 SNP combinations that consisted of three or more SNPs (Supplementary Table S3). These combinations could not be detected by PLINK.
Figure 1(b) illustrates two statistically significant combinations (IDs 3 and 10 in Supplementary Table S3). ID 3 consisted of three SNPs located within the genes PCDHGA1, VPS13C and VNN3. These SNPs are located within different genes on different chromosomes (Fig. 1d). ID 10 consisted of five SNPs. Four SNPs are located within the same gene MTRR, and hence this combination is eliminated in Procedure 2 (Fig. 1c). We discuss these results in detail in Supplementary Text.
These two results show that LAMPLINK has the ability to detect statistically significant SNP combinations from genome-wide case–control data. By replacing the phenotype with a disease, it might be possible to identify causal mutations of complex diseases. LAMPLINK is the first implementation that can detect statistically sound high-order interactions from tens of thousands of markers. Hence, LAMPLINK may contribute to the identification of combinatorial effects from multiple markers by re-analysis of existing GWAS datasets.
4 Future work
LAMPLINK currently supports two genetic models (dominant and recessive exclusive models), but it cannot handle the combination of recessive and dominant models (known as the jointly recessive-dominant model for two loci; Li and Reich, 2000) due to a theoretical limitation in LAMP. Future work includes supporting the jointly recessive-dominant model as well as the threshold (Greenberg, 1981) and additive models (Neuman and Rice, 1992), which may help solving the problem of missing heritability.
We also plan to incorporate other statistical models into LAMPLINK for analyzing various types of data. For example, statistical assessment using the Mann–Whitney U-test or a regression model is useful to analyze numerical traits data. Additionally, LAMP has been developed to avoid spurious results caused by a confounding variable (e.g. age or gender of patients) (Terada et al., 2016). Incorporating these methods will greatly improve the versatility of our software.
Acknowledgement
We thank Assist Prof. David duVerle and Dr. Raissa Relator for insightful comments and suggestions. Supercomputing resources were provided by NIG (ROIS).
Funding
This work was supported by Precursory Research for Embryonic Science and Technology (PRESTO), Japan Science and Technology Agency (JST) to A.T.; Core Research for Evolutional Science and Technology (CREST), JST to R.Y., K.T. and J.S.; and KAKENHI [14469361 to R.Y., 15H05711 to K.T. and 15H01717, 15H05713 and 24240044 to J.S.].
Conflict of Interest: none declared.
References
Author notes
Associate Editor: Oliver Stegle