A maximum kernel-based association test to detect the pleiotropic genetic effects on multiple phenotypes

Abstract Motivation Testing the association between multiple phenotypes with a set of genetic variants simultaneously, rather than analyzing one trait at a time, is receiving increasing attention for its high statistical power and easy explanation on pleiotropic effects. The kernel-based association test (KAT), being free of data dimensions and structures, has proven to be a good alternative method for genetic association analysis with multiple phenotypes. However, KAT suffers from substantial power loss when multiple phenotypes have moderate to strong correlations. To handle this issue, we propose a maximum KAT (MaxKAT) and suggest using the generalized extreme value distribution to calculate its statistical significance under the null hypothesis. Results We show that MaxKAT reduces computational intensity greatly while maintaining high accuracy. Extensive simulations demonstrate that MaxKAT can properly control type I error rates and obtain remarkably higher power than KAT under most of the considered scenarios. Application to a porcine dataset used in biomedical experiments of human disease further illustrates its practical utility. Availability and implementation The R package MaxKAT that implements the proposed method is available on Github https://github.com/WangJJ-xrk/MaxKAT.


Introduction
The advances in high-throughput technology allow researchers to investigate the occurrence, development and treatment of disease from the genetic perspective. Up to now, over ten thousands of disease-associated single-nucleotide polymorphisms (SNPs) have been detected via genome-wide association study (GWAS) based on single trait per variant analysis. However, GWAS suffers from power loss because of weak marginal effects, ignorance of interactions among genetic variants, and strict multiple comparison policy. To make up for these defects, the analysis based on gene sets is a good alternative, where genetic variables are grouped into different sets based on biological knowledge. Many gene-based tests for analysis between a single trait and multiple SNPs have been proposed. For example, Yu et al. (2009) proposed an adaptive rank truncated product test, Wu et al. (2011) developed a sequence kernel association test (SKAT), Pan et al. (2014) constructed an adaptive sum of powered U-score (aSPU) test, and Svishcheva et al. (2019) extended diverse powerful methods developed for gene-based association analysis to situations using the GWAS summary statistics as inputs.
There is growing evidence suggesting the wide existence of pleiotropic genetic effects, i.e. many SNPs are prone to be simultaneously associated with several human complex traits, especially those located in one gene or one pathway (Stearns 2010, Razeto-Barry et al. 2011, Zhang et al. 2017, Wu et al. 2022, Bu et al. 2023. For example, the gene coding for the enzyme phenylalanine hydroxylase not only relates to phenylketonuria, but also affects multiple phenotypes via diverse mutations within this gene, such as mental retardation, seizures, and hair pigmentation (Paul 2000). As a result, it is rational to conduct joint analysis of multiple phenotypic traits. The RV coefficient (Escoufier 1973), which generalizes the Pearson correlation coefficient to multivariate situation, is a useful tool for this issue and provides insight into linear relationships between two sets of variables. And to accommodate diverse association types, it has been extended to cases of nonlinear relationships by using two definite kernel functions instead of the linear kernel, and the corresponding test is known as the kernel-based association test (KAT). KAT has been widely used in genetic and genomic studies. For example, Broadaway et al. (2016) used KAT to test crossphenotype effects of rare variants, Chen et al. (2016) used the kernel-based regression model to conduct microbiome association studies, Lee et al. (2017) proposed a SKAT-based multivariate association procedure to identify rare variants related to multiple phenotypes, Zhan et al. (2017a) proposed a kernel-based RV statistic to investigate the association between host gene expression and microbiome composition, and Zhan et al. (2017b) generalized the KAT to test associations for high-dimensional structured phenotypes.
Despite its wide usage, KAT is subject to substantial power loss when multiple phenotypes are moderately to strongly correlated. To overcome this drawback, we propose a maximum kernel-based association test (MaxKAT). We first conduct eigenvalue decomposition on KAT and rewrite KAT as a weighted summation of angles between two eigenspaces. By adjusting the weights, we establish a series of tests to describe the association strength at different levels. Since the optimal test is unknown, we choose the most significant one among them as the final test. For convenience of calculation, a generalized extreme value distribution is used to calculate the statistical significance of MaxKAT under the null hypothesis. In addition, the proposed test can accommodate highdimensional data and yield high power against various alternative hypotheses. Extensive simulation studies show its superior performance over KAT via a wide range of scenarios. And application to a biomedical porcine dataset demonstrates its efficiency in practical situations.

Materials and methods
Suppose n individuals are randomly sampled from a source population. For each individual, data of m SNPs in a target gene or region on a chromosome andm phenotypes of interest are collected. Let X ¼ ðx 1 ; x 2 ; . . . ; x n Þ > and Y ¼ ðy 1 ; y 2 ; . . . ; y n Þ > be the corresponding n Â m and n Âm observed data matrix, respectively, where the superscript > denotes the transpose of a matrix or a vector, x i ¼ ðx i1 ; x i2 ; . . . ; x im Þ > and y i ¼ ðy i1 ; y i2 ; . . . ; y im Þ > are the genotypic and phenotypic data for the ith individual, respectively, i ¼ 1; 2; . . . ; n. The null hypothesis holds that these m SNPs are not associated withm phenotypes, and the alternative hypothesis is that at least one SNP is associated with at least one phenotype.

Weight-adjusted KATs
Given two definite kernel functions kðÁ; ÁÞ andkðÁ; ÁÞ, two similarity matrices S ¼ ðs ij Þ nÂn andS ¼ ðs ij Þ nÂn can be calculated via s ij ¼ kðx i ; x j Þ ands ij ¼kðy i ; y j Þ, respectively, for i; j ¼ 1; 2; . . . ; n. Then KAT (Chen et al. 2016, Lee et al. 2017, Zhan et al. 2017a is where trðÁÞ indicates the trace of a matrix and H ¼ I n À 1 n 1 n 1 > n is the centering matrix with I n being the identity matrix and 1 n being an n-dimensional column vector with all 1 units.
To overcome the defect of KAT and develop a powerful and robust test, we take another look at KAT. Based on eigenvalue decomposition strategy, we have where k 1 ! k 2 ! Á Á Á ! k n andk 1 !k 2 ! Á Á Á !k n are the eigenvalues of HSH and HSH, respectively, and q 1 ; q 2 ; . . . ; q n andq 1 ;q 2 ; . . . ;q n are their corresponding eigenvectors. Then KAT can be rewritten as This indicates that KAT is a weighted summation of angles between two sets of eigenvectors. These angles represent the association strengths between genotypes and phenotypes projected on different coordinate levels, and their weights are determined by the corresponding eigenvalues. This summation, however, may cause information loss when some important information contained in angles corresponding to small eigenvalues is concealed owing to their small weights. To deal with this issue, the relative differences among weights need to be adjusted to ensure that important association information becomes visible in the summation. However, the angles vary across different situations and the optimal weights are difficult to know beforehand. Therefore, we next consider power weights and construct a series of weight-adjusted KATs (abbreviated as wKAT) by changing weights in KAT as where h andh are two positive constants. Note that the parameter 1 n in KAT has been changed to 1 n hþh À1 in wKATðh;hÞ for the convenience of asymptotic property derivation, which does not influence the statistical significance of wKAT.
Obviously, when h ¼h ¼ 1; wKATðh;hÞ becomes KAT. It is expected that large weights should give to small angles and small weights to large angles. To detect how h andh affect the power of wKATðh;hÞ, we conduct simulations to see their performances. Simulation results show that when elements within multiple variates (such as the m dimensional genotypes) are weakly related, index (such as h) larger than 1 helps wKAT obtain higher power than KAT does. On the other hand, if elements within multiple variates (such as them dimensional phenotypes) are strongly related, index (such ash) should set to be less than 1 to boost the power. The detailed simulation settings and simulation results are given in Supplementary Section S1 in the Supplementary Materials. The simulations presented above for wKAT indicate that the distribution of eigenvalues of kernel matrices is affected by the correlation among variables, and the resulting eigenvalue-based weights impact the performance of wKAT. To give a succinct explanation for this phenomenon, we use a linear kernel kðy 1 ; y 2 Þ ¼ Pm r¼1 y 1r y 2r to construct a kernel matrix. Under this condition, the eigenvalues of the kernel matrix are n-1 times those of the sample covariance matrix. Therefore, when the correlation strengths among variables are strong, the eigenvalue distribution of the covariance matrix becomes uneven, resulting in an uneven eigenvalue distribution of the kernel matrix. For instance, let us consider m ¼ 100. When multiple variants are independent with D y ¼ Im , all the eigenvalues of D y are 1. However, when multiple variants are correlated, and D y ¼ ð1 À qÞIm þ q1m 1 > m with q ¼ 0:1, the first eigenvalue of D y is 10.9, and the remaining 99 eigenvalues are all 0.9. Similarly, when D y ¼ ð1 À qÞIm þ q1m 1 > m with q ¼ 0:9, the first eigenvalue of D y is 90.1, with the rest of the eigenvalues being 0.1. Therefore, if variables are strongly related, the eigenvalue distribution of the kernel matrix is very uneven. Under this condition, signals in KAT weighted by small weights are potentially to be masked by those with larger weights, which leads to a loss of power in KAT. Powers of eigenvalues (i.e. h;h) less than 1 can alleviate this uneven weight distribution, making all the signals visible. Thus, power weight adjustment in wKAT can help enhance its power.
We can write wKATðh;hÞ as wKATðh;hÞ ¼ 1 where ðHSHÞ h can be seen as an extension of HSH that depicts similarities between pairs of x i s at a different level indexed by h. Denote the (i, j)th element of ðHSHÞ h as d h;ij , and that of HSH as d ij . Given h ¼ 0:5, we have jl . This indicates that d ij calculates similarity between the ith and jth subjects by integrating their respective similarities with all the other subjects at the level of h ¼ 0:5 as a whole. Similarly, when h ¼ 2, d h;ij ¼ P n l¼1 d il d jl , indicating that d h;ij calculates similarity between the ith and jth subjects by integrating their respective similarities with all the other subjects measured by the kernel function kðÁ; ÁÞ as a whole. So compared with d ij , d h;ij ðh 6 ¼ 1Þ can be treated as a similarity measure to some extent.

A maximum kernel-based association test
Although wKATðh;hÞ is valid for arbitrary positive constants h andh, good choices on h andh can lead to power gains over KAT. Since the optimal choices of h andh, which are always unknown beforehand, vary with the true relationships between genotypes and phenotypes, we propose a MaxKAT as where H andH are two sets of candidate choices for h andh, respectively, andl hh andD 2 hh are respective estimates of the mean and variance of wKATðh;hÞ, which can be expressed aŝ We would like to point out that smaller hs (those approaching zero) make all the eigenvalues approach one, resulting in almost the same weights for different angles. In contrast, larger hs may highlight angles corresponding to large eigenvalues overly. In order to strike a balance, candidate values for h andh are set to be close to one. The results of simulations and real data studies show that H ¼H ¼ ð 1 2 ; 3 4 ; 1; 2; 3Þ is sufficient in most scenarios, and we use this setting in our work.
Since the joint distribution of multiple wKATs is complex, we apply the permutation procedure to the significance calculation for MaxKAT. Once calculating the observed MaxKAT using the original data, we simultaneously permute the rows and columns of matrix HSH for a great many times (denoted as N), say 1000 times, and plug the permuted HSH into the expression of MaxKAT to obtain N permuted MaxKATs. Then the P-value of the observed MaxKAT is the proportion of all these permuted MaxKATs larger than the observed one.
On the other hand, since MaxKAT is the maximum over a Gaussian random field (See Theorem 2 of Supplementary Section S2 in the Supplementary Materials), we can use the generalized extreme value (GEV) distribution to approximate its distribution. The cumulative distribution function of a GEV random variable Z is where j; a; g are unknown parameters. Denote the sample mean, variance, and skewness of the N permutation replicates obtained above by f 1 , f 2 , and f 3 , respectively. Then by matching them with the three corresponding theoretical moments, we have the following three equations as Then parameters j; a; and g can be estimated by solving these equations and can be denoted asĵ;â; andĝ, respectively. Given the observed value of the MaxKAT, t, its P-value is calculated by plugging the estimatesĵ;â; andĝ into 1 À FðtÞ.
Note that there exist some inference methods for KAT-type statistics that are free from doing permutations, such as the KRV method proposed by Zhan et al. (2017a). These methods can be used to approximate the distribution of KAT and wKAT. However, they are not applicable to the distributional approximation of MaxKAT, since MaxKAT is a maximumtype statistic, whose moments are very complicated to calculate. So we use the permutation strategy in this work.
As described above, the significance of MaxKAT can be calculated through two procedures, the permutation strategy and the GEV distribution approximation. We denote them as MaxKAT.perm and MaxKAT.gev, respectively. Though both MaxKAT.perm and MaxKAT.gev rely on the permutation strategy to calculate P-values, MaxKAT.gev has a computational advantage over MaxKAT.perm. That is, MaxKAT.gev allows to obtain P-values that are arbitrarily small, which will reduce computational cost greatly when the significance level is stringent. For example, when the number of permutation is 1000, the minimum P-value that MaxKAT.perm can get is 0.001, while the minimum P-value that MaxKAT.gev can get is arbitrarily small. Under this condition, when the significance level is stringent, such as 1 Â 10 À6 , the number of permutation for MaxKAT.perm needs to be no less than 1 Â 10 6 to obtain meaningful results, but that for MaxKAT.gev can still be 1000. We conduct simulation to demonstrate this point in Section 3.2.
When phenotypes are of mixture type, i.e. including both continuous and binary traits, we can use two types of kernel to form a mixture one.
When there are covariates, such as sex, age, and population stratification confounders, to be adjusted for, we can first regress phenotypes on covariates to obtain residuals, which can be treated as "new phenotypes", and then use these "new phenotypes" in the subsequent association analysis. Applications of this strategy are illustrated in the later data analysis.

Simulation studies
In this section, we conduct simulations to evaluate MaxKAT from the perspective of the type I error rates and powers, computational time and kernel function choice.

Type I error rates and powers
We conduct extensive simulations to demonstrate the type I error rates and powers of MaxKAT by comparing with KAT. The sample size is set to 100. And the dimensions of genotypes and phenotypes are, respectively, chosen from f20; 50; 100; 200; 500g. To generate genotypes and phenotypes, we first sample latent variables z i1 and z i2 from the multivariate normal distribution Nð1 50 ; I 50 Þ, and then produce x i and y i based on z i1 and z i2 , respectively. To be specific, x i is sampled from Nðz > i1 b x ; D x Þ and y i is generated from the following model where function f ðÁÞ is set to be element-wise for simplicity. We set three types of models to generate y i : Since the association strengths between pairs of genotype decrease as their physical distances increase, D x is set to have autoregressive structure with its (i, j)th element being q jiÀjj . To model different association strengths within multiple phenotypes, two covariance structures are considered for D y : the autoregressive structure mentioned above and the compoundsymmetry structure D y ¼ ð1 À qÞIm þ q1m 1 > m . We set q to be 0.1, 0.5, and 0.9 to indicate diverse association strengths. For the sake of simplicity, parameter matrices b x and b y are set to be equal constant matrices, and the values of their elements b are listed in Table 1. To mimic the coding of genotype data which is coded as 0, 1, and 2, x i is transferred to 0, 1, and 2 based on two quantile values of ð1 À qÞ 2 and 1 À q 2 to consider the Hardy-Weinberg equilibrium, where q ¼ 0.3 is the minor allele frequency. The recoded x i is still denoted as x i for the notational simplicity. Under the null hypothesis of no association, z i1 and z i2 are sampled independently so that x i and y i are independent. When the alternative hypothesis is true, z i1 and z i2 are set to be equal.
We apply both MaxKAT.perm and MaxKAT.gev to the simulation study to investigate the accuracy of GEV approximation strategy. The significance of KAT is also calculated via permutation procedure to make the results comparable. For each scenario, 1000 repetitions are conducted and the type I error rates and powers are obtained by calculating the proportions of P-values less than the significance level of 0.05. Given each dataset, we use the inner product-based kernel (Price et al. 2006) described above to calculate genetic similarities and use the Gaussian kernel to calculate phenotypic similarities.
The type I error rates are presented in Figs 1 and 2, and Supplementary Figs S1-S4 in the Supplementary Materials. The purple horizontal line corresponding to 0.05 in each plot is drawn for better reference. It can be seen that all these three methods KAT, MaxKAT.perm, and MaxKAT.gev can control the type I error rates properly, with their P-values fluctuating around the level of 0.05. For example, when D y is of autoregressive structure (Fig. 1) Fig. 4 as an example. In this figure, the powers of MaxKAT.perm and MaxKAT.gev are close, and they both are higher than that of KAT in all the considered scenarios. MaxKAT has higher powers even when the data are of high dimension. To be specific, as the dimension m increases, the power We also consider situations with different association patterns. The dimensions m andm are chosen from f50, 100, 200g. Phenotypes are generated via Model I. On one hand, to consider situations where multiple genetic variables and multiple phenotypes are partially related, we respectively let a proportion of r elements in x i and y i be related to the potential variants, with proportion r being chosen from f20%; 50%; 80%g. On the other hand, to consider different effect sizes, non-zero values in b x and b y are generated from the uniform distribution Unifð0:5; 1:5Þ for autoregressive D y ðq ¼ 0:5Þ and Unifð1:5; 2:5Þ for compound symmetry D y ðq ¼ 0:5Þ. The bar plots for powers of different methods are shown in Figs 5 and 6. These results show that when the variables are partially related to different effect sizes, the newly proposed MaxKAT still outperforms KAT in detecting association relationships.

Computational time comparison
As described in Section 2.2, the GEV-based MaxKAT.gev has a computational advantage over MaxKAT.perm and KAT when the significance level is stringent. We conduct simulations to illustrate this point. Since MaxKAT.perm and KAT    MaxKAT 5 calculate P-values through the permutation strategy, the number of permutation N increases as the significance level gets more stringent. For example, N ¼ 1000 is sufficient when the significance level is 5 Â 10 À2 , but N needs to be no less than 1 Â 10 6 when the significance level is set to be 1 Â 10 À6 . On the contrary, 1000 is always sufficient to calculate P-values for MaxKAT.gev, no matter how stringent the significance level is. We let N choose values from f1 Â 10 3 ; 1 Â 10 4 ; 1 Â 10 5 ; 1 Â 10 6 g for KAT and MaxKAT.perm, and keep N for MaxKAT.gev to be 1000, to investigate their run time.
Dimensions m andm are set to be equal, taking values from f50, 100, 200g. And phenotypes are generated via Model I with compound symmetry D y ðq ¼ 0:5Þ. In each scenario, the repetition time is 10 to calculate the average running time for each method. The results are presented in Table 2. Just as expected, the computational time for KAT and MaxKAT.perm increases as N gets longer, i.e. as the significance level gets more stringent. And MaxKAT.perm is much more time-consuming than MaxKAT.gev as N increases, demonstrating the advantage of GEV approximation in situations with stringent significance levels. Though KAT is more time-saving than MaxKAT.gev when N is no larger than 1 Â 10 5 , the running time of MaxKAT.gev is acceptable and it achieves much higher powers, which has been demonstrated before. And the running time of KAT excels that of MaxKAT.gev when N is 1 Â 10 6 , verifying the advantage of MaxKAT.gev further under conditions of stringent significance level.

Comparison of different choices on kernel functions
Note that although we use the inner product-based kernel (for genetic data) and Gaussian kernel (for phenotypic data) in the above analysis, other types of kernels can also be used. We conduct simulations to investigate the effect of kernel choice on the performances of different methods. Four types of kernel choice are considered: I Apply inner product-based kernel to genetic data, and Gaussian kernel to phenotypic data.
II Apply IBS kernel to genetic data, and Gaussian kernel to phenotypic data. III Apply inner product-based kernel to genetic data, and polynomial kernel to phenotypic data. IV Apply IBS kernel to genetic data, and polynomial kernel to phenotypic data.
Data are generated via Model I. And the powers of different methods for autoregressive D y ðq ¼ 0:5Þ and compound symmetry D y ðq ¼ 0:5Þ are displayed in Figs 7 and 8, respectively. It can be seen that the choices of kernel functions do have a remarkable effect on powers of all the methods, and the combination of inner product-based kernel and Gaussian kernel performs better than other kernels in the scenarios considered here. Since how to choose kernel functions is not the focus of our attention, we omit this issue in the present work and leave it to further study. However, it is noteworthy that although the powers vary with different kernel functions, MaxKAT outperforms KAT in all conditions with diverse kernel functions.

Applications to biomedical porcine datasets
Serum lipids are important indicators for health status of humans. Their abnormal concentrations are associated with cardiovascular disease and diabetes. In clinical tests, six blood lipid traits, including total cholesterol (TC), triglycerides (TG), atherosclerosis index (AI), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein (HDL-C), and the ratio of the HDL-C and LDL-C (denoted as HDL-C/LDL-C), are widely used in risk assessment of cardiovascular disease. Locating genetic variants regulating these traits might be useful for the prevention, identification, and treatment of the disease.
Pigs have been used as biomedical experimental materials of human disease research for decades. Yang et al. (2015) genotyped 61 565 SNPs of the pig genome and conducted GWAS on these six blood lipid traits to identify the trait-related SNPs. In this work, we apply KAT and MaxKAT to the Erhualian pig dataset in Yang et al. (2015) to compare their performances. Erhualian is a Chinese indigenous pig breed which is well known for its high prolificacy. We first preprocess the data using quality control. Three hundred and thirtythree pigs are genotyped on 61 565 SNPs and phenotyped on the 6 blood lipid traits mentioned above. After removing observations with missing trait values, data from 322 pigs remain. Quality control is also conducted on genotypes. Specifically, only SNPs with missing value proportion being <5%, minor allele frequency being larger than 1%, and minimum genotype cell number being larger than 5 can be used for further analysis. So 20 687 SNPs remain to be tested.
We apply multidimensional scaling (MDS) (Li and Yu 2008) to the covariance information of the SNP data to correct for population stratification, which is implemented in the R package AssocTests (Wang et al. 2020). Ten principal coordinates are generated and treated as confounder covariates. Six traits are regressed on these covariates to remove their effects, and the residuals are used to calculate the similarity matrix via the Gaussian kernel function. The inner productbased kernel (Price et al. 2006) is used to calculate the genetic similarity matrix. h andh are set to take on values in  Wang et al.
respectively, while KAT gives a P-value of 0.072. These values indicate that MaxKAT detects the association relationship between phenotypes and genotypes, whereas KAT misses it.
MaxKAT and KAT can also be used in association analysis based on SNP regions, such as genes, pathways, and SNP intervals. In the GWAS analysis of Yang et al. (2015), eight SNPs of Erhualian are identified to be related to the phenotypes of interest. To compare the performances of MaxKAT and KAT in the SNP region-based analysis, we apply both methods to SNP intervals of length 3 Mb that are, respectively, centered at these SNPs. First, we conduct the same quality control procedure as described before in these intervals. The numbers of SNPs in these SNP intervals before and after quality control (denoted as m 1 and m 2 , respectively) are presented in Table 3. For each SNP interval, we use SNPs in the corresponding interval to calculate the genotypic similarity matrix via the inner product-based kernel function. The phenotypic similarity matrix is the same as that used before. The resulting P-values for all eight SNP intervals are shown in Table 3. It can be seen that both KAT and MaxKAT detect the association relationships of the first two SNP intervals, but they do not identify the association relationships of the rest SNP intervals except for the fifth SNP interval. This is because that these SNPs are identified by Yang et al. (2015) via GWAS analysis based on single trait and single SNP, while KAT and MaxKAT conduct association analysis based on multiple traits and multiple SNPs, where association signals may be weakened when multiple traits and multiple SNPs are partially correlated. However, as for the fifth SNP interval, the P-values of KAT MaxKAT.perm and MaxKAT.gev are 0.020 and 0.023, respectively, confirming the finding reported in Yang et al. (2015). In contrast, the P-value obtained using KAT is 0.065, indicating a failure to detect this association relationship. These results suggest that MaxKAT performs better than KAT in identifying association signals based on SNP regions.

Discussion
Association testing methods being free of constraints on data dimension and data structure are in great demand nowadays in the big data era. The kernel-based approach is a good alternative to this issue and has been generally recommended (Chen et al. 2016, Lee et al. 2017, Zhan et al. 2017a. However, it suffers from substantial power loss when the correlation among phenotypes is moderate to strong. To handle this problem, we propose a maximum type kernel-based association test, MaxKAT, and use a generalized extreme value distribution to appropriate its distribution. Simulation studies and real data analysis demonstrate the superiority of MaxKAT over KAT, verifying its efficiency and sensitivity in detecting pleiotropic genetic effects for multiple phenotypes.
As MaxKAT relies on a permutation procedure to calculate its P-value, it takes more time to implement MaxKAT in larger sample sizes. However, compared with distributional approximation-based methods, this permutation-based method is free from requirement on the sample size and has high powers even when sample sizes are small. Besides, given that MaxKAT boosts the power in detecting association relationships remarkably, its time cost is acceptable.
Although MaxKAT is developed for genetic pleiotropy study, it can also be applied to other areas, such as   MaxKAT 7 microbiome data analysis and genomic study, where association analysis between microbiome composition and phenotypes, and between gene expression levels and genetic variants are conducted, respectively. We would like to point out that MaxKAT can also be used to handle data with more complex structures such as tree, network, and image, only if a reasonable kernel is available. The R codes for the implementation of MaxKAT are assembled in the R package named MaxKAT which is posted on GitHub https://github.com/WangJJ-xrk/ MaxKAT.