Multivariate genome-wide association analysis by iterative hard thresholding

Abstract Motivation In a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive. Results We present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA’s linear mixed models and mv-PLINK’s canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits. Availability and implementation Software, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelIHT.jl.


IHT on LD-Pruned Genotypes
In Figure 1 of the main text, we compared the false positive counts of IHT, CCA, and mvLMM. Similar to Table 1 in the main text, we also computed the power for detecting independent and pleiotropic SNPs in these simulations. These results are presented in Figure 1.

Additional Simulation Studies for IHT
In the main text, our simulation study explores the situation where causal SNPs are shared roughly equally among all traits. This is a reasonable assumption because we expect most multivariate GWAS to be conducted on similar traits, which are expected to share a similar number of causal variants. However, in practice, researchers may confront both highly polygenic traits along with very non-polygenic traits.
Here we describe a simulation study with three traits, where each trait is perturbed by 10, 100, or 1000 causal SNPs, respectively. We use n = 10, 000 unrelated subjects from the UK Biobank and restrict analysis to a genotype matrix constructed from 29, 481 SNPs on chromosome 10. Traits Y r×10000 are simulated just as in our main simulation study. We ignore the genetic relationship matrix since related subjects have been For the rth trait, r ∈ {1, 2, 3}, the effect sizes of the causal SNPs j ∈ S r,causal are Gaussian deviates β j ∼ N(0, 0.1) with |S 1,causal | = 10, |S 2,causal | = 100, and |S 3,causal | = 1000. The causal SNP indices are chosen uniformly across the chromosome. The covariance between traits, Σ 3×3 , is generated as in our main simulation.

Loglikelihood
Consider multivariate linear regression with r traits under a Gaussian model. Up to a constant, the loglikelihood for the response vector y i of subject i can be written where B is the r × p matrix of regression coefficients, x i is the p × 1 vector of predictors, and Γ is the r × r unstructured precision (inverse covariance) matrix. For n independent subjects, let Y be the r × n matrix with ith column y i and let X be the p × n design matrix with ith column x i . Then the loglikelihood for all subjects In subsequent sections we will present both full and block ascent IHT. The former updates B and Γ simultaneously. The latter alternates updates of B and Γ, holding the other parameter block fixed.

First Directional Derivative
Recall that the Hadamard's semi-directional derivative [1,2] of a function f (x) in the direction v is defined as the limit To calculate the directional derivative of the loglikelihood (eq 1 in main text), we perturb B in the direction U and Γ in the symmetric direction V. The sum and product rules then give [2]. The trace properties tr(CD) = tr(DC) and tr(C T ) = tr(C) consequently imply Because this last expression is linear in (U, V), the loglikelihood is continuously differentiable.

Second Directional Derivative
Now we take the directional derivative of the directional derivative (1.1) in the new directionsŨ andṼ. This action requires the inverse rule dṼΓ −1 = −Γ −1Ṽ Γ −1 proved in Example 3.2.7 of [2]. Accordingly, we find We also calculate Finally, setting the two directions equal so thatṼ = V andŨ = U produces the quadratic form generated by the second differential.

Extraction of the Gradient and Expected Information
To extract the gradient from a directional derivative, we recall the identity d v f (x) = ∇ f (x) T v for vectors v and x and the identity tr(A T B) = vec(A) T vec(B) for matrices A and B [5]. The first identity shows that the directional derivative is the inner product of the gradient with respect to the direction v. The second displays the trace function as an inner product on dimensionally identical matrices. Thus, the matrix directional derivative is Inspection of the directional derivative (1.1) now leads to the gradient with blocks Analogously, the quadratic form (1.2) implicitly defines the Hessian H through the identity It follows that J BB = XX T ⊗ Γ. Similarly, In summary, the expected information matrix takes the block diagonal form In our projected steepest ascent algorithm, the expected information matrix is never explicitly formed. It is implicitly accessed in the step-size calculation through the associated quadratic form Q(B, Γ).

Full IHT Step Size
Let (B, Γ) be the matrix of regression coefficients and variance components obtained by horizontally contatentaing Γ ∈ R r×r to B ∈ R r×p . The next iterate in full IHT is the projection of the point where C m = ∇ B L and W m = ∇ Γ L evaluated at (B m , Γ m ). The loglikelihood along the ascent direction is a function of the scalar t m and can be approximated by the second-order expansion The choice maximizes the approximation. If the support of the matrix (B, Γ) does not change under projection, then this IHT update is particularly apt.

IHT Projection
Recall that full IHT iterates according to where ∆ m+1 is derived in equation (1.7). Here k is a positive integer representing the sparsity level, which is assumed known. In practice k is found through cross-validation. The projection P S k (∆) splits into separate projections for B and Γ. One can independently project each row of B to sparsity. Alternatively, one can require each row of B to have the same sparsity pattern if the same set of predictors plausibly contribute to all r traits. The Γ projection must preserve symmetry and positive semidefiniteness. Symmetry is automatic because the gradient of Γ is already symmetric. To project to positive semidefiniteness, one takes the singular value decomposition of Γ and project its eigenvalues λ to nonnegativity. One can even project Γ to the closest positive definite matrix with an acceptable condition number [6].

The Block Ascent IHT
In block ascent we alternate updates of B and Γ. The exact update (eq 4 of main text) of Γ is particularly convenient, and we take advantage of it. Symmetry and positive semidefiniteness are automatically preserved.
Inversion can be carried out via Cholesky factorization of Γ. This choice of Γ simplifies the step length .
Note that the denominator of the step size does not require formation of the n × n matrix X T C T m ΓC m X. One can write tr( where L is the Cholesky factor of Γ. The matrix L T C m X is fortunately only r × n.

UK Biobank Runtime Script
Here is the script used to perform our UK Biobank analysis