Phenotype-specific differences in polygenicity and effect size distribution across functional annotation categories revealed by AI-MiXeR

Abstract Motivation Determining the relative contributions of functional genetic categories is fundamental to understanding the genetic etiology of complex human traits and diseases. Here, we present Annotation Informed-MiXeR, a likelihood-based method for estimating the number of variants influencing a phenotype and their effect sizes across different functional annotation categories of the genome using summary statistics from genome-wide association studies. Results Extensive simulations demonstrate that the model is valid for a broad range of genetic architectures. The model suggests that complex human phenotypes substantially differ in the number of causal variants, their localization in the genome and their effect sizes. Specifically, the exons of protein-coding genes harbor more than 90% of variants influencing type 2 diabetes and inflammatory bowel disease, making them good candidates for whole-exome studies. In contrast, <10% of the causal variants for schizophrenia, bipolar disorder and attention-deficit/hyperactivity disorder are located in protein-coding exons, indicating a more substantial role of regulatory mechanisms in the pathogenesis of these disorders. Availability and implementation The software is available at: https://github.com/precimed/mixer. Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
The rapid technological advances of the last years have provided an enormous amount of genetic data, promoting the development of statistical methods aimed at unraveling the genetic architecture of complex traits (Evans et al., 2018). A key effort has been to estimate single nucleotide polymorphism (SNP)-based heritability, either using individuallevel genotype data (Yang et al., 2010), or summary-level statistics from genome-wide association studies (GWAS) (Bulik-Sullivan et al., 2015). However, heritability estimates provide a limited picture of the genetic architecture underlying complex phenotypes. For example, they are agnostic about the number of genetic variants influencing a phenotype and their effect sizes (Timpson et al., 2018): both of these quantities can vary and still result in the same heritability, which is proportional to their product (Frei et al., 2019;Holland et al., 2020b). Importantly, the proportion of variants influencing a phenotype (polygenicity) and the variance their effect sizes (discoverability) substantially affects the power of GWAS and may inform the design of future genetic studies to maximize discovery (Schork et al., 2016;Smeland et al., 2019).
Recently, we developed a model which allows the breakdown of SNP-heritability into the number of variants influencing a given phenotype (non-null variants) and the distribution of their effect sizes using summary statistics from GWAS and detailed populationspecific linkage disequilibrium (LD) structure (Frei et al., 2019;Holland et al., 2020b). The model assumes that the non-null variants are distributed uniformly throughout the genome and that their effect sizes are drawn from a Gaussian distribution. However, prior genetic studies suggest that non-null variants are differentially enriched across functional genomic categories and complex phenotypes (Schaub et al., 2012;Schork et al., 2013). Here, we present a model, Annotation Informed (AI)-MiXeR, which extends our previous work by allowing different (non-overlapping) predefined functional annotation categories of the genome to have various densities of non-null variants with different effect size distributions.
Several conceptually related methods that aim to characterize the genetic architecture of phenotypes using GWAS summary statistics have recently been developed. The partitioned LD score regression (LDSC) analysis (Finucane et al., 2015) estimates the proportion of SNP-based heritability explained by variants within predefined functional categories but does not estimate the abundance of non-null variants or assess their effect sizes. The RSS-E method (Zhu and Stephens, 2018) only estimates the abundance of non-null variants in different annotation categories, while the distribution of effect sizes of non-null variants is assumed to be the same for all annotation categories. The GENESIS model (Zhang et al., 2018) allows several groups of trait-susceptibility variants with different densities and effect size distributions but assumes the non-null variants to be uniformly distributed among the groups and does not support prior group definition (e.g. in terms of functional annotation categories). In contrast to these methods, AI-MiXeR allows simultaneous modeling of abundance and effect size magnitudes of non-null variants in arbitrary predefined functional annotation categories.
Here, we extensively tested AI-MiXeR on synthetic GWAS data generated under various setups to establish scenarios where it reconstructs the underlying parameters correctly. We then applied AI-MiXeR to GWAS summary statistics for 11 complex phenotypes representing a range of diverse human traits and diseases. Our analysis suggests that both densities and effect sizes of non-null variants vary considerably across different genomic annotation categories and reveals diverse patterns of genetic architecture in different phenotypes.

Ai-MiXeR model overview
We consider an additive model of genetic effects ignoring gene-environment interactions, epistasis and dominance effects. Variant effect sizes are modeled with point-normal mixture priors, where both proportion of non-null variants and distribution of their effect sizes can vary between different predefined functional genomic categories. Each functional category in the model is characterized by the proportion of non-null variants (polygenicity, p) and the variance of their effect sizes (discoverability, r 2 ). The pure (i.e. not induced by LD) effect of the kth variant (b k ) is modeled as a mixture of null and non-null components: , where p C and r 2 C , respectively are proportion and variance of non-null variants effect sizes in the functional category C, and N 0; r 2 C À Á denotes the normal distribution with mean 0 and variance r 2 C . The signed association test statistics (z-score) of the jth variant is then given by: where N is the GWAS sample size, H k is the heterozygosity of variant k, M is the number of variants in LD with variant k, r jk is the Pearson's correlation coefficient between the genotypes of the jth and kth variants and is a Nð0; r 2 0 Þ distributed residual factor. Functional category-specific polygenicities and discoverabilities of a GWAS trait are estimated by maximizing the likelihood of the GWAS summary statistics (z-scores). To reduce computational burden, we randomly select a subset of 10 6 variants of all GWAS variants to use for maximization of the likelihood function. For specific details of the model and its implementation, please refer to the following sections.

Ai-MiXeR model details
Consider a quantitative phenotype standardized to mean 0 and variance 1. Let y be a random variable representing a phenotype measurement for an individual in the population (E y ð Þ ¼ 0, var y ð Þ ¼ 1). Let G ¼ g j f g j¼1...M be a fixed set of M random variables representing genotypes of bi-allelic variants. These are assumed to be centered where f j is the minor allele frequency of variant j and h j is its heterozygosity). We assume an additive genetic model for the phenotype generation: where is a normally distributed error term with mean 0 and variance r 2 e . b j G ð Þ is understood here as the (unknown) effect of variant j as would be obtained from a multiple linear regression of the phenotype y on all genotypes G in a hypothetical infinite sample. This definition of the effect size b j G ð Þ implies that b j G ð Þ will reflect only the true causal effect of the jth variant (thus b j G ð Þ ¼ 0 if the jth variant is not causal) whenever G includes all causal variants for the trait. On the other hand, if any causal variants are missing in the set G, b j G ð Þ will also include the effects of those missing causal variants that happen to be tagged by the jth variant. Any variant j with b j G ð Þ 6 ¼ 0 will be called a non-null variant. Further, we will henceforth consider G to be fixed and omit it from the notation.
Consider now a GWAS on a quantitative phenotype. Let N be the number of individuals in the GWAS and assume that N is sufficiently large so that the allelic composition (i.e. genotype frequencies) of the variants observed in the GWAS is approximately equivalent to the allelic composition of the same variants in the population. Thenŷ 1 y 1 . . .ŷ N ð Þis a vector of phenotypes,Ĝ ¼ĝ ij Â Ã i¼1...N;j¼1...M is the N Â M matrix of genotypes observed in the GWAS and ¼ 1 . . . N ð Þis a vector of residuals ( i represents the residual term for the ith individual). Using (1) and this notation we can write: y ¼Ĝb þ; (2) which is a sample-equivalent of Equation (1). Denote also withĝ j 1 g 1j . . .ĝ Nj À Á the vector of genotypes of variant j observed in the GWAS (jth column ofĜ matrix). The marginal effect of variant j estimated in GWAS (b 0 j ) is obtained from the simple linear regression of the phenotype on the genotype of variant j: (3) Ã is the sample heterozygosity of variant j and b k is the hypothetical effect size of variant k from a multiple linear regression in an infinite population (as discussed above). Assuming the considered phenotype is complex, i.e. it is influenced by many variants each explaining only a tiny fraction of phenotypic variance, then the variance of the simple linear regression error is approximately equal to the sample variance of the phenotype, varê j À Á ffi varŷ ð Þ :¼ 1, whereê j ¼ê 1j . . .ê Nj À Á . Using this approximation, we can write an expression for the standard error of b j : Combining Equations (3) and (4), we can write an expression for the z-score of variant j observed in GWAS: In Equation (5),r jk ,Ĥ k ,ĝ ij and N are known constant factors iĝij is an unknown (because the i are) residual.
Remembering that, by definition, i ; i ¼ 1; . . . ; N are realizations of the random variable [see Equations (1) and (2)] and assuming that these realizations are independent from each other, 0 can be modeled as a normally distributed random variable with mean 0 and variance (equally for all variants): By construction [Equation (1)], when there is no genetic effect on the phenotype, r 2 e ¼ 1. However, the assumption of independence of all i is often violated in GWAS due to the presence of various confounding factors such as population stratification and cryptic relatedness. Moreover, bothĤ k andr jk are usually estimated from external genotyping panels where variant frequencies and correlations may differ from those in the GWAS sample. In addition, due to technical limitations,r jk estimates are commonly truncated (e.g. disregarding all correlations below a certain threshold). For the model to be able to mitigate these discrepancies we introduce a r 2 0 parameter and model 0 in (5) as a random variable distributed as N 0; r 2 0 À Á . It was shown that, in the framework of the infinitesimal model, r 2 0 has the same mathematical meaning as the intercept term in the LDSC model (Frei et al., 2019). The last unknown factor in (5), b k , is modeled as a random variable with point-normal mixture distribution, where the variance is allowed to differ between different variant annotation categories: where variant k 2 C, C G is a subset of variants in G constituting some annotation category, p C is the proportion of variants with non-zero effect (non-null variants) in the annotation category C and r 2 C is the variance of the effect sizes among all non-null variants in C. The set of annotation categories C j È É j¼1...T defined on G must form a partition of G (i.e. each variant from the G must belong to one and only one annotation category C j ).
Modeling b k as Equation (6), 0 as N 0; r 2 0 À Á and taking r jk , h k and N as known constant factors, Equation (5) allows to derive the probability density function (pdf) of z j (pdf z ) as the convolution of b k (k ¼ 1 . . . M) and 0 random variables.

Estimation of pdf of z-scores
We derive the pdf of a random variable z representing a variant's association z-score from the convolution of b k (k ¼ 1 . . . M) and 0 random variables. To simplify notation, we omit the indices reflecting the annotation category, replace 0 with and denote: where r 2 e;k ¼ N i r 2 ik H k r 2 , $ N 0; r 2 0 À Á . We can then rewrite Equation (5) as: The pdf of z at z 0 (in our case z 0 is the z-score from the GWAS) can be written as the inverse Fourier transform of its characteristic function / z t ð Þ: where pi is Archimedes' constant (i.e. pi % 3:14) and i is the unit imaginary number.
Assuming that the non-null effects (b j ) are independent from each other and from the error term : Using the definition of characteristic function and expression (7), we can write the characteristic function of n k as: and similarly for : Combining the last two expressions, the characteristic function of z can be written as: from which we can obtain the point estimate of pdf z at z 0 : The result is a definite integral (i.e. a number), which can be computed numerically.

Optimization setup
The polygenicity (p) and discoverability (r 2 ) parameters are estimated by maximizing the likelihood of the z-scores observed in the GWAS summary-level data (z 0 ): pdf z z 0 ð ÞÀ! p;r 2 ;r 2 0 max, where the AI-MiXeR probability density function of the z-scores (pdf z ) is modeled as described in the section above. Specific estimation details are given below.
• Variants from the extended major histocompatibility complex region (genome build 19 locations chr6:25119106-33854733) were excluded from the optimization due to the high complexity of the LD structure in this region. • The z-scores of 10 6 randomly selected variants were used for the optimization of the cost function. This procedure was replicated 50 times to limit selection bias. • The cost function was defined as -log(likelihood)/10 6 , where 10 6 reflects the number of variants (z-scores) used at each replica of the optimization procedure.

Implementation
Data management and configuration procedures were implemented in Python. For optimization SciPy implementations of both Nelder-Mead and differential evolution methods were used. Evaluation of the cost function was implemented in C using GNU Scientific Library (http://www.gnu.org/software/gsl/) for numeric integration and OpenMP (https://www.openmp.org/) for parallelization. AI-MiXeR's source code is freely available at https://github.com/pre cimed/mixer.

Computational time
A single optimization run using the setup described in the 'Optimization setup' section above took between 3 and 6 h on a computing node with dual Intel Sandy Bridge E5-2670 (16 physical computing cores) running at 2.6 GHz, and 64 Gb RAM.

Simulations with synthetic data
We analyzed the performance of the model on GWAS summary statistics generated from synthetic genotypes and phenotypes with various genetic architectures under model assumptions.
2.6.1 Synthetic genotypes 10 5 synthetic genotypes were generated with Hapgen2 (Su et al., 2011) using 503 European samples from 1000 Genomes Phase 3 data (1000Genomes Project Consortium et al., 2015 as described in the study by Frei et al. (2019). A set of 11 015 833 biallelic variants was considered. The LD structure was estimated from a subset of 10 4 genotypes using PLINK 1.9 (Chang et al., 2015) ignoring all correlations between variant genotypes at r 2 <0.01 and transchromosome correlations.

Functional annotation categories
Two non-overlapping functional annotation categories were considered: exonic and non-exonic. The exonic annotation category includes all variants within exons (including 5 0 and 3 0 untranslated regions) of protein-coding genes, while the non-exonic category contains all remaining variants. This choice was motivated by previous research showing that protein-coding exons (including 5 0 and 3 0 untranslated regions) are most strongly enriched for association with many complex human phenotypes (Schork et al., 2013). Additionally, the exonic category, as defined above, largely overlaps with the genomic regions investigated in whole-exome genotyping and whole-exome sequencing studies. Its modeling can therefore serve as a projection for future discoveries in whole-exome studies.
All variants were functionally annotated using UCSC's Table Browser (hg19/GRCh37) (Karolchik et al., 2004). With this definition, the non-exonic category contains approximately 70 times more variants than the exonic category.

Synthetic phenotypes
Synthetic phenotypes were generated using SIMU (Frei, 2016). A given number of non-null variants was selected at random for each functional annotation category. Effect sizes for the selected non-null variants were sampled from the standard normal distribution and then rescaled to obtain the required level of heritability, given different predefined ratios (see below) between the average effect sizes of the two dichotomous functional annotation categories. For each synthetic genotype, a quantitative synthetic phenotype was then generated as the sum of allelic effects over all non-null variants complemented by a certain proportion of a random Gaussian noise (representing effects of the environment) required to keep the predefined level of heritability. Finally, association tests were performed using PLINK 1.9 to obtain GWAS summary statistics.

GWAS summary statistics
We applied the model to GWAS summary statistics on 11 phenotypes (Table 1). Like in simulations with synthetic data, we considered here two functional annotation categories for the variants: exonic and non-exonic. For ease of comparison with partitioned LDSC method (Finucane et al., 2015), the LD structure was estimated with PLINK 1.9 using genotype data from LDSC's template containing 9 997 231 biallelic variants for 489 unrelated European individuals [originally derived from 1000 Genomes Phase 3 data (1000Genomes Project Consortium et al., 2015]. Transchromosome correlations as well as correlations between variant genotypes at r 2 <0.05 were disregarded. For each phenotype, 50 independent optimization runs were performed to maximize the likelihood of the GWAS z-scores observed in different subsets of 10 6 randomly selected variants.

Simulations with synthetic data
The simulations with synthetic data demonstrate that the true parameters are estimated accurately when the proportions of heritability carried by both functional categories are comparable and each category individually carries >2% of the total heritability; if one of the functional categories carries a negligible fraction (<2%) of the total heritability the model often fails to reconstruct its parameters accurately (Supplementary Fig. S1). Selected simulation cases representing scenarios closely resembling complex human phenotypes analyzed in this study are presented in Figure 1. These simulations show that in the range of parameters observed (according to the model) in the 11 phenotypes analyzed in this study, the model is able to provide instructive unbiased estimates of p and r 2 parameters for both exonic and non-exonic functional annotation categories. A complete comparison of true simulation parameters and corresponding model estimates for all 810 simulated phenotypes is shown in Supplementary Figures S2-S4, and the corresponding numerical results are given in Supplementary Table S3. Of note is that, in general, heritability estimates are more robust than estimates of p and r 2 .

GWAS summary statistics
The model was used to estimate exonic and non-exonic polygenicities, discoverabilities and heritabilities of 11 complex phenotypes presented in Table 1 (Fig. 2, Supplementary Table S1). We also obtained estimates of SNP-heritability per functional category for all 11 phenotypes using partitioned LDSC. These estimates are compared with AI-MiXeR's in Figure 2 (right) and Supplementary Table  S2. The polygenicity parameters (p exonic and p non-exonic ) can be converted into the number of non-null variants by multiplying them by the total number of variants within the corresponding annotation category. The numbers ensuing for the analyzed phenotypes are presented in Supplementary Table S1.
The majority of 11 analyzed phenotypes fall into the portion of parameter space where, according to our simulations, the model is expected to produce robust parameter estimates ( Supplementary  Fig. S1, Fig. 1). However, two phenotypes (ADHD and WHR) fall in a portion of parameter space where the model is prone to return inconsistent results in the exonic category (due to this category's limited size, its very low polygenicity in ADHD and its very low discoverability in WHR). This is reflected in larger error bars for the exonic category in these phenotypes (Fig. 2, Supplementary Table  S1). However, the observed consistency of parameter estimates across all 50 independent optimization runs for all analyzed phenotypes suggests that some robust conclusions can be drawn about actual features of the underlying genetic architecture.
The model suggests, that despite having similar heritability, phenotypes may differ substantially in polygenicity and discoverability of non-null variants. For example both AI-MiXeR and partitioned LDSC provide comparable estimates of total and partitioned heritability for LDL and T2D (AI-MiXeR LDL: h 2 total ¼ 0.14, h 2 exonic ¼ 0.09; AI-MiXeR T2D: h 2 total ¼ 0.13, h 2 exonic ¼ 0.06) (Fig. 2,  Supplementary Table S2). However, our model suggests that the genetic architectures underlying these phenotypes differ drastically, with T2D being approximately 5 times more polygenic than LDL and having 91% (versus 15% in LDL) of non-null variants within exons. The polygenicity deficit is compensated in LDL with a discoverability 3.5 times larger than in T2D (50 times larger in the exonic category). According to the model, EA has the largest number of non-null variants (48 000, with only 0.6% of exonic variants) among all analyzed phenotypes, while IBD has the smallest (3000, 91% exonic) (Supplementary Table S1). However, the effects of the non-null variants are on average five times stronger in IBD than in EA and result in a larger total heritability for the former (0.25 in IBD versus 0.1 in EA). SCZ and BD show similar polygenicity  1. Performance of the model on a selected set of scenarios with synthetic GWAS data. True simulation parameters (pexonic, pnonÀexonic, r 2 exonic /r 2 nonÀexonic and h 2 total ) are shown on the left. The blue bars represent the non-exonic category, the orange bars represent the exonic category. The bar lengths represent the median values obtained from 10 optimization runs with independently generated GWAS (locations and effect sizes of non-null variants). The parameter values from individual optimization runs are shown with vertical black dashes. Empty bars with green borders and hatches show the true values of the corresponding parameters used for GWAS simultion (34 000 and 30 000 non-null variants, of which 9% and 10%, respectively, are exonic) and discoverability (with exonic effects being approximately twice stronger). In contrast, most non-null variants for height and T2D are exonic (62% out of 15 500 and 90% out of 9000, respectively) having on average 10-and 3-times weaker effects, respectively, than non-exonic variants.

Discussion
We present the AI-MiXeR model, which can be used to decouple and partition a phenotype's heritability into functional categoryspecific polygenicity (proportion of non-null variants in a given category) and discoverability (variance of non-null effect sizes) components and thus better characterize the phenotype's genetic architecture. This may inform the design of future genetic studies, as efforts to improve discoverability in different genomic categories are likely to have different impact across complex phenotypes depending on their unique genetic architecture.
It is widely assumed that protein-coding exons contain a higher proportion of causal variants (higher polygenicity) and have on average stronger effects (higher discoverability) on complex phenotypes compared to non-exonic regions (Minelli et al., 2013;Schork et al., 2013;Sveinbjornsson et al., 2016). In our study, the AI-MiXeR model suggests that less than half (5 of 11) of the analyzed phenotypes (SCZ, BD, COG, LDL and BMI) support this assumption. Four other phenotypes (T2D, IBD, height and WHR) show higher density of non-null variants in exonic regions but stronger average effects in the non-exonic portion of the genome. In the two remaining traits (ADHD and EA), the pattern appears to be reversed, with a higher density of weaker effect variants in nonexonic regions. Since non-exonic regions cover a substantially larger fraction of the genome compared to exonic regions (containing roughly 70 times more variants), the former account for a greater portion of SNP-heritability than the latter for most phenotypes. Only IBD and T2D present substantially higher fractions of non-null variants in exonic variants.
From our simulation studies on synthetic GWAS, we can infer that the balance of h 2 partition between the functional annotation categories has a strong effect on the model's performance.
Extremely small values of polygenicity (p) or discoverability (r 2 ) in a functional annotation category (relative to the complementary category) result in a heavily unbalanced heritability partition between the categories and can thus lead to substantial errors in the estimates of p and r 2 for the category with smaller absolute heritability ( Supplementary Fig. S1 top and bottom). Despite this, heritability estimates were generally robust (Supplementary Figs S1-S4, Supplementary Table S3).
Decoupling the heritability of different functional categories into polygenicity and discoverability may facilitate trait-specific experimental designs prioritizing certain genomic regions for detailed investigation. For instance, by looking only at the heritability pertaining exons in T2D (h 2 exonic ¼ 0.06) and LDL (h 2 exonic ¼ 0.09), one could expect the yield of an exome-wide scan for both phenotypes to be comparable. However, AI-MiXeR predicts that the average effect size (square root of discoverability) of exonic non-null variants is approximately seven times larger in LDL than in T2D. An exome study of the former therefore could be expected to result in a higher yield of statistically significant findings, given a moderately sized sample. This speculation may be indirectly supported by comparing existing exome-wide studies of T2D and LDL. One of the largest exome sequencing studies on T2D published so far (20 791 cases, 24 440 controls) identified 15 variants in 7 distinct genomic loci reaching exome-wide significance level (Flannick et al., 2019). In contrast, an exome-wide association study of serum lipids in a comparable sample (N ¼ 39 087) reported 66 exome-wide significant LDL susceptibility variants within 14 loci (Dewey et al., 2016). AI-MiXeR's predictions also suggest, however, that a significant increase in the sample size in T2D whole-exome studies will yield more phenotype-associated variants than an equivalent sample size increase in LDL whole-exome studies, since T2D has substantially larger polygenicity.
AI-MiXeR relies on design and implementation quality of the specific GWAS. In general, model predictions for a given phenotype may differ depending on a GWAS's sample size, as well as on the coverage of the tested variants. The sample sizes of the GWAS tested here vary by more than one order of magnitude, from approximately 5 Â 10 4 for BD and ADHD to more than 7 Â 10 5 for EA and height. In all simulations, we kept the sample size constant (N ¼ 10 5 ) and varied only the heritability (h 2 ¼ 0.1, 0.4, 0.7). Since these quantities contribute to the GWAS z-scores distribution only through their product [follows from formula (5)], our simulation scenario with , discoverability (variance of non-null effect sizes) and heritability of exonic and non-exonic functional categories in 11 traits. Traits are shown in the left column: schizophrenia (SCZ), bipolar disorder (BD), attention-deficit/hyperactivity disorder (ADHD), general cognitive ability (COG), educational attainment (EA), type 2 diabetes (T2D), inflammatory bowel disease (IBD), low-density lipoproteins (LDL), body mass index (BMI), height and waist-hip ratio (WHR). For AI-MiXeR results, a bar's length shows the mean value of the parameter obtained from 50 independent optimization runs with 106 randomly selected variants used to maximize the likelihood of the observed GWAS z-scores, while the black bars show the range (min, max) of such estimates. The heritability estimates obtained with partitioned LDSC are shown in darker colors with hatching (right) and black bars representing the standard errors of the estimates N ¼ 10 5 and h 2 ¼0.7 is equivalent to a scenario with, for example, N ¼ 7 Â 10 5 and h 2 ¼ 0.1 (given that polygenicities are equal in both scenarios). Other aspects of potential GWAS-related issues (e.g. coverage of tested variants) were not tested.
The model underlying AI-MiXeR is sensitive to the LD structure estimates. Ideally, the LD structure should be estimated on the same sample used for association testing. However, this is mostly impractical if not impossible. Here, in the analysis of GWAS summary statistics for 11 phenotypes, we estimated the LD structure using the 1000 Genomes Phase 3 genotype panel. Inconsistencies between the LD structure of the samples used for association testing and that of the 1000 Genomes Phase 3 panel could skew the model's results. Additionally, roughening the LD structure (e.g. by ignoring all correlations with r 2 below a certain threshold) also could result in biased parameter estimates. In our simulations, r 2 was estimated from 10 000 synthetic genotypes (randomly sampled from the complete set of 100 000 synthetic genotypes used for association testing) ignoring all correlations with r 2 < 0.01. A subset of European ancestry samples from the 1000 Genomes Phase 3 panel (N ¼ 489) was used to estimate LD r 2 values for the GWAS data because of the wide availability of these data, ease of comparison with LDSC and the fact that genotypes in a majority of analyzed GWAS were imputed using this panel as a reference. The limited size of the 1000 Genomes panel, however, results in relatively low confidence r 2 values, especially for weak correlations involving low-frequency variants. To mitigate this issue, we increased the r 2 cutoff, disregarding all correlations with r 2 <0.05. Nevertheless, consistency between partitioned heritability estimates produced by AI-MiXeR and LDSC (Fig. 2, Supplementary Table S2) suggests the absence of the modelspecific systematic biases.
AI-MiXeR makes further simplifying assumptions, including uniform distribution of non-null variants within functional annotation categories and the effect size's independence of allele frequency and LD. It has recently been shown that these simplified assumptions, which have been used implicitly or explicitly in many earlier methods, can lead to substantial biases in heritability estimates (Speed et al., 2017). We previously demonstrated (Frei et al., 2019) that these factors also bias the model's estimates of p and r 2 when no distinction is made between annotation categories. We did not investigate how disregarding them affect AI-MiXeR's category-specific estimates. These assumptions are likely violated to different degrees in different phenotypes and make the model more suitable for some phenotypes than for others. In this report, we provide examples of successful applications of AI-MiXeR but advise prospective users to carefully assess the model's suitability for a given phenotype. Recently, we proposed a model (Holland et al., 2020a) for the distribution of non-null variants and their effect sizes which takes allele frequency and LD into account. Combining the latter with the AI model presented here involves considerable complexity, however, it is a logical next step. Additionally, the model assumes additivity of genetic effects. Allowing dominance genetic effects and epistasis may further improve model fitness for some phenotypes. Introducing this flexibility requires significant technical complexity and will be considered in our future work.
It is also important to note that the numbers of non-null variants presented (Supplementary Table S1) are estimated for the hypothesized distributions of variant effects, capturing the broad outlines of polygenicity. The observed relative proportions of non-null variants (and their effect sizes) between functional annotation categories within one trait or across different traits might be more reliable indicators of actual genetic architecture features or differences.
The model allows for simulating any number of annotation categories simultaneously (in the marginal scenario, each variant can be treated as a separate annotation category). However, with the current implementation, the computational cost of the optimization increases rapidly as the number of annotation categories grows. For this reason, we restricted the main analysis of this study to exonic and non-exonic categories, which we reckoned could be informative in the context of whole-exome studies. An exploratory analysis of several other functional annotation categories (including intronic, promoter and enhancer regions) for SCZ and T2D can be found in Supplementary Material.
The AI-MiXeR method presented here considers predefined annotation categories allowing both different proportions of non-null variants and different effect size distributions in various functional annotation categories, which is not possible with other methods available to date (Finucane et al., 2015;Zhang et al., 2018;Zhu and Stephens, 2018). The ability to model predefined annotation categories separately allows hypothesis-driven studies of complex phenotypes, which in turn can provide a better understanding of the genetic architecture of those complex phenotypes. Our analysis suggests that both the polygenicity and the discoverability in different functional categories vary considerably across human traits and disorders. Knowing such patterns may facilitate trait-specific experimental designs prioritizing specific genomic regions for detailed investigation.