Comparing heritability estimators under alternative structures of linkage disequilibrium

Abstract The single nucleotide polymorphism heritability of a trait is the proportion of its variance explained by the additive effects of the genome-wide single nucleotide polymorphisms. The existing approaches to estimate single nucleotide polymorphism heritability can be broadly classified into 2 categories. One set of approaches models the single nucleotide polymorphism effects as fixed effects and the other treats the single nucleotide polymorphism effects as random effects. These methods make certain assumptions about the dependency among individuals (familial relationship) as well as the dependency among markers (linkage disequilibrium) to provide consistent estimates of single nucleotide polymorphism heritability as the number of individuals increases. While various approaches have been proposed to account for such dependencies, it remains unclear which estimates reported in the literature are more robust against various model misspecifications. Here, we investigate the impact of different structures of linkage disequilibrium and familial relatedness on heritability estimation. We show that the performance of different methods for heritability estimation depends heavily on the structure of the underlying pattern of linkage disequilibrium and the degree of relatedness among sampled individuals. Moreover, we establish the equivalence between the 2 method-of-moments estimators, one using a fixed-single nucleotide polymorphism-effects approach, and another using a random-single nucleotide polymorphism-effects approach.


Introduction
Fundamental to the study of inheritance is the partitioning of the total phenotypic variation into genetic and environmental components (Visscher et al. 2008). Using family studies, the phenotypic variance-covariance matrix can be parameterized to include the variance of an additive genetic effect, and an environmental effect (Lynch and Walsh 1998). Specific family designs, such as twin studies can accommodate both shared and nonshared environmental effects. The ratio of the genetic variance component to the total phenotypic variance is the proportion of genetically controlled variation and is termed as the "narrow-sense heritability." As shown in the recent review of more than 17,000 twin studies (Polderman et al. 2015), heritability provides useful information on the power to identify causal genetic markers in a genome-wide association study (GWAS), is used to estimate familial recurrence risk of disease, and informs the genetic architecture of the trait (e.g. through partitioning by genomic region or tissue-specific expression).
GWASs seek to understand the relationship between these traits and millions of single nucleotide polymorphisms (SNPs), a type of genetic variant. Linear models are widely used in the field of statistical genetics to assess both individual and cumulative contribution of genetic variants on a trait. The individual contribution is assessed by treating each variant as a fixed effect (fixed-SNP-effect model) while adjusting for relevant covariates in a linear regression (Dicker 2014;Bulik-Sullivan et al. 2015;Schwartzman et al. 2019) or by treating each variant as a random effect (random-SNP-effect model) by using a linear mixed effect model (Yang et al. 2010(Yang et al. , 2011Speed et al. 2012). The fixed-SNP-effect-based approaches model individuals as independent, but incorporate the dependencies among the markers explicitly into the model. On the other hand, the random-SNPeffect models use the genetic relatedness among individuals to improve the efficiency of estimation of genetic variance. Nowadays, with the increasing ability to sequence many genetic variants in large cohort studies [UK Biobank Bycroft et al. (2018), Precision Medicine cohort Collins and Varmus (2015), and the Million Veterans Program Gaziano et al. (2016) are a few such examples], there is significant interest to estimate the cumulative contribution of the genome-wide causal variants. Often we assess such cumulative contribution by estimating the proportion of variance explained by the additive effects of the causal variants in the genome; that is, the "SNP heritability." The random-SNP-effect models assume an infinitesimal model for the SNP effects and use of genome-wide SNP data on distantly related individuals (Haseman and Elston 1972;Yang et al. 2010Yang et al. , 2011Yang et al. , 2012Lee et al. 2011Lee et al. , 2012Speed et al. 2012;Bulik-Sullivan et al. 2015;Seal et al. 2022) to estimate the pairwise genetic relatedness between sampled individuals. These approaches assume that each causal SNP makes a random contribution to the phenotype, and these contributions are correlated between individuals who have similar genotypes. By partitioning the phenotypic covariance matrix among all individuals into a genetic similarity matrix and a random variation matrix, the approach estimates the proportional contribution of the genetics to the total phenotypic variation. The estimation of the heritability parameter heavily depends on the estimation of a high-dimensional genetic relationship matrix (GRM). The matrix is usually estimated from the observed data on M markers for all n individuals in the cohort. Two methods of estimation are used to estimate heritability under this model. One is a likelihood-based approach, which includes Genome Wide Complex Trait Analysis, or GCTA (Yang et al. 2010) and Linkage-Disequilibrium Adjusted Kinships, or LDAK (Speed et al. 2012). It uses the restricted maximum likelihood (REML) estimation technique (Corbeil and Searle 1976) to estimate heritability. The other approach uses method-ofmoments (MOM) technique to estimate heritability, such as Haseman-Elston (HE) regression (Haseman and Elston 1972). A major advantage of this mixed effect model approach is that it can account for related individuals, but the general recommendation is to exclude individuals with relatedness greater than 0.025 in the estimation of heritability (Yang et al. 2011) due to shared environment. These approaches do not explicitly account for the linkage disequilibrium (LD) among the markers, and REML-based estimators have been shown to be sensitive to the patterns of LD (Speed et al. 2012).
There have been attempts to rectify such bias due to LD by partitioning the genome into regions with different LD structures and by assigning a different genetic variance parameter to each partition (Evans et al. 2018). Such correction has been shown to improve the bias in heritability estimation for REML-based estimators. However, such corrections are often ad hoc and the performance depends on the underlying LD structure. Recently, Pazokitoroudi et al. (2020) used a similar partitioning strategy on the HE regression estimator. However, it is not clear if LD will impact the MOM estimator in the same way as it does the REMLbased estimators. Moreover, the performance of the MOM estimators in presence of LD has not been studied extensively.
The fixed-SNP-effect approaches assume SNP effects are arbitrary and fixed (Dicker 2014;Schwartzman et al. 2019), thus giving more flexibility to each SNP effect. The proposed estimators are consistent and asymptotically normal in high-dimensional linear models with Gaussian predictors and errors, where the number of causal predictors m is proportional to the number of observations n; in fact, consistency holds even in settings where m=n ! q, where 0 < q < 1. This set of approaches cannot easily accommodate relatives in the model, and thus the consistency of the estimator is derived under the assumption that the sampled individuals have independent genotypes. These approaches directly incorporate the LD among SNPs into the model and have been shown to provide consistent estimates of heritability under the correctly specified LD model for n > M, where M is the total number of markers. However, these methods make different approximations to derive the heritability estimator for n < M, since the LD matrix is not invertible then. The properties and differences between these approximation-based estimators (Dicker 2014;Hou et al. 2019) are not well studied for n < M.
In this article, we take a closer look at these random-SNPeffects and fixed-SNP-effects models for heritability estimation using both likelihood and MOM approaches. This article provides an analytical comparison of 2 popular MOM estimators from each of these categories. We aim to understand the fundamental differences or similarities between the principles of these 2 lines of approaches. We present a set of simulation studies with varying structures of LD and compare the performance of a wide array of estimators. We further provide some theoretical results that justify the observed simulation performance. We demonstrate through theoretical derivations as well as simulation studies that the potential impact of LD on a random-SNP-effect model-based estimator (Haseman and Elston 1972) depends on the extent and structure of correlation of the causal and noncausal variants. We also show that the fixed-SNP-effect model estimator proposed by Dicker (2014) is essentially equivalent to the HE method-of-moments estimator (Haseman and Elston 1972) for n < m.
Our findings in this article do not demonstrate any particular advantage of the fixed-SNP-effect models over the random-SNPeffect models in presence of LD, at least for the case when heritability is estimated using a genome-wide marker model and when n < M. One could partition the genome into small segments to account for the differences in genome-wide LD structure and handle the influence of LD better using a fixed-SNP-effect estimator for each partition separately (Hou et al. 2019). However, there is a potential overfitting issue for having a separate heritability parameter for every partitioned segment.
The rest of the article is organized as follows: first, different methods to estimate heritability are explained and analytical formulae to compare their performance under different LD structures are presented. We then describe strategies to simulate LD and relatedness structure and to evaluate both fixed-SNP-effects and random-SNP-effects models. Finally, the results are presented and discussed.

Genotypes, phenotypes, and heritability estimation
We consider a set of n individuals from a homogeneous population, typed at M SNP markers, assumed to be in Hardy-Weinberg equilibrium. Note that notation is also listed in Table 1. Assume an n Â M matrix of genotypes G ¼ ðG ij Þ, where G ij ¼ 0; 1; 2 is the number of copies of the reference allele for individual i at locus j with population frequency p j . Thus G ij ; i ¼ 1; 2; . . . ; n, has mean 2p j and variance 2p j ð1 À p j Þ; j ¼ 1; 2; . . . ; M. The vector of standardized genotypes for individual i at marker j is given by so that C ij has mean 0 and variance 1. The matrix of standardized genotypes for all markers, C A ¼ ðC ij Þ, carries information on the relatedness of individuals, and the LD among markers. While EðC ij C i' Þ ¼ r j' is the genotypic correlation between loci within an individual, EðC ij C kj Þ ¼ / ik measures the genotypic correlation between individuals. We define the GRM W as in Yang et al. (2010) and we define the LD matrix as where we use the single quote ( 0 ) to denote the transpose of a matrix. In large samples, the empirical allele frequencies ð2nÞ À1 P n i¼1 G ij can be used as an estimate of the population frequency p j of Equation (1) in forming the matrices W and R.
Suppose that the first m of the M markers are causal, having a direct impact on phenotype. We denote by C C the matrix consisting of the m columns of C A corresponding to the causal markers, and adopt the classical trait model of Fisher (1919). The phenotype of individual i is given by where b j , C ij , and i are mutually independent and have mean 0. Then Eðy i jC C Þ 0 so that varðy i Þ ¼ Eðvarðy i j C C ÞÞ. In SNP heritability estimation, r 2 g is the phenotypic variance attributable to SNPs, and r 2 e is the phenotypic variance attributable to environmental factors. In a random-SNP-effect model, we assume b j $ Nð0; r 2 g =mÞ, and i $ Nð0; r 2 e Þ. Then Then either conditionally on C C as m becomes large, or taking expectations over C ij , varðy i Þ ¼ r 2 g þ r 2 e . On the other hand, a fixed-SNP-effect model assumes that b is a fixed quantity, with b j ¼ 0 for noncausal markers. In that case, we define r 2 g ¼ b 0 R Ã b, and note EðC ij C i' Þ ¼ r j' and because of normalization, EðC 2 ij Þ ¼ 1 so that Thus in either case, the phenotypic variance varðy i Þ ¼ r 2 g þ r 2 e and SNP heritability are h 2 ¼ r 2 g =ðr 2 g þ r 2 e Þ. If phenotypes are standardized to have variance 1, then r 2 g ¼ h 2 and r 2 e ¼ 1 À h 2 . More generally, estimation of heritability is primarily concerned with the estimation of r 2 g , the estimate of h 2 being then obtained by dividing by the empirical variance of the phenotypes y i ; i ¼ 1; . . . ; n.

Overview of estimators
In our overview of the methodologies for heritability estimation, we concentrate on method-of moments estimation and likelihood-based estimation for the random-SNP-effect models. We further compare these estimators with the fixed-SNP-effect method of moments model-based estimators (Dicker 2014;Schwartzman et al. 2019). The Supplementary material provides more details on these estimators.
For the likelihood methods, we consider the GCTA (Yang et al. 2011) and LDAK (Speed et al. 2012) approaches. In brief, GCTA is a random-SNP-effect model derived under assumptions similar to those of Genotypes, Phenotypes, and Heritability Estimation. The approach uses REML (Patterson and Thompson 1971) to estimate r 2 g and r 2 e . It estimates heritability assuming that phenotypes are drawn from a multivariate Normal distribution, where the log-likelihood function is À n 2 logð2pÞ À 1 2 log detðr 2 g W þ r 2 e IÞ þ y 0 ðr 2 g W þ r 2 e IÞ À1 yÞ: LDAK (Speed et al. 2012) uses a similar model, except reweighting the SNP markers to adjust for LD. More details on the GCTA and LDAK approaches can be found in Supplementary Section 1. Note that r 2 g is only identifiable when W is not the identity matrix. For the method-of-moments estimators, we first considered an HE estimator (Haseman and Elston 1972), an estimator from the random-SNP-effect approach category. The estimator of r 2 g , derived in Supplementary Section 2.1, has the form An estimate of heritability is then given by dividing by the empirical variance of phenotypes Y i . Further properties of this estimator in the case of no LD are given in Supplementary Section 2.1.
We also considered 2 method-of moment estimators from the fixed-SNP-effect approach category, which we denote Dicker-1 The population frequency of locus j r 2 g The variance in phenotype attributable to genotypic effects r 2 e The variance in phenotype attributable to environment. r jl The genotypic correlation between loci j and l in the population / ik Genotypic correlation between individuals i and k G A matrix of genotypes with n rows and M columns C A A matrix of normalized genotypes with n rows and M columns (Equation 1) C C A matrix formed by the m columns of C A that correspond to the causal markers W The GRM calculated using all markers; an n Â n matrix.
An m-vector of effects of causal loci on phenotype, or sometimes an M-vector augmented by 0's (Equation 4) y An n-vector of phenotypic values of individuals. and Dicker-2 (Dicker 2014). Dicker-1 is applicable in the case of no LD. It is derived and discussed in the Supplementary Section 2.2 and takes the form We consider this estimator primarily for comparison with the HE estimator: see Supplementary Section 2.1 and in the section Impact of LD on the Dicker-1 estimator.
Throughout this article, we refer to the estimator in Equation (7) as Dicker-1, but we also present a variant of Dicker-1, which we denote Dicker-1-R. In the presence of LD, if R is invertible, Dicker-1-R is However, in many cases, R is not invertible because M > n, and hence we do not consider Dicker-1-R in our simulations. In these cases, to address the LD, Dicker (2014) derives an estimator which we denote Dicker-2. This estimator uses moments of the trace of the LD matrix R to correct for LD, resulting in an estimator of r 2 Further details of the Dicker-2 estimator are given in Supplementary Section 3.3.

Impact of LD on the HE estimator
In this section, we consider the impact of marker mispecification and marker LD on the numerator and denominator of the HE estimator, and hence on the estimate of r 2 g . We assume unrelated individuals but correlated markers, so that r j' 1, and r jj ¼ 1. We split the markers into m causal markers C and ðM À mÞ noncausal markers F. Note that all markers are used in the GRM: A , but that only causal markers C C contribute to the phenotype y. For convenience, assume that the first m markers are causal: C ¼ f1; . . . ; mg and F ¼ fðm þ 1Þ; . . . ; Mg. Then, following the same derivation as in Supplementary Section 2.1, for i 6 ¼ k we obtain, If the b j have mean 0 and are uncorrelated, we have only terms in j ¼ ', and this reduces to Here, using that individuals i and k are independent and that b 2 j has expectation r 2 g =m. Then where for convenience, we denote the sums of squared correlations r 2 j' between causal and non À causal markers r 2 j' among non À causal markers: Considering similarly the denominator of the HE estimator, Equation (13) approximates the expectation of the HE estimator and gives insight into its bias. First, if there is no LD, R CC ¼ m, R CF ¼ 0, and R FF ¼ ðM À mÞ, giving the results of Supplementary Section 2. Second, if the GRM contains only causal markers M ¼ m, then LD among these causal markers does not cause bias, as approximated by Equation (13). Third, if additional markers F are not in LD with each other, nor with the causal markers C, R CF ¼ R FF ¼ 0, and again no bias results. Note that generally inclusion of additional markers in the GRM is less serious than omission of causal markers. If C A is missing causal markers j, then Equation (11) will not include the contributions of those b j and S YW will be decreased, but S WW will not (on average) be affected, leading to underestimation of r 2 g . In some special cases, biases cancel out. Consider first a special case of causal markers in regions of "average LD"; suppose all r j' ¼ s for j 6 ¼ '. Then, R CC ¼ m þ mðm À 1Þs 2 ; R CF ¼ mðM À mÞs 2 and R FF ¼ ðM À mÞ þ ðM À mÞðM À m À 1Þs 2 , and some arithmetic show there is no bias. Two other examples occur in the simulations of Results. In both the autocorrelation and block simulations, causal and noncausal markers are alternating. Then M ¼ 2m and R FF ¼ R CC , and Equation (13) again shows there is no bias. This is demonstrated in the simulation results in Figs. 1 and 2. We note that although we only show the case of M ¼ 2m, we show in Supplementary Section 3.1 that the approximate theoretical bias is also quite small for other ratios of causal to noncausal markers, and there is no bias for equally sized blocks.
In other cases, there can be bias. For example, if causal markers are in regions of high LD, then (per marker) R CC dominates over R FF , and r 2 g will be overestimated, while if causal markers are in regions of low LD R FF in the denominator will dominate, and r 2 g will be underestimated. The case of duplication of markers also considered in the simulations (Simulation Study 1: , and (iv) n ¼ 2; 000; m ¼ 500. Five hundred data sets were simulated for each condition. A horizontal line is shown at h 2 ¼ :8, the simulated truth. On the bottom row, the X-axis is the parameter q, and the MSE of each of the estimators is plotted on the Y-axis.
Five hundred data sets were simulated for each condition. A horizontal line is shown at h 2 ¼ :8, the simulated truth. On the bottom row, the X-axis is the parameter q, and the MSE of each of the estimators is plotted on the Y-axis.
Impact of Marker LD) is different, and Equation (13) again provides an estimate of the bias. In this example, there is no LD in the m causal markers, so R CC ¼ m. The genotypes at subset of d of these markers are replicated r additional times, but these replicates are Note that if no markers are replicated (d ¼ 0) or all markers are replicated (d ¼ m) then there is no bias. Note also that the result only depends on the proportion of markers replicated. If d ¼ gm, then Equation (14) reduces to ð1 þ rgÞ 2 r 2 a =ð1 þ 2rg þ r 2 gÞ. Although the expectation of the ratio is approximated by the ratio of expectations in Equation (13), simulation shows this approximation gives an accurate estimate of the bias: see Supplementary Section 3.1 ( Supplementary Fig. 3).

Impact of LD on the Dicker-1 estimator
Through our simulations, we found that the Dicker-1 estimator had a generally greater bias than the HE estimator (Simulation Study 1: Impact of Marker LD). This is because the Dicker-1 estimator is derived from a linear expression of quadratic forms which is inflated due to LD. On the other hand, the HE estimator is a ratio of quadratic forms, and LD inflates both the numerator and the denominator, which potentially reduces the overall effect of LD. We recall from Equation (7) that the Dicker-1 estimator takes the formr 2 g ¼ ðnðn þ 1ÞÞ À1 ðMy 0 Wy À My 0 yÞ. Through calculations shown in detail in Supplementary Section 3.2, we show that We note that R CC þ R CF ! m, and hence, from this approximation, we have thatr 2 g is greater than r g , and the magnitude to which it is greater increases as R CC and R CF increase.

Equivalence of HE and Dicker-2
As will be shown in Simulation Study 1: Impact of Marker LD, estimates from the Dicker-2 and HE regression were very similar, although Dicker-2 explicitly models LD. Analytically, under certain normalization schemes, the 2 estimators are effectively equivalent. This suggests that efforts to correct for LD in the Dicker-2 framework do not ensure improved performance of this estimator compared to the HE estimator.
We begin by reformulating HE regression. We recall from Equation (6) that the HE estimator is given bỹ We can rewrite this in matrix form, giving us S YW ¼ y 0 Wy À y 0 diagðWÞy 2 : Under Hardy-Weinberg equilibrium, the GRM should have values that are approximately 1 on the diagonal. We assume y is normalized to have variance 1, which results in S YW % y Wy À n 2 : Now we consider S WW . Noting that trðW 0 WÞ ¼ P Together, the HE estimator is approximately y Wy À n trðW WÞ À n : Now consider the equations for Dicker-2. First note that in Equations (9) and (10), l 1 ¼ 1 since the genotypes are normalized to have variance 1. Next, we use the property of traces that trðABCDÞ ¼ trðDABCÞ to calculate Since n % n þ 1, for large n, we have that the Dicker 2 estimator (Equation 9) is approximately which is the same as Equation (15).

Impact of relatedness of individuals on moments estimators
Under the assumption of independence of individuals, the SD of the HE estimator of r 2 g or of h 2 increases with the number of markers M (Supplementary Section 2). This arises because in the limit, the matrix W converges in probability to the identity matrix, i.e. all off-diagonal terms converge in probability to 0. This leads to poor behavior of the HE estimator because the numerator and denominator of the HE estimator converge in probability to 0. However, this is an artifact of the assumption of complete independence (unrelatedness) of individuals. In any real sample, regardless of the extent of correction for population structure, there will always be variation in the degree of relatedness of individuals, even if any single pairwise relatedness measure is small. Note that the original formulation of HE estimators (Haseman and Elston 1972) made use of the genetic similarity between known relatives. In this section, we therefore consider the case where individuals may be related, so standardized genotypes C ij and C kj are no longer independent. For simplicity, we ignore LD: that is C ij and C k' are independent, for j 6 ¼ l, whether or not i ¼ k.
Under relatedness and inbreeding, it remains the case that EðC ij Þ ¼ 0, but varðC ij Þ ¼ ð1 þ F i Þ (Crow et al. 1970) and EðC ij C kj Þ ¼ / ik , where F i is the inbreeding coefficient of individual i, and / ik is the relatedness of i and k, or twice the coefficient of kinship between i and k. To consider the HE estimator (6), for i 6 ¼ k, Hence as M ! 1 S WW tends to We can also calculate Thus, contrary to the results of Supplementary Section 2.1 for unrelated individuals, the SD of the HE estimator no longer increases as M ! 1, but rather will depend on the magnitude of ð P k P i < k / 2 ik Þ. Although this sum may be small, if even any of the / ik are nonzero it is strictly positive, and eventually, relatedness will bound the SD of the estimator of r 2 g . Relatedness poses greater problems for the Dicker-1 estimator (Equation 7), which involves the diagonal terms of the GRM matrix W. Considering the expected quadratic form Now, by expanding and simplifying, even the coefficient of r 2 e is no longer mn but P n i¼1 P m w¼1 EðC 2 iw Þ ¼ Mðn þ P k i¼1 F i Þ while that of r 2 g is, as in Supplementary Section 2.2 This expectation now involves not only ð1 þ F i Þ 2 , and / 2 ik but also higher-order moments.
Although the derivation of distributional properties of the Dicker method-of-moments estimators depends critically on the assumption of 2n independent genomes, there is nothing in the derivation of Equivalence of HE and Dicker-2 that assumes W is diagonal. Indeed, the trace equation used in showing the approximate equivalence of the HE and Dicker-2 estimators, suggests that the Dicker-2 accommodation of LD in the absence of relatedness is alternatively accommodating relatedness in the absence of LD. Thus, as will be seen in the results of Simulation Study 3: Impact of Relatedness in Individuals, the close equivalence of the Dicker-2 and HE estimators should hold under relatedness, and, as seen from Equation (16) above, the standard deviation will no longer increase indefinitely as M ! 1.

Simulation strategy
We performed simulation studies to assess the impact of LD structure and relatedness of individuals on heritability estimation. Each simulated data set consisted of genotypes G at M markers (m causal markers) for n unrelated individuals. The marker allele frequencies were those of a randomly chosen subset of markers from the 1000 Genomes Project from Chromosome 1 in the African (AFR) population (Clarke et al., 2017). This set of frequencies was filtered to have allele frequency less than 0.95 and greater than 0.05 and was fixed over data set simulations.
Genotypes are standardized using their empirical allele frequencies. Phenotypes were simulated for n individuals, given their genotypes at the m causal markers, in accordance with the linear model of Equation (4): For the chosen value of h 2 , ð0 < h 2 < 1Þ, the m-vector of genetic effects b was simulated with independent components b j $ Nð0; h 2 =mÞ for j ¼ 1; . . . ; m. The independent residual effects i $ Nð0; 1 À h 2 Þ for i ¼ 1; . . . ; n. Thus, for the purposes of the simulation r 2 g ¼ h 2 , r 2 e ¼ 1 À h 2 , and varðy i Þ ¼ 1, with h 2 set to 0.8 for all simulations (see Genotypes, Phenotypes, and Heritability Estimation).
We implemented the Dicker and HE estimators in R Version 4.0.2 as described in Supplementary Section 2. We used GCTA (Yang et al. 2011) and LDAK (Speed et al. 2012) as representative likelihood estimators, both of which are described in more detail in Supplementary Section 1. For every simulated data set, we applied each of these estimators. We also report a gold standard estimator to assess the performance of these different methods. The gold standard estimate is calculated assuming we know the true values of b: the empirical variance of C C b is divided by the empirical variance of the phenotypes. This gold standard estimator can be expressed as ðC C b À C C bÞ 0 ðC C b À C C bÞ ðy À yÞ 0 ðy À yÞ : In simulation study 1, we assessed the impact of different LD structures on heritability estimation. We generated genotypes assuming three kinds of LD structure: autocorrelated, block, and repeat. More details of the LD structures are given in Supplementary Section 4. Each data set was simulated with a new b and G. For each LD structure, we studied the impact of both sample size n and the number of causal markers m on heritability estimation, and simulated data sets at 5 levels of LD. For each LD structure and level, we generated 500 simulated data sets.
Comparing (1) and (2) provides insight on differences in estimates of h 2 depending on if n > m or m > n, whereas (2) and (3) compares estimates with different number of causal markers, and (2) and (4) compares estimates with different numbers of individuals. We first generated genotypes at M ¼ 2m markers. We used marker correlations q ¼ 0; 0:2; 0:4; 0:6, and 0.8., as detailed in Supplementary Section 4. (Note that q ¼ 0 is the no-LD case.) The markers were then assigned to be alternating causal and noncausal (m ¼ M=2).
In simulation study 2, we investigated the behavior of likelihood models by plotting log-likelihood values (Equation 5) as a function of r 2 g and r 2 e . The GRM W in Equation (5) was calculated using Equation (2). Of interest was the relationship between the shape of the log-likelihood function and the number of individuals and causal markers, and the shape of the likelihood as the number of repeats increased. From the results of simulation study 1, we hypothesized that the shape would be different when m > n, where GCTA underestimated heritability, compared to when m < n, where GCTA overestimated [comparing (i) and (iv) of Fig. 3]. The combinations of numbers of markers and individuals were the same as with the repeats in Simulation Study 1, and the allele frequencies were taken from the AFR sample of the 1000 Genomes Project, as before. We include plots with no repeated markers (Fig. 4a) and with 10% of the markers repeated 8 times (Fig. 4b).
The log-likelihoods minus the maximum log-likelihood were plotted. Likelihoods were truncated at the 60% quantile of b(i) for rows (i) and (iv), and at the 60% quantile of a(ii) for rows (ii) and (iii). These cutoffs were chosen because they were the plots that had the lowest 60% quantile. Plots were generated for (i) n ¼ 1; 000; m ¼ 200 (ii) n ¼ 200; m ¼ 1; 000, (iii) n ¼ 200; m ¼ 3; 000 (iv) n ¼ 2; 000; m ¼ 1; 000, in following with simulation study 1. We averaged log-likelihoods of 100 simulated data sets with grid spacing 0.05. Due to differences in ranges, there is a shared color bar between (i) and (iv), and a different shared color bar between (ii) and (iii). A circle within each plot is used to mark the location of the maximum log-likelihood.
In simulation study 3, we assessed the impact of related individuals on heritability estimation. We simulated first, second, and third cousins using the rres package in R (Wang et al. 2017) as well as unrelated individuals to illustrate our findings in Impact of Relatedness of Individuals on Moments Estimators. The segment length option in rres was set to 3,000 cM. Using the same set of allele frequencies as previously, we simulated marker genotypes for 400 individuals, in 10 40-ships. A k-ship is defined to be a set of k cousins related to a certain degree. Each cousinship is unrelated with all other cousinships. The number of markers ranged from 400 to 4,000 in steps of 400. Phenotypes were generated using Equation (17). For every combination of cousinships and number of markers, we simulated 500 sets of 10 40-ships. A visualization of the GRM of the dataset is shown in Supplementary Fig. 5 using 1,000 markers.

Results
Simulation study 1: bias and variance when q 5 0 or r 5 0 (no LD) The special case of no LD in Simulation Study 1 is shown in Figs. 1-3 in panels of the upper row at the left-hand point of each point. These figures verify that the estimators were generally unbiased in estimating the heritability. One exception is in LDAK, where when n ¼ 200, LDAK seemed to underestimate heritability.
Although we generally observed no bias in the estimators under independent markers, we saw that the estimators had a wide range of variances. In the cases n ¼ 1; 000; M ¼ 200 and n ¼ 2; 000; M ¼ 1; 000 [columns (i) and (iv) in Figs. [1][2][3], the variance of the GCTA, and LDAK estimators were lower than the variance of the moments estimators, but this difference is less pronounced in the cases where n ¼ 200 [columns (ii) and (iii)], which may suggest that the number of individuals affects the likelihood-based estimators more than the moments-based estimators. The lower variance resulted in lower MSE for GCTA for all conditions with q ¼ 0, but the bias in LDAK caused it to have comparable MSE to the moments estimators when n ¼ 200 (Figs. 1-3).
We can also compare cases when the number of individuals is kept constant while the number of markers is increased by comparing n ¼ 200; m ¼ 500 in column (ii) vs n ¼ 200; m ¼ 1; 500 in column (iii). In Supplementary Section 2.1, we found that with unrelated individuals and independent markers, the standard deviation of heritability should be asymptotically proportional to ffiffiffiffi ffi M p =n in the case of the HE estimator. Accordingly, since M ¼ 2m in the simulations, when the number of causal markers increases, the standard deviation of the heritability estimates increased as well. This is shown in both Figs. 1-3, where MSEs were higher in column (iii) compared to column (ii). This trend appeared to hold true for both the likelihood-based estimates and the moments-based estimates.
We can compare cases when the number of individuals increased while holding the number of markers constant by comparing n ¼ 200; m ¼ 500 in column (i) vs n ¼ 2; 000; m ¼ 500 in column (iv). The variance and MSE of the heritability estimates decreased for all estimators, which agreed with the theoretical result for the HE estimator.
Finally, some of the biases in the behavior of LDAK may be that the LDAK model does not match our generative model. LDAK reweights their genotypes using X ij ¼ ðG ij À 2f j Þ Â ½2f j ð1 À f j Þ a , and a is recommended to be 1.25 (Speed et al. 2012(Speed et al. , 2017. More details can be found in Supplementary Section 1. Our model does not explicitly simulate phenotypes in this manner, however. To investigate this, we also chose a in LDAK to be -1, which matches our simulated phenotypes due to our normalization scheme (Equation 1). Results (not included) were largely similar, although the estimated heritability was slightly closer to the simulated truth in the repeat case.

Autocorrelation structure
Data were simulated using the autocorrelation structure as described in Simulation Strategy, and a representative set of moments and likelihood estimators are evaluated on these simulated data. The estimated variance and bias of different estimators are shown in Fig. 1.
The HE estimator and the Dicker-1 estimator do not explicitly account for LD structure, and because the Dicker-1 estimate was developed for the no-LD case, it shows bias when LD is present. In the top row of Fig. 1, the Dicker-1 estimator shown in gold showed an increase in bias as q increased for all of (i)-(iv). Consequently, the MSE of the Dicker-1 estimator increases rapidly compared to all of the other estimators as we increase q (bottom row of Fig. 1). In contrast, there was no increase in the MSE of the HE estimator when markers were autocorrelated, agreeing with Impact of LD on the HE Estimator. In the top row of Fig. 1, the estimates of h 2 from the HE estimator did not appear to visually differ significantly from the true value of 0.8. This estimate behaved very similarly to the Dicker-2 estimator, despite the Dicker-2 estimator explicitly attempting to correct for LD. This is analytically shown in Equivalence of HE and Dicker-2.
The likelihood estimators in Fig. 1 showed generally lower MSE and no obvious bias. The GCTA estimator is shown in black and the LDAK estimator is shown in light blue. Both of these estimators seemed to have lower MSE across all values of q than the moments estimators, as seen in the bottom row.
When n ¼ 200; m ¼ 3; 000, as q increased, there was a decrease in the MSE in all the estimators except the Dicker-1 estimator. In Fig. 1, it can be seen that as q increases, the first and third quartiles of the estimates of h 2 decrease. It has previously been shown that fewer causal markers lead to decreased variance (Dicker 2014), and hence this effect may be driven by a decrease in the effective number of markers as LD increases. Figure 2 shows the estimated variance and bias in different estimators when the genotypes were simulated from the block structure with parameter q, as described in Simulation Strategy. Similarly to the autocorrelation structure, the Dicker-1 estimator had significant bias and high MSE, although this is expected because Dicker-1 as implemented here relates to the no LD case. The HE and Dicker-2 estimators were not as affected by the LD. In contrast to the autocorrelation, however, LDAK underestimated h 2 in Fig. 2, columns (ii), (iii), and (iv). In the bottom row of Fig. 2, this resulted in an MSE that was comparable to that of HE and Dicker-1. GCTA estimates appeared to still largely be unbiased and produced MSEs that were lower than the other estimators. Again, it was observed that there are cases when the MSE decreases as q increases, similarly to the autocorrelation case. Figure 3 shows the variance and bias patterns when the genotypes were simulated from the repeat structure with parameter r, as described in Simulation Strategy. As r increases, the number of times that 10% of the markers were simulated increased. There were m causal markers simulated and n individuals. For example, when m ¼ 1,000 and r ¼ 3, there were 1,000 causal markers that were simulated, and the first 100 markers were repeated 3 times, leading to a total of 1,300 markers that were entered into the analysis. An increased value of r indicates more markers that are in perfect LD with the original causal markers. We also examined behavior when repeated markers had a small probability of not being exact duplicates, and results were similar but less pronounced (results not shown).  Fig. 3. Simulation Study 1C (repeated markers). On the top row, the X-axis plots the parameter r, the number of times that 10% of the markers are being repeated as described in Supplementary Section 4. Estimates of h 2 using different estimators are plotted along the Y-axis. The value n refers to the number of individuals simulated. The value m is the total number of causal markers simulated, as described in Simulation Strategy. We consider (i) n ¼ 1; 000; m ¼ 200 (ii) n ¼ 200; m ¼ 1; 000, (iii) n ¼ 200; m ¼ 3; 000 (iv) n ¼ 2; 000; m ¼ 1; 000. Five hundred data sets were simulated for each condition. A horizontal line is shown at h 2 ¼ :8, the simulated truth. On the bottom row, the X-axis is the parameter r, and the MSE of each of the estimators is plotted on the Y-axis. Figs. 1 and 2, the estimates for Dicker-1 increase rapidly as r increases, agreeing with analytical calculations from Impact of LD on the Dicker-1 Estimator. In contrast to Figs. 1 and 2, in Fig. 3, the estimates for HE and Dicker-2 decrease as r increases, corresponding to Equation (14) and to results in Supplementary Fig. 3, where those equations were verified through simulation. The MSE of these 2 estimators also increases as r increases (Fig. 3) and further produce very similar estimates, agreeing with analytical calculations from Equivalence of HE and Dicker-2.
The LDAK estimator shows the same pattern of bias as GCTA in that as r increases, h 2 is underestimated when n > m and overestimated when n < m. This bias is less pronounced than with GCTA, however. In the bottom panel of Fig. 3, it can be seen that as r increases, the MSE of LDAK appears relatively constant, whereas the MSE of GCTA is increasing when n ¼ 200; M ¼ 1; 000 or n ¼ 200; M ¼ 3; 000, as seen in columns (ii) and (iii).

Simulation study 2: likelihood surfaces
In Fig. 3, GCTA displayed an upward bias when n > m, and a downward bias when n < m. We hence hypothesized that the likelihood would be different if n > m vs if m > n. The likelihood surface captures the joint likelihood of r 2 e and r 2 g . From the model in Equation (5), Varðy i Þ ¼ r 2 e þ r 2 g . Hence we expect that the maximum likelihood lies on a diagonal, as r 2 g % Varðy i Þ À r 2 e . This appears to be true when the number of individuals is much larger than the number of markers, but when the number of individuals much less than the number of markers, the axis of the conditional maxima becomes more horizontal (Fig. 4). An intuition for Fig. 4. Simulation Study 2. The difference of the log-likelihood from the maximum log-likelihood is plotted for parameters r 2 e on the Y-axis and r 2 g on the X-axis. The colors depict the value of the difference from the maximum log-likelihood. Likelihoods are truncated at the 60% quantile of b(i) for rows (i) and (iv), and at the 60% quantile of a(ii) for rows (ii) and (iii) for visibility. Row labels correspond with Fig. 3, with (i) n ¼ 1; 000; m ¼ 200 (ii) n ¼ 200; m ¼ 1; 000, (iii) n ¼ 200; m ¼ 3; 000, and (iv) n ¼ 2; 000; m ¼ 1; 000. Column A has markers with no LD, and in column B, 10% of the markers are repeated 8 times, corresponding the rightmost points in Fig. 3. The average of 100 independent simulations using a grid with spacing 0.05 is plotted in each panel. Note that there is one color scale shared between (i) and (iv) on the left, and a different color scale shared between (ii) and (iii) on the right due to different ranges. The red point indicates the location of the maximum likelihood. this result is that as the number of individuals improves, we have better knowledge of the total phenotypic variance.
The likelihood surfaces also demonstrate a faster rate of change in the likelihood surface when the number of individuals is increased, comparing Fig. 4, a and g where the range of the colors is greater than in Fig. 4, c and e. This observation corresponds with simulation study 1, where as the number of individuals increased, the variance of the estimates of heritability decreased. Finally, on the right hand side of Fig. 4, the surfaces are still either diagonal or horizontal, but the maxima (circles within each plot) are shifted. This agrees with simulation study 1 results, where there was bias in the GCTA method when the number of repeats increased.

Simulation study 3: impact of relatedness in individuals
In simulation study 3, we studied the effect of familial structure on estimates of heritability using cousinships and found that an increase in the number of causal markers generally increased MSE unless relatedness was high.
The Dicker-2, HE, and GCTA estimators appeared unbiased for each of the relatedness structures (Fig. 5). For GCTA and HE, we reasoned that because their model is conditional on the GRM, it took into account relatedness. Furthermore, because we have shown that HE and Dicker-2 are equivalent (Equivalence of HE and Dicker-2), we can also explain the unbiasedness of Dicker-2. LDAK was also largely unbiased in the case of unrelated individuals, For the different relatedness structures (unrelated, full sibs, first cousins) we considered, we observed similar pattern in the change of MSE as we increased the number of markers. MSE was generally the lowest when the number of markers was closer to the sample size. However, as we increased the number of markers, MSE for each estimator increased (Fig. 6). For HE and Dicker-2, the unrelated individuals had the lowest MSE when the number of markers was 400, but increased as the number of markers increased. On the other hand, first cousins had MSE that remained steady (Fig. 6). When the number of markers was 4,000, the MSE of the unrelated individuals was larger than the MSE of the first cousins, agreeing with analytical calculations from Impact of Relatedness of Individuals on Moments Estimators.
For each of our estimators, unrelated individuals had the highest MSE and first cousins had the lowest MSE. Furthermore, comparing the case of related to unrelated individuals, the MSE increased more slowly with the increase in the number of causal markers in related individuals.

Discussion
The methods for SNP heritability estimation can be broadly classified into 2 groups; fixed-SNP-effects models and the random-SNP-effects models. The fixed-SNP-effect models (Dicker 2014;Schwartzman et al. 2019) can more easily accommodate the LD structure among the genetic variants and can accommodate variants as both causal or noncausal. However, these approaches rely on independence among the individuals in the sample. On the other hand, the random-SNP-effect models (Yang et al. 2011) can accommodate and borrow power from related individuals, though it is generally recommended to exclude relationships with higher relatedness than 0.025 (this corresponds approximately to relatives second cousins or closer) to avoid confounding due to shared environments. These random-SNP-effects models assume all variants are causal and the majority of the methods do not accommodate LD among the markers in a statistically rigorous way. The asymptotic properties of these heritability estimators depend on model assumptions. In this article, we have studied the impact of model misspecification on heritability estimation through extensive simulation studies. We have simulated data under various LD structures and have allowed a certain portion of the variants to be noncausal. We found little difference in the performance of a fixed-SNP-effect model method-of-moments estimator and an MOM estimator from a random-SNP-effect model under different model misspecification.
We have derived the analytic expression for the approximate bias of the HE estimator in presence of LD among markers. Impact of Linkage Disequilibrium considers various scenarios for the LD among causal and noncausal markers and analytically shows the impact of this correlation on the HE estimator. Our simulation studies and numerical results have also considered various LD scenarios to illustrate that the bias in heritability estimation depends on the underlying LD pattern and is often small. In many cases, the standard practice of pruning markers to reduce LD (Calus and Vandenplas 2018) may be unnecessary.
In the case where R À1 can be computed (M < n), Dicker (2014) proposes a heritability estimator (Dicker-1) that can account for the LD among markers by rotating the genotypes. The derivation of the consistency of the estimator, however, relies on the Normality assumption. In case of large n and M (M ) n), our simulation studies and analytical derivation in Equivalence of HE and Dicker-2 show that the Dicker-2 estimator (fixed-SNP-effect model-based estimator) and HE estimator (random-SNP-effect model-based estimator), are essentially the same. Hence, in the situation M ) n, Dicker-2 estimator has limited ability to correct for the LD among markers because the HE estimator has bias under some forms of LD, as shown in Equation (13). This is a contradiction to the claim that the Dicker (2014) always provides an improved estimator of heritability in presence of LD among markers.
Further estimators have been proposed in, for example, Hou et al. (2019), which proposes the h 2 GRE estimator. This estimator's goal is to relax assumptions about the LD structure of the data by giving each causal effect its own SNP-specific variance and has been shown to provide some robustness to LD structures. In the case that the LD matrix is estimable (n > M), if no binning is used, the h 2 GRE estimator is approximately equivalent to the Dicker-1-R estimator if n ! 1 and M remains constant (Supplementary Section 5), but expands the scope of the Dicker-1-R estimator by using a pseudoinverse. This allows the h 2 GRE estimator to be used in cases when some markers are in perfect LD, which was not possible with the Dicker-1-R estimator. The h 2 GRE estimator also corrected bias in the Dicker-1-R estimator in our simulations for a finite number of individuals. Furthermore, we found that in some cases, the h 2 GRE estimator has lower variance than the Dicker-1 estimator even if there is no LD (Supplementary Fig. 6). This is possibly due to the use of the empirical R in h 2 GRE estimator which may reduce the variance of the estimate. We note, however, that the h 2 GRE estimator is not defined if q ¼ n, where q is the rank of R (Supplementary Section 5). This situation may arise in the case that n < M. We did not study h 2 GRE in detail because we aimed to analytically understand the simple estimators (estimators without any binning or weighting).
Another estimator that demonstrated robustness to some forms LD was proposed in Pazokitoroudi et al. (2020). This estimator aimed to expand upon the HE estimator by allowing partitioning of heritability to multiple variance components. These partitioning methods can be ad hoc, but have been shown to improve robustness of estimators to MAF and LD in some cases (Evans et al. 2018). In the case that the genome is not partitioned, this estimator reduces to the HE estimator (Supplementary Section 6). We did not consider partitioning in this article so that we would be able to more easily understand the estimators analytically.
Fixed-SNP-effect model-based estimators generally assume that sampled individuals are independent. These approaches do not accommodate related individuals in the heritability estimation. We demonstrate that even in the absence of LD, the Dicker-1 is severely biased in the presence of related individuals. However, because of its equivalence to the HE estimator, the Dicker-2 estimator generates consistent estimates of heritability with related individuals in the absence of LD.
The likelihood-based approaches from the random-SNPeffects model category, especially the LDAK approach showed more bias under certain model misspecification as compared to the MOM estimators. Under different LD structures, the traditional GCTA approach showed more stability in terms of both bias and precision over the LDAK estimator. We did not observe any specific advantage of adjusting for LD by using the LDAK estimator.
Under the assumption of independence of individuals, the standard errors of the heritability estimator increase with the number of causal markers. This is an artifact of the assumption of complete independence (unrelatedness) of individuals. In any real sample, regardless of the extent of correction for population structure, there will always be variation in the degree of relatedness of individuals, and the extent of variation would depend on the nature of relatedness present in the sample. As shown in Simulation 3, the precision of the heritability estimators improves if we include relatives in the sample. The MSE of the estimators was generally lower when we had certain relatedness present in the sample. Moreover, the impact of increasing the number of markers on MSE was significantly less pronounced if we had relatedness in the sample. Hence, we highly recommend to at least include second cousins, if present in the study sample, in the SNP heritability estimation. If the study sample has substantial number of first cousins, it may be beneficial to assess the sensitivity of the heritability estimate after inclusion of first cousins.
In general, MOM estimators had much larger standard errors compared to the likelihood-based estimators. However, the computational gain of these MOM estimators over the likelihood estimators is significant for large n and M and often outweighs limitation of large standard error (Lin et al. 2022). There was no apparent bias in these estimators besides the repeat structure in Simulation 1C. For repeat structures of the causal markers, we observed underestimation in HE regression and a small upward bias for GCTA estimator.

Data availability
The code used to generate the data is available at https://github. com/alantmin/heritability. Supplemental material is available at G3 online.