The infinitesimal model with dominance

Abstract The classical infinitesimal model is a simple and robust model for the inheritance of quantitative traits. In this model, a quantitative trait is expressed as the sum of a genetic and an environmental component, and the genetic component of offspring traits within a family follows a normal distribution around the average of the parents’ trait values, and has a variance that is independent of the parental traits. In previous work, we showed that when trait values are determined by the sum of a large number of additive Mendelian factors, each of small effect, one can justify the infinitesimal model as a limit of Mendelian inheritance. In this paper, we show that this result extends to include dominance. We define the model in terms of classical quantities of quantitative genetics, before justifying it as a limit of Mendelian inheritance as the number, M, of underlying loci tends to infinity. As in the additive case, the multivariate normal distribution of trait values across the pedigree can be expressed in terms of variance components in an ancestral population and probabilities of identity by descent determined by the pedigree. Now, with just first-order dominance effects, we require two-, three-, and four-way identities. We also show that, even if we condition on parental trait values, the “shared” and “residual” components of trait values within each family will be asymptotically normally distributed as the number of loci tends to infinity, with an error of order 1/M. We illustrate our results with some numerical examples.


Introduction
In the classical infinitesimal model, a quantitative trait is expressed as the sum of a genetic and a nongenetic (environmental) component, and the genetic component of offspring traits within a family follows a normal distribution around the average of the parents' trait values, and has a variance that is independent of the trait values of the parents.With inbreeding, the variance decreases in proportion to relatedness.When trait values are determined by the sum of a large number of Mendelian factors, each of small effect, as we show in Barton et al. (2017), one can justify the infinitesimal model as a limit of Mendelian inheritance.Crucially, the results of Barton et al. (2017) show that the evolutionary forces such as random drift and population structure are captured by the pedigree; conditioning on that pedigree, and trait values in the population in all generations before the present, the within-family distributions in the present generation will be given by a multivariate normal, with variance determined by that in the ancestral population and probabilities of identity by descent that can be deduced from the pedigree.If some traits in the pedigree are unknown, then averaging with respect to the ancestral distribution, the multivariate normality is preserved.It was also shown that under some forms of epistasis, trait values within a family are still normally distributed, although the mean will no longer be a simple function of the traits in the parents (as there are epistatic components which cannot be observed directly).
We emphasize that as a result of selection, population structure, and so on, the trait distribution across the population can be far from normal; the infinitesimal model as we define it only asserts that the within-family distributions of the genetic component of the trait are Gaussian, with a variance-covariance matrix that is determined entirely by that in an ancestral population and the probabilities of identity determined by the pedigree.Moreover, as a result of the multivariate normality, conditioning on some of the trait values within that pedigree has predictable effects on the mean and variance within and between families.In other words, knowing the trait values for some individuals in the population does not distort the multivariate normality of the distribution of the unobserved traits, and the mean and covariances of these traits may be derived explicitly (albeit after rather tedious calculations).
In this paper, we show that this extraordinary robustness of the infinitesimal model extends to include dominance.The distribution of the genetic part of the trait will once again be a multivariate normal distribution whose mean and variance is expressed in terms of the variance components in an ancestral population and probabilities of identity by descent determined by the pedigree, but now, with just first-order dominance effects, the identities required will involve up to four genes.As with the case of epistasis, the mean is not a simple function of the trait values in the parents, and there is nontrivial covariance between families.One can think of the genetic component of the trait values within a family as consisting of two parts.Both are normally distributed.In the additive case, the first reduces to the mean of the trait values of the parents; with dominance it will be random (even if we condition on knowing the parental traits), but the same for all individuals in the family.What is at first sight surprising is that even if we condition on knowing the trait values of the parents, this shared quantity is normally distributed.Assuming there is no mutation to ease the presentation [the effect of mutation was studied in Barton et al. (2017)], our first contribution is to show how to calculate its mean and variance from knowledge of variance components in the ancestral population and the pedigree, both with and without knowledge of the trait values of the parents.Knowing the trait values of the parents shifts the mean in a predictable way, the variance is independent of the parental trait values.The second part of the trait value, which is independent for each offspring in the family, is independent of the first; it encodes the randomness of Mendelian inheritance.It is a draw from a normally distributed random variable with mean zero and variance again determined by the pedigree and variance components in the ancestral population.It is not affected by conditioning on parental trait values.This segregation of the trait into a shared part and a residual part that is independent for each member of a family is not the classical subdivision into additive and dominance components, but it arises naturally both in the formulation of the infinitesimal model and in its derivation as a limit of Mendelian inheritance for a large number of loci each of small effect.We give a more mathematical description of it in Equation (2).
Our work can be seen as an extension of that of Abney et al. (2000), who establish sufficient conditions for a Central Limit Theorem to be applied to the vector of trait values in the presence of dominance and inbreeding.Our second contribution in this work is to establish the magnitude of the error in that normal approximation, verify that in conditioning on the trait values of the parents of an individual we are not (unless those traits are very extreme or the pedigree is very inbred) leaving the domain where the normal approximation is valid, and write down the effect of knowing those parental trait values on the distribution of the individual's own trait.A careful statement of our results can be found in Theorems 4 and 5.The notation we shall need is rather involved, but in a nutshell, we shall write the trait  Z i of a given diploid individual i in generation t as the sum over M loci of per-locus allelic effects that are functions of the allelic states χ 1 l , χ 2 l of the two genes of individual i at locus l, plus an environmental contribution E i (that we shall assume to be Gaussian): Here, ̅ z 0 is the average trait value in the ancestral population (itself a sum of average allelic effects) and the sum encodes the contribution of all loci to the deviation from this average [each per-locus deviation being of order 1/ ��� M √ , see Barton et al. (2017) and the third section below for a justification].In this sum, the term η l (χ 1 l ) + η l (χ 2 l ) models the additive part of the contribution of locus l and ϕ l (χ 1 l , χ 2 l ) models the part due to dominance.Assuming Mendelian inheritance and no linkage between the M loci, at each locus the allelic state χ 1 l is a copy of the allelic state of one of the two genes in the "first" parent of i, chosen at random, and χ 2 l is a copy of the allelic state of one of the two genes in the "second" parent of i, again chosen uniformly at random.Writing for the alleles at locus l in the first parent and for the alleles in the second parent, we can then write the sum over all loci in Equation (1) as the sum of an average parental contribution (shared by all offspring of these parents), and a residual term of mean zero that encodes the stochasticity of Mendelian inheritance (the actual genetic contribution of the parents minus their average contribution).To avoid introducing even more notation, here we simply write R i A and R i D for the parts of the residual due to the additive terms and to the dominance terms respectively.Explicit formulae are given in Equations ( 24)-( 27).Doing so, we obtain (2) The genetic component of the trait can thus be seen either as the sum of an additive part (A i + R i A ) and a dominance part (D i + R i D ), or as the sum of a shared part (A i + D i ) and a residual part . Following the same strategy as in Barton et al. (2017), in Theorem 4 we show that even conditionally on (i.e.knowing) the parental traits  Z i [1] and  Z i [2] , as M tends to infinity the residual part converges in distribution to a Gaussian distribution with mean 0 and a variance depending only on variance components in the ancestral population and on the probability of identity by descent between two parental genes (which is fully determined by the pedigree).Crucially, the limiting variance does not depend on the parental traits.This convergence happens at a rate proportional to 1/ ��� M √ . Turning to the shared part, we use a different approach to prove that conditional on  Z i [1] and  Z i [2] , A i + D i also converges to a Gaussian distribution as M tends to infinity.Again, the nonzero mean and the variance of the limiting normal distribution can be fully described, the variance is independent of the parental traits and the convergence happens at a rate propor- . This is the content of Theorem 5, in the special (and most difficult) case when individual i was produced by selfing.For both the shared and the residual parts, the rate of convergence deteriorates when the pedigree is too inbred (leading to probabilities of identity by descent close to 1 between some pairs of parental genes), or when some traits in the population are too extreme (as knowing the trait value then gives us too much information about the unobserved underlying allelic states).
Our derivation of the infinitesimal model as the limit of a finitelocus model has two interesting corollaries.First, as mentioned above, we obtain that the error made by approximating the trait distribution within a family by a Gaussian distribution increases by a quantity of order 1/ ��� M √ in each generation.Consequently, for very large M, we expect the infinitesimal model with dominance to be valid for a time of the order of ��� M √ generations, provided the population is not too inbred and no too extreme traits appear in the meantime.Second, the set of technical lemmas that are key to the proofs of these results, presented in Appendix E, show that the infinitesimal model leaves essentially no signature on the allele frequencies at any given locus: even knowing the ancestral traits, the distribution of the allelic state at a single locus in a given individual is barely distorted by selection acting on the trait and the result is that, at the population level, the allelic distribution evolves in an essentially neutral way.In particular, its variance only depends on the variance of the allele distribution in the ancestral population and on identities by descent, that are not changed by knowledge of the trait values.
The rest of the paper is organized as follows.In the next section, we define the identity coefficients (that is, probabilities of identity by descent) that we shall need to formulate the model precisely.We show how to compute them knowing the population pedigree in Appendix A and provide the corresponding Mathematica code in supplementary material (Barton 2023).Next, we spell out the model in terms of quantities that are familiar from classical quantitative genetics, and we explore its accuracy numerically in a devoted section.Finally, we derive this extension of the infinitesimal model as a limit of a model of Mendelian inheritance on the pedigree.The calculations are somewhat involved, and almost all will be relegated to the appendices.We must modify the strategy of Barton et al. (2017), which, although valid for the part of the trait value which is independent for each individual within the family, does not suffice for proving normality of the part of the trait value that is shared by all individuals within a family.To prove that this is normally distributed requires a new approach, based on an extension of Stein's method of exchangeable pairs.To keep the expressions in our calculations manageable, we satisfy ourselves with presenting the details only in the case in which we condition on knowing the trait values of the parents of an individual, in contrast to the additive case of Barton et al. (2017), in which we conditioned on knowing all the trait values in the pedigree right back to the ancestral generation.Our approach could readily be extended to conditioning on knowledge of more trait values, which amounts to conditioning a multivariate normal on some of its marginals.In Appendix H, we present the new ideas that are required to control the way in which errors in the infinitesimal approximation accumulate from knowledge of trait values of more distant relatives in the presence of dominance.
Just as in the additive case, the key will be to show that because many different combinations of allelic states are consistent with the same trait value, knowledge of the pedigree, and the trait values of the parents of an individual in that pedigree, actually gives very little information about the allelic state at a particular locus in that individual, or about correlations between two specific loci.An important consequence of this is that, in practice, it is going to be hard to observe signals of polygenic adaptation, because even a large shift in a trait caused by strong selection does not yield a prediction about alleles at a particular locus.

Identity coefficients
In the case of an additive trait, the infinitesimal model can be expressed in terms of the variance in the ancestral population (that is, the base population which we shall call generation zero) and twoway identity coefficients at a single locus.Recall that two genes at a given locus are identical by descent if their allelic states are identical and were inherited from a common ancestor.Since we assume that individuals are diploid, we need to specify which genes we consider when defining the identity coefficients.
For two distinct individuals i and j in the same generation, we define F ij to be the probability of identity by descent between two genes (at a given locus), one taken uniformly at random among the two genes of individual i and one taken at random among the two genes of individual j.When i = j, F ii is defined to be the probability of identity by descent of the two distinct genes in the diploid individual i.
The definition naturally extends to subsets of three or four genes taken from two distinct individuals (again, at a given locus), for which we shall talk about three-and four-way identities.These quantities will be required to state our results below.
We use F 122 for the probability that the two genes in individual 2 are identical by descent and they are identical by descent with a gene chosen at random from individual 1.We write F 1122 for the probability that all four genes across individuals 1 and 2 are identical by descent; this corresponds to the quantity δ in Walsh and Lynch (2018, Chapter 11).We need an expression for the probability that each gene in individual 1 is identical by descent with a different gene in individual 2 and all four are not identical.We shall denote this by  F 1212 .This is denoted by (Δ − δ) in Walsh and Lynch (2018).Finally, we need the probability that the two genes in individual 1 are identical, as are the two genes in individual 2, but the four genes are not all identical, which we shall denote by  F 1122 .We illustrate the three-and four-way identities in Fig. 1.
During the course of our mathematical derivations, it will be convenient to express all two-, three-, and four-way identities in terms of the nine possible four-way identities (Walsh and Lynch 2018, Fig. 11.5).This is illustrated in Fig. B1.
In Appendix A, we discuss how to compute these identity coefficients given a pedigree.From now on we simply write "identity" instead of "identity by descent."

The infinitesimal model with dominance
For ease of exposition, in this section we leave aside the environmental component of the trait value and we focus on its genetic component, which we denote by Z [so that in the notation of Equation (1),  Z = Z + E].We first introduce the different quantities that are involved in this component of the trait value in a rigorous way, most of which were already hinted at in the Introduction, and then we compute the mean and variance of the shared and residual parts of Z with and without knowledge of the parental traits.
The population is diploid and trait values are determined by the allelic states at M unlinked loci.Each locus thus corresponds to a pair of genes.We assume that in generation zero (i.e. in the "ancestral" population), the individuals that found the pedigree are unrelated and sampled from an ancestral population in which all loci are in linkage equilibrium and are in Hardy-Weinberg equilibrium (that is, in the ancestral population the two allelic states at each locus in a given individual are sampled independently of each other and therefore the probability that an individual carries a given pair of alleles is given by the product of the probabilities of each allele being sampled).In order to define the various quantities that enter into our model, we introduce notation to express the trait as a sum of effects over loci.However, we emphasize that once these components, all of which are familiar from classical quantitative genetics, have been calculated for the ancestral population, the model can be defined without reference to the effects of individual loci.
To adhere to the notation of Barton et al. (2017), we use χ 1 l , χ 2 l for the allelic states of the two genes at locus l in a given individual in the pedigree.When we talk about the distribution of the allelic state of a single gene, we drop the superscript 1 or 2 and simply write χ l .We write ̅ z 0 for the mean trait value in the ancestral population and express the trait value of an individual as ̅ z 0 plus a sum of allelic effects.The influence of each locus will scale as 1/ ��� M √ , where M is the total number of loci (assumed large).We write η l (χ l ) to denote the (order one) scaled additive effect of the allele χ l and ϕ l (χ 1 l , χ 2 l ) for the scaled dominance component (where ϕ l is assumed to be a symmetric function of the two allelic states χ 1 l and χ 2 l ).That is, the total contribution of locus l to the trait value will be of the form We shall assume that both η l and ϕ l are uniformly bounded (i.e. they will all take their values in some finite interval [ −B, B]).We also suppose that dominance effects are sufficiently "balanced" that inbreeding depression is finite at least in the ancestral population.More precisely, let χl denote an allele sampled at random from the distribution of alleles at locus l in the ancestral population, then ι defined by (3) is bounded (as a function of M).This condition is crucial to our result.It is not obvious that it can hold, as the number of terms in the sum grows linearly with M while the scaling factor 1/ ��� M √ decreases much more slowly.Such a uniform bound is possible for instance if we consider a situation in which the contributions of the different loci compensate each other in a "random-walk-like" way, i.e. each expectation is either positive or negative (by the same amount, say), and the number of positive and negative expectations differ by at most O( ��� M √ ).An example is presented at the beginning of the section on numerics.Note however that the quantity ι may be bounded uniformly in M for many other reasons.For simplicity, we do not consider higher order dominance components (that is D × D-or more complex-components) here.
Remark 1.Note that χl is the random variable describing a draw from the distribution of allelic states at locus l in the ancestral population (generation 0), while we use χ l to denote the allelic state at locus l in a given individual in the pedigree (living in generation t, say).A priori, the law of χ l is a biased version of the law of χl , obtained after letting selection and drift act over t generations, but in Appendix E we shall show that, in effect, this distortion is very small for each given locus, and χl and χ l have the same distribution up to a small error even if we condition on knowing the parental (or ancestral) trait values.
For an individual in the ancestral population, its allelic states at locus l, which we denote by  χ 1 l ,  χ 2 l , are independent draws from a distribution  ν l on possible allelic states that we assume is known.
It is convenient to normalize so that E[η l ( χ l )] = 0, E[ϕ l ( χ 1 l ,  χ 2 l )] = 0, and for any value x ′ of the allelic state at locus l, the conditional expectations We explain in the section on modeling Mendelian inheritance why these assumptions do not result in a loss of generality.The genetic component of the trait value takes the form [compare with Equation (1), the expression for the observed trait including environmental noise] Let us write i[1] and i [2] for the parents of the individual labeled i.
As advertised in the Introduction, the genetic component of an offspring's trait value has two contributions.The first one is shared by all its siblings, and is a random quantity which is characteristic of the family.The second contribution is unique to the individual and independent of the first one.In our proofs, we shall investigate these two parts separately.We shall use the notation , where the shared part has been further subdivided into the contribution A i from the additive component, and the contribution D i from the dominance component.The residuals R i A and R i D are determined by Mendelian inheritance and correspond to the contributions from the additive and dominance components respectively.Explicit expressions for these quantities are in Equations ( 24)-( 29) below.In this notation, the additive part of the trait value is A i + R i A and the dominance deviation is

Trait values for a given pedigree
We now define the infinitesimal model in terms of classical quantities of quantitative genetics that can be expressed in terms of expectations in the ancestral population and identities determined by the pedigree.We use the notation of Walsh and Lynch (2018), which we recall in Table 1.Under the infinitesimal model, conditional on the pedigree, the components (A i + D i ) and (R i A + R i D ) of the trait values of individuals in a family follow independent multivariate normal distributions.In Appendix B, the expressions presented in this section will be justified by taking the trait values determined by Equation (4) under a model of Mendelian inheritance.In writing down the infinitesimal model, we shall assume that as the number of loci tends to infinity, the quantities defined in the top part of Table 1 converge to well-defined limits.
To simplify notation, we shall use 1 and 2 in place of i[1] and i[2] in our expressions for identity; thus, for example, , and F 11 will be the probability of identity by descent of the two genes in parent i [1].The mean and variance of (A i + D i ) are then and In this expression, the term proportional to σ 2 A is the variance of A i , the term proportional to σ ADI is twice the covariance of A i and D i and the remaining sum gives the variance of D i .Recall that we are assuming here that the ancestral population is in linkage equilibrium.With linkage disequilibrium there is an additional term, c.f. the remark below Equation ( 11).The components (A + D) are also correlated across families.For individuals labeled i and j, respectively, Note that, in contrast to our expression for the variance of Z i , in this expression, the subscripts i and j in the identities refer to the individuals themselves, not their parents; for example, the expression F ij is the probability of identity of two genes, one sampled at random from individual i and one sampled at random from individual j.We reserve letters for individuals in the current generation, and numbers for their parents.
If we combine the components R i A and R i D that segregate within families, we have that the sums (R i A + R i D ) are independent of each other (due to the independence of the variables encoding Mendelian inheritance), mean zero, normally distributed random variables with variance Here again, the term proportional to σ 2 A is the variance of R i A , the term proportional to σ ADI is twice the covariance of R i A and R i D , and the remaining sum equals the variance of R i D .We calculate the mean, variance, and covariance of these different components in Appendix B. In order to recover the mean and variance of the trait values, we add the contributions of (A i + D i ) and (R i A + R i D ) and observe that the identity F 12 in our expressions for the variances of these quantities (which we recall was the probability of identity of one gene sampled at random from each of the parents i[1], i[2] of our individual) corresponds to F ii .This yields that, conditional on the pedigree, and For a single individual, its trait value can only depend on the two alleles that it carries at each locus, so it is no surprise that this expression depends only on pairwise identities between those two genes.We remark that Equation ( 11) differs from the corresponding expression [Equation (11.6c) in Walsh and Lynch (2018)].To recover exactly their expression, one must add ( f − F 2 ii )(ι 2 − ι * ) to the right-hand side, where f is the probability of identity at two distinct loci in individual i.We see how to recover this term in Remark B1, but because we have assumed linkage equilibrium in our base population, for the period over which the infinitesimal model remains a good approximation, under our assumptions we ii .This is not to say that there is not a significant contribution to the trait value from linkage disequilibrium; it is just that for any specific pair of loci it is negligible.We shall see a toy example that reinforces this point at the beginning of the section on modeling Mendelian inheritance.
We emphasize again that our partition of the trait values into a contribution that is shared by all individuals in a family and residuals differs from the conventional split into an additive part and a dominance deviation.The additive part of the trait is From our calculations in Appendix B, we can read off and Remark 2. Notice that the purely additive case can be simply recovered by taking ϕ l ≡ 0, so that D i = 0 = R i D , and σ 2 A is the only nonzero variance coefficient.This yields and finally Table 1.Coefficients of classical quantitative genetics (top) and elements of individual trait decomposition (bottom).

Additive variance
Sum of squared locus-specific inbreeding depressions Variance of dominance effects in inbred individuals Covariance of additive and dominance effects in inbred individuals Additive part of the shared component We use  χ l to denote an allelic state sampled from the distribution  ν l of possible allelic states at locus l in the ancestral population;  χ 1 l ,  χ 2 l are independent draws from the same distribution.

Conditioning on trait values of parents
Under the infinitesimal model, the trait values of individuals across the pedigree are given by a multivariate normal.Therefore, standard results on conditioning multivariate normal random vectors on their marginal values, which for ease of reference we record in Appendix C, allow us to read off the effect on the distribution of Z i of conditioning on Z i [1] and Z i [2] .However, a little care is needed; we shall be justifying the normal distribution within families as an approximation as the number of loci tends to infinity, and we must be sure that asymptotic normality is preserved under this conditioning.We shall see that if, for example, parental trait values are too extreme, then the conditioning pushes us to a part of the probability space where the normal approximation breaks down.This is particularly evident in the toy example that we present in the section on modeling Mendelian inheritance.A justification for asymptotic normality even after conditioning is outlined in that section, and details are presented in the appendices.
Just as in the classical infinitesimal model, the mean and variance of the residuals R i A + R i D are unchanged by conditioning on the trait values of the parents [recall that these residuals encode the stochasticity due to Mendelian inheritance at each locus; expressions for R i A and R i D are given in Equations ( 24)-( 27)].For the shared components, the mean and variance will be distorted by quantities determined by the covariances between (A i + D i ) and Z i [1] , Z i [2] .Let us write with a corresponding definition for C (i, i[2]).Then, once again using 1 and 2 in place of i[1] and i[2] in our expressions for identities, with C(i, i[2]) given by the corresponding expression with the roles of the subscripts 1 and 2 interchanged.(A derivation of this expression is provided in Appendix B.) With this notation, and the expression is simpler as we are then conditioning a bivariate normal on one of its marginals.) Remark 3. In the purely additive case, things simplify greatly.From the expressions above, before conditioning, the mean of A i + D i is zero (since ι = 0), and the variance is Moreover, Substituting into Equations ( 17) and ( 18), and observing that we find that conditional on the trait values of the parents, the mean and variance of )/2 and zero, respectively, and we recover the classical infinitesimal model.
Although in the presence of dominance the expressions ( 17) and ( 18) are rather complicated, we emphasize that they are derived from knowledge of just the ancestral population and the pedigree, and are expressed in terms of familiar quantities from classical quantitative genetics.

Numerical examples
In this section, we present numerical examples to illustrate the accuracy of the predictions of the infinitesimal model, again disregarding the environmental component of the trait.
We first generated a pedigree for a population of constant size of N = 30 diploid individuals over 50 discrete generations.Mating is random, but with no selfing.In order to facilitate comparison of different scenarios, the same pedigree was used for all subsequent simulations.In this way, the identity coefficients are held constant.As expected, the mean probability of identity between pairs of genes sampled from different individuals in generation t is close to 1 We define a trait, Z, which depends on M = 1,000 bi-allelic loci.There is no epistasis, so that the trait value is a sum across loci.In the examples here, we assume complete dominance, so that the effects of the three genotypes at each locus are either −α : − α : + α or −α : + α : + α.In order to ensure that the inbreeding depression ι is bounded, we need to have some "balance" and so we choose the effects at each locus according to an independent Bernoulli random variable with parameter H; that is, the probability that the effects across the three genotypes at locus l is −α : − α : + α is 1 − H, independently for each locus.The effect size α is taken to be 1/ ��� M √ for all loci and H = 1 2 + 2 �� M √ .With these choices the additive and dominance variances will be O(1).
In the ancestral population, the allele frequencies were generated to mimic neutral allele frequencies with very low mutation rates, but conditioned to segregate at each locus.Thus, allele frequencies at every locus were sampled independently and according to a distribution with density proportional to (p(1 − p)) 1−ϵ , with ϵ = 0.001, but with those in [0, 1/60] and [1 − 1/60, 1] discarded (and the distribution renormalized).Then for each population replicate, these frequencies were used to endow each individual in the base population with an allelic type at every locus.
Variance components are defined with respect to this reference set of allele frequencies.For the population generated for the examples presented here, these values were σ 2 A = 0.269, σ 2 D = 0.073, and the inbreeding depression ι = −0.531.The additive and dominance components are uncorrelated in the base population (Cov(A, D) = 0).In the numerical experiments that follow, each replicate population is started at time zero from a different collection of genotypes, sampled from this base distribution.
We first simulated a neutral model.Figure 2 illustrates how the different components of the trait values change over fifty generations of neutral evolution.Recall that we always use the same realization of the pedigree.For each replicate, we take an independent sample of allelic types at time zero.For each individual in the pedigree we evaluate the additive and dominance components A and D and then in each generation we calculate the mean and variance of these quantities across the 30 individuals in the population.This is only intended to give some feeling for the ways in which the components fluctuate through time.Of course the infinitesimal model is only providing a prediction for the distribution of trait values within families; a single realization will see substantial contributions to trait values from linkage disequilibrium (c.f. the toy example in the section on modeling Mendelian inheritance and Theorem 5).In the following figures, we compare these quantities to the detailed predictions of the infinitesimal model.The top row in Fig. 2 is a single replicate, while the bottom is the average over 300 replicates.On the left, we have the mean of the additive and dominance components and their sum; on the right, we have plotted the variance components.For a single replicate, there is indeed a substantial contribution from linkage disequilibrium.When we plot just the genic components (that is the sum over variances at each locus, ignoring the contribution from linkage disequilibrium), as expected, the picture is much smoother and we see that the predictions of the infinitesimal model are close to the values obtained by averaging over 300 replicates.Since linkage disequilibrium will dissipate rapidly, halving in each generation, it is the genic component that determines the long term evolution.
All components are measured relative to the base population.In practice, in natural populations, one does not have access to the ancestral population and so one measures components relative to the current population.This amounts to a change of reference Hill et al. (2006).We do not do this in our setting, as it would result in different variance components for every replicate.
In Fig. 3, we explore the relationship between the dominance deviation and inbreeding.Since we use the same pedigree for all our  12)-( 14) (note that the identity coefficients F ii increase through time due to genetic drift).
experiments, each individual is characterized by a single F ii (the probability of identity of the two genes at a given locus).For each of 1,000 replicates (that is independent samples of allelic types for the individuals in generation zero), we calculated the dominance deviation for each individual in the pedigree.The plot in Fig. 3 shows the dominance deviation averaged over those 1,000 replicates for each individual in the pedigree.Thus, there are 30 points in each generation, one for each individual in the population.As expected, the mean of the dominance component decreases in proportion to F ii , E[D] = −0.53Fii (recall that ι = −0.53 for our base population).
Figure 4 shows how the (co)variance of A and D depends on identity F ii for pairs of individuals in the pedigree.As in Fig. 3, for each individual in the pedigree, A and D are calculated for each of the 1,000 replicates; Fig. 4 shows the variances and covariances of the resulting values for each of the 30 individuals in generations 5, 10, 20, and 40 and these are compared to the theoretical predictions.Note that since in the bi-allelic case σ 2 D = ι * , the expression (14) for the variance of the dominance component reduces to Next, we consider the variances of the residuals R A and R D within families.One hundred pairs of parents were chosen at random from the population, and from each 1,000 offspring were generated.This was repeated for 10 replicates made with the same pedigree and the same set of parents; within-family variances were then averaged over replicates.In Fig. 5, in each plot there are 100 points, one for each pair of parents.The two lines correspond to least square regression (blue) and theoretical predictions (red) which can be read off from Equation (8).For readability, in the figure we use the notation V RA , V RD , and V RA,RD to denote the variance of R A , the variance of R D and the covariance between R A and R D , respectively.Using Equation ( 8) and the explanation below, together with the fact that σ 2 D = ι * in our bi-allelic case, we have )/2 is the within-individual identity averaged over parents 1 and 2; and where F (3) is defined as follows: The full force of our theoretical results is that even if we condition on the trait values of parents, the within-family distribution of their offspring will consist of two normally distributed components and, in particular, the variance components will be independent of the trait values of the parents.We test this by imposing strong truncation selection on the population.We retain the same pedigree relatedness,   13) and ( 14). but working down the pedigree, each individual's genotype is determined by generating two possible offspring from its parents and retaining the one with the larger trait value.In Fig. 6, we compare the results with simulations of the neutral population.Dashed lines are for the neutral simulations, solid ones for the simulation with selection.For the population under selection, we see an immediate drop in the total genetic variation, caused by the strong selection; there is significant negative linkage disequilibrium between individual loci, as predicted by Bulmer (1971).The blue is the additive component.
We see that about one-third of the variance is dominance variance.The bottom row shows that the genic components are hardly affected by selection, as predicted by the infinitesimal model.With or without selection, the variance components change as a result of inbreeding.
Finally, Fig. 7 compares the variance components at 50 generations for neutral simulations with those with truncation selection as the number of loci increases from M = 100 to M = 10 4 .Replicate simulations were generated as in Fig. 6.Under the infinitesimal model, these components should take the same values with and without selection.This is reflected in the simulations, with the covariance between the additive and dominance effects being the slowest to settle down to the infinitesimal limit.

The infinitesimal model with dominance as a limit of Mendelian inheritance
In this section, we turn to the justification of our model as a limit of a model of Mendelian inheritance as the number M of loci tends to infinity.Although we shall focus on the distribution of the genetic components of the trait values in the pedigree, in this section we consider the general situation where the observed trait of an individual,  Z i , is the sum of a genetic component Z i and an environmental component E i .Our mathematical assumptions on E i are detailed in Main results below.
Our work is an extension of that of Abney et al. (2000), which in turn builds on Lange (1978).The distinctions here are that we explicitly model the component of the trait value that is shared by all individuals in a family separately from the part that segregates within that family; we identify the effect on each of these components of conditioning on knowing the trait values of the parents of the family; and we estimate the error that we are making in taking the normal approximation, thus providing information on when the infinitesimal approximation breaks down.
The fact that the genetic component of trait values within families is normally distributed is a consequence of the Central Limit Theorem.That this remains valid even when we condition on the trait values of the parents stems from the fact that knowing the trait value of an individual actually provides very little information about the allelic state at any particular locus.This in turn is because, typically, there are a large number of different genotypes that are consistent with a given phenotype.In Barton et al. (2017), this was illustrated through a simple example which can be found on p. 402 of Fisher (1918), which concerned an additive trait in a haploid population.Here we adapt that example to the model for which we performed our numerical experiments.
Suppose then that we have M bi-allelic loci.We denote the alleles at locus l by a l and A l .The contributions to the trait of the three genotypes a l a l , a l A l and A l A l are −α, −α, α respectively with . For simplicity, in contrast to our numerical experiments, we suppose that the probabilities of genotypes a l a l , a l A l , A l A l are 1/4, 1/2, 1/4 respectively.Now suppose that we observe the trait value to be k/ ��� M √ . What is the conditional probability that the allelic types at locus l, which we denote χ 1 l χ 2 l are A l A l ?For definiteness, we take M and k both to be even and l = 1.
First consider the probability that the contribution to the trait value from locus 1 is +1/ ��� M √ .Let us write p + for the (unconditional) probability that the contribution from locus 1 is 1/ ��� M √ , that is for the contribution to the trait from locus l.We have ) .
An application of Bayes' rule then gives Similarly, and In view of the Central Limit Theorem, we would expect a "typical" value of k to be on the order of ��� M √ ; conditioning has only perturbed the probability that which we expect to be of order 1/ ��� M √ .In the purely additive case, which corresponds to taking p + = p − = 1/2, at the extremes of what is possible (k = ±M), we recover complete information about the values of χ 1 1 , χ 1 2 ; however, with dominance that is no longer true.
Notice that for the difference between the trait value of an individual and the mean over the population to be order one requires order ��� M √ of the loci to be "nonrandom," but observing the trait does not tell us which of the possible M loci these are.Similarly, performing the entirely analogous calculation for pairs of loci, and observing that we deduce that, For a "typical" trait value the last term in Equation ( 19) is order 1/M.When we sum over loci, this is enough to give a nontrivial contribution to the trait value coming from the linkage disequilibrium.However, although observing the trait of a typical individual tells us something about linkage disequilibria, it does not tell us enough to identify which of the order M 2 pairs of loci are in linkage disequilibrium.
Essentially the same argument will apply to the much more general models that we develop below.In particular, for the infinitesimal model to be a good approximation, the observed parental trait values must not contain too much information about the allelic effect at any given locus, which requires that the parental traits must not be too extreme [corresponding to k in our toy model being O( ��� M √ )].In the additive case, it was enough to control the additional information that we gained about any particular locus from  A, D); black, blue, red, purple).The bottom row is the changes to genic variances with time against predictions of the infinitesimal model.The values are averages over 300 replicates for the neutral case, 1,000 for the selected case, made with the same pedigree.There are M = 1,000 loci, and thus we expect the infinitesimal model to be accurate for about ��� M √ ∼ 30 generations.Selection is made within families; for each offspring, two individuals are generated from the corresponding parents, and the one with the larger trait value retained.knowledge of the trait value in the parents.This is because, in that case, the variance of the shared contribution within a family is zero and independent Mendelian inheritance at each locus ensures that linkage disequilibria do not distort the variance of the residual component that segregates within families.With dominance, we must estimate the (nontrivial) variance of the shared component, and for this we shall see that we need to control the build up of linkage disequilibrium between pairs of loci.It will turn out that since all pairs of loci are in linkage equilibrium in the ancestral population, any given pair of loci will be approximately in linkage equilibrium for the order ��� M √ generations for which the infinitesimal approximation is valid.
This does not mean that the linkage disequilibria do not affect the trait values, but because of the very many different combinations of alleles in an individual that are consistent with a given trait, observing the trait tells us very little about the allelic state at a particular locus.The allele at that locus can only ever contribute O(1/ ��� M √ ) to the overall trait value.As the population evolves, and we are able to observe more and more traits on the pedigree, we gain more and more information about the allele that an individual carries at a particular locus.In Barton et al. (2017), we considered an additive trait in a population of haploid individuals.In that setting, we showed that for a given individual, one does not gain any more information about the state at a given locus from looking at the trait values on the whole of the rest of the pedigree than one does from observing just the parents of that individual.In our model for diploid individuals with dominance, this is no longer the case; observing the trait values of any relatives, no matter how distant, provides some additional information about the allelic state at a locus.The difference arises from the fact that the contribution that a gene makes to the trait value of an individual depends not only on its own allelic state, but also on that of the other copy of the gene at that locus.As a result, we gain information about the allelic state in a focal individual by observing trait values in any other individuals in the pedigree with which it may be identical by descent at that locus.However, the amount of information gleaned about the allelic state of an individual from observing new individuals in the pedigree will decrease in proportion to the probability of identity, and so for distant relatives in the pedigree is very small; provided our pedigree is not too inbred, and trait values are not too extreme, we can still expect the infinitesimal model to be a good approximation for order ��� M √ generations.

Environmental noise
Our derivations will depend on two approaches to proving asymptotic normality.The first, which we apply to the portion R i A + R i D of the trait values, uses a generalized Central Limit theorem (which allows for the summands to have different distributions), which provides control over the rate of convergence as M → ∞.(It is this control that tells us for how many generations we can expect the infinitesimal model to be valid.)However, the Central Limit Theorem guarantees only the rate of convergence of the cumulative distribution function of the normalized sum of effects at different loci.Our proofs exploit convergence to the corresponding probability density function, which may not even be defined.To get around this, we can follow the approach of Barton et al. (2017) and make the (realistic) assumption that rather than observing the genetic component of a trait directly, the observed trait has an environmental component with a smooth density.This results in the trait distribution having a smooth density which is enough to guarantee the faster rate of convergence.In addition to the benefit in terms of regularity of the trait distribution, an environmental noise with a smooth distribution also reinforces the property that observing the trait value gives us very little information on the allelic state at a given locus: a continuum of combinations of genetic and environmental components may have led to the observed trait, in which each given locus contributes an infinitesimal amount.(To ensure sufficient regularity of the trait density, we could instead make the assumption that the distribution of allelic effects at every locus has a smooth probability density function.)The approach to proving asymptotic normality of the shared component uses an extension of Stein's method of exchangeable pairs.Once again in the presence of environmental noise (to ensure that the trait distribution has a smooth density) we recover convergence with an error of order 1/ ��� M √ .
, respectively (see supplementary material for details).Thus, convergence is somewhat faster than If the environmental component is taken to be normally distributed, then exactly as in Barton et al. (2017), we can adapt our application of Theorem C1 in Appendix C to write down the conditional distribution of the genetic components given observed traits; i.e. traits distorted by a small environmental noise, c.f. Remark F2.

Assumptions and notation
Recall that we assume that in generation zero, the individuals that found the pedigree are unrelated and sampled from an ancestral population in which all loci are assumed to be in linkage equilibrium.The allelic states at locus l on the two chromosomes drawn from the ancestral population will be denoted  χ 1 l ,  χ 2 l .They are independent draws from a distribution on possible allelic states that we denote by  ν l (dx).Without loss of generality, by replacing and observing that the second and third terms on the right-hand side are functions of  χ 1 l and  χ 2 l , respectively, which we may therefore subsume into η l ( χ l ), we may assume that for any value x ′ of the allelic state at locus l, the conditional expectation As a consequence, partitioning over the possible values of  χ 2 l , we have that the cross variation term With this modification of ϕ l (x, x ′ ), Moreover, still without loss of generality, by absorbing the mean into ̅ z 0 , we may assume that In this notation, the genetic component of the trait of an individual in the ancestral population (which we denote by Ẑ to make it clear that the following property is specific to individuals in generation 0) is and by Equations ( 22) and ( 23), we have We assume that the scaled allelic effects η l , ϕ l are bounded; |η l |, |ϕ l | ≤ B, for all l.We also assume that all the quantities in the top part of Table 1 exist in the limit as M → ∞.

Inheritance
We now need some notation for Mendelian inheritance.Recall that i[1] and i[2] are the labels of the parents of individual i in our pedigree, each of which contributes exactly one gene at each locus in a given offspring.Mendelian inheritance translates into the property that the gene passed on by parent i[1] was the one inherited from its own "first" parent (i[1])[1] with probability 1/2, or from its "second" parent (i[1])[2] with probability 1/2.Even though we do not distinguish between males and females, it is convenient to think of the chromosomes in individual i as being labeled 1 and 2, according to whether they are inherited will denote the allelic states of the two genes at locus l in parent i[1], respectively inherited from its own "first" and "second" parent.Again following the conventions of Barton et al. (2017), extended to account for the fact that we are now considering diploid individuals, we use independent Bernoulli(1/2) random variables, X i l , Y i l to determine the inheritance of genes 1 and 2, respectively, at locus l in individual i.Thus, X i l = 1 if the allelic state of gene 1 at locus l in individual i is inherited from gene 1 in i[1], and In this notation, the trait of individual i in generation t is given by where and The terms A i and D i are shared by all descendants of the parents i[1] and i[2].In the third section of this paper, we presented the mean and variance of their sum, conditional on the pedigree P(t).The sums (24)+( 25) and ( 26)+( 27) comprise what we previously called R i A and R i D , respectively; each has mean zero.They capture the randomness of Mendelian inheritance.They are uncorrelated with A i + D i .Again, in a previous section we gave expressions for the variances and covariance of R i A and R i D in terms of the ancestral population and identities generated by the pedigree.These calculations allowed us to identify the mean and variance of the parts A i + D i and R i A + R i D in terms of the classical quantities of quantitative genetics in Table 1.Since we are assuming unlinked loci, the asymptotic normality of these quantities when we condition on the pedigree, but not on the trait values within that pedigree, is an elementary application of Theorem D2 in Appendix D, a generalized Central Limit Theorem which allows for nonidentically distributed summands.
In Barton et al. (2017), we showed that in the purely additive case, the vector (R i A ) Nt i=1 , which determines the joint distribution of the trait values within families in generation t (recalling that in the additive case R i D = 0), is asymptotically a multivariate normal, even when we condition not just on the pedigree relatedness of the individuals in generation t, but also on knowing the observed trait values of all individuals in the pedigree up to generation t − 1, which we denote by  Z(t − 1) (notice the difference between this notation and the notation  Z t for the observed trait of an individual living in generation t).Our main result extends this to include dominance, at least under the assumption that the ancestral population was in linkage equilibrium.
With dominance, the expression for the distribution of the mean and variance-covariance matrix of the multivariate normal Z 1 , . . ., Z Nt conditioned on the pedigree up to generation t and some collection of the observed trait values of individuals in that pedigree up to generation t − 1 is a sum of the quantities of classical quantitative genetics in Table 1, weighted by four-way identities and deviations of trait values from the mean.In principle, they can be read off from Theorem C1 in Appendix C.
We will focus on proving that conditional on knowing just the trait values of the parents of individual i and the pedigree, the components (A i + D i ) and (R i A + R i D ) are both asymptotically normal, but we explain why our proof allows us to extend to the case in which we also know trait values of other individuals.The importance (and surprise) is that given the pedigree relationships between the parents and classical coefficients of quantitative genetics for a base population (assumed to be in linkage equilibrium), knowing the traits of the parents distorts the distribution of their offspring in an entirely predictable way.In particular, this is what we mean when we say that the infinitesimal model continues to hold even with dominance.
The extra challenge compared to the additive case is that, in contrast to the part R i A + R i D , where Mendelian inheritance ensures independence of the summands corresponding to different loci even after conditioning on trait values, when we condition on trait values the terms in A i + D i will be (weakly) dependent and proving a Central Limit Theorem becomes more involved.

Main results
Recall that the trait values that we observe, and therefore on which we condition, are the sum of a genetic component and an independent environmental component; that is, the observed trait value is where, for convenience, the {E i } are independent N(0, σ 2 E )-valued random variables.We suppose that the environmental noise is shared by individuals in a family (so we can think of it as part of the component A i + D i of the trait value, whose distribution therefore also has a smooth density).
We write N t for the number of individuals in the population in generation t, (Z 1 t , . . ., Z Nt t ) for the corresponding vector of trait values, and P(t) for the pedigree up to and including generation t.A simple application of the Central Limit Theorem gives that is asymptotically distributed as a multivariate normal random variable as M → ∞.More precisely, let (β 1 , β 2 , . . ., β Nt ) ∈ R Nt , and write Z β =  Nt i=1 β i Z i t , then using Theorem D2, , for suitable constants C,  C (which can be made explicit), where N (z) is the cumulative distribution function for a standard normal random variable.The mean and variance of Z β can be read off from Equations ( 9), (10), and (11).
Our main results concern the components of the trait values of offspring when we condition on the observed trait values of their parents.The following result follows in essentially the same way as the additive case of Barton et al. (2017).
are asymptotically normally distributed, with an error of order 1/ ��� M √ .More precisely, for all z ∈ R, where , and we have used p(σ 2 , x) to denote the density at x of a mean zero normal random variable with variance σ 2 .The constants ′′′ depend only on the bound B on the scaled allelic effects.The variances in the expressions above are all calculated conditional on P(t − 1), but not on observed parental trait values.
Put simply, the normal approximation is good to an error of order 1/ ��� M √ ; the constant in the error term will be large, meaning that the approximation will be poor, if the within-family variance somewhere in the pedigree is small or if the observed trait values are very different from their expected values.Just as in the additive case, we could prove an entirely analogous result when we condition on any number of observed trait values in the pedigree, except that with dominance this is at the expense of picking up an extra term in the error for each observed trait value on which we Infinitesimal with dominance | 13 condition.The justification required for this is provided by Appendix H.
What is at first sight more surprising is that the shared component of the trait value within a family, i.e. the random variable A + D + E, is also asymptotically normally distributed, even when we condition on observed parental trait values.Note that the randomness of the shared component comes from the fact that the allelic states underlying the parental traits are still random (they are unobserved).In the case of a purely additive trait, it turns out that the shared component can be simply expressed as the average of the two parental traits and therefore conditioning on these traits renders the shared contribution totally deterministic, but such a simplification no longer occurs when we add dominance, due to the nonlinearity of the allelic contributions in D [see Equation ( 29)].Our proof of normality uses the fact that we consider the environmental noise to be shared by individuals within the family; in this way we can guarantee that the shared component of the observed trait value also has a smooth density.
We are only going to prove the result for the shared component of a family in generation one that was produced by selfing In what follows, for a given function h we write ‖h‖ for the supremum norm of h, and N μ,σ 2 (h) for the integral of h with respect to the distribution of an N (μ, σ 2 ) random variable (whenever this quantity makes sense): Theorem 5. Let W = A + D + E denote the shared component of the trait value in a family in generation one.Let h be an absolutely continuous function with ‖h ′ ‖ < ∞, then where μ W is given by Equation (F5), and σ 2 W is the sum of the variance of the environmental noise and the expression in Equation (F21).Remark 6. 1) Although we only prove that A i + D i + E i is asymptotically normal in this special case of an individual in generation one that is produced by selfing, the same arguments will apply in general.However, the expressions involved become extremely cumbersome.By considering selfing, we capture all the complications that arise in later generations (when distinct parents may nonetheless be related).
2) We do not record the exact bound on the constant C. It takes the same form as the error function C in Theorem 4, except that the constants C ′ ,  C ′ , C ′′ , C ′′′ depend on the inbreeding depression ι, as well as the bound B on the scaled allelic effects.In particular, just as there, the asymptotic normality will break down if the trait value of the parent is too extreme, or if the variance of the trait values among offspring is too small.3) Since we are assuming that the environmental noise has a smooth density, convergence in the sense of Equation ( 30) is sufficient to deduce that the cumulative distribution of In Fig. 8, we show the cumulative distribution functions of the additive and dominance parts of the shared and residual components of trait within 10 families after 20 generations of neutral evolution, with M = 1,000 loci.All 10 within-family distributions of R A , R D are close to Gaussian; they vary somewhat in slope, since families vary in identity coefficients (see Fig. 5), but this is not apparent in these plots.The normal approximation is better for the residual components than for the shared component.This may be due to the fact that the random variables encoding Mendelian inheritance at different loci are independent and identically distributed, which makes the summands in the expressions for R A and R D more weakly dependent than the summands in A and D, leading to faster convergence to a Gaussian distribution.This also explains why we need a more elaborate approach to show convergence of the shared parts to Gaussians.

Strategy of the derivation
Our first task will be to show that conditional on the pedigree, the distribution of the trait values in generation t is approximately multivariate normal (with an appropriate error bound).Since Mendelian inheritance ensures that (before we condition on knowing any of the previous trait values in the pedigree) the allelic states at different loci are independent, this is a straightforward application of a generalized Central Limit Theorem (generalized because the summands are not required to all have the same distribution).Just as in Barton et al. (2017), we can keep track of the error that we are making in assuming a normal approximation at each generation.In this way we see that, under our assumptions, the infinitesimal model can be expected to be a good approximation for order ��� M √ generations.The same Central Limit Theorem guarantees that the joint distribution of (Z i [1] , Z i [2] , A i + D i ) is asymptotically normally distributed as the number of loci tends to infinity.This certainly suggests that the conditional distribution of A i + D i given Z i [1] , Z i [2] should be (approximately) normal with mean and variance predicted by standard results on conditioning a multivariate normal distribution on some of its marginals (which we recall in Theorem C1).However, this is not immediate.It is possible that the conditioning forces the distribution on to the part of our probability space where the normal approximation breaks down.
To verify that the conditional distribution is asymptotically normal, we shall show that observing the trait value of an individual provides very little information about their allelic state at any particular locus, or any particular pair of loci, and consequently conditioning on parental trait values provides very little information about allelic states in their offspring.This is (essentially) achieved through an application of Bayes' rule, although some care is needed to control the cumulative error across loci.We use this to calculate the first and second moments of A i + D i conditional on  Z i [1] ,  Z i [2] .The fact that they agree with the predictions of Theorem C1 depends crucially on the assumption that dominance is "balanced," in the sense that the inbreeding depression ι is well defined.This quantity enters not just in the expression for the expected trait value of inbred individuals, but also in our error bounds, c.f. Remark F4.
Of course checking that the first two moments of the conditional distribution of A i + D i are (approximately) consistent with asymptotic normality is not enough to prove that the conditioned random variable is indeed (approximately) normal.Moreover, we cannot apply our generalized Central Limit Theorem to this term.Instead we use a generalization of Stein's method of "exchangeable pairs" (outlined in Appendix D), which relies on our ability to control the (weak) dependence between the contributions to A i + D i from different loci that is induced by the conditioning.We present the details in the case of identical parents (which is the case in which normality is most surprising) in Appendix G.
We only present our results in the case in which we condition on the parental traits of a single individual in generation t.Just as in the additive case, this can be extended to conditioning on any combination of traits in the pedigree up to generation t − 1, but the expressions involved become unpleasantly complex.Instead of writing them out, we content ourselves with explaining the only step that requires a new argument.We must show that knowing the traits of all individuals up to generation t − 1 does not provide enough information about the allelic states at any particular locus in an individual in generation t to destroy the asymptotic normality of its trait value.This is justified in Appendix H using the fact that, because of Mendelian inheritance, the amount of information gleaned about an allele carried by individual i from looking at the trait value of one its relatives, is proportional to the probability of identity with that individual as dictated by the pedigree.

Asymptotic normality conditional on the pedigree
We first illustrate the application of the generalized Central Limit Theorem by showing that in the ancestral population, the distribution of (Z 1 0 , . . ., Z N0 0 ) is multivariate normal with mean vector (̅ z 0 , . . ., ̅ z 0 ) and variance-covariance matrix (σ 2 A + σ 2 D ) Id, where Id is the identity matrix and σ 2 A and σ 2 D were defined in Table 1.
To prove this, it is enough to show that for any choice of N0 j=1 β 2 j .We apply Theorem D2, due to Rinott (1994), which provides control of the rate of convergence as and we abuse notation by writing Ψ j l for this quantity in the jth individual in generation zero.Set E l =  N0 j=1 β j Ψ j l .Recalling our assumption that all η l and ϕ l are bounded by some constant B, so that the sum of the scaled effects at each locus is bounded by 3B, we have that |E l | is bounded by 3B‖β‖ 1 for all l.Moreover, since the individuals that found the pedigree are assumed to be unrelated and sampled from an ancestral population in which all loci are in linkage equilibrium, using Equations ( 22) and ( 23), we find that

Infinitesimal with dominance | 15
Theorem D2 then yields Here, N is the cumulative distribution function of a standard normal random variable.The right-hand side can be bounded above by for a suitable constant C. In particular, taking β k = 0 for k ≠ j and β j = 1, we read off that the rate of convergence to the normal distribution of Z j 0 as the number of loci tends to infinity is order 1/ ��� M √ .
Note that the normal approximation is poor if the variance σ 2 A + σ 2 D is small.Exactly the same argument shows that the distribution of (Z 1 , . . ., Z Nt ) of the individuals in generation t converges to that of a multivariate normal, with mean vector (̅ z 0 + ιF 11 , . . ., ̅ z 0 + ιF NtNt ) and variance-covariance matrix determined by Equations ( 10) and (11).
Our proof of asymptotic normality of A i + D i conditional on the observed trait values of parents will exploit that the joint distribution of (A i + D i , Z i [1] , Z i [2] ) is asymptotically normal, also with an error of order 1/ ��� M √ . This time we show that ) is asymptotically normal for every choice of the vector (β 1 , β 2 , β 3 ) ∈ R 3 .We apply Theorem D2 with where ), with a symmetric expression for Ψ l (i[2]), and )).
Theorem D2 then shows that the difference between the cumulative distribution function of , which can be deduced from the expressions for the variance and covariance of and Φ i l that are calculated in Appendix B and recorded in Equations ( 10), ( 11), and ( 16).

Conditioning on trait values of the parents
We suppose that for each i, we know the parents of the individual i and their trait values Z i [1] and Z i [2] .We shall treat the shared components (A i + D i ) and the residuals (R i A + R i D ) separately.Both will converge to multivariate normal distributions which are independent of one another.
Mendelian inheritance ensures that the contributions to R i A + R i D from different loci are independent and so normality becomes an easy consequence of Theorem D2 once we have shown that the information gleaned from knowing the trait values only perturbs the distribution by order 1/ ��� M √ .This is checked in Equation (F7) and the proof then closely resembles the proof in the additive setting of Barton et al. (2017) and so we omit the details.
The proof that (A i + D i ) is normal is more involved as once we condition on the trait values in the parents, the contributions Φ i l for l = 1, . . ., M will all be (weakly) correlated.Our approach uses an extension of Stein's method of exchangeable pairs which we recall in Appendix D and apply to our setting in Appendix G.This calculation is more delicate, but the key is that our conditioning induces very weak dependence between loci.The deviation from normality is controlled by 1 and the corresponding quantity for the partial derivative with respect to z 2 (both to be interpreted as ratios of densities) evaluated at  Z i [1] ,  Z i [2] respectively.(We recall that  Z denotes observed trait value.)The normal approximation will break down if the trait values are too extreme or if the pedigree is too inbred.

Discussion
The essence of the infinitesimal model is that the distribution of a polygenic trait across a pedigree is multivariate normal.
Necessarily, if some individuals are selected (that is, if we condition on their trait values), there can be an arbitrary distortion away from Gaussian across the population.However, conditional on parental values and on the pedigree, offspring within each family still follow a Gaussian distribution.This was shown in Barton et al. (2017) in the purely additive case, and is extended here to the case with dominance; the only difference being that with dominance, the part of the trait shared by all siblings, A + D, is now still random even when conditioning on the parental traits (observing the parental traits does not give us full information on the contribution of the parental alleles to the average offspring trait as it did in the purely additive case), and the most difficult part of our analysis consists in showing that this shared contribution is also Gaussian.Our results strongly rely on our assumption that inbreeding depression, ι, is finite (it is zero in the purely additive case).Armed with these results, the classic theory for neutral evolution of quantitative traits can be used to predict evolution, even under selection.Theorems 4 and 5 show that this infinitesimal limit holds with dominance, at least over timescales of order square root of the number of loci.Indeed, they show that conditional on the parental traits, the distance between the distributions of the components of the offspring trait and a normal distribution is of the order of 1/ ��� M √ .Hence, the distance between the trait distribution of an individual and the infinitesimal approximation increases in every generation by a factor of order 1/ ��� M √ , and the error bound becomes macroscopic (i.e.order 1) after of the order of ��� M √ generations.Our work provides some mathematical justification for the ubiquity of the Gaussian, and the empirical success of quantitative genetics-a success which is remarkable, given the complex interactions that underlie most traits.The limit is not universal: a nonlinear transformation of a Gaussian trait leads to a non-Gaussian distribution, and failure of the infinitesimal model.This is because epistatic and dominance interactions then have a systematic direction, which violates the terms of the Central Limit Theorem.(Recall that in our toy example in the section on modeling Mendelian inheritance, we needed a "balance" in the dominance component, which we see reflected in our main results in the requirement that ι be well defined.)Nevertheless, if the population is restricted to a range that is narrow relative to the extremes that are genetically possible, then the infinitesimal model may be accurate, even if the genotype-phenotype map is not linear.This links to another way to understand our results: if very many genotypes can generate the same phenotype, then knowing the trait value gives us negligible information about individual allele frequencies.To put this another way, the infinitesimal limit implies that selection on individual alleles is weak relative to random drift (N e s ∼ 1), so that neutral evolution at the genetic level is barely perturbed by selection on the trait (Robertson 1960).
If traits truly evolve in this infinitesimal regime, then it will be impossible to find any genomic trace of their response to selection.This extreme view is contradicted by finding an excess of "signatures" of selection in candidate genes, though it might nevertheless be that these signals are generated by alleles with modest N e s, such that the infinitesimal model remains accurate for the trait.Indeed, Boyle et al. (2017) argue that the very large numbers of single nucleotide polymorphisms that are typically implicated in genome-wide association studies for complex traits implies an "omnigenic" view, in which trait variance is largely due to genes with no obvious functional relation to the trait.Frequencies of nonsynonymous and synonymous mutations suggest that selection on deleterious alleles is typically much stronger than drift (N e s ≫ 1; Charlesworth 2015).However, it might still be that selection on the focal trait is comparable with drift, even if the total selection on alleles is much stronger.Whether the infinitesimal model accurately describes trait evolution under such a pleiotropic model is an interesting open question.
In principle, we can simulate the infinitesimal model exactly, by generating offspring from the appropriate Gaussian distributions.For the additive case, this is straightforward, since we only need follow the breeding value of each individual, and the matrix of relationships amongst individuals (e.g.Barton andEtheridge 2011, 2018).However, to simulate the infinitesimal model with dominance, we need to track four-way identities, which is only feasible for small populations (<30, say).
We have not set out the extension of the infinitesimal model to structured populations in detail.In principle, this just requires that we track the identities within and between the various classes of individual.One motivation for the present theoretical work was to extend our infinitesimal model of "evolutionary rescue" (Barton and Etheridge 2018) to include inbreeding depression and partial selfing.This should be feasible, provided that we do not need to track identities between specific individuals, but instead, group individuals according to the time since their most recent outcrossed ancestor-an approach applied successfully by Sachdeva (2019).Already, Lande and Porcher (2015) applied the infinitesimal model to a deterministic model of partial selfing, while Roze ( 2016) analyzed an explicit multilocus model of partial selfing, allowing for dominance and drift, assuming that all loci are equivalent, and that linkage disequilibria are weak.
One of the most obviously unreasonable assumptions of the classical infinitesimal model, and the extension described here, is that there are an infinite number of unlinked loci.Santiago (1998) showed how loose linkage could be approximated by averaging over pairwise linkage disequilibria.In the additive case, the infinitesimal model can be defined precisely for a linear genome, by assuming that very many genes are spread uniformly over the genome (Sachdeva and Barton 2018).The techniques used in our approach are not robust to (even moderately) high levels of linkage, as groups of genes passed on together will decrease the number of "independent" units of heritable contributions to the trait value, leading to an effective number of loci M eff too low for the Gaussian approximation to be valid (or more precisely, for the bound between the trait distribution and the appropriate Gaussian distribution in Theorems 4 and 5 to be small).In this case, one needs to consider explicit models of recombination that are out of the scope of this work.
The main value of the infinitesimal model may be to show that trait evolution depends on only a few macroscopic parameters; even if we still make explicit multilocus simulations, this focuses attention on those key parameters, and gives confidence in the generality of our results.Quantitative genetics has developed quite separately from population genetics.Although the theoretical synthesis half a century ago (Robertson 1960;Bulmer 1971;Lande 1975) stimulated much subsequent work (empirical as well as theoretical), the failure to find a practicable approximation for the evolution of the genetic variance (e.g.Turelli and Barton 1994) was an obstacle to further progress.The infinitesimal model provides a justification for neglecting the intractable effects of selection on the variance components, and treating them as evolving solely due to drift and migration.This approach may be helpful for understanding evolution in the short and even medium term. where The quantity F * kl is the probability of identity of two genes drawn independently from individuals k and l (this independent drawing corresponds to Mendelian inheritance); if k = l, then we may either pick the same gene twice, which happens with probability 1/2 (and since the two genes are identical, they are also identical by descent), or pick the two genes of individual k, again with probability 1/2, and their probability of identity by descent is then F kk by definition.Restating (A1) in words, the probability that a gene taken in individual i and a gene taken in individual j, both in generation t, are identical by descent is equal to the sum over all potential pairs (k, l) of parents in the previous generation (t − 1) of the probability that the gene in i descends from k, the gene in j descends from l and that the "parental" genes in k and l are themselves identical by descent.

Calculating two-, three-, and four-way identities
Several papers have developed algorithms for calculating identity coefficients, given a pedigree (Karigl 1981;Abney 2009;García-Cortés 2015;Kirkpatrick et al. 2019).These assume a single genetic locus, and primarily consider the nine condensed identity coefficients of Fig. B1 that describe the relationship between two diploid individuals.This body of work has developed algorithms that can efficiently calculate identity coefficients involving two individuals, across large pedigrees.Karigl (1982) considers (but does not implement) calculation of identities amongst more than two individuals.
Here, we define and implement a (fairly) simple algorithm that deals with multiple sets of genes across multiple individuals.The corresponding code in Mathematica can be found in supplementary material (Barton 2023).This is unlikely to be as efficient as existing algorithms for identities amongst one set of genes across two individuals; it is limited by the need to calculate and store identities amongst very many sets of ancestral genes, corresponding to the very many routes by which genes may descend through the pedigree.
First, we establish our notation.The two genes in each individual each receive a separate label.Thus, a gene in individual i will have label i = {i, 1} or i = {i, 2}.Sets of genes will be generically denoted by S = {i 1 , . . ., i k }.We define F[S 1 , S 2 , . . ., S n ] to be the probability that the genes contained in each set S 1 , S 2 , … , S n are identical by descent, tracing back to n distinct founders in the ancestral population.For example, F[{i 1 }, {i 2 , i 3 }, {i 4 , i 5 }] is the probability that these three sets of genes, S 1 = {i 1 }, S 2 = {i 2 , i 3 } and S 3 = {i 4 , i 5 }, each trace back to three distinct founders: one ancestral to i 1 , another one ancestral to i 2 and i 3 , and a last one ancestral to i 4 and i 5 .Necessarily, F[{i}] = 1 (a single gene traces back to a unique founder), and the probability of identity of genes i 1 and i 2 satisfies Identities in generation t are denoted F t .
Given the pedigree, the identities are defined recursively; F t is a linear combination of identities F t−1 in the previous generation.Here, we simply outline the algorithm.A detailed explanation in terms of the Mathematica code is in the supplementary material (Barton 2023).
In generation t = 0 all individuals are assumed unrelated and so F 0 [S 1 , . . ., S n ] is set to be 1 if each S k comprises a single gene and these n genes are all distinct.Otherwise it is set to zero.
The algorithm proceeds in two steps, first identifying the possible parents from which each gene is descended and then the possible genes within that parent.In this way, a list of all possible scenarios is generated, with each scenario having equal probability.A slight twist here is that if a set contains a single gene in a given individual, that gene traces back to one or other parent of the individual, with equal probability; two genes in the same individual must trace back to the two parents, although those may be the same individual if there is selfing.This list contains many permutations that are equivalent, differing only by order; these are tallied to reduce the number of configurations that need to be stored, resulting in a weighted list.This gives a recursion back to the founder generation.The number of generations and size of pedigree is limited by the amount of memory needed to store the intermediate lists.

Appendix B: Conditioning on the pedigree
In this section, we illustrate how to recover the expressions for the mean and variance of the two parts (A i + D i ) and (R i A + R i D ) of the trait of individual i from identity coefficients of its parents i[1] and i[2] and the classical coefficients of Table 1.Covariances between families are calculated in the same way.We also calculate the covariance between (A i + D i ) and Z i [1] and Z i [2] (given the pedigree) which will be important for establishing the effect of conditioning on the trait values of the parents.Although these expressions are well known, it seems to be hard to find an explicit derivation such as that presented here.Note that at this stage we are only conditioning on the pedigree, not on the observed trait values and the results in this section do not require us to assume the presence of an environmental noise term.

Notation
Throughout this section, we are going to be calculating quantities conditional on the pedigree.We shall suppress that in our notation.

Mean and variance of A i + D i
The contribution to the trait Z i from the lth locus is determined by the four alleles χ , and χ i[2],2 l and the independent Bernoulli random variables X i l and Y i l .The mean and variance of (A i + D i ) and (R i A + R i D ) will depend on which combinations of these alleles are identical.First, we introduce some notation for the nine possible identity classes.In Fig. B1, the two copies of each gene in each individual are represented by two (horizontally adjacent) dots.Lines between dots represent identity by descent.It is convenient to think of the genes within an individual as being ordered.
For each of the nine possible identity classes between i[1] and i[2], we calculate two quantities from which the mean and variance of To see where these expressions come from, consider for example identity state Δ 3 , with, say, = : χ 2 l , where "=" here means identical by descent.Then, using Equations ( 21)-( 23), The following quantities can be calculated in the same way.They are important for calculating the covariance between the trait values of parent and offspring (in particular the covariance between (A i + D i ) and Z i [1] and Z i [2] ) which will dictate the change in distribution of the trait values within families arising from conditioning on knowing the traits of the parents.We record them here for later reference. id We can express two-and three-way identities between the parents in terms of the four-way identities Δ 1 , . . ., Δ 9 .Recall that we write, for example, F 11 for the probability of identity of the two genes in i[1] and F 12 for the probability of identity of two genes, one selected at random from i[1] and one from i [2].In terms of the nine identity states, we have Combining the above, we find with a symmetric expression for 1 and the expression (6) for the variance of (A i + D i ) follows.
Remark B1.Walsh and Lynch (2018) give an expression for the variance when there is linkage disequilibrium.In their notation, f is the probability of identity at two distinct loci.Then for l ≠ m, so that our expression for will be multiplied by ( f /F 2 12 ), resulting (when we subtract Correcting for this by adding ( f − F 2 12 )(ι 2 − ι * ) to our expression (11) for the variance of Z i (for which we recall that F 12 becomes F ii ), we recover the expression of Walsh and Lynch (2018).

The covariance between
To understand the expression (7) for the covariance between A i + D i and A j + D j for i ≠ j, consider The 16 terms corresponding to products of additive effects correspond to the 16 different possibilities for the allelic types at locus l if we choose one allele at random from individual i and one from individual j, and the contribution to the expectation will be nonzero precisely if the chosen alleles are identical, in which case they contribute E[η l ( χ l ) 2 ].Summing over l, the overall contribution of such terms to the covariance will therefore be 2σ 2 A F ij .Similarly, terms involving one factor of η l and one ϕ l will only be nonzero if all evaluated on the same allelic type, hence the terms multiplied by F iij and F ijj in Equation ( 7).
Continuing in this way and using that E[A i + D i ] = ιF ii , we recover Equation (7).

The residuals
The corresponding calculations for the mean and variance of the residuals, R i A + R i D follow exactly the same pattern.It is convenient to consider R i A and R i D separately, and then calculate the covariance.The first of these, corresponding to the additive part is very straightforward since it is only going to depend on pairwise identities.
Recall first that Since the Mendelian inheritance is independent of the allelic states, R i A has mean zero; to establish the variance, we must calculate its square.Since inheritance is independent at distinct loci, only the diagonal terms contribute and we find This is, of course, exactly the expression we would obtain in the purely additive case.The second residual, R i D , also has mean zero, but its variance will now involve higher order identities.Recall that Once again, since Mendelian inheritance is independent at different loci, E[(R i D ) 2 ] will be entirely determined by the diagonal terms.
Note that for independent Bernoulli (parameter 1/2) random variables X and Y, and So, taking expectations over the variables X i l and Y i l , we find The first term depends only on pairwise identities and we see immediately that it is The second term in Equation (B2) is most easily calculated conditional on identity class.Let us write Ξ(l) for the summand corresponding to locus l.
Using our notation for identities, this becomes Thus, The covariance of . We need to establish the mean of We have been able to drop the "−1/4" terms in the second bracket and so the mean of Equation ( B3) is that of Taking expectations (conditional on the pedigree) and summing over loci, we find Finally, for two distinct parents, we have found that in generation t, conditional on the pedigree up to time t, We can also read off the result for when the two parents are the same from this formula.In that case and  F 1212 = 1 − F 11 .

Appendix C: Conditioning multivariate Gaussian vectors
For ease of reference, we record here a standard result for conditioning multivariate normal random vectors on their marginal values.
Theorem C1.Suppose that The proof can be found, for example, in Brockwell (1996, Proposition 1.3.1 in Appendix A).

Appendix D: Generalized central limit theorems
We shall exploit known techniques for proving both convergence to a normal distribution, and for establishing the rate of convergence, in situations which go beyond the classical setting of independent identically distributed random variables.For convenience we recall the key results that we need here.
We begin with a result of Rinott (1994) on the rate of convergence in a generalized Central Limit Theorem; generalized because the summands are not identically distributed and it allows some dependence between elements in the sum.We do not use this second feature here, but it would be needed to extend our results to include effects that depend on more than one locus, and so for completeness we include it in the statement of the result.It also gives an idea of how quickly the rate of convergence deteriorates if one includes epistasis or higher order dominance effects.This result can be used both to prove asymptotic normality when we condition only on the pedigree (and not on any observed trait values), and to prove asymptotic normality of the residuals (that is the part of the trait distribution within families that is not shared among offspring) conditional on the observed traits of ancestors in the pedigree.
The dependence is captured by a dependency graph.
Definition D1.Let {X l ; l ∈ V} be a collection of random variables.The graph G = (V, E), where V and E denote the vertex set and edge set respectively, is said to be a dependency graph for the collection if for any pair of disjoint subsets A 1 and A 2 of V such that no edge in E has one endpoint in A 1 and the other in A 2 , the sets of random variables {X l ; l ∈ A 1 } and {X l ; l ∈ A 2 } are independent.
The degree of a vertex in the graph is the number of edges connected to it and the maximal degree of the graph is just the maximum of the degrees of the vertices in it.
Theorem D2. (Rinott 1994, Theorem 2.2) Let E 1 , . . ., E M be random variables having a dependency graph whose maximal degree is strictly less than D, satisfying where N is the distribution function of a standard normal random variable.
In particular, when D and B are order one and σ 2 is of order M, the bound is of order 1/ ��� M √ .Since we are only allowing for dominance effects that depend on allelic states at a single locus, and we have no epistasis, our dependency graphs will have no edges and so the maximal degree of any vertex will be zero and we may take D = 1.Epistasis or higher order dominance effects, will increase the degree.This bound on the accuracy of the normal approximation will decrease rapidly as the number of combinations through which the allelic state at a single locus can influence the trait grows.

Exchangeable pairs
In order to prove the asymptotic normality of the part of the trait value that is shared by all the offspring in a family conditional on parental traits, we require a different approach.Because we are conditioning on the trait values of the parents, there will be weak dependence between all the pairs of loci within the sums defining A i + D i (and so the dependency graph for the summands would be the complete graph).To check that nonetheless the limit is Gaussian we shall use a variant of Stein's method of exchangeable pairs, originally introduced in Stein (1986).
Recall that the pair of random variables (W, W ′ ) is called an exchangeable pair if their joint distribution is symmetric.Suppose Infinitesimal with dominance | 23 for some 0 < λ < 1, where R is a random variable of small order.
Let us write Δ = W − W ′ and define In this case, one can show (see Chen et al. 2011, §2.3) that Proposition D3. (Chen et al. 2011, Proposition 2.4i) Let h be an absolutely continuous function with ‖h ′ ‖ < ∞, and F any σ-algebra containing σ(W).If Equation (D1) holds, then , where Corollary D4.Suppose that (W, W ′ ) is an exchangeable pair with where R is a random variable of small order.Then defining  K 1 ,  K 2 , h and F as in Proposoition D3, where N μ W ,σ 2 W denotes the distribution of a normal random variable with mean μ W and variance σ 2 W .
Remark D5.Although this result is enough to guarantee that W is asymptotically normal, because we require ‖h ′ ‖ < ∞, it is not enough to bound even the distance between the cumulative distribution function of W and that of a standard normal random variable with an error of order 1/ ��� M √ .To propagate our argument from one generation to the next requires convergence of the density function of the observed trait value, and once again it is our assumption that there is some environmental noise (with a smooth density) that allows us to guarantee this convergence based on the result proved here.

Appendix E: Key lemmas
Notation E1.Throughout the rest of the appendices, to ease the notation we shall assume that the (Gaussian) environmental noise is subsumed into the trait value Z, so that its distribution can be assumed to have a smooth density.That is, what we call Z below is the observed trait  Z discussed in the main text.Moreover, when we write P[Z = z], we actually mean the density function of the distribution of  Z evaluated at the value z (in formula, P[Z = z] : = φ  Z (z) with φ  Z the density of  Z).This notation allows us to cover both the case when the allelic distributions are general (potentially concentrated on a finite number of values) and the environmental component is smooth enough that the distribution of their sum is also smooth, and the case when there is no environmental noise but the scaled allelic distributions have a smooth density over [-B,B] (in which case the distribution of the genetic component Z is itself smooth enough for the method below to be employed).
In this section, we prove two key lemmas which will underpin our proof.They will allow us to estimate the effect on the distribution of the allelic types at a particular locus, or particular pair of loci, of knowing the trait value.We shall be using Bayes' rule.With a slight abuse of notation Let us write Ψ l (x, x ′ ) = η l (x) + η l (x ′ ) + ϕ l (x, x ′ ) and Z −l for the trait value of an individual with the effect of locus l removed, then the ratio in this expression becomes Of course, this ratio of probabilities should be interpreted as a ratio of density functions.Moreover, bearing in mind our remarks on environmental noise, we are going to suppose that these density functions are sufficiently smooth that we can justify an application of Taylor's Theorem.Of course, we know that Z −l is approximately normally distributed, using exactly the same argument as for Z, and it is no surprise that the ratio differs from one by something of order 1/ ��� M √ .The importance of the next lemma will become evident when we sum conditional expectations over loci; c.f. Remark E5.Lemma E2.In the notation above, where the function C l (z) in the error term can be bounded independent of l and z.
Remark E3. (Conditioning on the pedigree) Although we have suppressed it in the notation, this lemma holds in any generation, but the expressions E should be interpreted as being calculated conditional on the pedigree (which will determine the probability of identity of χ 1 l , χ 2 l ).
Proof of Lemma E2.We are going to abuse notation (still further) and imagine that P[χ 1 l = x, χ 2 l = x ′ , Z = z] has a density with respect to x, x ′ .Of course, we do not expect that to be true (even with environmental noise), but it makes our expressions easier to parse than using a more mathematically accurate notation.We begin with an application of Taylor's Theorem (with respect to z): Provided that P[Z = z] has a uniformly bounded third derivative, our assumption that the terms that make up Ψ l are uniformly bounded allows us to deduce that  C l is uniformly bounded in l and z.Notice that the expression in Equation (E2) is just P[Z = z].
Since we are not conditioning on any trait values in the pedigree, and the ancestral population is assumed to be in linkage equilibrium, (χ 1 l , χ 2 l ) and Z −l are independent.Combining this observation with Equation (E1), and, once again applying Taylor's Theorem, we find where the function  C l in the last line is uniformly bounded independent of l and (x, x ′ , z). (To justify this last statement, recall that we are abusing notation and implicitly subsuming the environmental noise into the distribution of Z.The density function here is actually a convolution of that of the environmental noise, which is smooth, and the true distribution of Z, and is therefore smooth.)Still assuming sufficient regularity, differentiating the previous equation we find and C l uniformly bounded.
Finally, substituting Equations ( E5) and (E6) in Equations ( E3) and (E4), we obtain as required.□ We also require an analog of Lemma E2 with which to control the effect of conditioning on the trait value on the distribution of the allelic values at pairs of loci.We write for the trait value with the contributions from loci l and m removed.The following lemma follows on iterating the argument that gave us Lemma E2.Lemma E4.In the notation above, where the functions C l,m (z) are uniformly bounded in l, m, z.

Infinitesimal with dominance | 25
Proof of Lemma E4.We iterate the previous result: now substitute for P[Z −l = z] and its derivatives.□ Remark E5.Just as for Lemma E2, the proof of Lemma E4 applies in any generation as long as one interprets the expectations as being taken conditional on the pedigree.We have assumed that our base population is in linkage equilibrium to write We shall only be presenting the detailed proofs for individuals in generation one.To extend to the general case requires an analog of Lemma E2 when we consider the trait values of the two parents of an individual.For completeness, we record that lemma here.

Lemma E6. Let us use
In the following expression, all expectations should be interpreted as taken conditional on the pedigree: Remark F1.Since we already checked that the trait Z i [1] is approximately normally distributed, and the same argument evidently gives that Z −l is approximately normally distributed for each l, the derivation above may seem unnecessarily complex.However, in summing the terms in Equation (F2) over loci, we exploited the fact that we could pull the ratio P ′ [Z i [1] ]/P[Z i [1] ] outside the sum.Only then did we approximate it by the limiting normal distribution.We could only do this because we expressed everything in terms of the distribution of the whole trait.If we try to approximate the distribution of Z i [1] −l directly by a normal distribution, and then sum, we cannot control the error.We shall use this trick repeatedly in what follows. Similarly, The terms of order one and 1/ ��� M √ vanish as a result of Equations ( 20), ( 21), ( 22), and (23).Multiplying by 1/ ��� M √ and summing over loci, we find that ). Recalling that the trait distribution in the ancestral population is (almost) normally distributed with mean ̅ z 0 , we see that if we ignore environmental effects, so that the variance of the trait distribution in generation zero is σ 2 A + σ 2 D , then adding ̅ z 0 to the right-hand side of Equation (F3), and substituting , we recover that up to an error of order 1/ ��� M √ , the expected trait value among offspring is as predicted by Theorem C1.

Remark F2. (The breeder's equation)
Suppose that as a result of environmental noise, the observed trait of each individual in the ancestral population is its genetic trait plus an independent N (0, σ 2 E ) random variable.Then assuming normality of the ancestral trait distribution, and using Theorem C1, we find that for unrelated parents the mean trait in generation one is ) where σ 2 Z is the total variance of the observed trait in the ancestral population; that is σ F4) is the breeder's equation.

Mean trait value, same parent
We now turn to the expected trait value in a family in generation one that is produced by selfing.The calculation for the additive term is unchanged, but now we have a nontrivial contribution from the dominance component.We denote the parent Z i [1] .Since 1] ] and E[ϕ l (χ 1] ].Our strategy is as before: we express each of these probabilities in terms of the distribution of the trait value minus the contribution from locus l and we apply Lemma E2.Thus, once again using that in generation zero, before conditioning, the two alleles at locus l in Z i [1] are independent draws from  ν l , Using Equations ( 21), ( 22), ( 23), we see that on integration the only nonzero contribution comes from the term η l (x)ϕ l (x, x) which can be integrated to yield 1] ] P[Z i [1] ] E[η l ( χ l )ϕ l ( χ l ,  χ l )] + O 1 M   .

􏼒 􏼓 .
Multiplying by 1/ ��� M √ and summing over loci, we find that the mean of the term D i in Equation ( 29), conditional on i and on knowing the trait value Z i [1] , is 1] ] P[Z i [1] ] 1 2M Adding on the additive terms that we calculated before and restating everything in terms of the quantities in Table 1, we obtain that for two identical parents Notice that the factor of 1/2 in front of ι is the probability of identity F * of the two genes in the offspring.
Of course, there is no surprise here: Thus, up to the error term, Equation (F5) is just Var(Z i [1] ) , as we expect from the (approximately) bivariate normal distribution of (A i + D i ) and Z i [1] .

generation one
We now turn to the variance of the shared parental contribution.This is where the complications associated with incorporating dominance really start to be felt.In the process of calculating the conditional mean above, we established that conditioning on the parental trait values (and whether or not they are identical) distorts the distribution of the allelic state at a given locus by a factor of order 1/ ��� M √ .This distortion is enough to shift the mean trait (as we see in the breeder's equation), and, as we shall see, the variance of the sum over loci will have a contribution from linkage disequilibrium.

Conditional variance (A i + D i ), generation one, same parent
First we consider the case in which the parents are the same.We need to calculate the expectation of (A i + D i ) 2 conditional upon the parental trait.We begin with the "diagonal" terms, corresponding to a single locus.We take these in three parts.First, proceeding as before, Notice that the term arising from the Taylor expansion is already of order 1/ ��� M √ , and, since we multiply each of the terms in the sum by 1/M, we have no need to develop the expansion further.Indeed, all terms in the expression for the variance will be multiplied by 1/M and so for the "diagonal" terms in the square of the sum, we only need an expression to leading order.Remark F3.The error that we are making in discarding the terms arising from the Taylor expansion is 1/ ��� M √ multiplied by a term that depends on P ′ [Z i [1] ]/P[Z i [1]  1] ])/Var(Z i [1] ).As usual, the approximation will be poor if the trait value of the parent is too extreme, or the variance is too small.As a result, for these terms we can calculate with respect to the distribution in the ancestral population and we find Similarly, recalling that we are still considering the case of identical parents, and )) × (ϕ l (χ Combining all these terms we find that if the parents are identical, then the contribution to E[( 1] ] from the "diagonal" terms is We must now turn to the contribution from correlations across loci.For this, we must compute ))(η m (χ 1] ] P[Z i [1] ] There are four terms of this form, and so multiplying by 1/(16M) and summing gives (σ 2 D ) 2 4 P ′′ [Z i [1] ] P[Z i [1] ] . (F19) Combining Equations (F17), (F18), and (F19), we find that Equation (F11) is (F20) Adding Equations (F8), (F13), (F16), and (F20) yields E[(A i + D i ) 2 ], and subtracting the square of Equation (F5), we obtain Now if we substitute the Gaussian density for Z i [1] , observing that P ′′ [Z i [1] ] P[Z i [1] ] − P ′ [Z i [1] ] P[Z i [1] ] ) (and since Φ l is uniformly bounded independent of l, the error is bounded independent of l), and then averaging out over l as in the definition of T(  W), on substituting Equation (G5) and Cov(Z, W) = ρσ Z σ W , we find Using the approximation for the conditional distribution of (χ 1 l , χ 2 l ), given Z obtained in Appendix E, so we can rewrite Equation (G6) as Substituting in Equation ( G1), We are going to apply Corollary D4 to (  W,  W ′ ) with F = σ(  W).We set λ = 1/(M(1 − ρ 2 )) and observe from Equation (G7) that we may take a remainder term R with E[|R|] of order 1/M 1/2 in Equation (D3).Moreover, and so, since by construction |Δ| < C/ ��� M √ , E[  K 2 ] is also order at most 1/M 1/2 .
Since with these definitions it remains to control Again using the results of Appendix E, (the first term being the conditional variance if the random variables were distributed exactly as a bivariate normal), whereas (Note that we see the unconditioned σ 2 W in this second expression since it involves only diagonal terms.) To control Equation (G8), observing that, by the Cauchy-Schwarz inequality, it suffices to control In particular, we should like to show that this expression is of order O(1/M).Now we use the standard decomposition of conditional expectations: for two random variables X and F, For us, X = M(  W −  W ′ ) 2 = (Φ L − Φ * L ) 2 , and F =  W, so 2 , and we seek

Fig. 1 .
Fig.1.Three-and four-way identities.Lines indicate identity by descent between genes.See the main text for further explanation.

Fig. 2 .
Fig. 2. Changes of the mean and variance of the additive part of the trait, the dominance part, and their sum over 50 generations of neutral evolution.The top row shows a single replicate, while the bottom row shows the average over 300 replicates using the same sequence of individuals spanning the 50 generations.The left column shows the means ( ̅ G = ̅ A + ̅ D, ̅ A, ̅ D; black, or middle curve; blue, or bottom curve; red, or top curve), while the right column shows the variance components (V G = Var(G), V A = Var(A), V D = Var(D), V A,D = Cov(A, D); black, or top curve; blue, or middle top curve; red, or middle bottom curve; purple, or bottom curve).On the right, solid lines show the total variances and covariance, while the dashed lines show the genic component.These differ through the contribution of linkage disequilibrium, which generates substantial variation.The genic component changes smoothly, as expected with a large number (M=1,000) of loci.With M=1,000 loci, we expect the infinitesimal model to be accurate for about ��� M √ ∼ 30 generations.Simulations are made on a single pedigree with 30 individuals; variance components are measured relative to the ancestral population.The predicted values for these means and variances under the infinitesimal model are given in Equations (12)-(14) (note that the identity coefficients F ii increase through time due to genetic drift).

Fig. 3 .
Fig. 3.The relation between the dominance deviation and the probability of identity of the two genes within an individual.There is one point for the average over 1,000 replicates for each of the 30 individuals in generations 5, 10, 20, 40 (black, or left-most group of points; blue, or second left-most group; purple, or second right-most group; red, or right-most group).(Recallthat the pedigree is fixed, so identities are the same for each replicate.)The mean of D decreases as ιF ii = −0.53Fii (solid line), in accordance with Equation (12).

Fig. 4 .
Fig. 4. The variance and covariance of A and D versus identity F ii for individuals in the pedigree.As in Fig. 3, there are 30 points in each generation, one corresponding to each of the 30 individuals in the population.Generations 5, 10, 20, 40 (black, or left-most group of points; blue, or second left-most group; purple, or second right-most group; red, or right-most group).Here, again we use the shorter notation V A = Var(A), V D = Var(D), V A,D = Cov(A, D) and the theoretical predictions were derived in Equations (13) and (14).

Fig. 5 .
Fig. 5.The variance and covariance within families between the residual additive and dominance deviations R A and R D(V RA = Var(R A ), V RD = Var(R D ), V RA ,RD = Cov(R A , R D )).One hundred pairs of parents were chosen at random from the ancestral population and from each one thousand offspring were generated.The within-family variances obtained in this way were averaged over 10 replicates (with the same pedigree and parents).Each of the 100 points in each plot corresponds to one pair of parents.The five outliers are families produced by selfing.The blue lines (or top lines) show a least-squares regression; the red lines (or bottom lines) are the theoretical predictions [see Equation (8)].The two lines exactly coincide in the plot on the right.

Fig. 6 .
Fig. 6.Comparison between a neutral population (dashed lines) and one subject to truncation selection (solid lines).Top row: change in means relative to the initial value (G = A + D, A,D; black, or top curves; blue, or middle curves; red, or bottom curves); middle: variances, including linkage disequilibria (top to bottom:V G = Var(A + D), V A = Var(A), V D = Var(D), V A,D = Cov(A, D); black, blue, red, purple).The bottom row is the changes to genic variances with time against predictions of the infinitesimal model.The values are averages over 300 replicates for the neutral case, 1,000 for the selected case, made with the same pedigree.There are M = 1,000 loci, and thus we expect the infinitesimal model to be accurate for about ��� M √ ∼ 30 generations.Selection is made within families; for each offspring, two individuals are generated from the corresponding parents, and the one with the larger trait value retained.

Fig. 7 .
Fig. 7. Convergence of the variance components at 50 generations, as the number of loci increases from M = 100 to M = 10 4 (same notation as in Fig.6).Simulations with 50% truncation selection are compared with neutral simulations (solid, dashed lines).The replicate simulations were generated as in Fig.6(see main text).Regressions of the log absolute difference between selected and neutral variance components against ln (M) have slopes −0.62, −0.72, −0.70, −0.66 for V G , V A , V D , V A,D , respectively (see supplementary material for details).Thus, convergence is somewhat faster than

Fig. 8 .
Fig.8.The distributions of the residual (top row: R A , R D ) and shared (bottom row: A, D) components of phenotype (M = 1,000 loci); for each, the cumulative distribution function is plotted as standard deviations of a Gaussian, z, so that a normal distribution appears as a straight line.These are calculated from families of 1,000 offspring, from multiple pairs of parents, each replicated 10 times, drawn after 20 generations without selection.The residuals are calculated by subtracting values from the family mean, and pooling across the 10 replicates.Thus, for each family there are 10,000 values; the cumulative distribution function is shown for 10 pairs of parents, in 10 colors.The shared component is calculated by taking the mean of each family, and pooling across 100 pairs of parents and across the 10 replicates.Thus, for each plot there are 1,000 points.There is now some deviation from a Gaussian.
Fig. B1.All possible four-way identities.The dots represent the four genes across the two parents (each parent corresponding to a row) and lines indicate identity (c.f.Abney et al. 2000).