Genomic prediction of hybrid crops allows disentangling dominance and epistasis

Abstract We revisited, in a genomic context, the theory of hybrid genetic evaluation models of hybrid crosses of pure lines, as the current practice is largely based on infinitesimal model assumptions. Expressions for covariances between hybrids due to additive substitution effects and dominance and epistatic deviations were analytically derived. Using dense markers in a GBLUP analysis, it is possible to split specific combining ability into dominance and across-groups epistatic deviations, and to split general combining ability (GCA) into within-line additive effects and within-line additive by additive (and higher order) epistatic deviations. We analyzed a publicly available maize data set of Dent × Flint hybrids using our new model (called GCA-model) up to additive by additive epistasis. To model higher order interactions within GCAs, we also fitted “residual genetic” line effects. Our new GCA-model was compared with another genomic model which assumes a uniquely defined effect of genes across origins. Most variation in hybrids is accounted by GCA. Variances due to dominance and epistasis have similar magnitudes. Models based on defining effects either differently or identically across heterotic groups resulted in similar predictive abilities for hybrids. The currently used model inflates the estimated additive genetic variance. This is not important for hybrid predictions but has consequences for the breeding scheme—e.g. overestimation of the genetic gain within heterotic group. Therefore, we recommend using GCA-model, which is appropriate for genomic prediction and variance component estimation in hybrid crops using genomic data, and whose results can be practically interpreted and used for breeding purposes.


Introduction
Many plant species are presently cultivated in the form of singlecross hybrid varieties, especially when a strong heterosis effect is observed for yield-related traits (e.g. maize, sunflower, sugarbeet, etc.). These hybrids are generally obtained by crossing inbred lines originated from two complementary populations, called heterotic groups. Breeders' objective is therefore to identify (i) the best singlecross hybrids among all possible crosses between existing inbred lines from the two groups and (ii) create new lines within heterotic group, from crosses of existing lines, which will improve the performance of candidate hybrids at a next generation. Models for genetic improvement of hybrid crops (e.g. maize) across two heterotic groups are typically based on the notions of general combining ability (GCA) and specific combining ability (SCA) (Griffing 1962;Stuber and Cockerham 1966;Bernardo 2010). The genotypic value G ij of the cross of lines i and j, as a function of uniting gametes from i and j, can be written as follows: where GCA of line i is the average effect of a gamete when ideally crossed to all gametes from the reciprocal heterotic group. SCA of the combination of line i and j is the remainder.
It is important to notice, for readers not familiar with hybrid crops, that in many hybrid crops such as maize, parents are pure homozygous individuals (inbred lines). Thus, all gametes produced by i (and j) are identical, and all F1 descendants of i and j are identical. This is different from crosses of other species such as animals (pigs for instance) where full-sibs show genetic variation. As a result, GCA contains single locus (additive, in the statistical sense) and multiple loci (additive by additive and higher additive interactions) effects. This is because the whole genotype (gamete) of the pure line is transmitted to the F 1 descendants, including any possible epistatic combination, and regardless of whether loci in interaction are in the same or in different chromosomes. In this, GCA is different from the concept of Breeding Value in Animal Genetics, which captures the part of functional epistatic effects that is contained in the additive substitution effects, but it does not contain epistatic deviations as they are broken down by meiosis.
Informally, the GCAs within group 1 (group 2) are the sum of additive, additive x additive, additive x additive x additive. . . deviations within group 1 (group 2), whereas SCA are the sum of dominance, all epistatic interaction involving dominance, and any epistatic additive interaction across both groups. Stuber and Cockerham (1966) presented the covariance across GCAs of different lines and SCAs of different pairs of lines as a function of probabilities of alleles at loci being identical by descent. These probabilities are, in their work, implicitly based on pedigree, and are the same across all loci or pairs of loci. For this reason, and based on pedigree of lines alone, some components of the variance cannot be distinguished; for instance, dominance and across-groups additive epistasis cannot be separated (Bernardo 2010).
With the advent of molecular markers (SNPs), it is possible to obtain better predictions based, ultimately, on similarity across lines (or pairs of lines) measured using markers, using for instance kinship matrices based on VanRaden (2008). Bernardo (1994) transposed the concepts of Stuber and Cockerham (1966) to the prediction of hybrid lines of corn using molecular markers. These ideas have been used extensively (e.g. Massman et al. 2013;Technow et al. 2014;Bouvet et al. 2016;Kadam et al. 2016;Acosta-Pech et al. 2017;Westhues et al. 2017;Schrag et al. 2018). However, the infinitesimal assumptions of Stuber and Cockerham (1966) are not needed or completely pertinent anymore. SNP markers allow finer distinction of patterns of relationships across and within regions. For instance, it is possible to distinguish relationship within locus and across loci-thus making it possible to split dominance and across-groups epistasis. Also, relationships across dominance deviations are not simple expressions built from additive relationships (Vitezica et al. 2013(Vitezica et al. , 2017. Therefore, it is important to properly re-define statistical models for genomic prediction, in order to have models that are more adequate to the physical nature of the genome (finite and not infinitesimal), better understood, and potentially more accurate. The model including a GCA component for each group is convenient because individuals are selected within groups.
In this work we develop a new model, called GCA-model, which corresponds to Stuber and Cockerham (1966) and to Bernardo (2010) ideas, and whose theory we re-develop from scratch for the case of markers. We rederive orthogonal models for genomic prediction in hybrid crops using the notions of effects defined "according to origin" (GCAs and SCAs), using quantitative genetics theory and considering the substitution effects of markers. We present expressions for additive, dominant and epistatic relationships across pure lines and hybrids. Then, as an illustration, we use our method, in an increasing order of complexity, to estimate variance components and predict hybrid performance using a publicly available data set (Technow et al. 2014). We compare results of our orthogonal model (GCA-model) with effects defined "according to origin" to the model with effects "defined uniquely" at the hybrid level (G-model), and to the existing formulation of Technow et al. (2014) that we show to be a simplification of our approach.

Theory
Here we derive the theory for genomic relationship matrices in the analysis of hybrid crosses of inbred lines from two populations. We draw on the tradition of separately modeling effects of gametes coming from each heterotic group (Sprague and Tatum 1942;Griffing 1962;Stuber and Cockerham 1966;Bernardo 2010).
To derive the genetic model, an ideal (issued from random mating of heterotic pools), and large population of hybrids was assumed. In this way the content of a random sample of gametes depends on the allele frequency in the population. This is the traditional treatment in Quantitative Genetics. The two parental populations or heterotic groups (e.g. Dent and Flint in the case of North European maize) are named 1 and 2. Extension to more than two heterotic groups is immediate.
Our aim is to split the total genotypic value of a single-cross hybrid in two statistical additive effects (one from each group), a single dominance deviation (particular to each cross) and epistatic interactions (either intra-group or across-groups). Ideally, this partition is orthogonal and all components should be estimable. Orthogonality is a definitional system that guarantees that variance components are "a priori" (before seeing the data) independent, whereas practical orthogonality depends on the information available in the data set. To our knowledge, this partition and its use in hybrid crops using markers have not been presented elsewhere.

Additive substitution effects and dominance deviations in hybrid crops
We start from a genotypic model to derive statistical effects (Falconer 1981;Bernardo 2010). Consider a biallelic single locus/ gene with alleles B/b and the allele origin denoted as i ¼ 1 and j ¼ 2, thus population 1 has B 1 and b 1 with respective frequencies p 1 and q 1 ¼ 1 À p 1 and population 2 has B 2 and b 2 with frequencies p 2 and q 2 . The hybrid population has genotypes (frequencies) B 1 B 2 (p 1 p 2 ), B 1 b 2 (p 1 q 2 ), b 1 B 2 (q 1 p 2 ) and b 1 b 2 (q 1 q 2 ).
We assume additive and dominant gene action, and separate effects of gametes coming from 1 and 2. Then the genotypic value G of a hybrid can be written (up to a common constant) as where a 1 is the functional additive effect for B 1 from P1, a 2 is the functional additive effect for B 2 from P2, and d is the functional dominance value of both heterozygotes (B 1 b 2 and b 1 B 2 ). The genetic mean of the hybrid population is therefore Classically, the genotypic values of a hybrid are split into statistical additive values, one per parent, and a dominant deviation for the hybrid, as in (Bernardo 2010): Þ ) is the additive effect of a gamete from population 1 (from population 2) combined with a gamete from population 2 (population 1), whereas g D is the dominant deviation.
Additive values g A 1 ð Þ and g A 2 ð Þ of the gametes include average effects of each gene/allele. The average effect of alleles (a B1 , a b1 , a B2 and a b2 ) are derived from the genotypic values. The reasoning is set out in Table 1 (following Table 7.2 in Falconer (1981)). If gametes carrying B 1 from population 1 are mated at random with gametes from population 2, the frequencies of the genotypes produced will be p 2 of B 1 B 2 and q 2 of B 1 b 2 . The genotypic value of a hybrid B 1 B 2 is G B1B2 ¼ ða 1 þ a 2 Þ, that of B 1 b 2 is G B1b2 ¼ ða 1 þ dÞ, and the mean of these two, taking into account of the proportions in which they occur is E GjB 1 ð Þ¼p 2 a 1 þ a 2 ð Þþ q 2 a 1 þ d ð Þ . For instance, considering allele B 1 , the difference between the mean value conditional on a particular genotype of the gamete (e.g. E GjB 1 ð Þ) and the population mean ðE G ð ÞÞ is the average effect of the allele, a B1 . The average effects of alleles (a B1 , a b1 , a B2 and a b2 ) are therefore Now, we can derive the average effect of the allele-substitution, for instance, letting b 1 be substituted by B 1 . From the b 1 alleles taken at random from the population for substitution, a proportion p 2 will be found in b 1 B 2 genotypes and a proportion q 2 in b 1 b 2 . The substitution will, respectively, change the value from a 2 þ d ð Þ to a 1 þ a 2 ð Þand from 0 to a 1 þ d ð Þ(see Table 1). Thus, the average effect of the allele-substitution ða 1 Þ of population 1 is The same result can be obtained as the difference between average effects: a 1 ¼ a B1 À a b1 . For population 2 ða 2 Þ, it is a 2 ¼ a B2 À a b2 . The average effects of alleles can also be rewritten as function of the allele substitution effect as follows: a B1 ¼ q 1 a 1 , a b1 ¼ Àp 1 a 1 , a B2 ¼ q 2 a 2 and a b2 ¼ Àp 2 a 2 .
Note that allele-substitution effects involve both functional additive ða 1 þ a 2 Þ and dominant ðdÞ effects, and allele frequencies of the other parental population. Similar expressions were presented by Vitezica et al. (2016) but an identical functional additive effect ða ¼ a 1 ¼ a 2 Þ was assumed in both parental populations, ignoring the origin of the allele.
The statistical additive effects of a gamete are equal to the sum of the average effects of the alleles it carries. Thus, for a single locus, the statistical additive effects are This can also be written as g A 1 ð Þ ¼ z 1 a 1 and g A 2 ð Þ ¼ z 2 a 2 with Subtracting statistical additive effects from genotypic values (G B1B2 , G B1b2 , G b1B2 and G b1b2 ) gives dominance deviations which are interactions between the alleles received from parental populations. This is detailed in the Appendix, and the dominance deviation of the hybrid according to its genotype is So, the dominance deviation of a hybrid individual can be written as g D ¼ wd with w ¼ À2q 1 q 2 2q 1 p 2 2p 1 q 2 À2p 1 p 2 for genotypes Or, equivalently, w ¼ À2z 1 z 2 for z 1 , z 2 defined as above. Finally, the model for analysis of hybrid crosses considering additive and dominance effects can be written in matrix form for a set of crosses as

Derivation of additive and dominance genomic relationships
Now we extend the analysis to multiple markers, using Þ are matrices with as many rows as inbred lines present in each heterotic group, and as many columns as the number of markers, nsnp. The matrix W ¼ w . . . w nsnp ð Þ has as many rows as hybrid individuals and as many columns as markers.
Genotypes in pure lines are in matrices M 1 and M 2 which contain zero for genotypes b 1 b 1 and b 2 b 2 , respectively; and 1 for genotypes B 1 B 1 and B 2 B 2 , respectively. The observed B allele frequencies for marker j in the heterotic groups composed by n lines can be computed as p j ¼ P n i¼1 M ij n . Matrices Z are obtained subtracting p (which is equal to centering if p is computed from observed genotypes), as Z 1 ¼ M 1 À 1p 0 for population 1. It is analogous for population 2. Now, we can set up the covariance matrices. Additive covariance matrix for population 1 (for g A 1 ð Þ ) assuming linkage equilibrium, is where r 2 a1 is the variance of the allele-substitution effect ða 1 Þ of population 1. For one locus, if the population 1 is a population of pure lines (individuals are homozygotes), the genetic variance of its gametes g A 1 ð Þ is By construction of the matrices, E g A 1 ð Þ ð Þ ¼ 0 and then we have for one locus the following table of gametes and their effects So, Eðg 2 and Var g A 1 ð Þ ð Þ ¼ p 1 q 1 a 2 1 .
Assuming linkage equilibrium, we generalize this result to all nsnp markers Therefore, we can now divide Var g A 1 ð Þ ð Þ above by this variance and we have is the additive genomic relationship matrix across lines in population 1 of size n 1 Â n 1 . The reasoning is identical for population 2 and only the allele frequencies change. These results are similar, but not identical, to VanRaden (2008). In particular, using VanRaden's method 1 directly, while coding genotypes in pure lines as 0/2, results in G VR 1 ð Þ ¼ 2G A 1 ð Þ . The reason for this discrepancy is because of the reference population used for the additive variance. For a single population with an arbitrary level of inbreeding, the covariance of additive values is expressed as the relationship matrix times the additive variance in an outbred population with the same allele frequencies (Endelman and Jannink 2012). The additive variance, r 2 A 1 ð Þ defined here is for the fully inbred population, and these two definitions of additive variance differ by a factor of 2. VanRaden's additive relationship matrix divided by two to obtain a kinship (or coancestry) matrix results in G VR 1 ð Þ =2 which is equal to our result. Note that the choice of reference population changes the scaling for G.
For the dominance deviations, the covariance matrix for hybrids, assuming linkage equilibrium, is where r 2 d is the variance of the dominant effect at the locus level, defined at the hybrid population.
Because gametes are uncorrelated, The variance of dominance deviations for F1 hybrids is therefore, for one locus, r 2 D ¼ 4p 1 q 1 p 2 q 2 d 2 . This is as in Reif et al. (2007), Hallauer et al. (2010) and Vitezica et al. (2016) although the d effect is defined differently in their models, that use "uniquely defined" effects. From here and assuming LE across loci we have which results in the covariance matrix is the dominance relationship matrix across hybrids of size n Â n. Note that we assume that allele substitution effects and dominance effects are random with respective variances r 2 a and r 2 d . Technow et al. (2014) modelled specific combining abilities using element-by-element products of matrices G A 1 ð Þ and G A 2 ð Þ , following Stuber and Cockerham (1966). Clearly this is not the same as our matrix D above, that results directly from modelling dominance deviations. We will show later that Technow et al. (2014) approach, in fact, only models additive by additive acrossheterotic groups epistasis, which is indeed a part of the SCA, and that their approach is an approximation to our D. Also we show that the method of Stuber and Cockerham (1966) using elementby-element products to obtain relationships across SCA assumes an infinitesimal model, and should not be directly transposed to marker-based models.

Some properties of the additive and dominance relationship matrices
Matrices G A 1 ð Þ , G A 2 ð Þ and D have an average diagonal equal to 1 and an average value equal to 0 across the whole matrix. This implies that estimates of variance components can be interpreted as genetic variances (Legarra 2016). Also, z and w, the underlying incidence matrices to G A 1 ð Þ , G A 2 ð Þ and D are orthogonal (see Appendix for the proof), which implies that by construction statistical estimates are a priori independent from each other. Also, because the basic bricks z and w are orthogonal, extension to higher order of interaction (epistasis) is immediate and also orthogonal (as mentioned later). In the next section, we present epistatic relationship matrices.

Epistasis in hybrid populations
So far, we have written down the model for analysis of hybrid crosses including additive and dominance relationships. Now we use Kronecker products to extend the incidence matrices z and w to epistatic interactions.
The classical model (1) including GCA and SCA effects for a hybrid individual can be written as y ij ¼ l þ gca i þ gca j þ sca ij þ e ij , where y ij is the phenotypic value of the hybrid, l is the population mean, gca i is the GCA of line i, gca j is the GCA of line j and sca ij is the SCA which depends of the combination of alleles received from i and j. Stuber and Cockerham (1966) showed that the GCA-term ðgca i Þ includes, in addition to the additive gametic effects, the additiveby-additive epistasis across loci for alleles present in the line (equation 1, page 1279), and all higher order additive interactions in the line. This is because the lines are inbred-so exactly the same gamete is always transmitted to the hybrid, contrary to animal breeding where recombination breaks down epistatic combinations. So, for instance the GCA of a gamete from population 1, considering two loci, k and m and second-order epistasis,

Epistasis intrapopulation
where aa ð Þ k;m 1 is the deviation due to epistatic interaction across loci in population 1. The coefficients of the incidence matrix, z 1 k ð Þ z 1 m ð Þ , for second-order epistatic effects between two loci can be computed as the Kronecker products ( ) of the respective incidence matrices for single locus effects. A Kronecker product of orthogonal incidence matrices results in an orthogonal incidence matrix (Van Loan 2000), so the orthogonality (always under the assumption of linkage equilibrium) extends to any order of epistasis.
For multiple loci, the matrix Z 11 of additive-by-additive interaction effects can be written using Kronecker products of each row (corresponding to each line) of the preceding matrices as Matrix Z 11 is of large size (the number of rows is nsnp 2 ) but it is not explicitely used. Following Vitezica et al. (2017), we know that where is the Hadamard product; following developments in Vitezica et al. (2017) the genomic additive-byadditive epistatic relationship matrices of lines of population 1 with themselves is thus in agreement (up to a scaling factor) with Cockerham (1954), Henderson (1984), Martini et al. (2016), and Vitezica et al. (2017).
In the above expression, tr is the trace and n 1 is the number of lines in population 1. Therefore, the covariance matrix for the additive-by-additive interaction within population 1 ðg AA 11 ð ÞÞ is: The reasoning for population 2 is the same resulting in The dimensions of G AA ð2;2Þ and G AA ð2;2Þ are n 1 Â n 1 and n 2 Â n 2 , respectively.

Epistasis across populations
According to Stuber and Cockerham (1966), the SCA-term sca ij ð Þ ; in addition to the dominant deviation effects, includes the additive-by-additive epistasis across loci in alleles coming from different populations (equation 2, page 1279), the additive-bydominant and dominant-by-dominant interactions, plus higher order interactions that we will not detail here as the reasoning is the same. So, SCA for a hybrid from populations 1 and 2, considering two loci, k and m, The different z 1 and z 2 come from two parental lines i and j. Let T 1 be a matrix relating hybrids to lines in population 1 with 1 in the k, l position if the k-th hybrid comes from the l-th line in population 1 and T 2 a similar matrix linking hybrids to lines in population 2. The covariance matrix for the additive-byadditive interaction between populations 1 and 2 (g AA 12 ð Þ) can be calculated as: where n is the number of hybrids and the matrix G AA ð1;2Þ has size n Â n. In other words, the matrix G AA ð1;2Þ is formed as follows: 1) For each pair of hybrids i; j with respective parents parent1ðiÞ (from population 1), parent2ðiÞ (from population 2) and parent1ðjÞ (from population 1) and parent2ðjÞ (from population 2) do: 2) Scale the resulting matrix to an average diagonal of 1. Technow et al. (2014) used G AA ð1;2Þ to model the relationship matrix of SCA's. We have shown that this is incorrect because G AA ð1;2Þ models across-population epistasis (interactions across loci for alleles coming from different heterotic groups) but it does not model dominance deviations (interactions within loci). We will see in the Discussion section that G AA ð1;2Þ is in fact an approximation of D.
Relationships for the other pairwise epistatic interactions (all of them present in the SCA and of size n Â n) are: • Additive in population 1 by dominant: In the same manner, it is possible to derive relationships for third and higher order interactions, using Hadamard products of G A 1 ð Þ , G A 2 ð Þ and D including the incidence matrices T for across-population interactions.
As the two gametes in each hybrid are uncorrelated, the genotypic variance (Stuber and Cockerham1966) (ignoring third and higher order epistatic terms) is In our analysis on real data, we will ignore the epistatic interaction terms r 2 ð Þ D and r 2 DD as their estimate is very inaccurate. We remark that a breeder is interested in r 2 GCA (because it indicates how much variation is expected in hybrids) but also in r 2 A , which determines the genetic progress of hybrids that is achievable by selecting lines, crossing them, and producing new inbred lines within heterotic groups (Stuber and Cockerham 1966). This is because epistasis combinations are broken down by recombination when creating new source populations for line development within each heterotic group.

Materials and methods
As illustration of the genomic relationship matrices developed here, variance components were estimated using the publicly available data set from the breeding program of the University of Hohenheim (https://doi.org/10.1534/genetics.114.165860) (Technow et al. 2014).
All parental inbred lines were genotyped with the Illumina Maize SNP50 BeadChip (Ganal et al. 2011). Here, we used the 35,478 SNP available after quality control (see details in Technow et al. 2014). Markers that were monomorphic in one group but segregating in the other group were kept. Genotypes of tested single-cross hybrids were derived from parental genotypes.

Genomic evaluation models
Our new GCA-model was compared with Stuber and Cockerham (1966) model for effects of genes "defined uniquely", whose implementation in a marker based-model for hybrid crops is actually the NOIA system (Á lvarez-Castro and Carlborg 2007; Vitezica et al. 2017). We will call this the G-model. Both genomic models were used for analysis of hybrid records (either estimation of genetic parameters or cross-validation). Note that the difference from our GCA-model to previous studies is that we use genomic relationship matrices that we have completely developed from theory (and not by transposition of pedigree-based concepts).
GCA model. Here, effects were defined "according to origin" (Sprague and Tatum 1942;Griffing 1962;Stuber and Cockerham 1966), but with relationship matrices developed in the Theory section. So, GCA-model for the (i; j) hybrid resulting from the combination of parental lines i (from population 1) and j (from population 2) can be written as: where y ij is the hybrid phenotype (entry mean), l is the overall mean. Our models differ in the explicit modelling of the GCA and SCA into sub-components due to additive, dominant and epistatic statistical effects. In classical settings, GCA and SCA may be modelled either as fixed or as independent random effects (Bernardo 2010;Hallauer et al. 2010). For instance, Giraud et al. (2017) used a model with random, uncorrelated GCA and SCA for the analysis of a half-diallel design. As mentioned before, GCA contains additive, additive x additive and further withinheterotic epistatic effects, whereas SCA can be split into dominance, across-population additive effects, and epistatic effects including dominance. For instance, gca ð1Þ ¼ g A 1 ð Þ þ g AA ð1;1Þ þ g AAA ð1;1Þ þ g AAAA ð1;1Þ þ Á Á Á where all terms are obviously not always fit in the model or estimable in practice.
Still, and because there are potentially several hybrids for each line, we use "residual genetic" r effects (e.g. Endelman et al. 2018) (see explanation below-similar to the "permanent environmental effect" in animal breeding) as a catch-all term that includes genetic effects that are not explicitely included. For instance, if we assume gca ð1Þ ¼ g A 1 ð Þ þ r, the term r captures r ¼ g AA ð1;1Þ þ g AAA ð1;1Þ þ g AAAA ð1;1Þ . . . and further terms, but if we assume gca ð1Þ ¼ g A 1 ð Þ þ g AA ð1;1Þ þ r, then r ¼ g AAA ð1;1Þ þ g AAAA ð1;1Þ . . .. The "residual genetic" r effects are assumed random and uncorrelated, and can be estimated because there are repeated hybrids for each line. In this manner the models are more robust to the fact of fitting genetic effects up to an arbitrary complexity that may be enough or not. The strategy of fitting "catch-all" "residual genetic" effects could also be followed for the SCA, but because hybrids are not repeated in the entry means, "residual genetic" r effects are not estimable as they are confounded with the residual.
Then we make several choices of genetic effects to be explicitly included in GCA and SCA, leading to several models that will be described later. The most complete model includes 2Þ . We ignore second-order epistasis including dominance and higher order epistatic interactions. Thus, the most complete model is: which in vectorial form (all phenotypes in vector y) is: where the T incidence matrices assign hybrids to parents in each heterotic group.
The additive effects of gametes from each inbred line are Fit of "residual genetic" GCA effects. As discussed before, the GCA effect conceptually contains within-population additive effects and additive epistatic interactions of any order. As not all these interactions are explicitely modelled, and because pure lines are repeated in hybrids, we fit "residual genetic" r effects in GCAmodels. For instance, if only additive effects are fit, this "residual genetic" r effect account for all additive epistasis within-group present in the GCA effect. Fitting residual genetic GCA effects is similar to fitting individual permanent environmental effects in animal breeding (e.g. for a cow that gives repeated performances of milk yield). It is known in animal breeding that this effect captures, among other things, genetic effects not explicitly modelled such as dominance or epistasis (Kruuk 2004;Vitezica et al. 2018). Indeed, historically GCAs have been estimated as random, unrelated effects, for instance in diallel designs (Sprague and Tatum 1942;Hallauer et al. 1988). Thus, all the models detailed above (shown in Table 2) include "residual genetic" r effects.
G model. This model ignores the origin of the gametes and uses a "uniquely defined" effect per hybrid (Stuber and Cockerham 1966), as developed in a genomic context by Vitezica et al. (2017) using the NOIA approach, to correctly model dominance deviations under the constraint that hybrids are not in HWE. The G-model for single-cross hybrid individuals can be written as:

Sub models and model comparison
Variance components were estimated for nested models (GCAmodel and G-model) that added, in succession, additive effects (g A ), dominance effects (g A þ g D ), additive-by-additive genetic effects (g A þ g D þ g AA ) in addition to "residual genetic" r effects of lines in the GCA-model. The additive-by-additive epistatic effects can be interactions of loci within line from population 1 g AA  Table 2. Also, the variance attributable to GCAs is r 2 ð Þ þ r 2 r 2 ð Þ and for SCAs is r 2 SCA ¼ r 2 D þ r 2 AA 1;2 ð Þ . Also, another set of models was identical but "residual genetic" r effects of lines were not fit in any of the models.
Goodness-of-fit of models was compared based on the deviance information criterion (DIC), which balances model fit and model complexity to avoid overfitting (Spiegelhalter et al. 2002). The lower the DIC value, the better fit of the model to the data.
Predicted ability of phenotypes of "untested hybrids" for the different models was tested performing a T2-T1-T0 cross-validation scheme as in Technow et al. (2014). The prediction accuracy of T2, T1, and T0 hybrids (two, one and zero parents, respectively) was obtained for 300 hybrids (N D ¼ 90 and N F ¼ 53 of Dent and Flint parental lines) in the training set. Predictive performance of hybrids was computed separately for each group of hybrids by dividing the correlation of predicted and observed values by ffiffiffiffiffiffi H 2 p . H 2 is the genomic broad-sense heritability. The H 2 estimated with the full GCA-model was used. The cross-validation process was repeated 100 times.
Estimation of variance components and cross-validation were performed in a Bayesian approach using the BGLR R-package (Pé rez and de los Campos 2014). To speed up computation, the eigenvalue decomposition of the variancecovariance matrices was done according to Acosta-Pech et al.
(2017) and modeled as Bayesian Ridge Regression (BRR). For each model, inferences were based on 30,000 samples collected from 60,000 iterations after discarding 30,000 for burn- GCA-model (effects (g) and variances (r 2 ) defined within heterotic group): additive (A 1 ð Þ and A 2 ð Þ ), dominance (D), "residual genetic" (r) and additive-by-additive epistasis (AA) within heterotic groups ((1,1) and (2,2)) and between heterotic groups (1,2). All the models detailed above were also run without the "residual genetic" r effect term. in and thinning of 10. Convergence of variance parameters was inspected by trace plots and convergence diagnostic was assessed using the BOA R-package (Smith 2007).

Data availability
The data set from the breeding program of the University of Hohenheim is available with the publication of Technow et al. 2014 (https://doi.org/10.1534/genetics.114.165860). Estimation of variance components and cross-validation were performed in a Bayesian approach using the BGLR R-package (Pé rez and de los Campos 2014). The software can be downloaded from https:// cran.r-project.org/web/packages/BGLR/index.html. A program to build the genomic matrices is available at http://genoweb.tou louse.inra.fr/~zvitezic/maize.

Variance components estimates and heritabilities
Variance components and broad-sense heritabilities (H 2 ) for GCA-and G-models are shown in Tables 3 and 4, and for the model without "residual genetic" r effects, in Table A2.
We consider first Table 3 and the GCA-model. The total variance of GCA e.g. for population 1 is r 2 ð Þ , and this changes very little across models: total GCA variance for population 1 oscillates between 27.66 and 29.16, and for population 2 between 18.53 and 20.74. Within GCA, each individual component varies across the different models but changes are not large; the major change is the diminution from 23.16 to 19.06 in the estimate of r 2 A 1 ð Þ when r 2 AA 1;1 ð Þ is fit, and similarly (r 2 A 2 ð Þ changes from 12.92 to 10.61) for population 2. This is due to reassignment of total GCA variance across its component parts; the r effect does not fully account for the absence of the r 2 AA 1;1 ð Þ variance component. As for the SCA, its two components (r 2 D and r 2 AA 1;2 ð Þ ) also show some changes and there is some reassignment from one to the other. Still, for GCA and SCA components changes are not great and enter well within the confidence intervals of the estimates.
On the contrary, when "residual genetic" r effects were not fit (Table A2), changes were much larger. In particular, the inclusion of within-group epistatic variances (r 2 AA 1;1 ð Þ and r 2 AA 2;2 ð Þ ) reduced additive variances from 31.08 to 22.30 for r 2 A 1 ð Þ , and from 22.11 to 14.42 for r 2 A 2 ð Þ in the full model. Thus, we conclude that the GCA-model is reasonably (empirically) orthogonal when "residual genetic effects" for GCA of lines are fit, regardless of the level of complexity for the genetic modelling (additive, additive by additive, etc.). Using "residual genetic effects" allows to accommodate non-additive effects of lines not explicitly modelled.
Some of these changes can be attributed to similarity of relationship matrices. The correlation between G A 1 ð Þ and G AA 1;1 ð Þ (and regression coefficient of G A 1 ð Þ $ G AA 1;1 ð Þ) was 0.39 (0.92); and the correlation between G A 2 ð Þ and G AA 2;2 ð Þ (and regression coefficient of G A 2 ð Þ $ G AA 2;2 ð Þ) was 0.55 (1.63). These results show a high similarity between these relationship matrices and explain that the additive effects tend to capture additive by additive effects if the latter are not explicitly fit (as described above and shown in Table A2), something that is ameliorated fitting "residual genetic" r effects (Table 3).
Similarly, D and G AA 1;2 ð Þ matrices were highly correlated (0.88), suggesting that it is difficult to accurately separate dominance and across-groups additive by additive epistasis. To avoid redundancy between the D and G AA 1;2 ð Þ matrices, we corrected the G AA 1;2 ð Þ matrix by subtracting the contribution of dominance as in Alves et al. (2019). However, the resulting G AA 1;2 ð Þhad a similar correlation to D. Also the regression coefficient of D $ G AA ð1;2Þ was Þ ) and residual (r 2 e ) variances for GCA-and G-models and successively added additive effects (A), dominance effects (AD), additive-by-additive effects (AD(AA)). Superscripts 1 and 2 in parenthesis refers to dent and flint heterotic groups, respectively. The additive-by-additive epistatic effects can be interactions between loci within group ( AA ð Þ 11 ð Þ and AA ð Þ 22  Superscripts 1 and 2 in parenthesis refers to dent and flint heterotic groups, respectively. H 2 is the genomic broad-sense heritability. 1.02, indicating that elements of G AA 1;2 ð Þ are unbiased (but shrunken) estimators of the elements of D.
For the G-model, Table 3 shows that the additive variance estimate in the G : A model (51.77) was higher than the addition of additive variance estimates for Dent and Flint groups (36.08) in the GCA : A model. Similarly, estimates of dominance (r 2 D ðHÞ ) and epistasis within hybrids (r 2 AA H ð Þ ) variances were higher than in GCA-models. Both results are in agreement with Stuber and Cockerham (1966). The inclusion of the additive-by-additive epistasis effects in the G : AD AA Estimates of genomic broad-sense heritability H 2 for grain yield ranged from 0.73 to 0.80, and from 0.74 to 0.80 in the full GCA-and G-models, respectively (Table 4). Estimates of residual variances were similar between GCA-models and G-models (Table 3). The estimates of residual variances decreased as the non-additive genetic effects were added in both GCA-and G-models. Table 4 shows the Deviance Information Criteria (DIC) values for each model. Inclusion of non-additive effects in GCA-and Gmodels improved the goodness of fit of both models. DIC values between GCA-and G-models were very similar. Among all models, the best model (with lower DIC value) was the GCA : A AA ð Þ 1;2 ð Þ . Among the G-models, the best one was that accounted only for additive and additive-by-additive epistatic effects. Models including both dominance and epistatic effects had slightly worse DIC values than the best one. This can be explained because increasing the number of parameters may lead to overfitting and thus, it penalizes DIC values. Based on the inspection of the trace plot and convergence diagnostic with BOA R-package (Smith 2007), all estimates of the variance parameters in all models converged to the posterior distribution.

Cross-validation
The results for cross-validation are shown in Table 5. In general, prediction accuracy was considerably high for maize grain yield (>0.80 in all cases) and similar values were obtained with the Gand GCA-models in this data set. The inclusion of non-additive genetic effects did not improve the prediction accuracy of hybrid values in the testing sets compared to models including only additive effects. The only factor that counted in the predictive ability was the fact of having both, one, or no parents in the training data set, with respective prediction accuracies 0.80, 0.88 and 0.92.

Discussion
In this study, the theory in the analysis of hybrid crosses of inbred lines from two populations using relationship matrices was revisited in a genomic context. Models for genomic prediction in hybrid crops using the notions of effects defined "according to origin" (GCAs and SCAs) were rederived and expressions for additive, dominant and epistatic relationships for hybrids were presented. These models were applied to a public data set to exemplify the theory and its consequences in real life.

Insights into relationships for dominance and across-population pairwise epistasis
A surprising fact (to us) is that, in the classical pedigree-based methods, it is not possible to disentangle dominance deviations from across-population epistasis, whereas using markers it is possible. This seems not to have been recognized by previous researchers, leading to the wrong conclusion that in a genomic setting the relationship of dominance deviations is a product of corresponding additive relationships of parental lines. In this section we try to explain why such a difference. Stuber and Cockerham (1966) used the notion of identity by descent (IBD) coefficients to model relationships, where the starting block is the use of coancestries U -the probability that two alleles drawn at random from each of two pure lines are identical by descent. Although in our work we use genomic relationships (that are not probabilities), the concept of IBD is useful in the following.
For two hybrids, the IBD dominance relationship coefficient at locus k (say d ðkÞ ) is the probability that two complete genotypes at locus k in hybrids (i and j) are identical, and because the lines are fully inbred, this is the joint probability that both "parents1" (ancestors from population 1) are IBD at locus k (with probability Þ ) and both "parents2" (ancestors from population 2) are IBD at locus k (with probability U k ð Þ parent2 i ð Þparent2 j ð Þ ) (see Figure 1).
However, in practice, pedigree-based coancestries at specific loci are not observable and they are replaced by infinitesimal coancestries: GCA-and G-models are models that successively added additive effects (A), dominance effects (AD), and additive-by-additive genetic effects (ADðAAÞ). The additive-by-additive epistatic effects can be interactions between loci within group ( AA ð Þ 11 ð Þ and AA ð Þ 22 ð Þ ), across groups AA ð Þ 12 ð Þ or within hybrids AA ð Þ H ð Þ .
Superscripts 1 and 2 in parenthesis refers to dent and flint heterotic groups, respectively. The values refer to the mean (standard deviation) over 100 crossvalidation runs with the different models. For T2, T1 and T0 group hybrids, two, one and zero parents were tested in the training set.
expression presented by Stuber and Cockerham (1966). The approximation results from the fact that the genome is finite. For instance, if there were m ¼ 3 loci, there would be 3 local IBD, one at each locus, whose average will not in general be the same as the pedigree-based IBD which does assume infinite loci (Hill and Weir 2011). In an infinitesimal model, the approximation is exact. Now we address the across-population epistatic additive by additive relationships. Consider two loci k and l. In an IBD framework, across-population epistatic additive by additive relationship for hybrids i and j at two loci k and l (say w k;l ð Þ ij ) is the joint probability that both "parents1" (ancestors from population 1) have the same genotype at locus k and that both "parents2" (ancestors from population 2) have the same genotype at locus l (see Figure 2). Thus, w k;l Whole-genome epistatic relationship would therefore be   explains why w ij is an estimator (albeit not necesssarily a good one) of d ij , and it explains why the elements of G AA 1;2 ð Þ are unbiased (but not necessarily accurate) estimators of the elements of D, as shown by the results.
Thus, we have shown that the Stuber and Cockerham (1966) relationships assuming pedigrees are only exact under infinitesimal models. In previous sections we have shown that observing the genome (i.e. with markers), different relationships can be formed for each, additive substitution and dominant and epistatic deviations. Thus, contrary to pedigree-based formulations, a marker-based formulation allows disentangling of the different variance components.
Thus, in pedigree-based models the dominance and acrosspopulation epistatic relationships are conceptually different, but the lack of other information forces to use the same estimator for both. This is not the case in marker-based models, where we can actually observe different relationships within locus or across loci from two populations.

Partition of genetic variance components and heritability
The partition of the genetic variance in terms of statistical additive effects, and dominance and epistatic deviations effects, was possible using the relationship matrices developed here. In our model, estimates of additive genetic variance based on allele substitution effects are useful for selection or in the prediction of potential selection response in pool improvement. Vitezica et al. (2013) compared a classical model (in terms of statistical values for breeding purposes) with a genotypic model (biological values at the gene level) proposed by Su et al. (2012). When the genotypic model is used, additive and dominant genotypic variances are obtained. Both models are able to explain the data but their results and interpretation is different (Vitezica et al. 2013;Varona et al. 2018). The genotypic model has been used for hybrid genomic prediction (Fristche-Neto et al. 2018;Werner et al. 2018;Alves et al. 2019;Ramstein et al. 2020), but estimates of genotypic additive variance should not be interpreted for breeding purposes. The GCA-(proposed here) and G-models are equivalent models to explain the data only if all relevant gene actions (i.e. high order interactions) are included (Stuber and Cockerham 1966), but it is impossible to ascertain if all relevant interactions are included. In our results, both definitional systems perform similarly for prediction. However, as the G-model assumes gene effects uniquely within hybrids and does not provide additive values within pool, it can not be directly used for the selection of inbred lines within pools for recurrent pool improvement. Thus the GCA-model is more useful.
Orthogonal partitioning of the effects has been described extensively (e.g. Cockerham 1954;Kempthorne,1954;Lynch and Walsh 1998) for classical HWE populations but also for hybrid crosses (e.g. Griffing 1962;Stuber and Cockerham 1966;Bernardo 1996). Statistically, orthogonality means that inclusion of new terms in the model does not change the definition (in practical terms: the estimates) of already included effects in an ideal, infinitely large population. For instance, by construction, in an orthogonal model there is no covariance across statistical additive and dominance effects. This implies that the covariance across hybrids can be split in covariance due to additive effects, covariance due to dominance deviations, and so on (Lynch and Walsh 1998). Another advantage of using orthogonality in Genetics and breeding is the interpretability. It is the only way to carry out the estimation of GCA (additive "statistical" effects þ within-group epistatic "statistical" interactions) in an unambiguous manner, i.e. such that their definitions do not depend on other genetic terms that are fitted in the model.
In practice, additive, epistatic and other variances can not be accurately disentangled with a small data set and many (unknown) QTL loci. However, even with a thousand records and thirty thousand markers (as in this work), it still makes sense to orthogonally define the genetic effects in the model. Not using orthogonal partitions might lead to ambiguous definitions of effects and to potential mistakes. For instance, if the additive variance is inflated, a possible consequence is that the genetic progress can be overestimated. If the dominance variance is inflated, the role of assortative mating of pairs of lines to produce a hybrid could be exaggerated. In our work we used orthogonal definitions of effects and the corresponding relationship matrices, as well as "residual genetic" r effects to account for unmodelled higherorder effects. In this manner we obtained, in the GCA-model, empirically orthogonal estimates of additive, dominance and epistatic variances for maize grain yield.
In the GCA-model, after fitting the "residual genetic" r effect, additive variances were similar across different models ($22 and $12 for group 1 and 2, respectively: see Table 3) showing empirical orthogonality (Hill and Mä ki-Tanila 2015;Vitezica et al. 2017). For planning the breeding scheme (to estimate genetic gain and selection of within pools crosses), it is important to obtain good estimates of the genetic variance, and therefore we recommend fitting "residual genetic" r effects, in order to avoid overestimation of the genetic additive variance. The latter option is only possible if each line contributes to several phenotyped hybrids.
In the G-model, when within-group epistatic effects were not fitted, additive variance was overestimated. Similar results were observed by Bernardo (1995). He attributed this to multicollinearity between the additive and within-group epistatic relationships, as we observe. Working with repeated measures per individual, Vitezica et al. (2018) fitted a G-model with "residual genetic" r effects and they obtained empirically unbiased estimates of additive variance. However, in the present work it was not possible to fit "residual genetic" hybrid effects in the G-model because in our dataset each hybrid has a single record (adjusted entry means).
Genomic relationship matrices for within and across groups epistasis (in the full GCA-model) allows to partition the genetic variance in terms of GCA and SCA effects, as was originally defined by Stuber and Cockerham (1966) in an infinitesimal context. With our model, it is possible to split the GCA effect into the additive gametic effect and the additive-by-additive epistasis interaction within the line; and split the SCA effect into dominance deviation effect and additive-by-additive epistasis across groups. This has practical implications in hybrid breeding programs that will be discussed later.
Compared to the estimates of genetic variance component from Technow et al. (2014), we obtained similar estimates with the GCA : A AA ð Þ ð1;2Þ model (see Table 3). This makes sense because, as indicated in the theory, the estimate of SCA variation from Technow et al. (2014), is in fact the estimate of epistasis variation across populations r 2 AA 1;2 ð Þ . Their entry-mean heritability was 0.87, whereas our genomic estimate of broad-sense heritability was slightly lower (0.81). Differences between our and their estimates are mainly because we used entry means (publicly available) instead of the whole data set, which can be seen through the estimated residual variance which was much lower (e.g. $17) than their estimated values (179).

Goodness of fit
Models with lower DIC values better fit the data, and a difference less than 7 units is often considered as irrelevant (Plummer et al. 2006). In general, the inclusion of non-additive genetic effects improved the goodness of fit to the data in both GCA-and G-models in this set of hybrids. This result agrees with previous studies in maize hybrids ( Ferrão et al. 2020;Alves et al. 2019;Hunt et al. 2020). DIC values obtained with the GCA-model were similar to those obtained in G-models, indicating that they are equivalent models in terms of fitting the data. The best model, with a best balance between goodness of fit and model complexity, was the GCA : A AA ð Þ 1;2 ð Þ , which corresponds to a frequently used model in genomic prediction of hybrids (Technow et al. 2014). That means this model is efficient to fit the data. However, fit to the data is not the only aspect that should be considered-interpretation of the model in a genetic context is important.

Cross-validation
Overall, cross-validation analyses yielded a high prediction accuracy of hybrid performance (>0.80). This is because a high heritability generally results in high prediction accuracy, as was showed theoretically and empirically (Daetwyler et al. 2010;Combs and Bernardo 2013). Inclusion of non-additive genetic effects did not show improvement in prediction accuracy. This result agrees with other studies using real data where virtually no benefit was observed by including SCA effects in genomic prediction models of inter-heterotic-group hybrids (Bernardo 1994;Schrag et al. 2006Schrag et al. , 2018Maenhout et al. 2010;Kadam et al. 2016). This is because in inter-heterotic-group hybrids the proportion of SCA variance is often low and GCA high (Reif et al. 2007). We used the splitting of cross-validation considering T2, T1 and T0 (groups of hybrids with two, one and zero parents known in the training set) as in Technow et al. (2014). Our predictive abilities were comparable to those reported by Technow et al. (2014). For instance, the correlation obtained with the GCA : AD AA ð Þ 1;2 ð Þ results in values of 0.92, 0.88 and 0.80, which are close to the correlations of 0.91, 0.85 and 0.77 (for 300 hybrids in the training set) for T2, T1 and T0, respectively, reported by Technow et al. (2014) for grain yield. Assuming marker effects defined uniquely at the hybrid level (G-models) gave similar prediction accuracy than assuming gene effects according to origin (GCA-models). This result was also reported by Technow et al. (2014) with the same data set, but also by Alves et al. (2019) who analyzed a population of hybrids derived from a convergent population. Thus, GCA-and G-models are equivalent in terms of predictive ability of hybrid performance. However, our aim in this work is to introduce a more meaningful model (the GCA-model), and its superiority is not to be considered only in terms of better prediction ability in the hybrids.

Practical implications in hybrid breeding
The way of partitioning the genetic variance is to a certain extent a matter of convenience. Partitioning in terms of GCA (within group) is more convenient because inbred lines are actually created and selected within group. The magnitude of the GCA variance gives to the breeder an idea of how much overall genetic variation coming from the parents is expected in the hybrids. Further, splitting the GCA variance into additive and epistasis within group is relevant at the moment of planning the genetic progress in maize breeding programs. The genetic improvement in hybrid performance is through the selection of inbred lines. So that, breeders create new segregating (e.g. F2) populations by crossing elite lines within groups followed by subsequent generation of inbreeding to develop new inbred lines. Therefore, the particular additive-by-additive (and higher order) epistatic combination existing in a particular elite line is not transmitted as a whole to its F2 (and further selfing) progeny, because meiosis and recombination shuffles alleles of the two parents in the cross, breaking down the original epistatic combinations present in the elite inbred lines and creating new epistatic combinations. Thus, the use of the additive variance, instead of the total GCA variance, is more appropriate for the prediction of genetic progress that is achievable by selecting within heterotic pools (Stuber and Cockerham 1966). In addition, variance of epistasis within groups is expected to be converted in new additive genetic variance in the long term by random drift, thus, it affects the longterm selection response indirectly (Hill 2017). Also, for pool improvement, it is better to use estimates of additive effects instead of estimates of GCA, because the first reflect better expected genetic progress.
Splitting the SCA variance into dominance deviations and epistasis across groups could also have practical implications. Estimates of additive and dominance effects might be important for hybrid pool development. For instance, Zhao et al. (2015) suggested to use additive and dominance effects from an incomplete factorial in order to develop heterotic pools in wheat. Further, estimates of dominance deviations are relevant in the definition of mate allocation procedures (Varona et al. 2018); for instance, they could be used to maximize hybrid performance or maintain diversity for long-term genetic gain in hybrid breeding programs (e.g. Allier et al. 2019).
In maize, there is evidence of directional dominance (Reif et al. 2003;Ramstein et al. 2020). Indeed, directional dominance as a biological mechanism should exist, given that hybrids show heterosis. When there is directional dominance (i.e. a higher percentage of positive than negative dominance effects, EðdÞ 6 ¼ 0), overall heterosis could be considered in the genetic evaluation model. If individuals expressing the trait show considerable variation in heterozygosity (e.g. in a diallel design with crosses within-and across-groups), a more diverse individual will show more positive heterosis at the trait. De Boer and Hoeschele (1993) showed analytically that not fitting this heterosis (usually as a covariate) leads to spurious overestimation of dominance variation, as shown with real data (Xiang et al. 2016, Aliloo et al. 2017, Varona et al. 2018. Nonetheless, preliminary results in this work showed that heterosis (measured as number of heterozygotic loci) was very similar across hybrids and fitting heterosis in the models led to very similar results (not shown).

Conclusions
Models developed here, with effects defined according to origin (GCA-), and using genomic relationships properly defined for each statistical component, allow for a proper partition of statistical additive effects, dominance deviations, and epistatic deviations, in hybrids derived from inbred lines from two populations. Contrary to common belief, using SNP genotypes, it is possible to split SCA into dominance deviations and across-groups epistasis, and to split GCA into within-line additive effects and within-line epistatic effects. Our GCA-model is appropriate for genomic prediction and variance component estimation in hybrid crops using genomic data, and its results (estimates of genetic variance components, breeding values and deviations) can be practically interpreted and used for breeding purposes. 0 p 1 i q 1 i À Á 0 P i p 1 i q 1 i which sums to 0. The same proof holds for G A 2 ð Þ . The diagonal of D sums to P i p 1 i p 2 i À2q 1 i q 2 i À Á 2 þ p 1 i q 2 i 2q 1 i p 2 i À Á 2 þ q 1 i p 2 i 2p 1 i q 2 i À Á 2 þ q 1 i q 2 i À2p 1 i p 2 i À Á 2 P i 4p 1 i q 1 i p 2 i q 2 i which is equal to 1. In addition, the average value of the entire matrix D is 0. In effect, this sum can be written as P i p1 i p2 i p1 i q2 i q1 i p2 i q1 i q2 i À Á À2q 1i q 2i 2q1 i p2 i 2p 1i q 2i À2p 1i p 2i 0 B B @ 1 C C A À2q 1i q 2i 2q1 i p2 i 2p 1i q 2i À2p 1i p 2i 0 B B @ 1 C C A 0 p1 i p2 i p1 i q2 i q1 i p2 i q1 i q2 i À Á P i 4p 1i q 1i p 2i q 2i which sums to 0.

Genotype at P2
Genotype at P1