MAGE: metafounders-assisted genomic estimation of breeding value, a novel additive-dominance single-step model in crossbreeding systems

Abstract Motivation Utilizing both purebred and crossbred data in animal genetics is widely recognized as an optimal strategy for enhancing the predictive accuracy of breeding values. Practically, the different genetic background among several purebred populations and their crossbred offspring populations limits the application of traditional prediction methods. Several studies endeavor to predict the crossbred performance via the partial relationship, which divides the data into distinct sub-populations based on the common genetic background, such as one single purebred population and its corresponding crossbred descendant. However, this strategy makes prediction inaccurate due to ignoring half of the parental information of crossbreed animals. Furthermore, dominance effects, although playing a significant role in crossbreeding systems, cannot be modeled under such a prediction model. Results To overcome this weakness, we developed a novel multi-breed single-step model using metafounders to assess ancestral relationships across diverse breeds under a unified framework. We proposed to use multi-breed dominance combined relationship matrices to model additive and dominance effects simultaneously. Our method provides a straightforward way to evaluate the heterosis of crossbreeds and the breeding values of purebred parents efficiently and accurately. We performed simulation and real data analyses to verify the potential of our proposed method. Our proposed model improved prediction accuracy under all scenarios considered compared to commonly used methods. Availability and implementation The software for implementing our method is available at https://github.com/CAU-TeamLiuJF/MAGE.

, where both animals i and j are from breed A. When animal i is from breed A and animal j is from breed B, a Γ ij = Γ AB .The aforementioned a ij is the traditional relationships, while Γ A and denote Γ AB within-and across-breed ancestral relationships, respectively.
Consequently, the pedigree-based additive partial relationship matrix A A can be expressed as a 2 × 2 block matrix based on the breeds of animals: where the subscripts of the submatrix indicate the breeds within the covariance matrix.The derivation can be presented separately for different blocks.
The formula for A A Γ A,A is straightforward since A A A,A consists of purebred: (2) Given that A A A,AB contains only off-diagonal elements, the formula for A A Γ A,AB ij can be descried as: where this formula assumes that animal i is from crossbred AB, while animal j is from breed A. The animals f (i) and m(i) are from breed A and B, respectively.
Thus, it is evident that and where the term Γ A AB can be referred to as the partial cross-breed ancestral relationships of breed A. And where the term Γ A B represents the special partial within-breed ancestral relationships.It is a part of the within-breed ancestral relationships of breed B, measured only by the information from breed A. It is obviously that Γ A B is equal to 0. Therefore, this formula can be written as: The diagonal elements of in two-way crossbreeding systems, a A Γ AB,AB ii can be written as: In summary, A A Γ can be written as: was increased to 7,000 animals for sampling.The proportion of males to females was maintained constant throughout all generations.
Two purebred populations (B and A) were established by sampling 50 males and 500 females in G0.
The sample is based on phenotypes and has no overlap.The first generation of breed A was randomly sampled from the lowest 30% of phenotypes in G0, while breed B was sampled from the highest 30%.Within the purebred population, random mating continued for 20 generations (G1-G20).In each generation, 50 males and 500 females were randomly selected, and every female produced 10 offspring.
During generations 11 to 20 (G11-G20), a crossbred population (AB) was created randomly selecting 100 males from breed A and 500 females from breed B. Individuals mated randomly across breeds.Each purebred female produced ten crossbred offspring.A total of ten crossbred generations (CG1-CG10) were simulated.
The genome comprised 18 chromosomes with 1.2 Morgan, each containing 500 QTL and 10,000 SNP.
QTL positions were randomly simulated, while SNP positions were uniformly simulated.Biallelic SNP was simulated with uniformly distributed allele frequencies.During historical generations, the mutation rate of QTL and SNP was 2.5e-5, with no mutations occurring after the historical generations.
Following quality control, 5,472 QTL and 49,111 SNPs were segregated in G0 with a minor allele frequency greater than 0.05.Subsequently, all effects and phenotypes simulated by QMSim were discarded, and a new simulation was conducted.
The phenotypic and additive variances were set at 10 and 1, respectively, resulting in a constant narrowsense heritability of 0.1.A random selection of 1,000 QTL from the 5,472 segregated QTL was made, and the additive values of each QTL were sampled from a standard normal distribution.The additive values were then scaled based on QTL allele frequencies to adjust the additive variance to 1.
The size of non-additive effects of QTL was assumed to depend on the size of additive effects at these QTLs.The dominance value (d) was calculated as the product of the dominance coefficient (δ) and the absolute value of additive value (|a|).The dominance coefficient, or the degree of dominance (Falconer 1996), was sampled from a normal distribution with a mean of 0.2, in line with empirical observations (Bennewitz andMeuwissen 2010; Sun andMumm 2016).The standard deviation of the distribution was based on the proportions of dominance and QTL allele frequencies.The proportion of dominance, defined as the ratio of dominance variance to the genotypic variance, was simulated as 0, 10%, 30%, or 50%.The additive and dominance variances were calculated as follows: Lastly, environmental effects were sampled from a normal distribution with a mean of 0 and a variable standard deviation, ensuring a constant phenotypic variance of 10.

Appendix C. The formula for the complex crossbreeding systems
In this appendix, we present the formula of our model for complex crossbreeding systems, which mainly means three-way or four-way crossbreeding systems and is expected in the production of pigs and poultry.
In addition, those formulae are also suitable for the case that at least one parent of the individual is crossbred and both parents share the standard breed, such as crossbred animals mate with each other in the same crossbred population.

Additive relationship matrix
In three-way or more complex crossbreeding systems, it is easily proven that the pedigree-based additive partial relationship matrix A A Γ is similar to that in two-way crossbreeding systems: where the subscript P represents every purebred population except purebred A. The terms Γ A and Γ A AP denote within-and across-breed ancestral relationships, respectively.The recursive formulas for the element in A A are: where the element a A ij signifies the additive partial relationship between animals i and j; animal j is not a descendant of i.Additionally, the terms f (i) and m(i) represent the parents of animal i, and the term f A i denotes the breed A proportion of animal i.
Then, the coefficient matrices K A and Q A LP are different from that in two-way crossbreeding systems: where the terms k A ij and q P AP ij are the element of the coefficient matrices.

Dominance relationship matrix
In three-way or more complex crossbreeding systems, especially in the case of the crossbred animals sharing the standard breed mating with each other, the off-diagonal elements of the pedigree-based dominance partial relationship matrix D A must be adjusted to: where the element a f (i)f (j) signifies the additive relationship between animals f (i) and f (j), and the animals f (i) or f (j) represent the parents of animal i or j.
Then, the right-hand term of d A ij is defined as: ( a f (i)f (j) a m(i)m(j) ) A = a A f (i)f (j) a A m(i)m(j) + a A f (i)f (j) ∑ P k P a P m(i)m(j) + a A m(i)m(j) where the term , and superscript P denotes all breeds excluding breed A.

Figure 1 .
Figure 1.The simulation of breed L, breed Y and their crossbred descendants LY.The breed Y indicate positive selection based on simulated performance records and breed L indicate negative selection based on simulated performance.The crossbred LY is the crossbred descendants of animals from the last ten generation of the breed L and Y.