Application of multi-trait Bayesian decision theory for parental genomic selection

Abstract In all breeding programs, the decision about which individuals to select and intermate to form the next selection cycle is crucial. The improvement of genetic stocks requires considering multiple traits simultaneously, given that economic value and net genetic merits depend on many traits; therefore, with the advance of computational and statistical tools and genomic selection (GS), researchers are focusing on multi-trait selection. Selection of the best individuals is difficult, especially in traits that are antagonistically correlated, where improvement in one trait might imply a reduction in other(s). There are approaches that facilitate multi-trait selection, and recently a Bayesian decision theory (BDT) has been proposed. Parental selection using BDT has the potential to be effective in multi-trait selection given that it summarizes all relevant quantitative genetic concepts such as heritability, response to selection and the structure of dependence between traits (correlation). In this study, we applied BDT to provide a treatment for the complexity of multi-trait parental selection using three multivariate loss functions (LF), Kullback–Leibler (KL), Energy Score, and Multivariate Asymmetric Loss (MALF), to select the best-performing parents for the next breeding cycle in two extensive real wheat data sets. Results show that the high ranking lines in genomic estimated breeding value (GEBV) for certain traits did not always have low values for the posterior expected loss (PEL). For both data sets, the KL LF gave similar importance to all traits including grain yield. In contrast, the Energy Score and MALF gave a better performance in three of four traits that were different than grain yield. The BDT approach should help breeders to decide based not only on the GEBV per se of the parent to be selected, but also on the level of uncertainty according to the Bayesian paradigm.


Introduction
Genetic improvement of plants and animals is based on selecting and intermating the best performing parents to form the next improved population. The selection of which candidates to advance to the next breeding cycle is based on the breeding value (BV) and net genetic merits of multiple traits, and decisions are made based on their performance in crop fields and greenhouses. In genomic selection (GS), the genomic estimated breeding value (GEBV) of unobserved candidates is predicted using only genotypic information and a statistical model trained with phenotypic and genotypic data of individuals in the training population (Meuwissen et al., 2001).
The main decision in breeding is how to select the parents that maximize the response to selection (R) expressed as the difference between the mean of the offspring of the selected individuals (l 2 ), and the mean of the original population (l 1 ) (R ¼ l 2 À l 1 ). This decision can be achieved by different methods. The conventional strategy for selecting the best parents in a single trait is selection by truncation, which consists of ranking in descending or ascending order based on GEBV (depending on the desired direction of selection) and selecting some fraction of the top lines. With current genomic information, the optimum contribution theory (OCT) is a method for selecting the parental candidates with the aim of increasing the genetic gain by optimizing the genetic contribution of each individual to the next generation for a given rate of inbreeding (Henryon et al. 2014;Woolliams et al. 2015). Under the OCT method, the selection of candidates is based on high genetic merit, and on the relationship among the candidates for selection. In general, the main question is how to balance high genetic gains in the next generation while maintaining genetic diversity (Kinghorn 2011;Cowling et al. 2019). Akdemir and Sá nchez (2016) optimized genomic mating between parents under GS by applying a method that uses a function that combines measures of inbreeding as part of the objective function being minimized for a single trait. Furthermore, Han et al. (2017) selected the donor parents for the introgression of alleles to recipient individuals by proposing an optimized algorithm. Bulmer (1980) and Gianola and Fernando (1986) showed that, for a single trait, the conditional expectation of each candidate for selection maximizes the mean of true genotypic values and minimizes a squared loss function.
Decisions made during GS are based solely on the GEBV of the candidates for selection. Genomic approaches (1) estimate the GEBV through a statistical model and information about the unobserved (genotyped) individuals (candidate population) using the phenotypic and genotypic data of their parents, (2) rank the lines based on GEBV, and (3) select the top-ranking lines. Recently, Villar-Herná ndez et al. (2018) proposed a method based on Bayesian decision theory (BDT) for selecting the best candidates (in a single trait or in multi-trait) that maximize R; results were obtained by simulating a breeding program. For a single trait, and assuming the candidates have the same amount of information and are identically distributed, R could be expressed in terms of the selection differential ðS ¼ l S À l 1 , the difference between the mean of the selected individuals, l S , and the mean of the original population, l 1 ) multiplied by the heritability ðh 2 Þ. Thus, R¼h 2 S, and when h 2 ! 1; R!S (maximum expected response to selection, minimum expected loss in the decision of which candidates to select based on our breeding goals), whereas when h 2 ! 0, R ( S (minimum expected response to selection, maximum expected loss). The BDT methodology proposed by Villar-Herná ndez et al. (2018) considers the variance-covariance matrix between traits and the trait mean while minimizing the posterior expected distance between the distribution of the offspring of the selected individuals (distribution of the candidates) and the distribution of the selected individuals (distribution of the selected parents), and therefore maximizing the expected response to selection (R) given the phenotypic, genotypic and genomic information at hand. Minimizing the distance between the distribution of the parental candidates and the progeny distribution increases the accuracy of selection (assuming equal selection intensity).
Multiple trait selection is a concern addressed by animal and plant breeding in the past (Smith 1936;Hazel 1943;Henderson and Quass 1976) and also in the era of GS (Sun et al. 2017;Montesinos-Ló pez et al. 2019;Neyhart et al. 2019;Lenz et al. 2020). Multi-trait selection models are promising because they have the potential to increase the accuracy of GEBV (given that they use information about genetically correlated traits), especially in the presence of low heritability traits (Jia and Jannink 2012;Guo et al. 2014;Ward et al. 2019). Also, the improvement of genetic stocks requires considering multiple traits simultaneously because economic value and net genetic merits depend on all traits (Falconer and Mackay 1996).
The selection of multi-traits can be facilitated by ranking lines based on a single number; for example, genomic selection indices (SI) score lines based on a weighted average of GEBV, and then select those lines with high scores (Ceró n-Rojas and Crossa 2020). The approach of Villar-Herná ndez et al. (2018) ranks the lines based on the posterior expected loss (PEL, a single number) given our breeding goals, in terms of the mean and the genetic variance-covariance matrix. Thus, those candidates whose distributions are closer to the theoretical parental distribution will have the lowest expected loss value (high R), and the decision should be to advance those lines given that they reach the desired mean and keep the genetic variance as much as possible (high h 2 ).
As described in Villar-Herná ndez et al. (2018), the LF is the vehicle to go from the action space (candidate lines) to the resulting space (selected lines) given a Bayesian action (an action that guarantees a minimum PEL given our preferences). Of the three multivariate LFs used by Villar-Herná ndez et al. (2018), the Kullback-Leibler (KL) LF is easier to understand. We can compute the KL distance between two multivariate normal distributions, one of them truncated in a t-dimensional vector y c (reflecting the breeder's preference for high or low phenotypic values, y c of a length equal to the number of traits). The KL metric implies that the distance between both distributions decreases when the phenotypic (P) and genotypic variance-covariance (G) matrices tend to explain the same amount of variation between traits, i.e., GP À1 ¼ I (a quantity similar to narrow-sense heritability in a single-trait setting). Depending on the trait, the KL metric employs the divergence criterion, which induces less penalty for those lines that have more density (probability) to the right of censoring values (y c ) (increasing BVs) or more density to the left of censoring values (y c ) (decreasing BVs).
Similar interpretations of other LFs, Energy Score and MALF can be found in Villar-Herná ndez et al. (2018). The three LFs were derived and described based on univariate and multi-trait heritability, the response to selection, and the selection differential.
The advantage of the Villar-Herná ndez et al. (2018) method is that while minimizing the LF, the response to selection is maximized, considering uncertainty throughout the full posterior predictive distributions, and not only based on punctual estimates. Although Villar-Herná ndez et al. (2018) presented simulated and real data, they did not present extensive practical applications. Therefore, based on the previous considerations, the main objective of this research is to show the practical details when applying the BDT in a real GS prediction based on quantitative genetic concepts in breeding decisions. We used two extensive datasets (multi-traits from 766 and 320 wheat lines) in which we applied 10% selection intensity (a value commonly used in GS-assisted breeding) according to the minimum PEL criterion.

Experimental datasets
Dataset 1 (Elite wheat lines) information was given in Velu et al. (2016), and both datasets can be found at the link below.

Data and software availability
The two datasets used in this study and the codes for running the proposed models can be found at the following link http://hdl. handle.net/11529/10548420.

Direction of improvement in the datasets
In Dataset 1, selection of lines with low GEBVs for DTHD, DTMT, and Plant Height is required; thus, improvement focuses on the decreasing direction (-) of the trait. In contrast, for trait GY, progress lies in the increasing direction (þ), i.e., breeders want those lines with high GEBVs for GY. In Dataset 2, improvement of all traits under consideration lies in the positive (increasing) direction, i.e., lines with high GEBVs.

Loss function as a mechanism for parental selection
In general, the LF should reflect the distance/divergence of two probability distributions. The expected distance (loss of information) will be minimal if the two distributions approach each other, and the distance will be zero if both distributions are identical. Therefore, we can measure how close the distribution of the candidates is to the truncated parental distribution.

Univariate KL loss function
We first describe the single-trait case based on the idea of truncation selection. Let y c be a scalar. Lines with EBV above y c are the selected ones. The population from which we select the best lines is the base population (with some mean l 1 and some variance r 2 ). For most quantitative traits, normal distribution is assumed for the base population, i.e., Y $ Nðl 1 ; r 2 Þ; then the selected lines (parents in the following generation) formally follow a truncated distribution, i.e., Y s $ N T l 1 ; r 2 ; a ¼ y c ; b ¼ 1 À Á using the formal definition of a truncated distribution, but for simplicity, hereinafter we will denote Y s $ N T l 1 ; r 2 ; y c À Á . Note that truncated normal distribution is a function of three parameters: (l 1 ; r 2 ; y c ) and the mean of BVs after truncation occurs is l S ¼ l 1 þ r /ððycÀl 1 Þ=rÞ where / and U denote the probability density function and the distribution function of standard normal density, respectively. Also, note that this occurs within the same generation.
After crossing the selected lines, there is an offspring population that has some distribution with mean (l 2 ) hopefully greater than the mean of the base population (l 2 > l 1 ) and with some variance (we expect that it will be approximately equal to the variance of the base population in order to maintain the genetic diversity). We assume that the offspring population is also approximately normal, i.e., Y o $ Nðl 2 ; r 2 Þ. Then, using this idea, we can construct a metric that quantifies the distance between the distribution of the truncated (parental) population and the distribution of the offspring (candidates for selection) population, such that the candidates that guarantee maximum genetic progress are those that guarantee minimum distance between the two distributions; this occurs between generations. With this idea in mind, we can construct any metric, for example, based on the Kullback Leibler (KL) loss function or any other divergence function. It is important to note that KL distance can be calculated whatever the distribution of the parents and the candidates is. When normality is assumed, KL has an analytical expression, and otherwise can be approximated with numerical or simulation methods.
Univariate KL, as presented below in Equations 1a-1c, appears when we calculate and simplify the expectation of the log ratio between the theoretical parental distribution and the candidate distribution with respect to the base distribution: In the previous equations, the divergence measured is between F Yo and F Ys , where F Yo denotes the distribution function of Y o (random variable representing the phenotypic values of the offspring/candidate with mean l 2 ) and F Ys represents the distribution function of Y s (random variable denoting the phenotypes of selected lines obtained based on truncation selection with mean l s ). Thus, S ¼ l S À l 1 is the selection differential, R ¼ l 2 À l 1 is the selection response, and the standardized selection differential i ¼ S=r is the selection intensity. The second term on the right-hand side of Equation 1b implies that when R approaches S (while the selection intensity stays the same), the divergence between the truncated distribution and the candidate's distribution decreases and the genetic gain increases. That is, D KL F Yo ; F Ys ð Þdepends on the intensity of selection (which is assumed fixed) and is a decreasing function of h 2 (1c).
Note that the KL distance is not specific to normal distribution and can be applied to any pair of distributions. Appendix 1 ( Figures A1 and A2) shows a step-by-step explanation of a Bayesian decision approach using the KL distance metric to measure the distance between any pair of distributions (candidate distribution and truncated distribution).

Multivariate KL loss function
The idea of truncation selection can be extended to multi-traits where the base population, the parental individuals, and the offspring population follow multivariate distributions. The parental distribution is a truncated distribution in a vector y c (of a length equal to the number of traits). Assuming multivariate normal Truncated Normal (TMVN) distribution has mean vector l 1 , phenotypic variance-covariance matrix P; and truncation vector y c . The Multivariate Normal (MVN) distribution has a mean vector l 2 and the same P as the base population. After simplification, KL reduces to (2 b): Upon inspecting Equation 2b, when phenotypic (P) and genotypic variance-covariance (G) matrices tend to explain the same amount of variation and similar association between traits, then ðI À GP À1 Þ ! 0, i.e., the distance between the truncated distribution and the candidate/offspring distribution tends to decrease. The product matrix GP À1 ¼ h 2 is equivalent to multi-trait heritability (the ratio of the genetic variance in the numerator and the phenotypic variance in the denominator); thus, when GP À1 ¼ I, the heritability of the traits approaches 1; then R ¼ S and the mean of the offspring/candidate distribution is equal to the mean of the truncated distribution, l 2 ¼ l s . The multivariate KL LF contains the term -log z ð Þ; which indicates that the joint probability of multivariate BVs that is > y c has less penalty. As already mentioned for the single trait, the KL distance is not specific to normal distribution.

Another two multi-trait loss functions: MALF and Energy Score
We can use other metrics, not only the KL. Two simple and easy to implement LFs are the Energy Score and the Multivariate Asymmetric Loss Function (MALF). Both measures are generalizations of the single-trait selection fully discussed in Villar-Herná ndez et al. (2018). Brief theoretical descriptions of the Energy Score and MALF are given in Appendix 2.
Fitting the multi-trait mixed model and evaluating the posterior expected loss function To conduct selection through the Bayesian decision framework within genomic-enabled prediction of multiple traits, three steps are required: Step1: training a regression model using available phenotypic multivariate data and genotypic records (training population); Step 2: approximate posterior predictive distributions of each candidate assuming the same sampling model as in step 1; Step 3: approximate the PEL for each candidate so that we can identify the lines with minimum PEL that fulfill the desired selection intensity.
In our case, the multiple trait regression model used in step 1 was the following: where ðy 1 ; y 2 ; y 3 ; y 4 Þ 0 is the vector (or vectors) of phenotypic values for traits 1,2,3 and 4, respectively, l 1 ; l 2 ; l 3 and l 4 are the means for each trait, ðg 1 ; g 2 ; g 3 ; g 4 Þ 0 is the vector of genomic values of each line for the four traits, Z 1 ; Z 2 ; Z 3 and Z 4 are the corresponding design matrices for random effects, and ð 1 ; 2 ; 3 ; 4 Þ 0 is the vector of random residuals for all traits. Assuming that ðg 1 ; g 2 ; g 3 ; g 4 Þ 0 $ MVNð0; R g GÞ, where is the variance-covariance matrix of genomic values for the four traits, and ð 1 ; 2 ; 3 ; 4 Þ 0 $ MVNð0; R IÞ with I nÂn being the identity matrix, and the residual variance-covariance matrix. The model in (3) was fitted using the MTM R package (de los Campos and Grü neberg 2016, that is available at GitHub: https://github.com/QuantGen/ MTM) and R programming language version 3.6.1 (R Development Core Team 2019). The diagonal matrix R assumes a null residual covariance matrix among environments that is seldom used in plant breeding and animal breeding to avoid an important increase in computing time. After fitting the model given in Equation 3, we used outputs from the MTM R package to estimate the full posterior distributions of the model's parameters using 10,000 MCMC (Markov Chain Monte Carlo) samples after discarding the 30,000 samples as burn-in and thinning at lag 5. We then approximated the posterior predictive distribution of each line considered as a candidate, using multivariate normal distribution as the sampling model. Finally, the LF is inserted in Equation 4 in order to approximate the PEL of each candidate where p hjy; XÞ À represent the joint posterior distributions of the model's parameters after fitting the model in step 1 (model in Equation 3), and f y o jh; x 0 o Þ denotes the multivariate normal distribution (the sampling model), L F Yo ; hÞ À is the used LF (KL or Energy Score or MALF), and L o is the PEL. Then, for each candidate o, Equation 4 has to be evaluated. Integrals cannot be evaluated analytically; instead we used approximation iterating at each realization of MCMC chains; then the integrals are reduced to summations and averaged over the total number of MCMC chains (for breeders and geneticists who are not familiar with MCMC theory and Bayesian statistics, we attached all the source codes used in this study). It is important to recall that both the regression model in (3) and the LF used, can be replaced with any multiple trait model and any LF, i.e., the Bayesian decision approach is a general formulation. Here, we used three LFs that we think have a genetic sense.
It is important to note that the MTM R Package does not return posterior distributions of the model's parameters (which are needed to approximate the posterior predictive distribution) as MCMC objects, but internally it does. Therefore, we downloaded the source code and modified a couple of lines to save them. In http://hdl.handle.net/11529/10548420 the modified source code is added to all the datasets for reproducibility of the results presented in this paper.
In this study, we used information on 767 lines in Dataset 1 and the 320 records in Dataset 2 to train the regression model; the aim was to select the best 10% of lines (76 lines selected from Dataset 1 and 32 lines selected from Dataset 2) with minimum PEL; thus, the entire set of lines were considered as candidates.
For Dataset 2, the desired direction for all traits is in the increasing direction, so we chose y c ¼ ð6; 38; 31; 43Þ 0 for traits GY(þ), TKW(þ), GZnC(þ), and GFeC(þ), respectively. Each value in y c is greater than the simple average of the traits, 5.3 (GY), 33.1 (TKW), 28.1 (GZnC) and 38.6 (GFeC), and in all traits corresponds to 0.9 empirical quantile of observed phenotypic values. Note that the Bayesian decision formulation requires censoring on the right side of the parental distribution to reflect that improvement is in the positive direction, but in cases where the improvement is in the decreasing direction, we need a truncation on the left side of the distribution. A practical approach is to change the sign (multiplying by À1) of the GEBVs when evaluating the LF. This approach was used for traits DTHD(-), DTMT(-) and Height(-) in Dataset 1.

Dataset 1
Figure 1 displays pair-wise plots of the GEBVs for the four traits in Dataset 1 obtained under the KL LF. Figure 1A plots traits DTHD(-) and DTMT(-), given that improvement is in the decreasing direction for both traits, and the 76 selected lines (red dots) are concentrated at low values of GEBVs for both traits; the phenotypic correlation between DTHD and DTMT is high: 0.84 (Table 1). Similar results are observed in Figure 1C for DTHD (-) and Height(-) and in Figure 1E for the pair of traits DTMT(þ) and Height(-) with correlations of 0.20 and 0.14, respectively. Figure 1E depicts GEBVs of DTHD(-) and GY(þ), where the red dots are concentrated in the top left quadrant of the picture, given that we need to increase the phenotypic values of GY and, at the same time, decrease the value of DTHD. The same is shown in Figure 1D for the pair of traits DTMT(-) and GY(þ), and finally, in Figure 1F, for GY(þ) and Height(-), the selected lines are concentrated in the right bottom quadrant and have high GEBVs of GY and small values of Height. Phenotypic correlations of GY with the other traits were negligible, except with plant height (0.24) ( Table 1). These lines are the best according to the KL loss, which considers improvement in all traits, in either the increasing or decreasing direction. Similar plots were obtained using the Energy Score ( Figure A3, Appendix 3) MALF LFs ( Figure A4, Appendix 3), both of them with similar interpretations. Table 2 shows the values of the best five lines and the worst lines ranked based on the value of the three LFs, KL, Energy Score and MALF, for Dataset 1. Table 2 also shows, for each trait and for each of the five top ranking wheat lines and the worst five lines, the GEBVs and the PEL under each LF. Values of PEL were standardized to range from 0 (minimum loss) to 1 (maximum loss). The values inside () represent the rank of each line in the top 5 lines and in the worst 5 lines. Note that line 141 was ranked at the top (minimum PEL) for the three LFs. The reason for this is that it has low values of DTHD(-), DTMT(-) and Height(-), and although it does not have the maximum value of GY(þ), it has a relative high GEBV (6.57 ton/ha for trait GY); thus it is reasonable that it is ranked first. Wheat line 635 with the highest GEBV of 6.8 tons/ha for GY(þ) ranked five based on the KL LF; this is because the GEBVs for the other traits were not so low; the other four top lines (141, 232, 210, and 334) based on KL criteria had the lowest values for the other three traits DTHD, DTMT and Height; these lines gave slightly lower grain yield (6.6, 6.5, 6.6, and 6.7 tons/ha) than the line ranked five (635) based on KL criteria.
The other two LFs, Energy Score and MALF, except for line 141, selected another set of wheat lines in the top five, with lower values of the traits than the wheat lines selected by the KL LF. Regarding grain yield (þ), the top lines selected under Energy Score and MALF had lower grain yield than those selected by the KL criterion; however, the Energy Score and MALF criteria selected lines with lower values for traits DTHD(-), DTMT(-), and Height(-) than the KL criteria. As for the worst five lines, both criteria selected similar lines. Although only line (141) was selected under all LF criteria, in the 76 lines selected to be parents of the next generation for Dataset 1, KL vs Energy Score selected 30.26% of the same lines, KL vs MALF selected 31.58% of the same lines, and Energy Score vs MALF selected 92.11% of the same lines. Of the five worst lines based on PEL, we can see that they have high values for traits DTHD, DTMT and Height, but it was difficult to find lines with low values for trait GY.  In summary, for Dataset 1, the KL LF gave similar importance to all traits. In contrast, the Energy Score and MALF gave better performance in three of four traits (DTHD, DTMT and Height) leaving the GY trait as less important. In terms of the posterior variance, these differences were negligible. The LF approach should help breeders to decide based not only on the GEBV values per se of the parent to be selected, but also on the level of uncertainty according to the Bayesian paradigm.

Dataset 2
The phenotypic correlations among traits for Dataset 2 are shown in Table 1. Figure 2 displays a pair-wise plot for every combination of the posterior mean of the GEBV of the four traits. In this scenario, we are interested in increasing genetic gain in the positive direction for all traits. Therefore, the points representing selected individuals are in the top-right corner of the pair-wise plots for correlated traits. Those lines represented by red dots should be the ones breeders select to make crosses and move to the next improvement cycle, thereby assuring simultaneous increase in genetic gains (response to selection) in the four traits. Similar plots were obtained using the Energy Score ( Figure A5, Appendix 3) and MALF LFs ( Figure A6, Appendix 3), both of them with similar interpretation as in the KL LF.
The multi-trait values of the top five lines and the worst five lines ranked based on the three LFs, KL, Energy Score and MALF, for wheat Dataset 2 are shown in Table 3. In this case, wheat line 177 had the highest GEBV (5.97 tons/ha) for GY and was ranked in 1st place by KL LF; the other four top lines based on KL criteria also had high values of GEBV for TKW, GZnC and GFeC traits; the means of traits GY, TKW, GZnC and GFeC were 5.4, 33.2, 28.1 and 38.7, respectively, and all GEBVs for all traits had values greater than the mean. The other two LFs, Energy Score and MALF, selected different sets of wheat lines than those identified by KL; they also had lower values for GY (and higher for the other traits) than those wheat lines selected by the KL LF. As already mentioned, regarding grain yield, the top lines selected under Energy Score and MALF had lower grain yield than those selected by the KL criteria (lower than 5.5 tons/ha). However, the Energy Score and MALF criteria selected lines with higher values for traits  In summary, for this dataset, results show that on average the LFs performed similarly, in terms of the posterior mean and the posterior variance of the selected individuals.

Discussion
The main objective of this study was to present practical examples of how the GS via LF concepts and BDT of Villar-Herná ndez et al. (2018) can be used in candidate selection on two extensive datasets from which we wish to identify the 10% best performing individuals according to LF criteria, and advance them to the next generation of random mating. Based on this, we believe that the proposed multi-trait decision theory gives a clear interpretation of quantitative genetic and plant breeding methods because it selects the lines that maximize the response to selection of multi-traits by minimizing the LF (which, in turn, is a function of the heritability, selection differential and multi-trait phenotypic and genotypic covariance matrices).
To perform GS selection using a decision theory approach, the following steps are required: (1) training a multi-trait mixed regression model with genomic and phenotypic data, (2) approximating the posterior predictive distribution of each parental candidate using genomic information and the trained model, (3) calculating the PEL via MCMC approximation, and (4) selecting the best lines with minimum PEL according to the desired selection intensity.
The central part of the decision theory approach is the concept of an LF reflecting the breeder's preferences for the best performing parental candidates for selection. By minimizing the PEL, we maximize the genetic progress in all traits considered (maximization of the net genetic merit of individual lines), understanding that genetic progress is a compromise between increasing/decreasing (depending on the desired direction of improvement) BV (or GEBV) for all traits in successive selection cycles with the lowest possible loss of genetic diversity.
The use of LF methodology raised some natural questions; for example, how to reflect breeders' preference for high/low phenotypic values, or what the best values of y c (threshold vector) are. The proposed LF can be studied by incorporating inbreeding and co-ancestry information, and by extending the LF concept with non-Gaussian traits (Poisson, Binomial, etc.). The LFs facilitate the selection of multi-trait scoring in a single metric (a scalar) for each candidate line to be a parent of the next generation. The selection of multi-traits is important because the net genetic merit and the economic value usually depend more on some traits than on others. In general, genomic-enabled prediction multitrait models have become more useful than single-trait models because trait correlation information can be exploited to increase the prediction accuracy of correlated traits. In fact, multi-trait selection occurs even using a single-trait selection approach; however, if selection is based on a multi-trait regression model (parametrized as mixed or not) and the BDT framework, the researcher is selecting the best performing individuals for all the traits together. The BDT is the ideal approach for correlated traits.
By formulating the multi-trait selection in GS as a Bayesian decision problem, all uncertainty/risk components such as uncertainty in model parameters (mean and variance) and  (765) The five best lines and five worst lines based on posterior expected loss (PEL), under KL, Energy Score and MALF loss functions. Summaries are given for GEBVs for all traits in Dataset 1. Values of PEL were standardized to vary from 0 (minimum) to 1 (maximum). Values in () represent the rank of the line with respect to each loss function.
uncertainty in the effects of molecular markers are simultaneously considered in the LF when computing PEL. Furthermore, the LFs can be interpreted in terms of common concepts for geneticists such as heritability, and LFs are minimized when heritability increases, as was pointed out by Villar-Herná ndez et al.
. As a complement, LF concepts and selection through Bayesian decision is a well-established theory in statistics and its applications (Berger 1985, Ch4;Dawid 2007;Robert 2007, Ch2). Bayesian decision is a coherent way of selecting the "best parents/individuals" to advance in GS because the consequences of selection cannot be completely anticipated and uncertainty is contemplated in a unified framework. Expected loss theory assigns a quantitative loss to each possible decision, and then selects an action that minimizes the expected value of the resulting loss. This idea has proven to be a widely applicable description of rational behavior (Parmigiani and Inoue 2009). In this study, we conducted a selection of parental individuals using the LF approach in two extensive real wheat datasets comprising four traits. In the first dataset (Dataset 1), the genetic progress of three of four traits was in the decreasing direction, and in one trait the progress needed to increase; however, correlations between traits were positive but low, except for two traits. In the second dataset (Dataset 2), the progress needed was in the increasing direction and correlations between traits were, in general, negligible. Results from both datasets indicate that all LFs performed similarly, although in Dataset 1 there was a minor advantage of Energy Score and MALF functions over the KL (Energy Score and MALF performed better than KL loss in three of four traits, but sacrificed one trait: GY). This small difference in favor of Energy Score and MALF may be explained by the fact that in Dataset 1 there are traits whose improvement directions are in the positive and negative sense, or that positive and negative correlations between traits are present. These cases were not present in Dataset 2. In terms of posterior variance, the three LFs performed similarly in both datasets, i.e., sometimes LFs selected lines with high posterior variance, but other times they selected lines with low/medium posterior variance, but on average, percentual differences were insignificant. Additionally, it is important to note that Energy Score and MALF selected up to 96% of the same lines. In general, results from both datasets show that the lines with the highest GY values are not always those that will give less uncertainty and minimize the LF while maximizing the response to selection. The reason for this is that LFs weighted gains in all traits, not only GY; in Dataset 1, the best GY line ranked 5th under KL criteria and was not selected at all under Energy and MALF LF criteria. However, in Dataset 1, one line was selected as the best parent for the three LFs. In Dataset 2, the best GY line was the one with minimum KL value, but a line that was ranked 4th based on KL criteria with less GY than the one that ranked first was in fact selected as the best based on Energy and MALF criteria.
The use of Mean Squared Error, is in a sense, a quadratic distance that has its generalization in the LF named Continuous Ranked Probability Score (CRPS). The Energy Score LF, on the other hand, is a generalization of the CRPS LF in the multivariate context. Thus, on this respect the "mean squared error" is indeed included in the context of this study by means of the "Energy Score" LF.

Some differences between the multi-trait Bayesian decision and selection indices
Both the multi-trait BDT and the SI theory are indeed related because they are based on estimations of P and G. However, differences can be pointed out. First, the main difficulty when using SI in plant breeding is determining a vector of economic weights, whereas when using the Bayesian decision approach, the truncated y c values for the truncated distribution are easy to determine for increasing or/and decreasing traits. The BDT framework uses complete posterior distributions of each candidates and not only punctual estimates as SI does. The SI maximizes the response to selection based on the estimation of the coefficients of the economic weights that maximizes the correlation between the index and the net genetic merits and thus maximizes the selection response, whereas the BDT employs the divergence between distributions, which, as already mentioned, induces less penalty for those lines that have more density to the right of censoring values (y c ) or put more density to the left of censoring values (y c ). The three LFs used here, were derived and described based on univariate and multi-trait heritability, response to selection, and selection differential. Although economic weights are not necessary to implement selections indices, a relevant question would be how to implement these economical weight in the proposed LFs. Further research if required for comparing the multivariate LFs proposed in this study and the selection index theory and practice applied in breeding. Another comparison of the multivariate LF of the proposed method could be with a relevant methodology based on multi-objective optimization based approaches as in Akdemir et al. (2018).

Conclusions
In this research we conducted multi-trait selection of the best performing individuals using two extensive real wheat datasets with four traits under selection, through the BDT framework and LF concept. The main objective was to show a practical application and clarify some doubts and omissions in explanations not covered in the proposal of the authors in a preceding work (Villar-Herná ndez et al. 2018) where three LFs were explained as mechanisms for conducting multi-trait selection: the KL, the Energy Score, and the MALF. After applying the methodology, we found that for our datasets, all LFs performed similarly, selecting a subset of lines that guarantees the greatest genetic progress of all traits, although for one dataset we found a small advantage of Energy Score and MALF over the KL loss (i.e., in three of four traits in Dataset 1, the Energy Score and MALF reported greater gains than KL). In terms of genetic variance, the three LFs performed similarly in both datasets, i.e., sometimes LFs selected lines with high variance, but other times they selected lines with low/medium variance, but on average, the perceptual differences in variance with respect to the variance of the whole population for the three LFs were insignificant. Selection using LFs has the potential to be effective in multi-trait selection in GS given that it summarizes all relevant genetic concepts such as heritability, response to selection and the structure of dependence between traits (correlation).

Appendix 2: The Energy Score and MALF loss functions
The Energy Score is a multivariate loss function that comes from a well-known univariate loss function called Continuous Ranked Probability Score (CRPS) that is used in diverse applications (Hersbach 2000;Gneiting and Raftery 2007). The generalization of CRPS to multivariate scenarios was addressed by Szé kely and Rizzo (2013). For multi-trait GS purposes, the Energy Score is expressed as where | Á | denote the Euclidean norm, Y o and l s were previously defined, and Y o ' denotes an independent random vector with the same distribution as Y o , i.e, F Yo . Another multivariate loss function tested in this research is MALF (Komunjer and Owyang 2012), which is an asymmetric multivariate loss function. MALF comes from a very commonly used univariate loss function known in financial and actuarial literature as LinLin (Linear-Linear) Loss (Berk 2011). The MALF for multi-trait GS is expressed as where e ¼ l s À l 2 ð Þ 0 or, alternatively, expressed as e ¼ S À R ¼ S I À GP À1 ð Þ (note that e is a vector of length equal to the number of traits), s is a vector of length equal to the number of traits that controls the degree of asymmetry and has support (-1 < s < 1).
The term e ¼ e 1 þ e 2 þ Á Á Á þ e t ð Þ ¼ P je i j, i.e., the L 1 À norm, whereas e ¼ ðe 1 ; e 2 ; . . . ; e t Þ 0 is the vector of deviations for trait 1; 2; . . . ; t. Note that when ðI À GP À1 Þ ! 0, the MALF is minimized, similar to KL loss.   (765) Five best lines and five worst lines based on posterior expected loss (PEL), under KL, Energy Score and MALF. Summaries are given for posterior variances. Values of PEL were standardized to vary from 0 (minimum) to 1 (maximum). Values in () represent the rank of the line with respect to each loss function.