Bayesian multitrait kernel methods improve multienvironment genome-based prediction

Abstract When multitrait data are available, the preferred models are those that are able to account for correlations between phenotypic traits because when the degree of correlation is moderate or large, this increases the genomic prediction accuracy. For this reason, in this article, we explore Bayesian multitrait kernel methods for genomic prediction and we illustrate the power of these models with three-real datasets. The kernels under study were the linear, Gaussian, polynomial, and sigmoid kernels; they were compared with the conventional Ridge regression and GBLUP multitrait models. The results show that, in general, the Gaussian kernel method outperformed conventional Bayesian Ridge and GBLUP multitrait linear models by 2.2–17.45% (datasets 1–3) in terms of prediction performance based on the mean square error of prediction. This improvement in terms of prediction performance of the Bayesian multitrait kernel method can be attributed to the fact that the proposed model is able to capture nonlinear patterns more efficiently than linear multitrait models. However, not all kernels perform well in the datasets used for evaluation, which is why more than one kernel should be evaluated to be able to choose the best kernel.


Introduction
Genomic selection (GS) has been widely adopted because its predictive methodology enables the selection of candidates before phenotypes are available on all individuals (Meuwissen et al. 2001). Current research in GS includes the use of prediction models in GS that were successful in other fields, or the adaptation or development of specific models for GS (Montesinos-Ló pez et al. 2019b, 2019c, and models that couple mechanistic and statistical approaches (Tong et al. 2020). At the same time, breeders usually select multiple traits that are often genetically correlated, with correlations ranging from weak to strong. Often analyses of multitrait data are performed with uni-trait (UT) models, which assume zero genetic and residual covariances among these traits so that information from other traits is not used (Montesinos-Ló pez et al. 2019b) when obtaining expected breeding values of the evaluated individuals for the traits under study (Okeke et al. 2017). However, the optimal estimation process is composed of the combination of information from multiple traits and estimated breeding values using the multitrait (MT) models (van der Werf 1992;Ducrocq 1994;Okeke et al. 2017).
The use of UT models is very common, partly due to the lower number of existing MT models. However, the attraction of MT models continues growing, as pointed out by Mbebi et al. (2021). UT models are trained using only one dependent variable. However, these models are unable to capture the correlation between traits when only one dependent variable is used, that is, when the training process is done separately for each trait (Montesinos-Ló pez et al. 2019b), whereas MT models are trained using all the available traits simultaneously, which is why they are able to capture the correlation between traits. When this correlation between traits is moderate or large, most of the time the prediction performance of MT models is better than that of UT models (Montesinos-Ló pez et al. 2016, 2019b, 2019c. In MT models, even when the traits are unfavorably correlated (opposite signs), improvement of the prediction performance is expected as compared to UT models because the borrowing of information is possible (Neyhart et al. 2019). However, from a practical perspective, unfavorable correlations are common and complicate breeders' decisions (Neyhart et al. 2019). Opposite directions of such correlations imply an unfavorable response in one trait when selecting on another (Falconer and Mackay 1996); thus the underlying cause will impact the prospects of long-term improvement (Neyhart et al. 2019).
There is empirical evidence that MT models (frequentist and Bayesian) outperform UT when the traits are correlated, as reported by some authors such as Calus and Veerkamp (2011), Jia and Jannink (2012), Jiang et al. (2015), Montesinos-Ló pez et al. (2016), He et al. (2016), and Schulthess et al. (2018), who reported that, at least for some traits, MT models outperform UT models in terms of prediction accuracy. Schulthess et al. (2018) also reported that, compared to UT models, MT models improve parameter estimates. Small differences are observed between frequentist and Bayesian methods in terms of prediction performance.
However, it has also been reported that when the correlation between traits is low, MT models are not really advantageous (Montesinos-Ló pez et al. 2016, 2019, since MT models provide less benefits when the degree of relatedness between traits is low (Montesinos-Ló pez et al. 2016, 2019). An early study of multivariate genomic prediction (Jia and Jannink 2012) showed the usefulness of multivariate models, but large differences were only observed when variable selection methods (BayesA and BayesC) were applied to nonpolygenic traits (20 QTLs), and little difference was observed in polygenic traits.
The following seven advantages of MT models with regard to UT models have been pointed out by Montesinos-Ló pez et al. (2019b): (1) MT models represent complex relationships between traits more efficiently; (2) they exploit not only the correlation between lines, but also the correlation between traits; (3) they are much more interpretable than a series of UT models; (4) they are more computationally efficient (less time for training) than multiple UT models individually; (5) they improve the selection index because they allow more precise estimates of random effects of lines and genetic correlation between traits; (6) they can improve indirect selection because they increase the precision of genetic correlation parameter estimates between traits; and (7) they improve the power of hypothesis testing better than UT models.
Although MT models have many advantages over UT models, they require the estimation of more parameters (i.e., genetic and error covariances), which affects the prediction performance of the MT models as well as the accuracy of breeding value estimates. The larger the number of traits, the larger the required number of parameters that need to be estimated (Runcie et al. 2021). Also, the more complex the model is and the larger the number of traits included, the greater chances there are of facing convergence problems in the analysis (Runcie et al. 2021). This means that MT models require more data to be able to accurately estimate the additional parameters (Okeke et al. 2017). The optimum training size depends upon the effective population size and the available genetic diversity within the population (Arojju et al. 2020). In general, results have shown that Bayesian MT methods have less issues related to convergence problems than frequentist MT methods (Montesinos-Ló pez et al. 2019b).
However, despite these seven advantages of MT models, most of them are unable to capture complex nonlinear patterns of the inputs. For example, MT models with a linear predictor are unable to capture these complex nonlinear patterns (Cuevas et al. 2016(Cuevas et al. , 2017; however, it is quite straightforward to use the machinery of linear models for nonlinear tasks using Reproducing Kernel Hilbert Spaces (RKHS) methods (Gianola and van Kaam 2008). The use of RKHS methods for UT analysis is very common in GS (Cuevas et al. 2016(Cuevas et al. , 2017Crawford et al. 2018). For example, Long et al. (2010) reported that RKHS methods outperformed linear models in body weight of broiler chickens. Crossa et al. (2010) reported better prediction performance of RKHS methods with regard to linear Bayesian Lasso regression in wheat. In maize and wheat data, Cuevas et al. (2016Cuevas et al. ( , 2017Cuevas et al. ( , 2018Cuevas et al. ( , 2020 reported a greater performance of RKHS with Gaussian kernels over linear GBLUP for several UT genomic predictions incorporating genomic Â environment interaction. Cuevas et al. (2019) also reported that nonlinear kernel methods (Gaussian kernel and arc-cosine kernel) outperformed linear kernel methods in terms of prediction performance using markers and near infrared spectroscopy data in the predictor pedigree.
The basic idea of RKHS methods is to project the original independent variables given in a finite dimensional vector space into an infinite-dimensional Hilbert space (Gianola and van Kaam 2008). Kernel methods transform the independent variables (inputs) using a kernel function, and then the transformed inputs can be used in conventional machine learning techniques at a low computational cost and repeatedly, with better results in terms of prediction performance (Shawe-Taylor and Cristianini 2004). RKHS methods based on implicit transformations have become very popular in analyses of nonlinear patterns in datasets from various fields of study. Kernel methods obtain measures of similarity between objects that do not have natural vector representation (Montesinos-Ló pez et al. 2021).
Due to its many attractive characteristics, the mixed-model framework under a frequentist approach is still very popular in GS for the implementation of MT models. However, the adoption of the Bayesian paradigm in plant breeding continues to grow due to the great computational advancements and new methodological applications and elucidations. Bayesian MT models offer some of the following advantages mentioned by Montesinos-Ló pez et al. (2019b): (1) they allow prior information to be incorporated; (2) they do not need good starting values to estimate parameters of interest such as the restricted maximum likelihood; (3) they increase the precision of parameter estimates (smaller standard errors); (4) conclusions can be drawn about the correlations between the dependent variables, notably, the extent to which the correlations depend on the individual and on the group level; (5) testing whether the effect of an explanatory variable on dependent variable Y1 is larger than its effect on Y2, when Y1 and Y2 data were observed (totally or partially) in the same individuals, is possible only by means of a multivariate analysis; (6) when attempting to carry out a single test of the joint effect of an explanatory variable on several dependent variables, a multivariate analysis is also required; such a single test can be useful, e.g., to avoid the danger of chance capitalization, which is inherent to carry out a separate test for each dependent variable; and (7) it does not have strong identifiability problems. In general, the MT Bayesian approach has the advantage of being more parsimonious and providing a more informative and powerful analysis. However, Bayesian MT analysis is computationally more demanding than univariate analysis, and its implementation is therefore many times impractical.
Furthermore, the implementation of conventional MT (frequentist and Bayesian) models is, in general, computationally demanding (Runcie et al. 2021). The fragility of these methods is due to the number of variance-covariance parameters that must be estimated, which increases quadratically with the number of traits (Runcie et al. 2021). The computational demands increase even more dramatically, from cubically to quantically, with the number of traits (Zhou and Stephens 2014) because most algorithms require repeated inversion of large covariance matrices. These matrix operations dominate the time required to fit conventional MT models, leading to models that take days, weeks, or even years to converge (Runcie et al. 2021).
In this study, we propose Bayesian kernel methods for the multitrait genome-enabled prediction of multienvironment trials. We applied the proposed methods to three extensive wheat multitrait multienvironment trial datasets and compared the prediction performance using four kernels-linear (GBLUP), Gaussian kernel (GK), polynomial kernel (PK) and sigmoid kernel (SK)-and conventional Bayesian multitrait Ridge Regression (BRR) under two scenarios: Scenario 1, in which all traits are missing in the testing set (MT), and Scenario 2, in which only a fraction of the traits are missing in the testing set (MT_P). We also evaluated the prediction performance with and without including genotypeÂ environment interaction (G Â E) under a multitrait framework. Finally, we also provide the R code to implement these methods in conventional Bayesian multitrait software.

Bayesian multitrait kernel model
This model is given in (1) as: where Y is the matrix of phenotypic response variables of order n Â n T ; with n ¼ JI and J and I denotes the number of lines and environments respectively. Y is ordered first by environments and then by lines, n T denotes the number of traits, 1 n is a vector of ones of length n, l T is a vector of intercepts for each trait of length n T , T denotes the transpose of a vector or matrix, that is, l ¼ l 1 ; . . . ; l nT ½ T ; X E is the design matrix of environments of order n Â I, b E is the matrix of beta coefficients for environments with a dimension of I Â n T , Z L is the design matrix of lines of order n Â J, g is the matrix of random effects of lines of order J Â n T distributed as g $ MN JÂnT 0; K l ; R T Þ , that is, with a matrix-variate normal distribution with parameters M ¼ 0, U ¼ K l , and V ¼ R T , K l is the lth type of kernel matrix built with marker data (equivalent to a genomic relationship matrix) of order J Â J that captures linear or nonlinear relationships (l ¼ linear; Gaussian; polynomial and sigmoidÞ and R T is the variance-covariance matrix of traits of order n T Â n T . Note that Z L g are the BLUPs of lines of the n T traits, but repeated in the I environments. Z EL is the design matrix of the genotype Â environment interaction of order n Â JI, gE is the matrix of genotype Â environment interaction random effects distributed as gE $ MN JIÂnT 0; K l R E ; R T Þ , where R E is a diagonal variance-covariance matrix of environments of order I Â I, and K l R E is the Kronecker product of the lth type of kernel matrix of lines and the environmental relationship matrix. Furthermore, the term Z EL gE contains the BLUPs corresponding to the genotype Â environment interaction terms of the n T traits. is the residual matrix of dimension n Â n T distributed as $ MN nÂnT 0; I IJ ; R ð Þ , where R is the residual variance-covariance matrix of order n T Â n T . The criteria for using these four kernels (linear; Gaussian; polynomial and sigmoidÞ were that these are very popular kernels used in statistical science and two of them in genomic prediction (linear and Gaussian).

The kernel methods
The linear kernel (LK) was computed as K -Taylor and Cristianini 2004), since x T i and x T j are any two rows of the scaled matrix of markers (X of order J Â p) divided by the square root of the total number of markers (p) then this is indeed the linear kernel relationship matrix proposed by Van Raden (2008) and called Genomic Best Linear Unbiased Predictor (GBLUP). The polynomial kernel (PK) was computed as where a ¼ 0 is a real scalar, c ¼1 and d ¼ 3 is a positive integer (Shawe-Taylor and Cristianini 2004). The sigmoidal kernel (SK) was computed as K (Shawe-Taylor and Cristianini 2004). The Gaussian kernel (GK), also known as the radial basis function kernel, was computed as K where c is a positive real scalar (Shawe-Taylor and Cristianini 2004) and in this application, the parameter c used was c ¼ 1, assuming that the markers were scaled.

Computational implementation of the Bayesian multitrait kernel model
Note that when R T , R E , and R are diagonal matrices, model (1) is equivalent to separately fitting a univariate linear model to each trait. Also, when a linear kernel for K l is used in model (1), the model is equivalent to a conventional multitrait GBLUP model. The Bayesian multitrait kernel model (1) can be implemented in the BGLR package of de los Campos and Pérez- Rodríguez (2014). The github version of the BGLR R library can be accessed at https:// github.com/gdlc/BGLR-R and can be installed directly in the R console by running the following commands: install.packages('devtools'); library(devtools); install_github (https://github.com/gdlc/BGLR-R). First we need to have computed: X E denotes the design matrix of environments, Z L denotes the design matrix of lines, K l any of the 4 kernels described above (l ¼ linear; Gaussian; polynomial and sigmoidÞ, KL This implementation of model (1) can be carried out with this version of the BGLR package as follows: The first argument in the multitrait function is the response variable that is a phenotype matrix, in which each row corresponds to the measurements of n T traits in each individual. The second argument is a list predictor in which the first sub-list specifies the design matrix and prior model to the fixed effects part of the predictor in model (1), while the second sub-list specifies the parameters of the distribution of random genetic effects (g), where the KL is the expanded genomic relationship matrix specified, and which accounts for the similarity between individuals based on marker information. The third sub-list specifies the parameters of the distribution of random genotype by environment effects of gE, where the KLE is the genomic relationship matrix specified, and which accounts for the similarity between individuals. df0 ¼ v T and S0 ¼ S T are the degrees of freedom parameter (v T ) and the scale matrix parameter (S T ) of the inverse Wishart prior distribution for R T , respectively. In the third argument (resCOV), S0 and df0 are the Scale matrix parameter (S R ) and the degree of freedom parameter (v R ) of the inverse Wishart prior distribution for R. The last two arguments are the required number of iterations (nI) and the burn-in period (nb) to run the Gibbs sampler.
Datasets 1-3: elite wheat yield trial years 2013-2014, 2014-2015, and 2015-2016 These three datasets were collected by the Global Wheat Program (GWP) of the International Maize and Wheat Improvement Center (CIMMYT) and belong to elite yield trials (EYT) established in four different cropping seasons with four or five environments each. The lines involved in each of the environments of the same year are the same, but those in different years are different lines. EYT dataset 1 was sown in 2013-2014 and contains 767 lines, EYT dataset 2 was established in 2014-2015 and contains 775 lines and EYT dataset 3 was cultivated in 2015-2016 and contains 964 lines. The experimental design used was an alpha-lattice design and the lines were sown in 39 trials, each covering 28 lines and two checks in six blocks with three replications. In each dataset, several traits were available for some environments and lines. In this study we included four traits that were measured for each line in each environment: days to heading (DTHD, number of days from germination to 50% spike emergence), days to maturity (DTMT, number of days from germination to 50% physiological maturity or the loss of the green color in 50% of the spikes), plant height, and grain yield (GY). Full details of the experimental design and how the BLUEs were computed are given in Juliana et al. (2018).
Genome-wide markers for the 2506 (667 þ 775 þ 964) lines in the three datasets were obtained using genotyping-bysequencing (GBS; Elshire et al. 2011;Poland et al. 2012) at Kansas State University using an Illumina HiSeq2500. After filtering, 2038 markers were obtained from an initial set of 34,900 markers. The imputation of missing markers data was carried out using LinkImpute (Money et al. 2015) and implemented in TASSEL (Bradbury et al. 2007), version 5. Lines that had over 50% of missing data were removed and 2506 lines were used in this study (767 lines in the first dataset, 775 lines in the second dataset, and 964 lines in the third dataset). Also expected is a high level of relatedness given by pedigree or kinship between lines within a year of testing and also across years of testing due to the nature of the lines under study.

Evaluation of prediction accuracy with random cross-validation
The prediction accuracy of the Bayesian multitrait kernel model was evaluated with cross-validation (CV). A fivefold CV was implemented and the original dataset was partitioned into five subsamples of equal size, and each time, four of them were used for training and the remaining one for testing. In fivefold CV, one observation cannot appear in more than onefold. In the design, some lines can be evaluated in some, but not all, target environments, which mimics a prediction problem faced by breeders in incomplete field trials. Our validation strategy is exactly the same as the strategy denoted as CV2 that was proposed and implemented by Jarquín et al. (2014), in which a certain portion of test lines (genotypes) in a certain portion of test environments is predicted, since some test lines that were evaluated in some test environments are assumed to be missing in others.
We used the mean square error of prediction where y i is the observed value of the ith observation,f ðx i Þ is the prediction thatf gives to the ith observation and T is the number of observations in the testing set] to evaluate the prediction performance, since we are working with continuous variables and MSE was calculated from each environment in each trait for each of the testing sets. The formula given above was used to compute the MSE error in each fold, but the average of all folds was reported as a measure of genome-based prediction performance. The lower the average of MSE, the better the prediction performance. All the analyses were carried out using the R statistical software (R Core Team 2020).

Results
The results are given in two sections that correspond to datasets 1 and 2. In each dataset, the genome-based prediction performance was assessed without including G Â E interactions and including G Â E interactions. Both cases are provided under the following scenarios: (1) when all the traits in the testing set are predicted (standard MT method) and (2) when only a fraction of the traits in the testing sets are predicted (MT_P). Two traits were considered: DTHD and DTMT. For simplicity and clarity, results from dataset 3 are provided in Appendix A, where genome-based predictions measured under the MSE of prediction without G Â E interaction and with G Â E interactions are described under the two scenarios, MT and MT_P.
Results are presented for each trait including (I) and ignoring (WI) G Â E interaction for each of the scenarios, MT and MT_P in the form of tables and figures for each environment (of each of the datasets) and across environments.

DTHD (without G 3 E interaction, WI)
We first compared the prediction performance for trait DTHD in terms of MSE for the methods ( Figure 1A, WI, and Table 1) without G Â E interaction under conventional multitrait Bayesian Ridge Regression (BRR) and four types of kernels [linear GBLUP, Gaussian (GK), polynomial (PK), and sigmoid (SK)] when all traits in the testing set are predicted (MT) and when only a fraction of the traits is predicted (MT_P). In Figure 1A, WI, and Table 1 under both scenarios (MT and MT_P), the best performance for most of the four environments was observed under the multitrait GK and the worst was found under the multitrait SK for both MT and MT_P scenarios. In environment EHT under scenario MT_P, the predictions were considerably better than under scenario MT, while in environment LHT, scenario MT was slightly better than scenario MT_P (Table 1 and Figure 1A, WI).  Across environments, multitrait GK was always better than the other kernels for MT and MT_P (Figure 2A, WI, and Table 2). For the MT predictions, the GK outperformed the BRR, GBLUP, PK and SK by 7.012%, 6.76%, 0.928%, and 48.8%, respectively, while across environments for the MT_P predictions, the GK outperformed the BRR, GBLUP, PK, and SK by 6.17%, 6.19%, 1.32%, and 44.64%, respectively. Under scenario 2, MT_P gave a slightly better genome-based prediction than under scenario MT.

DTHD (G 3 E interaction, I)
Taking into account the G Â E interaction term, we also see that the worst performance was observed under the SK under both scenarios (MT and MT_P; Figure 1B, I, and Table 1). The best performance was observed under the GK under MT_P in environments Bed5IR and EHT, and BRR and GBLUP in environments Flat5IR and LHT. Large differences were not observed between the predictions without G Â E interaction ( Figure 1A, WI) and with G Â E interaction ( Figure 1B, I).

DTMT (without G 3 E, WI)
The prediction performance for trait DTMT is provided in terms of MSE for the five kernel methods ( Figure 3A, WI, and Table 1) under conventional multitrait Ridge regression (BRR) and four types of kernels (GBLUP, GK, PK, and SK) under the same two scenarios (MT and MT_P). In Figure 3A, WI, and Table 1, it is observed that ignoring the G Â E interaction term, under both scenarios (MT and MT_P), that the worst performance was for SK, while the best performance was the GK method for all environments except MT_P in Flat5IR (MSE ¼ 9.66). The SK was  considerably worse than the other methods under both scenarios ( Figure 3A, WI). In environment LHT, scenario MT was slightly better than MT_P ( Figure 3A, WI, and Table 1). Across environments, under scenario MT predictions, the GK was better than BRR, GBLUP, PK, and SK by 5.23, 5.06, 0.57 and 42.34%, respectively, while under MT_P predictions, the GK outperformed the BRR, GBLUP, PK, and SK by 5.10, 5.23, 1.02 and 40.14%, respectively ( Figure 2B, WI, and Table 2). The genomebased predictions under MT_P were better than under MT ( Figure 2B, WI, and Table 2).

DTMT (G 3 E, I)
Considering the G Â E interaction term, we also see that the worst performance was observed under the SK under both scenarios (MT and MT_P; Figure 3B, I, and Table 1). The best performance was observed under the GK in environments Bed5IR and EHT, and under BRR and GBLUP in environments Flat5IR and LHT. Large differences were not observed between the predictions without G Â E interaction ( Figure 3A, WI) and with G Â E interaction ( Figure 3B, I).
For trait DTMT across environment analyses, taking the G Â E interaction into account, under MT and MT_P, the worst performance was observed under the SK, and in general, scenario MT_P was better than MT ( Figure 2B, I, and Table 2). Under MT predictions across environments, the GK was superior in genomicenabled prediction accuracy than BRR, GBLUP, PK, and SK by 9.90%, 8.43%, 4.76%, and 71.37%, respectively, whereas for MT_P, the GK was better than BRR, GBLUP, PK and SK by 9.98%, 8.31%, 3.97%, and 65.25%, respectively ( Figure 2B, I, and Table 2). As for trait DHTD, there was a slight consistent increase in genomebased prediction accuracy when including G Â E (Figure 2B, I) compared to when ignoring G Â E (Figure 2B, WI) and for scenario 2 MT_P over scenario MT ( Table 2).

Summary of results for dataset 1
The nonlinear multitrait Gaussian kernel showed the best genome-based prediction accuracies in most of the environments for both traits, DTHD and DTMT, whereas the sigmoidal kernel (SK) gave the worst prediction. Consistently for the 4 kernel methods linear GBLUP, GK, PK, and SK, the model including G Â E gave lower MSE than models ignoring G Â E, whereas the scenario that included all the traits (MT) gave a slightly worse prediction accuracy than the scenario including only a fraction of the traits in the testing sets to be predicted (MT_P). Although these patterns are expressed in most (but not all) of the environments, the across environments analyses of Table 2 and Figure 2 clearly displayed these conclusions.

DTHD (without G 3 E, WI)
We first compared the prediction performance of the five methods ( Figure 4A, WI, and Table 3) under MT and MT_P scenarios when ignoring G Â E (WI). The best performance was observed under the GK, and the worst, under the SK. The SK was also considerably worse than the other methods under both MT and MT_P ( Figure 4A, WI). Figure 4A, WI, and Table 3 also show that the worst prediction under both MT and MT_P scenarios was in environment EHT, whereas the best prediction was in environment Bed2IR. In all environments, MT_P slightly outperformed MT ( Figure 4A, WI).

DTHD (G 3 E, I)
When the G Â E interaction ( Figure 4B, I, and Table 3) term was taken into account for trait DTHD, the best prediction performance under MT occurred under the GK, PK, and GBLUP kernels,  Average mean squared error (MSE) of prediction for five multitrait multienvironment model-methods: BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased predictor; GK, Gaussian kernel; PK, polynomial kernel; SK, sigmoidal kernel without G Â E (WI) and with G Â E (I) for two scenarios (MT and MT_P), four environments (Bed2IR, Bed5IR, EHT, Flat5IR, LHT), and two traits (DTHD, days to heading and DTMT, and days to maturity). Boldface indicates model-method with the lowest MSE for the environment.
but we found differences in the prediction performance of the five methods between environments, since the worst predictions were observed in environment EHT and the best in environment LHT. For this trait, the worst predictions were observed for SK. Under MT_P, the best model was GK (with GBLUP being the best only for Flat5IR). Sigmoid kernel SK considering the G Â E interaction term was also the worst under both scenarios. However, the best performance was observed in environments LHT and EHT under the GK, in environments Bed5IR and Bed2IR with PK and in Flat5IR under GBLUP. No large differences were found in predictions without ( Figure 4A) and with ( Figure 4B) the G Â E interaction term.
Across environments, MT_P was slightly better than the MT scenario ( Figure 5A, I, and Table 4). For MT across environments, the GK method had better prediction accuracy than BRR, GBLUP, PK, and SK by 16.67%, 15.95%, 0.716%, and 110.91%, respectively, while for MT_P predictions, the GK method outperformed the BRR, GBLUP, PK and SK by 17.45%, 16.97%, 2.87%, and 109.22%, respectively. As previously found, results including G Â E improved the genome-based prediction accuracy as compared to ignoring the interaction term, and MT_P had better prediction accuracy than MT. Figure 6A, WI, and Table 3 show the results of the five methods under both scenarios in terms of MSE without the G Â E interaction term for trait DTMT. Results show that the worst performance under both scenarios was observed using the sigmoid kernel ( Figure 6A, WI). In general, under MT and MT_P, GK was slightly better than the other four methods. In this trait we found no differences between MT and MT_P ( Figure 6A, WI, and Table 3).

DTMT (G 3 E, I)
For trait DTMT, when the G Â E interaction ( Figure 6B, I, and Table 3) term was taken into account, the best prediction performance under both MT and MT_P was carried out under the GK, but we found differences in the prediction performance of the five methods between environments, since the worst predictions  Average mean squared error (MSE) prediction, across environments for five model-methods: BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased predictor; GK, Gaussian kernel; PK, polynomial kernel; SK, sigmoidal kernel without G Â E (WI) and with G Â E (I) for two scenarios (MT and MT_P) and two traits (DTHD, days to heading and DTMT, days to maturity). Boldface indicates model-method with the lowest MSE for the scenario.
were observed in environment EHT and the best, in environment LHT. For this trait, the worst predictions observed were for SK. Across environments, scenario MT_P slightly outperformed MT ( Figure 5B, I, and Table 4). In all environments, MT_P slightly outperformed MT ( Figure 5B, I, and Table 4). Sigmoid kernel SK, taking into account the G Â E interaction term, was also the worst under both scenarios. Under scenario MT predictions across environments, the GK was better than BRR, GBLUP, PK, and SK by 11.05%, 10.94%, 3.87%, and 77.14%, respectively, while under MT_P predictions, the GK method overcame the BRR, GBLUP, PK, and SK by 10.07%, 10.44%, 4.35% and 72.65%, respectively ( Figure 5B, I, and Table 4).

Summary of results for dataset 2
Results for dataset 2 were similar to those obtained for dataset 1. The nonlinear multitrait Gaussian kernel had the best genomebased prediction accuracies for most of the environments for both traits (DTHD and DTMT), while the sigmoidal kernel (SK) produced the worse prediction. For the four kernels, the model including G Â E and the method (scenario) including MT_P gave better predictions than the model ignoring G Â E and/or including all the traits (MT). These patterns are shown in Table 4 and Figure 5.

Dataset 3 (EYT 2014-2015)
Details of the results are given in Figures A1A and B, A2A and B, A3A and B and Tables A1 and A2. In dataset 3 under the MT scenario, the models with the G Â E interaction outperformed the models that did not include the G Â E interaction by 20.30%(BRR), 20.42% (GBLUP), 32.77% (GK), 29.8% (PK), and À0.1% (SK), while under the MT_P, the outperformance was 18.82 (BRR), 19.40 (GBLUP), 31.82 (GK), 29.27 (PK) and À0.6% (SK). In general, the GK was the best genome-based prediction method, together with the model that included the G Â E interaction. Further details of the results are given in Appendix A.

With and without G 3 E interaction
In general terms, we observed that the best predictions were observed when the G Â E interaction term was taken into account, although the superiority with regard to ignoring the G Â E interaction went from slight to large. Dataset 1 across environments and traits under the MT scenario with G Â E interaction outperformed the models without G Â E interaction by 17.51% (BRR), 18.95% (GBLUP), 21.80% (GK), 15.81% (PK), and 1.4% (SK), while under the MT_P scenario, the outperformance of the models with G Â E interaction over ignoring the G Â E interaction was 14.54% (BRR), 16.47% (GBLUP), 19.30% (GK), 14.91% (PK), and 0.9% (SK). In dataset 2 across environments and traits, the outperformance of the models with the G Â E interaction with regard to those that ignored the G Â E interaction was 15.83% (BRR), 16.14% (GBLUP), 27.05% (GK), 24.86% (PK), and À1.2% (SK) under scenario MT, while under scenario MT_P, the superiority was 16.20% (BRR), 16.27% (GBLUP), 27.35% (GK), 24.06% (PK), and À0.6% (SK). Finally, in dataset 3 under the MT scenario, the models with the G Â E interaction outperformed the models without the G Â E interaction by 20.30% (BRR), 20.42% (GBLUP), 32.77% (GK), 29.8% (PK), and À0.1% (SK), while under the MT_P, the outperformance was by 18.82% (BRR), 19.40% (GBLUP), 31.82% (GK), 29.27% (PK), and À0.6% (SK). Note that we only report the results of traits DTHD and DTMT since we did not observe an improvement of the MT model with regard to the UT model for predicting the other two traits (plant height and GY). This could be due to the fact that these two maturity traits (DTHD and DTMT) are highly genetically correlated (with genetic correlations of 0.985, 0.974, and 0.983 in datasets 1, 2, and 3, respectively) and also demonstrate relatively little genotype Â environment interaction. Due to the high genetic correlation between these traits, the relative advantage of multivariate approaches will be greater than if traits with lower genetic correlations were used (e.g., plant height and GY). The fact that we did not observe an increase in prediction performance in traits plant height and GY is not rare since this model, as pointed out by one reviewer, should work only for some traits because each trait has a different structure. It is important to point out that our results are in agreement (in terms of the outperformance with regard to no kernel methods) with those obtained in the context of univariate kernel methods (Cuevas et al. 2016(Cuevas et al. , 2017(Cuevas et al. , 2018(Cuevas et al. , 2019.

Kernel differences
Under scenarios with and without G Â E interaction, the kernel that generally provided the best performance was the GK, which outperformed the other kernels between 0.258% and 110.91%, while the worst performance was observed under the SK kernel. In part these results can be due to a lack of an efficient tuning strategy for the hyperparameters of each kernel. They may also be due to the type of nonlinear patterns of the datasets, the size of the data, and the nature of the kernel function that implements the SK kernel. Also, in general, the GK outperformed the popular GBLUP and BRR models between 2.22% and 17.45%. Even though this superiority is not considerably large, it is a small further step toward improving the GS methodology. We did not apply a significant test to prove that there are significant differences in the performance between the GK and conventional methods (GBLUP and BRR), but we observed the plots. However, since there is overlap of the confidence intervals between the conventional methods (GBLUP and BRR) and GK, we can say that the differences observed only in some cases are significant. In the three datasets evaluated, the GK was always the best genomebased predicted kernel.

General issues
Kernel methods are powerful tools for the improvement of prediction performance, since they help to capture complex patterns in the data. They also offer flexibility, since they can be implemented in a two-step process using conventional statistical machine learning algorithms, where in the first stage, the kernels are computed, and in the second stage, those kernels are used in conventional linear algorithms. However, although there is empirical evidence that these methods improve the prediction performance in GS under a univariate prediction framework, there are still no generalizations and applications for the multitrait framework. For example, the models/methods used in this study, which when applied to multitrait multienvironment data on the three datasets show consistent improvement in terms of prediction performance mainly with the GK kernel.
Due to the above, in this research we proposed a Bayesian multitrait kernel method to capture nonlinear patterns in the input data under a multitrait framework. The method uses a conventional Bayesian multitrait model that instead of using a linear kernel, allows many types of kernels such as polynomial, Gaussian, sigmoid, etc. Although in the present paper only four kernels were evaluated including the linear kernel, other types of kernels can be considered. This is possible because the implementation of the Bayesian multitrait kernel method is a two-step process in which the kernel is computed in the first stage, and in the second stage the computed kernel replaces the linear kernel of the Bayesian multitrait model. Also, for this reason we do not expect significant differences in the time of implementation between the proposed kernels and the conventional GBLUP model since the number of parameters to estimate between the proposed kernel methods and the GBLUP method are the same.
Our results show that implementing the Bayesian multitrait kernel model improves the prediction performance with regard to the conventional linear multitrait kernel methods, since the Gaussian kernel outperformed conventional methods (Ridge regression and GBLUP) between 5.06% and 10.35% (in dataset 1), between 2.53% and 17.45% (dataset 2) and between 2.22% and 16.39% (dataset 3), and due to the fact that in the three datasets, the proposed method outperformed conventional methods. The proposed method can be implemented with conventional mixed multitrait models because a two-step process is required. It is important to point out that we do not expect the proposed method to outperform the conventional multitrait model in all datasets, since not all datasets are expected to have complex patterns in their input, although in all those datasets with complex nonlinear patterns in the input, the proposed method is expected to be able to improve the prediction performance. The small superiority of the MT model over the UT model could be due, in part, to the small number of markers and not to the strong correlation of the traits. These results, although not strong for improving GS genome-based prediction accuracy, represent a step forward in the right direction.
Another advantage of the Bayesian multitrait kernel methods is that they can significantly reduce the computational resources needed in comparison with Ridge regression multitrait models, since instead of directly using the inputs (independent variables), a transformed input is used that usually has less dimension than the dimension of the number of inputs. However, as with all kernel methods, due to this transformation of the input, the estimates of the beta coefficients are not interpretable as in conventional regression methods, and for this reason, these methods do not help to further understand the complex relationship between input and output, and as such, it is important to avoid false expectations about these methods (Montesinos-Ló pez et al. 2021) in terms of interpretability. Finally, as one reviewer pointed out, the successful implementation of the multitrait kernel method proposed here is straightforward when the dataset is balanced in the response variable (no missing data) and in the environments, but more complicated when the data are not balanced, but still the method works by only taking care of the imbalance situation. Also, it is important to point out that the phenotypic correlation between environments did not negatively impact the prediction performance of the proposed method since all the phenotypic correlations between environments are positive (Cuevas et al. 2016) for all traits (see Appendix C).
Some limitations of the proposed Bayesian multitrait kernel methods are: (1) it is more difficult to tune the hyperparameters of the kernels than in UT kernel methods, (2) that negative phenotypic correlations between environments can negatively affect the prediction performance, as stated by Cuevas et al. (2016), and (3) as in UT kernel methods, the beta coefficients resulting from multitrait kernel methods are not interpretable like in conventional linear regression methods, but there is ongoing research to allow variable selection with kernel methods (Crawford et al. 2018).

Conclusions
The proposed Bayesian multitrait kernel method is an attractive and novel approach to capture complex nonlinear patterns in multitrait data that helps take advantage of the correlation between traits. We found that the proposed MT kernel method outperformed the prediction performance of conventional Bayesian multitrait models. However, out of the four nonlinear kernels evaluated, we found that the best performance was obtained using the Gaussian kernel, and the worst, using the sigmoid kernel. In addition, we pointed out that the proposed methods can be implemented in conventional software for Bayesian multitrait models but require a two-step process. In the first step, the kernels are built, and in the second step, those kernels replace the genomic relationship matrices in the multitrait models. Additionally, we provided the data and the R code used in such a way that other scientists can implement this model with their own data.

Data availability
Phenotypic and genomic data for the three datasets are available at the following link https://hdl.handle.net/11529/10548629.  Table A1 show the results for the five methods for trait DTHD without including G Â E for both scenarios (MT and MT_P). Under both scenarios, we notice that in terms of MSE, the best performance was observed with the Gaussian kernel (GK) and the worst, with the sigmoid kernel (SK). With the exception of the sigmoid kernel (SK), the other four methods were slightly worse than the GK under both scenarios ( Figure A1A, WI). For environments FlatDrip, MT outperformed MT_P by a sizeable amount. Under MT scenario predictions across environments, the GK method outperformed the BRR, GBLUP, PK and SK by 4.95, 4.93%, 0.34%, and 72.68%, respectively, while under MT_P scenario, predictions under the GK method outperformed the BRR, GBLUP, PK and SK by 5.13%, 5.18%, 0.46%, and 72.03%, respectively ( Figure A2A, WI, and Table A2). Scenario MT_P gave slightly and consistent increase in prediction accuracy over the MT scenario.

DTHD (G 3 E, I)
Considering the model with the G Â E interaction term, the sigmoid kernel (SK) was the worst under both MT and MT_P scenarios. The Gaussian kernel (GK) was the best under both scenarios in all environments ( Figure A1B, I, and Table A1).
Under MT predictions across environments, the GK method outperformed the BRR, GBLUP, PK and SK by 15.58, 14.60, 3.41 and 137.54%, respectively, while under MT_P predictions, the GK method outperformed the BRR, GBLUP, PK and SK by 16.39, 15.35, 3.25 and 135.61%, respectively ( Figure A2A, I, and Table A2). Including the term G Â E, the genome-based predictions accuracy increases as shown in Figure A2A, WI and I, and Table A2.

DTMT (without G 3 E, WI)
For trait DTMT, Figure A3A, WI, and Table A1 show the results in terms of MSE of five methods under both scenarios. Here, also the sigmoid kernel (SK) was the worst in terms of prediction performance without the interaction term under MT and MT_P. We observed that the GK was slightly better than the other four methods and considerably better than the SK. Between MT and MT_P, only small differences were observed. The worst prediction ( Figure A3A, WI) under both scenarios was observed in environment FlatDrip and Flat5IR and the best in environment Bed2IR for M and MT_P.

DTMT (G 3 E, I)
With the G Â E interaction ( Figure A3B and Table A1), the best prediction performances were found under the GK for trait DTMT, but we did not find large differences in the prediction performance of the other four methods. However, between environments, we found significant differences and the worst predictions were observed in environment FlatDrip and the best in environment Bed2IR. In all methods and scenarios, the worst predictions were observed under the SK method.
Also, the predictions under MT_P across environments were slightly better than under MT Table A2 and clearly superior in environment FlatDrip ( Figure A3B). Under scenario MT predictions across environments, the GK method outperformed the BRR, Figure A1 Dataset 3-DTHD. Prediction performance in terms of mean square error of prediction (MSE) for five methods (BRR, GBLUP, GK, PK, and SK) (A) without G Â E interaction (WI) and (B) including G Â E interaction (I) for five environments (Bed2IR, Bed5IR, EHT, Flat5IR, and LHT) and two scenarios (MT and MT_P).
Implementation of the models using BGLR load('Pheno.RData',verbose¼TRUE) ### Pheno contains at least ###four columns Lines, Environment (Env) and at least ##to columns of the response variables (Y). #########Compute the design matrix of lines and environments nt¼ ncol(Pheno)-2 ####number of traits under study XE ¼ model.matrix($0þas.factor(Env),data¼Pheno) Average mean squared error (MSE) prediction, across environments for five model-methods: BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased predictor; GK, Gaussian kernel; PK, polynomial kernel; SK, sigmoidal kernel without G Â E (WI) and with G Â E (I) for two scenarios (MT and MT_P) and two traits (DTHD, days to heading and DTMT, days to maturity). Boldface indicates model-method with the lowest MSE for the scenario.  Appendix C: Phenotypic correlations of the three datasets