Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment–trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.

Plant breeding programs aim to improve multiple traits and develop new, higher yielding varieties that are disease and drought resistant and regionally adapted to different environments and growing conditions. However, to reach that challenging goal, plant breeders have been looking for new tools to improve the selection of candidate genotypes in early stages. Genomic selection (GS) proposed by Meuwissen et al. (2001) is one of the most powerful tools used by breeders and is now revolutionizing plant breeding. One of the fundamental features of GS is the use of high-density markers. Rather than seeking to identify individual loci that are significantly associated with a trait, GS uses all marker data simultaneously as performance predictors of a line, and this consequently delivers predictions that are more accurate.
Selection under this approach is based on GS predictions, potentially leading to more rapid genetic gains at a lower cost. Also, GS allows marker data to be combined with phenotypic and pedigree data (when available) in an attempt to increase the accuracy of the prediction of breeding and genotypic values (Goddard and Hayes 2007). Although there is empirical evidence showing that GS is a powerful tool for selecting candidate genotypes in early stages, there is still the need for novel statistical models for selecting multiple correlated traits evaluated in multiple environments. Recent examples of publications on this type of models for multiple traits and multiple environments are those of Montesinos-López et al. (2016 for continuous and count data. However, although these proposed models are mathematically elegant, they are computational very limited; as the number of genotypes, environments, and traits increases, these models cannot be implemented because it is extremely difficult to sample from large multivariate normal distributions. For the abovementioned reasons, statisticians and plant scientists are open and looking for better prediction alternatives in the context of data for multiple traits and multiple environments. However, the need to increase prediction accuracy is not exclusive to GS; it is present in many other areas like marketing, e-commerce, finance, and biology. Each area has made a significant effort to increase predictions; however, in general, few cases have been successful. This reminds us that making predictions is usually very challenging since the target inference is to predict unobserved quantities at the time of the inferences. In this article, we will explore two methods that are very popular in recommender systems. A recommender system is a subclass of an information filtering system that seeks to predict the "rating" or "preference" that a user would give to an item (Ricci et al. 2011). Recommender systems have become increasingly popular in recent years, and are utilized in a variety of areas including movies, music, news, books, research articles, search queries, social tags, and products in general. In this research, we will implement the following two recommender systems methods in the context of GS: item-based collaborative filtering (IBCF) and the matrix factorization method. The first method (IBCF) belongs to the collaborative filtering techniques, which are a set of prediction methods that have been significantly successful in building recommender systems in various settings (e.g., marketing, e-commerce, etc.). The basis of this technique is to use the known preferences of a group of users to make recommendations or predictions of the unknown preferences of other users.
The second method (matrix factorization) belongs to the latent factor models, and tries to explain the ratings by characterizing both items and users on a reduced number of factors that approximate the rating matrix. The matrix factorization method is considered one of the most successful of the latent factor models. In its basic form, matrix factorization characterizes both items and users by vectors of factors inferred from item rating patterns. High correspondence between item and user factors leads to a recommendation. These methods have become popular in recent years by combining good scalability with predictive accuracy. In addition, they offer much flexibility for modeling various real-life situations.
The two recommender systems mentioned above (IBCF and matrix factorization) work by building a matrix of preferences (called a rating matrix) where each row represents a user, each column represents an item, and the number at the intersection of a row and a column represents the user's rating value. The absence of a rating score at this intersection indicates that that user has not yet rated the item. We will use simulated and real data to compute the prediction accuracy of both recommender systems (IBCF and matrix factorization) and we will compare their predictions with those produced by two conventional GS models that take into account the genotype · environment interaction term.

IBCF
As mentioned above, the IBCF technique is a model-based algorithm for recommender items or products. In IBCF, similarities between items are calculated from a rating matrix (Table 1); based upon those similarities, a user's preference for an item that has not been rated by a user is calculated. In general, Table 1 can be constructed for n users and m items. Here, we present a step-by-step example of IBCF with four users and three items. For example, let us assume that we have the rating matrix shown in Table 1. With this information, we create an itemto-item similarity matrix (using the cosine similarity ¼ cosðuÞ ¼ P n j¼1 x j y j = ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P n j¼1 x 2 j q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P n j¼1 y 2 j q or Pearson correlation), which provides information on how similar an item is to another item. The complete cosine similarity matrix between items for this example is shown in Table 2. We then predict each user's ratings for items that have not previously been rated. In this example, we will calculate the rating for user u1 in item I2; along with user u2 in item I3 and for user u4 in item I1: Each of these predictions can be calculated using a simple weighted average to predict the rating, P i;j ; for user i in item j ; as follows (Sarwar et al. 2001): where the summation is over all other rated items (jeN) for user i; w j;j is the weight between items j and j ; and y i;j is the rating for user i on item j: Therefore, the predicted rating for item I2 for user u1 will be P 1;2 ¼ ðy 1;1 w 1;2 þ y 1;3 w 3;2 Þ=ð w 1;2 þ w 3;2 Þ ¼ ð2 · 0:76 þ 3 · 0:86Þ= ð0:76 þ 0:86Þ ¼ 2:53; while the predicted rating for item I3 for user u2 will be P 2;3 ¼ ðy 2;1 w 1;3 þ y 2;2 w 2;3 Þ=ð w 1;3 þ w 2;3 Þ ¼ ð5 · 0:78þ 2 · 0:86 Þ= ð0:78þ 0:86Þ ¼ 3:43; and the predicted rating for item I1 for user u4 will be P 4;1 ¼ ðy 4;2 w 2;1 þ y 4;3 w 3;1 Þ=ð w 2;1 þ w 3;1 Þ ¼ ð2 · 0:76 þ 2 · 0:78Þ=ð0:76 þ 0:78Þ ¼ 2: We provide the R code for the IBCF in Appendix A1.
It is important to point out that when calculating the similarities between users instead of between items, based upon these similarities, a user's preference for an item that he/she has not previously rated is calculated. This method is called user-based collaborative filtering (UBCF) and is basically the same as IBCF but refers to users (lines in plant breeding). While item-based algorithms generate predictions based on similarities between items, user-based algorithms generate predictions based on similarities between users. In this article, the main focus is on the IBCF because it is more computationally efficient than UBCF when the number of items is lower than the number of users.
Additionally, there is empirical evidence showing that IBCF is more accurate in predicting ratings than UBCF (Sarwar et al. 2001).

Matrix factorization (MF)
Matrix factorization is another algorithm for recommending items or products. Matrix factorization consists of factorizing a matrix, i.e., to find two matrices, which, when multiplied, will produce the original matrix. Formally, the matrix factorization model looks for two matrices, P and Q, of order n · K and m · K; respectively, such that where R is the matrix that contains all the ratings that the users have assigned to the items and it is of order n · m; K is the prespecified number of latent features (variables) that is lower than or equal to the minðn; mÞ:R is the estimated matrix of ratings. Each row of P represents the strength of the associations between a user and the features, while each row of Q represents the strength of the associations between an item and the features. To obtain the prediction of a rating of item I j by user u i ; we calculate the dot product of the two vectors corresponding to u i and I j : However, to obtain the predictions for users of the missing items, we need to estimate matrices P and Q; with the no missing entries of the rating matrix. There are several methods for estimating matrices P and Q: One method consists of minimizing the squared error function with the observed entries of R: where l is the regularization parameter that controls overfitting. To minimize the squared error function, we have to know in which direction we have to modify the values of p ik and q kj : In other words, we need to know the gradient at the current values, and therefore we differentiated the above equation with respect to these two variables separately: Since we have the gradient, next we provide the updating rules for both p ik and q kj : where e ij ¼ r ij 2 p ik q kj and a is the learning rate; we usually choose a small value, for example, 0.0002. The learning rate affects the learning time, and too large a value may lead to divergence. In general, a smaller learning rate gives better performance, but the learning time is also longer. Using the updated rules, we can iteratively perform the operation until the error converges to its minimum. Before implementing this gradient descendent algorithm, we need to select a learning rate a; a regularization coefficient l; and the required number of latent features K: Also, we need to set the starting values of matrices P and Q: The R code for implementing this algorithm is provided in Appendix A2.
Multiple-environment (ME) mixed model For each trait, the following univariate linear mixed model is proposed: where y ij represents the normal response from the jth line in the ith environment (i ¼ 1; 2; . . . ; I, j ¼ 1; 2; . . . ; JÞ: For illustration purposes, we will use I ¼ 3: E i represents the fixed effect of the ith environment and it is assumed as a fixed effect, g j represents the random effect of the genomic effect of the jth line, with g ¼ ðg j ; . . . ; g J Þ T $ Nð0; G g Þ; and G g is of order J · J and represents the genomic relationship matrix. It was calculated using the VanRaden (2008) method as G g ¼ WW T =p; with W as the matrix of markers of order J 3 p: gE ij is the random interaction term between the genomic effect of the jth line and the ith environment, where gE ¼ ðgE 11 ; . . . ; gE IJ Þ T $ Nð0; I I 5GÞ and e ij is a random error term associated with the jth line in the ith environment distributed as Nð0; s 2 Þ: As previously mentioned, this model was used for each of the l ¼ 1; . . . ; L traits, where L denotes the number of traits under study.

Multiple-trait and multiple-environment unstructured mixed model
To account for the correlation between traits, we stacked the information of all the L traits given in Equation (7). In matrix notation, the whole mixed model is equal to where Y is of order Ln 3 1; X is of order Ln 3 IL; b is of order IL 3 1 and contains the b coefficients of the environment-trait combinations, Z 1 is of order Ln · LJ; b 1 is of order LJ · 1; Z 2 is of order Ln · IJL; b 2 is of order IJL · 1, and e is of order Ln 3 1: Then b 1 $ Nð0; G 1 Þ; b 2 $ Nð0; G 2 Þ, and e $ Nð0; RÞ; where G 1 ¼ G g 5Σ t ; Σ t is the genetic covariance matrix between traits and is assumed unstructured, and 5 denotes a Kronecker product, n Table 2 Item-to-item similarity matrix constructed with information in Table 1   I1  I2  I3 I1 w 1;1 = 1 w 1;2 = 0.76 w 1;3 = 0.78 I2 w 2;1 = 0.76 w 2;2 = 1 w 2;3 = 0.86 I3 w 3;1 = 0.78 w 3;2 = 0.86 w 3;3 = 1 Value w j;j represents the similarity between item j and item j´, obtained using cosine similarity.
n Table 1 Rating matrix data set with four users (rows) and three items (columns), with the rating of each user given in an ordinal scale of three points denotes missing values that need to be predicted.
where Σ E is assumed as a general matrix of order I · I: It is important to point out that the trait · environment (T · E) interaction term is included in the fixed effect b; while the trait · genotype (T · G) interaction term is included in the random effect b 1 , and the three-way (T · G · E) interaction term is included in b 2 : The errors are assumed to be correlated with the covariance defined as R ¼ I n 5R e ; where R e is the residual general covariance matrix between traits.

Proposed methods
In this section, we present the proposed methods for analyzing MTME data with continuous phenotypes.
Method IBCF: In this method, the prediction analysis is performed with IBCF using Equation (1) directly. However, the training of the multipletrait and multiple-envirionmnet (MTME) data first need to be scaled for each trait-environment combination given in Table 3. When the genotypes under study are the same in all environments, we create a rectangular rating matrix where the number of rows is equal to the number of genotypes and the number of columns is equal to the resulting combinations of environments and traits; that is, if the number of traits is 4 and the number of environments is 5, then the resulting number of columns in this rating matrix should be 20 (Table 3). However, when the data are only multiple trait (MT) or ME, the rows should be genotypes, while the number of columns should be the phenotypes measured in each environment or trait. Table 3 gives an example of how the phenotypic information resulting from a MTME data set should be arranged to implement IBCF.
Method MF: This method implements the matrix factorization method described in Equations (2)-(6). To be correctly implemented, the data need to be placed as shown in Table 3; implementation was performed using the R code given in Appendix A2.
Method ME mixed model: Method ME consists of using Equation (7) for each trait; it is thus a typical genomic-based model with the main effects of environments, genotypes, and the genotype · environment interaction term.
Method MTME mixed model: Method MTME consists of using the following multitrait and multienvironment model: where y ijl represents the normal response of the jth line in the ith environment for trait l (i ¼ 1; 2; . . . ; I, j ¼ 1; 2; . . . ; J; l ¼ 1; . . . ; LÞ: T l represents the fixed effect of the lth trait, TE il is the fixed interaction term between the lth trait and the ith environment, gT jl represents the random effect of the interaction of genotype j and the lth trait, with gT ¼ ðgT 11 ; . . . ; gT JL Þ T $ Nð0; G 5I L Þ; gET ijl is the three-way interaction of genotype j; the ith environment and the lth trait, with gET ¼ ðgET 111 ; . . . ; gET IJL Þ T $ Nð0; I I 5G 5I L Þ; and e ijl is a random error term associated with the jth line, the ith environment and the lth trait is distributed as Nð0; s 2 Þ: Simulated data sets For testing the proposed models and methods, we simulated MTME data using Equation (8) with three environments, three traits, and 1000 genotypes, along with one replication for environment-trait-genotype combinations. We assumed that b T ¼ ½15, 8,7,12,6,7,14,9,8], where the first three beta coefficients belong to traits 1, 2, and 3 in environment 1, the next three values to the three traits in environment 2, and the last three to environment 3. Also, this first set of variance-covariance matrices were assumed to be . These three variance-covariance matrices gave rise to a correlation of 0.85 between each pair of traits (genetic and residual) and between each pair of environments. We also assumed that the genomic relationship matrix is known, G g ¼ 0:7I 1000 þ 0:3J 1000 ; where I 1000 is an identity matrix of order 1000 and J 1000 is a matrix of order 1000 · 1000 of ones. Therefore, the total number of observations was 3 · 1000 · 3 · 1 ¼ 9000; that is, 3000 for each trait. With these parameters, we simulated three data sets which were used for testing the prediction accuracy of the proposed models under each of the three studied scenarios. These scenarios included (S1), which generated the data as normal data using Equation (8); (S2), where the data were also generated with Equation (8) but with the error term replaced by exp[21.25 · abs(e)], where e is exactly the same as what was generated under the first scenario, however, it now produced negative skewed data; and (S3), the last scenario, which also generated the data with Equation (8) but with the error term replaced by exp(1.25 · abs(e)) to induce positive skewed data.
Additionally, under the same conditions as mentioned above, we studied two other sets of variance-covariance matrices: set 2, with  Table 3 Phenotypic information for building the rating matrix for multiple-trait and multiple-environment data for J genotypes, I environments, and L traits G denotes genotypes, E denotes environment, T represents trait, and y ijl is the phenotype from the jth line in the ith environment for the lth trait.
the third set of variance-covariance matrices produced a pair of correlations between traits and between environments of 0.25. These three sets of variance-covariance matrices were proposed for studying the performance of the methods proposed in the context of high correlation (first set of variance-covariances), medium correlation (second set of variancecovariances), and low correlation (third set of variance-covariances) between traits (genetic and residual) and between environments.

Experimental data sets
Here we present information on the data sets used for implementing the proposed methods. In total, three data sets were used (one of maize and two of wheat).
Data set 1: A total of 250 wheat lines were extracted from a large set of 39 yield trials grown during the 2013-2014 crop season in Ciudad Obregon, Sonora, Mexico (Rutkoski et al. 2016). The trials were sown in mid-November and grown on beds with five and two irrigations, in addition to drip irrigation. Days to heading (HD) was recorded as the number of days from germination until 50% of spikes had emerged in each plot in the first replicate of each trial. Grain yield (GY) was the total plot grain yield measured after maturity, and plant height (PH) was recorded in centimeters. Image data of the yield trials were collected using a hyperspectral camera (A-series, Mirco-Hyperspec VNIR; Headwall Photonics, Fitchburg, MA) mounted on a manned aircraft, which allowed us to calculate vegetative indices for each plot. The green normalized difference vegetation index (GNDVI) was one of the traits used in this study, since it is considered a good predictor when used with pedigree and/or genomic prediction of GY in wheat due to its high heritability and genetic correlation. Trait GNDVI can also be measured remotely on large numbers of candidates for selection.
Genotyping-by-sequencing (GBS) was used for genome-wide genotyping. Single nucleotide polymorphisms (SNPs) were called across all lines using the TASSEL GBS pipeline anchored to the genome assembly of Chinese Spring. SNP calls were extracted, and markers were filtered so that the percent of missing data did not exceed 80%. Individuals with .80% missing marker data were removed, and markers were recorded as 21, 0, and 1, indicating homozygous for the minor allele, heterozygous, and homozygous for the major allele, respectively. Next, markers with ,0.01 minor allele frequency were removed, and missing data were imputed with the marker mean. A total of 12,083 markers remained after marker editing.
Data set 2: A total of 309 doubled haploid maize lines were phenotyped and genotyped; they are part of the data set used by Crossa et al. (2013) and Montesinos-López et al. (2016), which is comprised of a total of 504 doubled haploid lines derived by crossing and backcrossing eight inbred lines to form several full-sib families. Traits available in this data set include grain yield (GY), anthesis-silking interval (ASI), and plant height (PH); each of these traits was evaluated in three optimum rainfed environments (Env1, Env2, and Env3). The experimental field design in each of the three environments was an a-lattice incomplete block design with two replicates. Data were preadjusted using block estimates, while environmental effects were derived from a linear model that accounted for the incomplete block design within environment and for environmental effects.
The genomic data were obtained with GBS for each maize chromosome. The number of markers after initial filtering and the number of markers after imputation were summarized in Crossa et al. (2013). Filtering was first done by removing markers that had .80% of the maize lines with missing values; then markers with a minor allele frequency #0.05 were deleted. The total number of GBS data were 681,257 SNPs and, after filtering for missing values and minor allele frequency, 158,281 SNPs were used for the analyses. About 20% of cells were missing in the filtered GBS information used for prediction; these missing values were replaced by their expected values before doing the prediction.
Data set 3: This third data set was composed of 26,264 wheat lines that were planted over 3 yr (year_13_14; year_14_15, and year_15_16). In the first year (year_13_14), 7672 lines were planted, in the second year (year_14_15) 9091 lines were planted, and in the third year (year_15_16) the remaining 9501 lines were planted. The following five traits were measured on each line: days to heading (HD), days to maturity (DMT), plant height (PH), lodging, and grain yield (GY).

Assessing prediction accuracy
To assess prediction accuracy, 20 training-testing random partitions were implemented, as well as two types of cross-validations. The first one (CV1) mimicked a situation where lines were evaluated in some environments for the traits of interest; however, some lines were missing in all the other environments. In this cross-validation, we assigned 80% of the lines to the training set and the remaining 20% to the testing set. The second cross-validation (CV2) mimicked a situation where we wanted to predict all the information of one trait for a complete year; however, the available information was for a previous year for all the traits under study and, for the target year, we had information for several traits except the one of interest. We used the Pearson correlation to compare the predictive phenotipic values to the observed phenotype. Models with higher correlation values had better predictions.
In the above-mentioned case, cross-validation CV1 was implemented for the simulated data sets and the first two real data sets (Data set 1 and Data set 2), while cross-validation CV2 was implemented for the real Data set 3. It is important to point out that in Data set 3, there were no common lines across years. Also, to illustrate how to predict one trait for the whole year, we assumed that one trait was missing. For example, we assumed that trait GY was missing in the year_14_15 (9091 lines missing for GY) and the information on the training set was the information from year_13_14 and year_14_15, but with GY missing in year_14_15. To predict GY for year_15_16, we used the information from year_13_14, year_14_15 and year_15_16 as the training set, but with GY missing in year_15_16 (9501 records missing for GY in year_15_16). The same was done when another trait was assumed to be missing.

Data availability
The maize and wheat phenotypic and genotypic data sets used in this study can be downloaded from the link: http://hdl.handle.net/11529/ 11099. The maize phenotypic and genotypic data sets are Maize_data. RData and Gg.RData, respectively. The wheat phenotypic and genotypic data sets are Data.Trigo.RData and G.Trigo.RData, respectively. The wheat large data set is Large_wheat_data.RData.

RESULTS
The results are organized into two main sections. The first section presents the results in terms of prediction accuracy for the simulated data, while the second section gives the prediction accuracy of the four proposed methods but for the real data sets.
Prediction accuracies using simulated data What follows are the prediction accuracies for each of the three studied scenarios obtained using the four proposed methods. Figure 1 shows the results of using a correlation between traits (genetic and residual) and between environments of 0.85 (high correlation), while Figure 2 depicts the results when the correlation assumed between traits (genetic and residual) and between environments was 0.5 (intermediate correlation). Finally, Figure 3 displays the results when the correlation assumed between traits (genetic and residual) and between environments was 0.25 (low correlation). Figure 1 shows that under the first scenario (when the data were normal), method IBCF was the best in terms of prediction accuracy for all the trait-environment combinations. The second-best method was matrix factorization and the worst was method MTME. It is important to point out that, on average, method IBCF was 10.8% better than matrix factorization, 13.8% better than method ME, and 31.3% better than MTME. Under scenario S2 (considering the residual negative skew multiplied by 1.25), we observed that method MTME was the best, followed by method ME, then by IBCF, and finally matrix factorization, which was the worst. However, the gains in terms of average predictions of method MTME with regard to methods IBCF, matrix factorization, and ME were only 12.5, 15.8, and 6.2%, respectively (Figure 1). Under scenario S3 (with the residual positive skew multiplied by 1.25), method IBCF was the best, followed by matrix factorization; MTME was the worst and, on average, IBCF was better than methods matrix factorization, ME, and MTME by 3.3, 39.3, and 67.1%, respectively (Figure 1). Appendix A7 gives the SE of the average Pearson correlations for each scenario and environment-trait combination of the predictions given in Figure 1, Figure 2, and Figure 3. Figure 2 shows that in the first scenario (when the data were normal), method IBCF (collaborative filtering) was the best in terms of prediction accuracy, as shown in seven out of nine trait-environment combinations, as opposed to matrix factorization, which was the worst. Also, it must be noted that, on average, method IBCF was 14.5% better than matrix factorization, 11.3% better than ME, and 7.7% better than MTME. Under scenario S2 (with the residual negative skew multiplied by 1.25), method MTME was the best, while matrix factorization was the worst; MTME was, on average, 43.8% better than IBCF, $45.7% better than matrix factorization, and 29.9% better than ME (Figure 2). Under scenario S3 (with the residual positive skew multiplied by 1.25), method IBCF was the best, MTME was the worst and, on average, IBCF was better than methods matrix factorization, ME, and MTME by 3.62, 51.3, and 53.3%, respectively ( Figure 2). Figure 3 shows that under the first scenario (when the data were normal), method MTME was the best in terms of prediction accuracy, as shown by the results of all the environment-trait combinations; the second best was ME, and the worst was matrix factorization. On average, MTME was 34% better than IBCF, 44.9% better than matrix factorization, and 29.1% better than ME. Under scenario S2 (with the residual negative skew multiplied by 1.25), method MTME was the best, matrix factorization was the worst, and MTME was, on average, better than IBCF, matrix factorization, and ME by 70.9, 71.4, and 47.7%, respectively (Figure 3). Under scenario S3 (with the residual positive skew multiplied by 1.25), method IBCF was the best, followed by matrix factorization; ME was the worst and, on average, IBCF was better than methods matrix factorization, ME, and MTME by 3.0, 53.4, and 27.1%, respectively (Figure 3). In general, method IBCF was poorer when the correlation between traits (genetic and residual) and between environments was low; however, in Figure 3 under scenario (S3), we can see than sometimes it was better.
Prediction accuracy using real data sets In this section, we present the results of predictions under the four proposed methods for the three real data sets. We begin by presenting the Figure 1 Simulated data. Average Pearson correlation (APC) for each environment-trait combination using the four methods under study [IBCF, matrix factorization (MF), ME, and MTME] for data simulated with a correlation of traits (genetic and residual) and correlation of environments of 0.85. S1 is the scenario under normality, S2 is the scenario under the error negative skew multiplied by 1.25, and S3 represents the scenario under the error positive skew multiplied by 1.25. The notation E1_T1 means environment 1, trait 1. results obtained using the first data set (wheat data set), followed by the results for the second data set (maize data set), and finally, the results for the third real data set (large wheat data set).
In Table 4 we present the prediction accuracies using the wheat data set under method MF with 10 different values of the regularization parameter lambda (lÞ: The table shows that the worst predictions were made without the regularization parameter, that is, when l ¼ 0: Then it shows that when l = 0.4, the prediction accuracy increased with regard to l ¼ 0 by 15.5%, continuing to increase the value of l until 1.6, the increase in the average prediction accuracy is up to 0.594 (21.88%). However, after a value of l = 2.2, the prediction accuracy starts to decrease. This implies that to get the best performance of method matrix factorization, various values of the parameter l need to be tested before doing the final implementation of this model.
In Table 5, we compare the predictions of the four proposed methods using the first data set (wheat data set). Table 5 indicates that IBCF without markers was the best method in terms of prediction accuracy, as shown by six out of the 12 environment-trait combinations; the second best was method matrix factorization, and the worst was method ME with and without markers. Method IBCF without markers was, on average, better than methods IBCF with markers, matrix factorization and ME without markers, ME with markers, MTME without markers, and MTME with markers by 44.6, 1.7, 99.2, 99.5, 29.9, and 36.3%, respectively. These findings of the good performance of method IBCF are related to the phenotypic correlations between traits, which are not low, as can be seen in Appendix A4. Here it is important to point out that to take into account the marker information, the genomic relationship matrix created with the marker information was used as a proxy for the phenotypic correlation between genotypes (users). Then we applied the UBCF instead of the IBCF. However, according to these results, there is evidence that using the genomic relationship matrix alone as a proxy for the phenotypic correlation between genotypes is not very reasonable. Table 6 indicates that for the second data set (maize data set), the best prediction accuracies were observed under ME with markers (univariate model), followed by model MTME with markers (multivariate model), and the worst were observed under model matrix factorization without makers. In general, the predictions of ME were slightly better than those of MTME with and without markers; the second best were obtained using method IBCF with markers, while the worst were obtained using matrix factorization without markers. Methods ME and MTME were considerably better than the remaining methods. ME and MTME with markers were $30.9% better than method IBCF without markers, 27.8% better than IBCF with markers, 35.4% better than matrix factorization without markers, 37.5% better than ME without markers, and 37.7% better than MTME without markers. Here it is important to point out that when methods IBCF, ME, and MTME took into account the genomic relationship matrix, the predictions improved. However, the improvement of IBCF by taking into account the markers with regard to IBCF without markers was only 4.4%. In general, the predictions with this real data set were low, which is mainly explained by the low phenotypic correlations between the nine environment-trait combinations, as can be seen in Appendix A5. Table 7 shows the results for the third real data set (the large wheat data set), where only methods IBCF and matrix factorization were implemented. Method matrix factorization was implemented with three values of latent features ðK ¼ 2; 3; 4Þ: It is important to recall that in this data set there are five columns (items) in the rating matrix (GY, HD, DMT, PH, and lodging) and 36,181 rows (observations), with each row corresponding to a different wheat line. The lines were evaluated in 4 yr: 7672 were evaluated in year_13_14, 9091 were evaluated in year_14_15, 9501 were evaluated in year_15_16, and 9917 were evaluated in year_16_17. In this cross-validation scheme (CV2), we assumed trait GY was missing in the following year and predicted using as training data 1, 2, or 3 yr before. For example, when GY was predicted in year_14_15, only the information of year_13_14 and Figure 2 Simulated data. Average Pearson correlation (APC) for each environment-trait combination using the four methods being studied [IBCF, matrix factorization (MF), ME, and MTME] for data simulated with a correlation of traits (genetic and residual) and correlation of environments of 0.5. S1 is the scenario under normality, S2 is the scenario under the error negative skew multiplied by 1.25, and S3 represents the scenario under the error positive skew multiplied by 1.25. The notation E1_T1 means environment 1, trait 1.
year_14_15 was used as training data, with 9091 lines missing trait GY in year_14_15 to be predicted. When 9501 lines (missing trait GY) were predicted for year_15_16, two scenarios were studied: (1) when only one previous year was used as training and denoted as GY_Year_15_16_1yb; and (2) when two previous years were used as training and denoted as GY_Year_15_16_2yb. Finally, when 9917 lines (missing trait GY) were predicted for year_16_17, three scenarios were studied: (1) when only one previous year was used as training, denoted as GY_Year_16_17_1yb; (2) when two previous years were used as training, denoted as GY_Year_16_17_2yb; and (3) when three previous years, denoted as GY_Year_16_17_3yb, were used as training sets. When the other traits were predicted, the same scenarios were studied, and the corresponding trait to be predicted was assumed to be missing.
The results in Table 7 indicate that when we predicted GY with IBCF in all six scenarios, the average prediction in terms of the Pearson correlation was $0.305, with an average agreement of 30.93% among the first 2000 lines predicted and observed. Under matrix factorization, GY predictions were lower, with Pearson correlations between 0.16 (with K = 2 latent features) and 0.175 (with K = 3); however, it is interesting to point out that under the three latent features used, the average agreement among the first 2000 lines predicted and observed was also 30.93%. When the HD trait was predicted under all six scenarios, good performance was observed, as the average prediction for the Pearson correlation was 0.601 under IBCF, 0.454 under matrix factorization with K = 2, 0.59 under matrix factorization with K = 3, and 0.60 under matrix factorization with K = 4, with average agreement of 48.33% in the first 2000 lines predicted and observed using the two methods. Similar behavior was observed when predicting the DMT trait, as the average prediction accuracy was 0.569 (Pearson correlation) under IBCF, 0.454 under matrix factorization with K = 2, 0.60 with K = 3, and 0.61 under matrix factorization with K = 4. In this trait, an average agreement of 41.6% in the first 2000 lines was predicted and observed for the two methods. On the other hand, when we predicted the traits PH and lodging, the average predictions were really poor: lower than 0.1145 for PH using the two methods and lower than 0.11 for lodging using the two methods. Additionally, the average agreements in the first 2000 lines predicted and observed were 24.642 and 20.77% for traits PH and lodging, respectively. The results in Table 6 show that methods IBCF and matrix factorization did a good job of predicting HD and DMT, and a reasonable job of predicting GY. However, they did a poor job of predicting PH and lodging. These contrasting results are due to the level of phenotypic correlation (see Appendix A6) since the traits that had high correlation with at least one other trait showed higher prediction accuracy and vice versa; that is, traits with low correlation with other traits were found to have low prediction accuracies. Also in Table 7 it is clear that for this data set, model IBCF was better than model matrix factorization.

DISCUSSION
In this article, we propose using two methods for recommender systems: IBCF and matrix factorization for predicting multiple traits of some genotypes that are missing in some environments. Our results using simulated and real data sets are interesting, since we found that when the phenotypic correlation between traits and environments is reasonably high, we can make predictions with reasonable accuracy (as shown in the simulation study presented) with both methods IBCF and matrix factorization, but method IBCF was the best. This is a very good sign, since the implementation of method IBCF is straightforward, as it is only necessary to place the information in a rectangular rating matrix where the rows are the genotypes and the columns represent the trait-environment combinations. Then, we need to scale each column by subtracting its Figure 3 Simulated data. Average Pearson correlation (APC) for each environment-trait combination using the four methods under study [IBCF, matrix factorization (MF), ME, and MTME] for data simulated with a correlation of traits (genetic and residual) and correlation of environments of 0.25. S1 is the scenario under normality, S2 denotes the scenario under the error negative skew multiplied by 1.25, and S3 is the scenario under the error positive skew multiplied by 1.25. The notation E1_T1 means environment 1, trait 1. mean and dividing by its SD. Obviously, this scaling process should be done only on the training data. Then, using the R code provided in Appendix A1, we can apply the IBCF technique. See the example in Appendix A3 to understand how to implement this algorithm.
We also provided the R code (Appendix A2) for implementing the matrix factorization method. However, the implementation of matrix factorization, even though it produced competitive predictions, was lower than that produced by method IBCF (using the R code given in Appendix A1). For its successful application, the following should to be taken into account: (1) first obtain the tuning parameter l using crossvalidation, (2) choose the number of latent features (parameter KÞ also using cross-validation; and (3) obtain appropriate starting values to get n  n  Prediction accuracies with Pearson correlation for each environment-trait (Env-Trait) combination of the proposed methods for the wheat data set from Rutkoski et al. (2016), under cross-validation scheme CV1. The best predictions of the four methods are in boldface, and the comparisons are made by row. "No markers" means that genomic information was not used, while "with markers" means that genomic information was used. MF, matrix factorization.
convergence. Since the IBCF is extremely fast, we suggest using these values as starting values for implementing the matrix factorization method. See the example in Appendix A3 to learn how to implement the matrix factorization algorithm.
From our results with simulated and real data sets, we found that the IBCF technique (method IBCF) is very competitive even compared to the MTME model for positive skew data. Under most scenarios where there was moderately high phenotypic correlation, the IBCF technique n Prediction accuracies with Pearson correlation for each environment-trait (Env-Trait) combination of the proposed methods for the maize data set under crossvalidation scheme CV1. The best predictions of the seven methods are in boldface, and the comparisons are made by row. "No markers" means that genomic information was not used, while "with markers" means that genomic information was used. MF, matrix factorization.
n outperformed (in terms of prediction accuracy and implementation time) the MTME. The IBCF technique is very efficient in terms of the computational time required for its implementation and can be applied on large data sets. These techniques (IBCF and matrix factorization) can also be applied to MT or ME analysis only, as was done on the third real data set (large wheat data set). However, the placement of the data are slightly different, since in the rectangular rating matrix the rows are the genotypes but the columns are the phenotypes of traits or the phenotypes of environments. The expected predictions depend on the degree of the phenotypic correlation between traits or environments. We believe that these approaches could be very useful when the number of traits or environments, as well as the correlations between them, are large. The results obtained for the large wheat data set are very promising mostly under method IBCF, since they show that when the phenotypic correlation between traits is moderately high, we can predict the information of thousands of lines using cross-validation CV2 (in this case, 9091 lines were predicted for year_14_15, 9501 lines for year_15_16, and 9917 lines for year_17_17). The IBCF technique also has the advantage that the implementation time is fast and allows parallel computing in the event of larger data sets. A disadvantage of using IBCF is that it works with phenotypic correlations between items (trait-environment combinations) or users (genotypes), and when we use the genomic relationship matrix as a proxy for the phenotypic correlation between genotypes, this does not really represent the phenotypic correlation between genotypes and can produce poor predictions. This was observed in our results where only in the second real data set using the genomic relationship matrix produced a little improvement in prediction accuracy when the genomic relationship matrix was used. When applied to the first real data set (wheat data set), the predictions obtained were poor. Also, it is important to point out that due to the nature of both models (IBCF and matrix factorization), they not allow for inclusion of the genomic relationship matrix. For this reason, we used the genomic relationship matrix as a proxy for the phenotypic correlation between genotypes (users) to be able to implement the UBCF technique, but we did not find an efficient way to incorporate this information under the matrix factorization method. However, according to the results obtained, we have evidence that using the genomic relationship matrix alone as proxy for the phenotypic correlation between genotypes (users) using the UBCF is not reasonable. For this reason, we believe that to take into account the genomic relationship matrix in these models (IBCF and matrix factorization), some novel modifications need to be made to these recommender systems methods to be able to take advantage of all the genomic information.
We must point out that the process for scaling each column is very important in order to have all the columns (items = trait-environment combinations) on the same scale, since methods IBCF and matrix factorization were originally proposed for ordinal items (example 1 = totally disagree, 2 = disagree, 3 = neutral, 4 = agree, 5 = totally agree). For this reason, the user must remember to scale each column of the rating matrix. We encourage the use of both IBCF and matrix factorization for ordinal or binary phenotypic data, given that IBCF was originally proposed in the context of ordinal data. However, according to our review of these two statistical techniques (methods IBCF and matrix factorization), they had only been implemented using the cosine and Pearson correlation. However, we believe that predictions can be improved using the polychoric correlation for ordinal data or the tetrachoric correlation for binary data. Also, since we usually use 0, 1, or 2 to denote the information of each marker in a line in its position in the genome with SNPs, we believe that when the correlation between the columns of SNPs is reasonably large, this method can be used successfully for imputation of missing markers.
It is important to point out that both methods can be implemented using the R code given in Appendices A1 and A2. Although there are some R packages that can be used to implement IBCF and matrix factorization, we need to be careful when using these packages, since some of them were developed exclusively for binary or ordinal data; thus, when we want to use them for scaled continuous data, they do not work because the maximization process they use is only for positive values of the input provided. One package that has such restrictions is the package rrecsys (Coba et al. 2017).
Finally, one of the main advantages of using IBCF is that its implementation is very fast and can be used on large data sets without problems. In the case of data sets with very large numbers of columns, parallel processing is possible because its computation is not iterative. However, if the correlation between the columns of the created rating matrix is poor (low correlation), this technique produces poor predictions; the larger the correlation between the columns of the rating matrix, the better the sample predictions will be.

APPENDIX A5
Phenotypic cosine correlations for environment-trait combinations in data set 2 (maize data set)

APPENDIX A6
Phenotypic cosine correlations for environment-trait combinations in data set 3 (large wheat data set) SE of the prediction accuracies given in Figure 1, Figure 2, and Figure 3, with Pearson correlation for each environment-trait (Env-Trait) combination using the four methods under study [IBCF, matrix factorization (MF), ME, and MTME] for data simulated with a correlation (Cor) of traits (genetic and residual) and correlation of environments of 0.85, 0.5, and 0.25. S1 is the scenario under normality, S2 is the scenario under the error negative skew multiplied by 1.25, and S3 represents the scenario under the error positive skew multiplied by 1.25. The notation E1_T1 means environment 1, trait 1.