A Bayesian Genomic Regression Model with Skew Normal Random Errors

Genomic selection (GS) has become a tool for selecting candidates in plant and animal breeding programs. In the case of quantitative traits, it is common to assume that the distribution of the response variable can be approximated by a normal distribution. However, it is known that the selection process leads to skewed distributions. There is vast statistical literature on skewed distributions, but the skew normal distribution is of particular interest in this research. This distribution includes a third parameter that drives the skewness, so that it generalizes the normal distribution. We propose an extension of the Bayesian whole-genome regression to skew normal distribution data in the context of GS applications, where usually the number of predictors vastly exceeds the sample size. However, it can also be applied when the number of predictors is smaller than the sample size. We used a stochastic representation of a skew normal random variable, which allows the implementation of standard Markov Chain Monte Carlo (MCMC) techniques to efficiently fit the proposed model. The predictive ability and goodness of fit of the proposed model were evaluated using simulated and real data, and the results were compared to those obtained by the Bayesian Ridge Regression model. Results indicate that the proposed model has a better fit and is as good as the conventional Bayesian Ridge Regression model for prediction, based on the DIC criterion and cross-validation, respectively. A computing program coded in the R statistical package and C programming language to fit the proposed model is available as supplementary material.

In genetic studies of plants or animals, it is common to find quantitative traits whose distribution is not normal; this is because the data are obtained from multiple sources or contain isolated observations (Li et al. 2015). Landfors et al. (2011) noted that it is often necessary to normalize the data to remove variation introduced during the experiment's development. However, such standard normalization techniques are not always able to remove bias because a large number of observations are positively or negatively affected by some treatment. In addition, suitable transformation for the data may be difficult to find, and may bring further complications when estimating and interpreting the results obtained (Fernandes et al. 2007). To avoid transformations, different methods have been developed that are flexible enough to represent the data and reduce unrealistic assumptions (Arellano-Valle et al. 2005). In the genomic selection framework (GS; Meuwissen et al. 2001), dense molecular marker genotypes and phenotypes are used to predict the genetic values of candidates for selection. The availability of high density molecular markers of many agricultural species, together with promising results from simulations (e.g., Meuwissen et al. 2001) and empirical studies in plants de los Campos et al. 2010;Crossa et al. 2010) and animals (e.g., VanRaden 2008Weigel et al. 2009), are prompting the adoption of GS in several breeding programs. The parametric model for GS explains phenotypes (y i ; i = 1,. . .,n) as functions of marker genotypes (x ij ) using a linear model of the form: y i ¼ b 0 þ P p j¼1 x ij b j þ e i , where n is the number of phenotypic records, p is the number of markers, x ij 2 f0; 1; 2g represents the number of copies of a bi-allelic marker (e.g., an SNP), and b j is the additive effect of the reference allele at the j th marker, j ¼ 1; . . . ; p. In matrix notation, the model is expressed as y ¼ b 0 1 þ Xb þ e, where y ¼ fy i g, b ¼ fb j g and e ¼ fe i g are vectors of phenotypes, marker effects and Gaussian zero mean errors, respectively, and X ¼ fx ij g is a matrix of marker genotypes of dimensions n·p. However, when the data are not normal, the normal regression methods generate inconsistent estimates with the natural distribution of the data and, therefore, the estimation of the conditional mean given the covariates is also inconsistent (Bianco et al. 2005).
With dense molecular markers, the number of markers exceeds the number of records in the reference population (p . .n) and, therefore, penalized regression estimation methods and their Bayesian counterparts are commonly used. Penalized estimation methods produce regression parameter estimates that are often equivalent to posterior modes. The literature on GS offers a long list of Bayesian models whose main difference is the prior distributions assigned to marker effects, which leads to what is known as the Bayesian Alphabet (Gianola 2013;de los Campos et al. 2013). The above-mentioned models assume that the response follows a normal distribution. Several phenotypic traits have distributions that are skewed, for example, female flowering, male flowering, the interval between male and female flowering, categorical measurements of diseases such as ordinal scale, counting data, etc. In these cases, either a normalizing transformation for the response variable (e.g., using Box-Cox transformation) or a model that deals with skew responses may be used. Varona et al. (2008) proposed using linear mixed models with asymmetric distributions in the residuals to tackle the problem in the context of animal breeding when pedigree information is available. Nascimento et al. (2017) proposed the Regularized Quantile Regression as a way to overcome the issue of non-symmetric distributions when marker information is available.
If a population is selected based on one trait ðYÞ and another trait of interest ðOÞ results that exceeds (does not exceed) some threshold, then the conditional distribution of Y j O . o, for a fixed o; leads to a distribution that is skewed (Arnold and Beaver 2000), such as the skew-normal (SN) distribution, which is of particular interest in this research. This distribution is a generalization of the normal distribution (Azzalini 1985) with a shape parameter added to adopt skewed forms. It has the advantage of being mathematically tractable and shares properties with the normal distribution; for example, the density of the SN is unimodal (Genton 2004). Varona et al. (2008) argues that the asymmetric distributions observed for the phenotypes are the result of environmental factors and that data can be modeled using non-symmetric residual distributions.
Based on the previous considerations and motivated by the fact that a great deal of traits in plant breeding have skew normal distributed, such as flowering time in most crop species, as well categorical traits such as diseases (ordinal, binomial, or counting data), in this study we propose a general Bayesian genomic regression model for skew-normal phenotypic traits with skew-normal random errors. The model uses a stochastic representation of the response variable (Arnold and Beaver 2000) in order to ease computations and it also works in the case that when n.p. It should point out, however, that the aim of the paper is not to describe and study the causes of the skew distribution but rather we assume that the skew data are given and thus the objective is to propose a robust statistical model that deals with the skew-normal distribution of the phenotypic and residuals.
The structure of this paper is as follows. In section 2, we present the statistical models and describe the latent variable model used in the regression. In section 3, we describe a simulation experiment that is performed to evaluate the predictive power of the proposed model. In section 4, we present an application with real data; section 5 includes the discussion and concluding remarks.

Statistical models
In this section we introduce the statistical models to be used in the manuscript. We begin by giving a brief review of the skew normal model. Then we introduce the concept of data augmentation and we use this concept in order to generate a skew normal random variable. After that we introduce the "centered parameterization" in the skew normal model, regression with skew random errors. Finally, we present the pior, posterior and full conditional distributions for the regression model with skew normal residuals. Skew-normal model: A continuous random variable U is said to follow the skew-normal law with shape parameter l 2 ℝ, denoted by SN D ðlÞ if its density function is: (1) where fðÁÞ and FðÁÞ denote the density and cumulative distribution functions of a standard normal random variable, respectively. The subscript D indicates the use of "direct parametrization" (Azzalini 1985). Note that the skew normal distribution reduces to the normal case when l ¼ 0.
The mean and variance of U are given by: The coefficient of skewness of U is: If Y is a random variable defined by Y ¼ j þ vU, then Y is said to have a skew-normal distribution with location (j), scale (v), and shape (l) parameters, and is denoted as SN D ðj; v; lÞ. The density function of Y is given by: It can be shown that the coefficient of skewness of Y corresponds to the skewness coefficient of U.
Hidden truncation: Let V and W be two random variables whose joint distribution is given as follows: where MN 2 ðm; ΣÞ denotes a bivariate random variable with mean m and variance-covariance matrix Σ and r 2 ð21; 1Þ; the random variable U is defined as follows: , "as if" we had observed the latent variable Z ¼ V . 0. The conditional distribution of UjZ ¼ z is Nðrz; 1 2 r 2 Þ and Z $ TNð0; 1; 0; NÞ, which is a truncated normal random variable with location parameter 0, scale parameter 1, lower truncation bound 0 and upper truncation bound N. Therefore, the joint distribution of U and Z is f UjZ ðujz; rÞf Z ðzÞ ¼ f U;Z ðu; z rÞ, that is: Note that the density function of U can be obtained by integrating f U;Z ðu; z rÞ with respect to z; that is, rÞdz. Estimating the parameters in the direct parametrization is troublesome, so "centered parametrization" is more appropriate for parameter estimation and interpretation (Azzalini 1985;Pewsey 2000;Azevedo et al. 2011, among others). If U $ SN D ðlÞ and l ¼ r ffiffiffiffiffiffiffiffiffi 1 2 r 2 p , then EðUÞ ¼ Figure 1 shows the density function of U for several values of the shape parameter l and the corresponding values of r. The random variable: is said to be a skew normal random variable with parameters m 2 ℝ, s e . 0 and g 1 , where g 1 is Pearson's skewness coefficient given by , and the range of g 1 is (-0.99527, 0.99527). In this case, EðYÞ ¼ m, VarðYÞ ¼ s 2 e . The usual notation is Y $ SN C ðm; s 2 e ; g 1 Þ. If we consider the following transformations: it can be shown, using Jacobians (Casella and Berger, 2002, Chapter 2), that the joint density of Y and Z is given by: . Note that the density function of Y $ SN C ðm; s 2 e ; g 1 Þ can be obtained by integrating f Y;Z ðy; zjm; s e ; rÞ with respect to z; that is, f Y ðyjm; s 2 e ; g 1 Þ ¼ . This representation is very convenient, because it allows us to write an augmented likelihood function (Hea-Jung 2005; Liseo and Parisi 2013), "as if" we had observed the latent value z. The density function of the skew normal random variable under "centered parametrization" is a complicated function that was given by Azevedo et al. (2011): .
Regression with skew normal random errors: Azzalini and Capitanio (1999) and Rusell and González (2002) proposed a simple linear regression model where the error terms are independent and identically distributed as SN D ð0; v; lÞ. The proposed model is: from the properties of the skew normal distribution, it follows that The model can be easily extended to include more covariates; that is: Azzalini and Capitanio (1999) and Rusell and González (2002) used the maximum likelihood method to estimate the parameters in the model. These ideas can be extended to the case of errors that are independent and identically distributed as SN C ð0; s 2 e ; g 1 Þ.
Bayesian regression with skew normal random errors (BSN): Let . . . ; n: Then, the likelihood function is given by: Let u ¼ ðb 0 ; b t ; s 2 e ; g 1 Þ t and pðujVÞ the prior distribution of u and V a set of hyper-parameters that index the prior distributions. Then, by Bayes' theorem, the joint posterior distribution of pðujyÞ is as follows:

VÞ:
Neither the joint posterior distribution nor the full conditional distributions of the parameters of interest have a closed form; therefore, the implementation of this model within the Bayesian framework is computing intensive. We propose using hidden truncation together with two standard MCMC techniques in Bayesian analysis: (i) Gibbs Sampling (Geman and Geman 1984) and (ii) Random Walk Metropolis Sampling to alleviate some of the computing burden.
Prior, posterior and full conditional distributions: Consider the joint distribution of Y and Z given in (5). In the regression context, we set ; then the augmented likelihood function is: In order to fully specify the Bayesian model, prior distributions for the unknown parameters must be defined. Let b s 2 b $ MN p ð0; s 2 b IÞ; for r; based on the following transformation r ¼ 1 2 2B, where B $ Betaða 0 ; b 0 Þ; the prior for r is denoted as pðrja 0 ; b 0 Þ, and depending on the hyper-parameters a 0 and b 0 , it can lead to a rich variety of shapes, just as in the case of the Beta distribution. For the intercept, the prior distribution is b 0 s 2 b 0 $ Nð0; s 2 b 0 Þ; for the scale parameter, a scaled inverted chi-squared prior distribution was used, that is, s 2 e df e ; S e $ x 22 ðdf e ; S e Þ; and finally s 2 Sorensen and Gianola, 2002, p. 85, for details about the parametrization used in this paper). The joint prior distribution is By combining equations 6 and 7 throught the Bayes' theorem, the posterior distribution of pðb 0 ; b; s 2 e ; s 2 b ; rjdataÞ is given by: The full conditional distributions of the parameters are given in Appendix A, which are the inputs for the Gibbs and the Random Walk Metropolis Samplers. In Appendix B, some pragmatical rules to elicitate values for the hyper-parameters s 2 b 0 ; S e ; df e , S b ; df b ; a 0 and b 0 , are given. The rules for setting S e ; df e , S b ; df b are based on those given by de los Campos et al. (2013) and Pérez and de los Campos (2014). In this paper, we set the hyper-parameters as follows: We set s 2 b 0 ¼ 1 · 10 6 in order to reduce shrinkage and because in practice it mimics a non informative but proper distribution. To sample from the full conditionals of r and s 2 e , we implemented a Random Walk Metropolis Sampler whose parameters are tuned so that the acceptation rate is about 0.23 (see Appendix A for details).
The BSN can be re-parametrized by replacing if the prior distribution of marker effects is normal with mean 0 and variance s 2 b , then the prior of t is t $ MN n ð0; s 2 b XX'Þ, which leads to a G-BLUP model (see de los Campos et al. 2013, for details about G-BLUP) but with skew normal residuals, that is, y i ¼ b 0 þ t i þ e i or, in matrix notation, y ¼ b 0 1 þ t þ e, which is a skew linear mixed model, a particular case of the model proposed by Arellano-Valle et al. (2005, who relaxed all normality assumptions in a standard mixed model. Bayesian ridge regression With random normal errors (BRR): Regression with random normal errors is a special case of the proposed model when r ¼ 0. The model is widely used in the GS selection literature (e.g., de los Campos et al. 2013). In the GS context, the model is given by: where e i 's are independent and identically distributed as Nð0; s 2 e Þ. The prior distributions for the unknown are: Þ for the scale parameter, a scaled inverted chisquared distribution, that is, s 2 e df e ; S e $ x 22 ðdf e ; S e Þ and finally The joint prior distribution is Thus, the posterior distribution of pðb 0 ; b; s 2 e ; s 2 b jdataÞ is The required full conditional distributions of the parameters for implementing a Gibbs sampler can be found elsewhere (e.g., de los Campos et al. 2013). We set the hyper-parameters using the same rules as in the BSN model. The BRR model can be fitted easily using the BGLR statistical package (Pérez and de los Campos 2014).

Monte Carlo Simulation
In this section, we use simulated data using marker genotypes from a wheat dataset made publicly available by Crossa et al. (2010). The dataset includes genotypic information for 599 wheat lines which were genotyped for 1279 DArT markers coded as 0 and 1. We simulated the phenotypes using the following additive genetic model: where e i $ SN C ð0; 1:5 2 ; g 1 Þ, with g 1 ¼ ffiffi ffi The idea here is to verify, through simulation, whether the proposed model works satisfactorily. We therefore obtained point estimates for b 0 , b, s 2 e and r. We also fitted the Bayesian Ridge Regression model and compared the estimates of regression coefficients, predictions and estimates of genetic values of both models. Letb be the vector of posterior means for regression coefficients. Pearson's correlation between the observed (y) and predicted values (b 0 1 þ Xb) is a goodness-of-fit measure; Pearson's correlation between the "true" genetic values (Xb) and the predicted values (Xb) is a measure of how well the genetic values are estimated; finally, Pearson's correlation between the "true" marker effects (b) and the estimated effects (b) is a measure that indicates how good a model is at uncovering marker effects ). We also computed the effective number of parameters (pD) and deviance information criterion (DIC) for the two fitted models (see Spiegelhalter et al. 2002, for more details). The algorithm used in this simulation experiment is described briefly below.
1. Set b 0 , b, s 2 e and r. 2. Simulate the phenotypes using equation (9) Compute the correlation between observed and predicted phenotypes, "true" and predicted genetic values, and "true" and estimated regression coefficients with both regression models. 6. Compute the effective number of parameters (pD) and deviance information criterion (DIC) for the two fitted models. 7. Repeat steps 1 to 5 one hundred times and obtain the averages of correlations, intercept (b 0 ), s 2 e and r.

Application to real data
This dataset is from the Drought Tolerance Maize (DTMA) project of CIMMYT's Global Maize Program (http://www.cimmyt.org). The dataset comes from a large study aimed at detecting chromosomal regions affecting drought tolerance. The genotypic data consist of information from 300 tropical inbred lines that were genotyped using 1,152 SNPs (Single Nucleotide Polymorphisms). The analyzed trait is Gray Leaf Spot (GLS) caused by the fungus Cercospora zeae-maydis, which was evaluated at three different sites, Kakamega (Kenya), San Pedro Lagunillas (Mexico) and Santa Catalina (Colombia) (see Supporting information). Crossa et al. (2011) analyzed a subset of these data; the response variable was transformed using Box-Cox transformation (Box and Cox 1964). Figure 2 shows density plots for GLS rating at the three sites. Kernel density was estimated using a Gaussian kernel, and the bandwidth for the kernel was estimated according to Venables and Ripley (2002). Figure 2 also shows the sample skewness index,ĝ 1 ¼ m 3 =s 3 , where m 3 ¼ n 21 P n i¼1 ðy i 2 yÞ 3 , y is the sample mean and s is the sample standard deviation (see Joanes and Gill 1998); in the three cases, the distribution is skewed to the right, so most of the distribution is concentrated around small values of the response variable. We propose using the regression model with skew normal random errors to predict disease resistance. We fitted two models: (1) the standard Bayesian Ridge Regression, where the errors e i $ NIIDð0; s 2 e Þ; i ¼ 1; . . . ; n, where "NIID" stands for "normally, independent and identically distributed"; and (2) the proposed model with skew normal random errors. The Bayesian Ridge Regression was fitted using the BGLR package (Pérez and de los Campos 2014), whereas the proposed model was fitted using the algorithm described in Appendix A. The models were first fitted using the full data, and subsequently 100 random partitions with 80% of observations in the  training set and 20% of observations in the testing set were generated. The two models were fitted for each of these random partitions; then the phenotypes of the individuals in the testing set were predicted and the ability of each model to make predictions was evaluated using Pearson's correlation between observed and predicted values. Inferences for each fit were based on 100,000 samples obtained after discarding 50,000 samples that were taken as burn-in. Convergence was checked by inspecting trace plots of the parameters.

Data and program availability
The data and programs are available as File S1 which corresponds to a compressed zip folder. The zip folder also contains a description of the data and commands to read it into the R statistical software. Table 1 shows point estimates for b 0 , b, s 2 e and r for the BSN and BRR models for different values of r. It also shows an estimate of u ¼ s 2 e =s 2 b , a regularization parameter that is widely used in Bayesian Ridge Regression. Higher values of the parameter are associated with more shrinkage; note that the estimates of s 2 e are very similar in both models, so small values of s 2 b could be associated with more precise estimates of b. It is also clear from this table that the point estimates for b 0 and s 2 e are very close to the real values used in the simulation. The correlation between observed and predicted values and the mean squared error is quite similar for both models and there is no clear winner. Finally, the algorithm is not able to estimate precisely the parameter r for distributions that are slightly asymmetric. Table 2 shows the effective number of parameters pD, the deviance information criterion (DIC), the correlation between "true" and estimated marker effects and the correlation between "true" and estimated signals. The table also shows that in general the pD and the DIC (small is better) favored the BSN model. The correlation between "true" and estimated marker effects is slightly better for BSN and the difference between the two models becomes clearer as r increases. The same pattern is observed for the correlations between true and estimated genetic signals.

Application to real data
Full data: Table 3 shows estimates of the posterior means of parameters s 2 e , s 2 b and r, as well as the effective number of parameters ðpDÞ and the deviance information criterion (DIC). From Table 3 it is clear that the estimation of marker effects is more precise for the BSN model than for the BRR model; the pD and the DIC also favored the BSN model. The estimated r parameter also supports the assumption that the skew normal random error is correct, and that the point estimate is not around 0, except in the case of San Pedro Lagunillas. Figure 3 shows scatterplots of the predicted GLS using the BSN and BRR models. As expected, Pearson's correlation between both predictions is very high (higher than 0.95). That implies that even when the data are skewed, if a BRR model is fitted in order to obtain candidates for selection, we can expect to obtain about the same individuals. Two models were fitted for each site by BRR and BSN. Figure 4 shows scatterplots for Pearson's correlation between observed and predicted values for individuals in the testing set obtained after fitting the BSN and BRR models for the three locations. When the correlations are higher for BSN than for BRR, this is represented by a filled circle, and by an open circle otherwise. The figure also shows the number of times Pearson's correlation is higher for the BSN than for the BRR model. From this figure, it is clear that the BSN model predicts slightly better than the BRR model. Figure 5 shows a scatterplot for the mean squared errors in the testing set for the three locations. When the MSE in BSN is smaller than the MSE in BRR, this is represented by an open circle and by a filled circle otherwise. The number of times n Table 2 True and estimated posterior mean of r, effective number of parameters (pD), deviance information criterion (DIC), correlations between "true" and estimated marker effects and correlations between "true" and estimated genetic signals; standard deviations in parentheses. Phenotypes were simulated under model (9) with r 2 f0; :5; :75; :90; :95; :99g and then regression models with skew normal (BSN) and normal errors (BRR) were fitted that the MSE in BRR is greater than the MSE in BSN is also shown in the plots. From this figure, it is clear that in general, the MSE for BRR is greater than the MSE for BSN. Table 4 shows the average Pearson's correlation and mean squared error (MSE) between observed and predicted values in the testing set. The averages and the standard deviations are very similar for both models and the  differences between the models are non-significant, but the figures suggest that the BSN model predicts slightly better than the BRR model.

DISCUSSION AND CONCLUSIONS
We have proposed a Bayesian regression model for skewed responses with applications when p . .n in the GS context, but it can also be employed in other cases and, of course, when p , n. In addition, to generalize linear whole genome regression models for various discrete distributions (ordinal, binomial, etc.), this study further completes the Bayesian toolbox for whole genome regression. The proposed model uses a stochastic representation of a skew normal random variable in order to facilitate the computations; it also allows using standard MCMC techniques to fit the proposed model. Results of the simulation and of applications with real data suggest that the proposed model fits the data better and also predicts slightly better than the standard Ridge Regression model. The Ridge Regression model is a particular case of our model when r ¼ 0. On the other hand, our results also suggest that BRR is a very robust model, although in the simulations data we already knew that it was the wrong model to fit; still, the predictive power of the model was very good. Although the conventional Bayesian whole-genome regression is robust, it does not correctly deal with skew phenotypic data, and this can decrease its genomic-enabled prediction accuracy and its goodness of fit to the data. Thus, the advantages of the proposed Bayesian whole-genome regression compensate its complexity and possible increases in computational time as compared to the conventional Bayesian ridge regression. The model proposed in this study is conceptually and operationally different, and presumably simpler than the skew-normal linear mixed model of Arellano-Valle et al. (2005) that uses a multivariate skew-normal distribution in order to relax normality.
Despite the fact that skewness is a major concern for breeding data analyses and may often be a result of uneven sampling of "high" and "low" performing individuals, selection, environmental effects, etc., the theoretical developments presented in this study are also applicable to many other areas of research in agronomy and in agriculture in general. For example, most crop flowering time data are indeed skewed, as well as categorical data representing different types of diseases as those presented in this research. So, skewness in phenotypical response can be the result of an artificial phenomena, the aim of this study was to propose a statistical model that will be more appropriate to deal with that problem.
Results of this study can be compared to results of two other studies, Crossa et al. (2011) andGonzález-Camacho et al. (2012). Crossa et al. Figure 5 Plots of the mean squared error (MSE) in the testing set for each of the 100 cross-validations and 3 locations. When the MSE in BSN is smaller than the MSE in BRR, this is represented by an open circle and when the MSE in BRR is bigger than the MSE in BSN, this is represented by a filled circle. The number of times that the MSE in BRR is bigger than the MSE in BSN is also shown in the plots.
n Table 4 Average of Pearson's correlation and mean squared error (MSE) between observed and predicted values in the testing set. The predictions were obtained after fitting the BSN and BRR models. The average is across the 100 random partitions with 80% of observations in the training set and 20% in the testing set. Standard deviations are given in parentheses (2011) included one site in Mexico (San Pedro Lagunillas) that was also analyzed by transforming the original GLS ordinal scale using Box-Cox transformation; the prediction accuracy of different models (e.g., Bayesian Lasso and Reproducing Kernel Hilbert Spaces) ranged from 0.416 to 0.462. Although strict comparison with the results obtained in this study is not possible because other random cross-validations were generated, the prediction accuracies of BSN (0.5489) and BRR (0.5450) models were higher than those previously reported by Crossa et al. (2011) for the same site. Stochastic representation can be used to extend Reproducing Kernel Hilbert Space (de los Campos et al. 2010) models that in many empirical studies have led to more accurate predictions than Bayesian Ridge Regression models and Bayesian LASSO, among others (e.g., Pérez-Rodríguez et al. 2012), so this is a topic for future research. Further studies to extend the model proposed in this study to include genotype · environment interaction should not be complicated. The proposed model can also be extended by assigning different prior distributions to the marker effects, for example, to induce variable selection. This could potentially lead to a new Bayesian alphabet with skew normal random errors.

ACKNOWLEDGMENTS
We thank the researchers of the Global Maize Program of the International Maize and Wheat Improvement Center (CIMMYT) who carried out the maize trials and provided the phenotypic and genotypic data analyzed in this article.
Authors' contributions: PPR, RAP, SPE, CVC, JSE and JC conceived the idea and designed the study; RAP, JC and PPR developed the statistical models; RAP wrote the R scripts; RAP and PPR carried out the statistical analysis; RAP, JC and PPR wrote the first version of the manuscript; JC, SPE, CVC, and JSE contributed critical comments and to the writing of the final manuscript.

LITERATURE CITED
q ¼ 1 2 log 1þr 1 2 r ¼ tanh 21 ðrÞ, so that q has support in ℝ. Note that the density function of q can be obtained using the transformation method (Casella and Berger, 2002, Chapter 2), which is given by pð qjelseÞ } pðrjelseÞ · sech 2 ðqÞ: In the Random Walk Metropolis Algorithm, we generated q by choosing a proposal transition kernel to add noise to the current state. Assuming that the actual value of q is q k and that we wanted to update its value so that in the next iteration we had q kþ1 , we followed steps d-f below. d. Sample q, q ¼ q k þ Z, where Z $ Nð0; n 2 Þ. e. Sample u, U $ Uniformð0; 1Þ. f. If u , pð qjelseÞ=pð q k jelseÞ, then q kþ1 ¼ q; otherwise q kþ1 ¼ q k . Once q kþ1 has been obtained, compute r ¼ tanhðq kþ1 Þ. The parameter n 2 can be modified so that we have an optimal acceptation rate (Roberts et al., 1997).
The samples from the posterior distribution can be obtained using the Gibbs Sampler (Geman and Geman, 1984) and the Random Walk Metropolis algorithm. In the algorithm, we sampled each of the fully conditional distributions until we obtained a sample of the desired size. We implemented the algorithm in the R Statistical package (R Core Team, 2016). In order to speed up computations, the routines that sample from pðb j elseÞ, j ¼ 1; . . . ; p were implemented in C programming language (Kernighan and Ritchie, 1988), a shared library was generated and then the compiled routines were used in R. The R script and the C source code are available upon request from the first author.

APPENDIX B: SETTING THE HYPER-PARAMETERS FOR THE PRIOR DISTRIBUTIONS
The hyper-parameters can be set using a set of default rules similar to those used in the BGLR software (de los Campos et al., 2013; Pérez and de los Campos 2014). With these rules we assigned proper but weakly informative prior distributions so that we partitioned the total variance of the phenotypes into two components: (1) the error and (2) the linear predictor, that is: where g i ¼ P p j¼1 x ij b j . A priori, the total genetic variance is P n i¼1 Varðg i Þ ¼ s 2 b P n i¼1 P p j¼1 x 2 ij and the a priori average genetic variance is: where MSx ¼ 1 n P n i¼1 P p j¼1 x 2 ij . So from (B.1) and (B.2), the partition of the phenotypic variance is given by: where V e ¼ Varðe i Þ ¼ s 2 e . Setting the hyper-parameters for x 22 ðs 2 b jS b ; df b Þ In the parametrization used in this work, Eðs 2 b jS b ; df b Þ ¼ Sb dfb 2 2 and Modeðs 2 b jS b ; df b Þ ¼ Sb dfbþ2 . We set df b =5 so that the prior distribution has a finite mean. From equation (B.2), thus, if we replace the left-hand side of (B.4) with the mode of s 2 b S b ; df b then: From (B.5) and solving for S b , we obtain S b ¼ Vg · ðdfbþ2Þ MSx . From the definition of heritability, h 2 ¼ V g =V y , so that V g ¼ h 2 V y ; then: Once we set df b ; we can set S b , and we only need to compute the phenotypic variance ðV y Þ, MSx and set R 2 as the proportion of the variance that is explained a priori by the markers. By default we set R 2 ¼ 0:5.
Setting the hyper-parameters for x 22 ðs 2 e jS e ; df e Þ In the parametrization used in this work, Eðs 2 e jS e ; df e Þ ¼ Se dfe 2 2 and Modeðs 2 e jS e ; df e Þ ¼ Se dfbþ2 , we set df e ¼ 5 and S e ¼ ð1 2 R 2 ÞV y · ðdf e þ 2Þ.
Setting the hyper-parameters for Nðb 0 j0; s 2 b 0 Þ We set s 2 b 0 ¼ 1 · 10 6 so that the prior assigned to the intercept is effectively a flat one.
Setting the hyper-parameters for pðrja 0 ; b 0 Þ The density function of the prior assigned to r is: Setting different values for those hyper-parameters could lead to a rich variety of shapes, just as in the Beta distribution. Note that if we set a 0 ¼ b 0 ¼ 1, then we obtain a Uniformð21; 1Þ prior that corresponds to the one used in this work.