Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence

Abstract Recent developments allowed generating multiple high-quality ‘omics’ data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values.


Introduction
Predicting complex traits is a fundamental aim of quantitative genetics. The use of whole-genome single nucleotide polymorphisms (SNPs) considerably improved the prediction of breeding values, resulting in the process widely known as genomic prediction (Meuwissen et al. 2001). A number of statistical approaches are now applied routinely in breeding programs, such as genomic best linear unbiased prediction (GBLUP) (VanRaden 2008), ridge regression (Whittaker et al. 2000), or methods from the "Bayesian Alphabet" (Gianola et al. 2009). More recently, machine learning algorithms have been tested in the context of genomic prediction (Gonzá lez-Recio et al. 2013;Pook et al. 2020;Zingaretti et al. 2020). These models may have several advantages when compared with the linear models routinely used in animal breeding programs, such as capturing interactions between predictors (nonadditive effects), automatic variable selection, and for making fewer assumptions regarding the underlying genetic architecture of phenotypes (Nayeri et al. 2019; Pé rez-Enciso and Zingaretti 2019). However, compared to the routinely used models, prediction performance from machine learning methods has shown mixed results (Azodi et al. 2019;Abdollahi-Arpanahi et al. 2020;Perez et al. 2022). There seems to be no "one-size-fits-all" model, as results are dependent on trait genetic architecture, size of the data, and fine-tuning of hyperparameters.
Recent development of low-cost high-throughput molecular technologies allowed multiple high-quality 'omics' data to be measured with high accuracy (Fernie and Schauer 2009;Tohge and Fernie 2015;Chawade et al. 2016). This has led to interest in utilizing these as new layers of information to improve the predictive performance of genomic prediction models, ultimately contributing to improve efficiency of breeding programs (Guo et al. 2016;Li et al. 2019). For example, gene expression levels measured in tissue samples by direct RNA sequencing (RNA-seq) are now readily available to animal and plant breeders (Stark et al. 2019). To incorporate these new sources of information into genomic prediction models requires new strategies for integration with the already widely used genome-wide marker data. Although most of the literature focusing on the inclusion of geneexpression data into genomic models to improve predictive performance aimed at predicting phenotypes (Takagi et al. 2014;Guo et al. 2016;Schrag et al. 2018;Azodi et al. 2019;Li et al. 2019;Morgante et al. 2020), fitting gene transcript levels as an additional layer of information into genomic models could indirectly improve the prediction of breeding values. Christensen et al. (2021) presented a 2-step method to incorporate such intermediate omics into genomic evaluations considering complete and incomplete omics-data scenarios. Results were validated using simulated data and suggested superiority of the single-step method including both the intermediate omics and genomics data over the traditional GBLUP using only genomics data. Similar results were observed by Michel et al. (2021) when investigating the integration of gene expression into genomic prediction for disease resistance in wheat by using a hybrid relationship matrix for merging both layers of omics data.
After observing much less variation in the gene expression among monozygotic twins than among siblings or unrelated individuals, Cheung et al. (2003) suggested a strong association between genotypes and the level of gene expression in humans. This finding is an indication that SNP genotypes and gene transcripts might contain redundant information which turns into a challenge when these are used as predictors in genomic models. Therefore, an adequate handling of these associations is necessary to prevent inflated relative contributions of individual layers (Holm et al. 2010;Christensen et al. 2021). In the same line of thought, Wade et al. (2022) have emphasized that the benefits of multiomics integration models over single-omic models are achieved once redundancy of predictors is decreased. Consequently, multiomics models should either automatically or through adequate parametrization be able to identify and manage information redundancy across multiple omics layers.
In this study, we used data from the Diversity Outbred (DO) mouse population Svenson et al. 2012) to evaluate the utility of gene expression in addition to genomewide genetic markers for genomic prediction using different modeling strategies. To this end, the objectives of this study were to: (1) estimate the proportions of phenotypic variance explained by genetic markers and gene transcripts in complex traits recorded in at least 2 time points to assess sampling time dependency; (2) evaluate the predictive accuracy for phenotypes using transcripts and/or marker information for the traits investigated using linear models and machine learning approaches; and (3) evaluate how the inclusion of transcripts affects estimation of genomic breeding values (GEBV) from linear models. We considered best linear unbiased prediction (BLUP) as a representative of linear models, while the gradient boosting algorithm was used as the machine learning representative. The BLUP models tested here vary in number of components, how interactions were modeled, and conditioning of one component on another. In addition to more traditionally used models, we developed and tested a new model called GTCBLUP with the objective to reduce redundancy between genomics and transcriptomics layers. The gradient boosting algorithm is a nonparametric machine learning method and was chosen for its ability to automatically control redundancy and implicitly account for nonlinear effects in prediction, while the BLUP models tested comprise parametric approaches to incorporate genomics and transcriptomics, considering or ignoring the interactions between them.

Phenotypes
Data used for this study were obtained from The Jackson Laboratory (Bar Harbor, ME, USA) and comprise a subset of the dataset used in Perez et al. (2022). The 478 DO mice originated from 4 nonoverlapping generations (4, 5, 7, and 11) with males and females represented equally. The total number of animals per generation was 47, 47, 192, and 192 for generations 4, 5, 7, and 11, respectively, with slight variation in the numbers of missing records across traits (Table 1). The mice were maintained on either standard high-fiber (chow, n ¼ 239) or high-fat diet (n ¼ 239) from weaning until 23 weeks of age. The proportion of males and females within each diet category was close to 50-50 for all generations, as well as within each litter-generation combination (2 litters per generation). This population is maintained under a systematic mating scheme, designed to limit population structure and relatedness. On average, the animals were related to each other at a level equivalent to first cousins, which is by design . More elaborate description of population structure, husbandry and phenotyping methods can be found in Svenson et al. (2012) and Tyler et al. (2021). Table 1 gives for each trait a brief description, the numbers of observations, and the estimated heritability. We considered 6 traits based on range of heritability and presumed genetic architectures. The chosen traits were measured 2 or 3 times during the animal's life, resulting in 13 distinct traits in total. The analyzed traits were bone mineral density at 12 (BMD12) and 21 (BMD21) weeks; body weight at 10, 15, and 20 weeks (BW10, BW15, and BW20); circulating cholesterol at 8 (CHOL8) and 19 (CHOL19) weeks; adjusted body fat percentage at 12 (FATP12) and 21 (FATP21); circulating glucose at 8 (GLUC8) and 19 (GLUC19) weeks; and circulating triglycerides at 8 (TRGL8) and 19 weeks (TRGL19). These traits can be categorized into measurements of body composition (bone mineral density, body weights, and fat percentage) and clinical plasma chemistries (circulating glucose and triglycerides). To allow a fair comparison between parametric and nonparametric models compared in this study, phenotypic records were precorrected for fixed effects of diet, generation, litter, and sex (Perez et al. 2022). Therefore, the precorrected phenotypes (y Ã Þ analyzed here comprise the sum of the additive genetic effect and residual terms.

Genotypes
The genotype data used for the animals in this study were obtained from their derived founder haplotypes (for details see Perez et al. 2022). The complete genotype file used for the analyses included 64,000 markers on an evenly spaced grid, and the average distance between markers was 0.0238 cM. The full genotype dataset was cleaned based on the following criteria: variants with minor allele frequency <0.05, call rates <0.90, and linear correlation between subsequent SNPs >0.80 were removed. After quality control, a total of 50,122 SNP markers were available for the mice with phenotypic, genotypic, and transcriptomic records.

Transcript levels
Transcriptome-wide expression levels were measured from whole livers as previously described (Munger et al. 2014;Chick et al. 2016) for 478 animals at 26 weeks of age. The RNA sample was sequenced using single-end RNA-Seq (Munger et al. 2014) and aligned transcripts to strain-specific genomes from the DO founders (Chick et al. 2016). Read counts were estimated using an expectation-maximization algorithm (EMASE, https://github. com/churchill-lab/emase). Read counts were previously corrected for the effects of sex, diet, and batch by normalizing in each sample using upper quantile normalization and applying a rank Z transformation across samples. After quality control, quantification of transcripts was available for 11,770 genes (Tyler et al. 2017).
Best linear unbiased prediction GBLUP Below we introduce 5 BLUP models and 3 gradient boosting machine (GBM) models with their acronyms and key features summarized in Table 2. The statistical model of GBLUP is: where yÃ is the vector of precorrected phenotypes, 1 is a vector of ones, l is the intercept, and g is the vector of random additive genetic values, where g $ Nð0; Gr 2 g Þ, G is the additive genomic relationship matrix between genotyped individuals, and r 2 g is the additive genomic variance. The matrix G is constructed following the second method described by VanRaden (2008) as ZZ 0 m , where Z is the matrix of centered and standardized genotypes for all individuals and m is the number of markers. Finally, e is the vector of random residual effects, where e $ Nð0; Ir 2 e Þ with r 2 e being the residual variance, and I is an identity matrix.

TBLUP
To evaluate the performance of transcriptomic data for predicting complex traits, we used a Transcriptomic Best Linear Unbiased Predictor (TBLUP) model. This model is similar to GBLUP, but using a transcriptomic relationship matrix, which evaluates the similarity among animals based on gene expression levels (Guo et al. 2016).
The statistical model of TBLUP is: where yÃ, 1, and l are defined as above, t is the vector of random transcript level effects, where t $ Nð0; Tr 2 t Þ and T is the transcriptomic relationship matrix built according to the formula WW 0 k , where W is the matrix of centered and standardized expression levels for all animals and k is the number of genes, and r 2 t is the variance explained by gene transcripts.

GTBLUP and GTIBLUP
The GTBLUP model fitted the g and t as independent random effects, each with their own variance component (Guo et al. 2016;Li et al. 2019). The model is yÃ ¼ 1l þ g þ t þ e; where all the parameters are as defined above.
The GTIBLUP model fitted g, t, and the interaction between g and t with an additional variance component (Morgante et al. 2020). This model is yÃ ¼ 1l þ g þ t þ gt þ e, where yÃ, 1l, g, t; and e are as defined above, and gt is the vector of interaction (between genomic and transcriptomic) effects, where gt $ Nð0; G#Tr 2 gt Þ and # is the Hadamard product.

GTCBLUP
The newly developed GTCBLUP model was similar to GTBLUP in that the g and t that were fitted as independent random effects, each with their own variance component. However, for this model, the transcript levels were conditioned on SNP genotypes, yielding a matrix W c computed as: (Hastie et al. 2009), Z is the matrix of centered and standardized genotypes as before, I is an identity matrix, and k ¼ m Ã r 2 e r 2 g , r 2 e is the residual variance, and r 2 g is the additive genomic variance, both variances estimated with the GBLUP model (including only g). Using the smoother matrix, i.e. including Ik rather than using I À Z Z 0 Z ð Þ À1 Z 0 , reflects that the effects associated with the SNPs are estimated as random rather than fixed effects. The aim of this model is to remove any variance from transcripts that is correlated to variance in genotypes, such that any phenotypic variance both associated with variance in genotypes and transcripts automatically will be associated with the genotypes only. Our hypothesis is that when using the GTCBLUP model for genomic prediction, any overlapping information contained between SNP genotypes and gene transcripts is removed, allowing the model to perform a more accurate partition of variance for the genetic component, and ultimately,  increase prediction accuracy for GEBV. The model is yÃ c k , and all other parameters are as defined above.

Gradient boosting machine models
GBM is an ensemble learning technique that applies an iterative process of assembling "weak learners" into a stronger learner, being largely used for both classification and regression problems (Friedman 2001). In the scope of this investigation, the GBM algorithm represents a nonparametric approach capable of implicitly fitting not only the additive effects of SNP and gene transcripts but also the within-and between-omics layers interactions. The GBM is also capable of performing automatic feature selection, prioritization of important variables, and discarding variables containing irrelevant or redundant information. A detailed description of the GBM algorithm and its application in genomic prediction can be found in Friedman (2002) To obtain the best possible results from the GBM algorithm, a grid search approach was used to determine the combination of hyperparameters that minimized the mean squared error of prediction within the inner training set for each trait. Details of the hyperparameter search method used are found in Perez et al. (2022). We implemented the GBM model using the "gbm" R package (Ridgeway 2020).
We tested 3 different GBM models. The first model considered only SNP genotypes as predictors (GGBM), the second model considered only (standardized) gene transcript levels as predictors (TGBM), and a third model that considered both genetic markers and transcript levels together as predictors (GTGBM). Our objective was to investigate if GBM models could capture within-and between-omics layers associations, while also reducing withinand between-omics layers redundancy by performing automatic variable selection. It is important to note here that although here we used "G" and "T" letters to refer to genomics and transcriptomics data in the GBM model's acronyms, predictors were fit directly in the model and not as relationship matrices.

Variance explained by genetic markers, transcript levels, and combinations of both
To understand how much of the phenotypic variance can be explained by using SNP genotypes, gene transcript levels and the combinations of both sources of information, we estimated variance components using the GBLUP, TBLUP, GTBLUP, GTIBLUP, and GTCBLUP models. Estimates of variance components along with the residual variance (r 2 e ) were obtained from a Bayesian approach analysis, using the BGLR R package (Pé rez and de los Campos 2014). The residual variance and variances from random effects included in the models were assigned a scaled-inverse v 2 density p r 2 where h represent the variance component (g; t; t c ; gt; or eÞ; S and df are the scale and degree of freedom parameters. The value of 5 was used for df in all models. The parameter S was calculated for r 2 e as S e ¼ r 2 p Ã ð1 À R 2 Þ Ã df e À 2 À Á , where r 2 p is the phenotypic variance and R 2 is the prior expectation for the proportion of variance to be explained by the model. For all other variance components, S was calculated as the relationship matrix assigned to the respective variance component (hÞ. For all models, the Gibbs sampler was run for 60,000 iterations, with a 20,000 burn-in period and a thinning interval of 10 iterations. Consequently, inference was based on 4,000 posterior samples.
For the GTIBLUP model, we calculated the proportion of variance explained by SNP genotypes (h 2 ¼ r 2 g r 2 g þr 2 t þr 2 gt þr 2 e Þ, gene , and from the interaction between effects from genetic markers and gene transcripts ). Consequently, the sum of h 2 , t 2 ; and gt 2 represent the proportion of the phenotypic variance explained by 2 layers of omics data (h 2 and t 2 Þ and by the between-omics-layer interactions (gt 2 Þ. The parameters h 2 , t 2 ; and gt 2 were calculated similarly for the other models but omitted any variance components associated with effects not included in the model.

Predictive performance for phenotypes
Performance of predictions from the models was measured by the accuracy, computed as the Pearson correlation (r y Ã ;ŷ ), and the relative root-mean-squared error of prediction (RRMSE) between predictions (ŷÞ and precorrected phenotypes ðy Ã Þ: ðy Ã ÀŷÞ 2 s =r p , where r p is the phenotypic standard deviation. In all analyses, we used a forward prediction validation scheme in which animals from older generations (4, 5, 7) were used as the reference and animals from the younger generation (11) as the validation subset. The standard error (SE) around the r y Ã ;ŷ estimates was obtained by calculating the standard deviations from 10,000 bootstrap samples (Davison and Hinkley 1997). The bootstrapping procedure was implemented using the "boot" R package (Canty and Ripley 2021). Statistical difference between prediction accuracies from different models was tested using the Hotelling-Williams test for dependent correlations (Steiger 1980). We have also assessed prediction bias by obtaining the regression coefficient from the linear regression of corrected phenotypes on model predictions. For these results, values above 1 indicate deflation, while values below 1 indicate inflation of predicted values.
To assess the proportion of variance explained by the models tested, we have calculated the coefficient of determination (R 2 ) from the regression of corrected phenotypes on model predictions for all traits. For the GBM models, we have used results from the model using the previously obtained best hyperparameter set from the standard grid-search procedure to assess the model R 2 for prediction within the reference set.

Predictive performance of different model components
For the BLUP models proposed to integrate SNP genotypes and gene transcripts (GTBLUP, GTIBLUP, and GTCBLUP), in addition to r y Ã ;ŷ we have also calculated the correlations between the solutions for each random effect included in the model (g, t, t c ; or gt) andŷ, as well as pairwise comparisons between all components in the model. Noting that the model component g in fact are GEBV, also the predictive performance for GEBV for any of the models was evaluated as the Pearson correlation between g andŷ. We did not divide those by the square root of the heritability, which would transform them into accuracies of GEBV with an interpretation of being the correlation between estimated and true breeding values, but instead evaluated the predictive performance of each model component in terms of the accuracy of predicting phenotypes. This enabled direct comparison across the different model components. Evaluating solutions from the additive genetic component from these models enabled to assess if the prediction of GEBV can be improved by using models capable of integrating SNP genotypes and gene transcripts for genomic prediction.

Results
Variance components estimation percentage of variance explained within the reference set Genomic heritabilities (h 2 ) obtained with GBLUP ranged from 0.08 to 0.44, representing a wide range of magnitudes across traits (Figs. 1 and 2). The estimated values for each variance component for all traits is presented in Supplementary Tables 1 and 2. When only fitting transcript levels as predictors (TBLUP), the percentage of variance explained (t 2 ) ranged from 0.22 to 0.75 and in general, it was higher than h 2 when comparing within the same trait. The exceptions to that were observed for BMD12 (h 2 ¼ 0.39 and t 2 ¼ 0.35) and GLUC8 (h 2 ¼ 0.30 and t 2 ¼ 0.22). When comparing the same trait measured at different time points, t 2 from TBLUP was higher for phenotypes collected closer to 26 weeks of age (i.e. the age at mRNA data sampling).
In terms of the total phenotypic variance explained, GTBLUP and GTIBLUP showed similar results (Figs. 1 and 2). For body weights (BW10, BW15, and BW20) and fat percentage (FATP12 and FATP21) traits, the variance explained by genetic markers (in GTBLUP and GTIBLUP) was drastically lower when compared with GBLUP for the same traits. For the remaining traits also a decrease in genetic variance captured by markers was observed, albeit that the decrease was much lower. For the interaction component in GTIBLUP (gt 2 ), results observed varied according to the trait analyzed but in general, it was low compared to both h 2 and t 2 . The only exception to that was observed for TRGL8, in which gt 2 was higher than h 2 and t 2 . For CHOL8, GLUC19, and TRGL19, gt 2 was either similar to h 2 or t 2 .
For GTCBLUP, differently from GTBLUP and GTIBLUP, the additive genetic variance captured was always in line with results from GBLUP. On the other hand, the variance explained by transcripts (t 2 ) from GTCBLUP was always lower than observed by other models including transcripts as predictors (TBLUP, GTBLUP, and GTIBLUP).
The variance explained (represented by the R 2 parameter) within the reference data by parametric models was in general lower than by the nonparametric counterparts (Table 3). Independent of being a parametric or nonparametric model, the use of gene transcripts (TBLUP and TGBM) as predictors explained in most cases more of the variance than using exclusively SNP genotypes (GBLUP and GGBM). For GTBLUP, GTIBLUP, and GTGBM, the variance explained was at least similar to observed for TBLUP and TGBM, but generally higher. For GTCBLUP, variance explained by the model was slightly to moderately higher than observed for GBLUP model, but always smaller than observed for GTBLUP, GTIBLUP, and GTGBM. The average R 2 when considering only traits recorded earlier (suffixes 8, 10, or 12) and later (suffixes 19, 20, or 21) moments were 76% and 83%, respectively, when using TBLUP, being the largest difference observed across models when considering these 2 groups of traits.

Prediction performance-phenotype prediction
In  1. Percentage of variance explained by SNP genotypes (g 2 ), gene transcripts (t 2 ), the interaction between them (gt 2 ), and not explained (e 2 ) by GBLUP, TBLUP, GTBLUP, GTIBLUP, and GTCBLUP models tested for the traits BW and FATP. For a description of the traits, see Table 1. For a description of the models, see Table 2.
presented in Supplementary Table 3. Here, we considered GBLUP to be the reference method. It showed prediction accuracies ranging from 0.01 to 0.29, these were highly positively correlated to the portion of variance explained by SNP genotypes by the same model, except for CHOL19. When comparing predictive performance between GBLUP and GGBM models, GBLUP yielded the highest prediction accuracies for 7 traits, while GGBM had the best predictive performance for 6 traits out of 13.  2. Percentage of variance explained by SNP genotypes (g 2 ), gene transcripts (t 2 ), the interaction between them (gt 2 ), and not explained (e 2 ) by GBLUP, TBLUP, GTBLUP, GTIBLUP, and GTCBLUP models tested for the traits BMD, CHOL, GLUC, and TRGL. For a description of the traits, see Table 1. For a description of the models, see Table 2.
For models that include only gene transcripts (TBLUP and TGBM), the TBLUP approach showed predictive accuracies ranging from 0.03 to 0.61, having the best performance for only 4 out of 13 traits. The TGBM model was able to overcome TBLUP for 7 traits, with prediction accuracies ranging from 0.04 to 0.58. For BMD21 and BW15, predictive accuracy was identical between TBLUP and TGBM. The differences between accuracies from TBLUP and TGBM were higher than between GBLUP and GGBM.
For models that combined SNP genotypes and gene transcripts levels (GTBLUP, GTCBLUP, GTIBLUP, and GTGBM), GTBLUP had the highest predictive accuracy for 5 traits out of 13. The second-best model overall was GTGBM, with the highest predictive accuracy for 4 traits. For every trait that GTIBLUP had the highest prediction accuracy, it was identical to the result for GTBLUP, while the GTCBLUP never had the highest predictive accuracy (Table 4).
The prediction error (RRMSE) and bias (b) for model's predictions are presented in Supplementary Tables 4 and 5, respectively. Considering single-omics models, on average BLUP models (GBLUP and TBLUP) yielded less biased predictions than GBM models (GGBM and TGBM). For models integrating SNP genotypes and gene transcripts, GTBLUP and GTIBLUP showed similar bias across traits, while GTGBM had on average less bias than the BLUP models. For the GTCBLUP model, predictions were inflated (b < 1Þ for all traits but BMD21. In terms of prediction error, differences between models were smaller than observed for bias (Supplementary Table 5) or predictive accuracies (Table 4). The lowest RRMSE values were observed for FATP21, while the Table 3. Model R 2 (Â100) for the best linear unbiased prediction (GBLUP, GTBLUP, GTCBLUP, and GTIBLUP) and gradient boosting (GGBM, TGBM, and GTGBM) approaches within training data.  For each group of models, the result with the highest accuracy is indicated in bold, identical results between 2 or more models are indicated in underline. For each row, different letters indicate significant differences between models. a For a description of the traits, see Table 1.
b For a description of the models, see Table 2. highest RRMSE values were observed for GLUC19. The RRMSE values for all traits analyzed were all around 1, indicating the average prediction errors were close to 1 phenotypic standard deviation.

Predictive ability for GEBV and other model components, and the correlation between them in BLUP models
In Table 5, the Pearson's coefficient correlation between model components solutions (ĝ,t,t c , andĝ t ) for the different BLUP models and corrected phenotypes (y Ã ) are shown. Corresponding standard errors are presented in Supplementary Table 6. Overall, results for GTBLUP and GTIBLUP were similar across traits. These 2 models had the most accurate GEBV (qĝ y Ã ) exclusively for GLUC8, while for BMD12 results from these models were matched by GTCBLUP. For GLUC19, all 4 parametric multiomics models yielded the same accuracy for GEBV, which was the lowest (0.01) across traits. In 8 out of 13 traits, the GEBV estimated using GTCBLUP model was the most accurate across all models. The correlation betweent and y Ã (q^t y Ã ) was also similar between GTBLUP and GTIBLUP, being always higher for these 2 models than observed for GTCBLUP. For GTCBLUP exclusively, q^t y Ã was low and negative for CHOL19 (À0.08), GLUC19 (À0.05), and TRGL8 (À0.07). For most traits, although a slight increase in the total variance explained was observed within the reference dataset ( Figs. 1 and 2) when comparing GTBLUP and GTIBLUP, there was not a proportional increase in qĝ y Ã in the validation (Table 5). For GTCBLUP, on the other hand, for all traits, there was an increase in the variance explained by SNP genotypes (g 2 in Figs. 1 and 2) when compared with GTBLUP and GTIBLUP, and the same pattern was observed for qĝ y Ã . Results for qĝ^t varied from þ0.14 to þ0.29 for GTBLUP and from þ0.13 to þ0.29 for GTIBLUP. For GTCBLUP, values for qĝt c were all negative and close to zero, ranging from À0.13 to À0.03 (Table 5). The values for qĝĝ t , only calculated for GTIBLUP, were close to zero for most traits with an exception for CHOL8 and CHOL19, for which qĝĝ t was 0.18. A similar pattern was observed for qtĝ t , for which values varied from À0.12 to þ0.06, with the largest differences from qĝĝ t observed for CHOL8 and CHOL19.

Discussion
Here, we investigated parametric and nonparametric approaches to leverage transcriptomic data for the prediction of complex phenotypes. To accomplish that, we used 478 animals from the DO Mouse population , for which information on phenotypes ) for a wide range of quantitative traits, SNP genotypes and gene transcript levels from liver tissue (Tyler et al. 2017) were available on the same animals. We used the genomic (GBLUP) and transcriptomic (TBLUP) BLUP models to evaluate the value of these omics data to predict phenotypes. In addition, we evaluated models to integrate genome and transcriptome data by modeling both layers independently (GTBLUP) or including an interaction component between the genome and transcriptome (GTIBLUP). Finally, we developed and tested the new GTCBLUP model that removes the betweenomics-layer information redundancy. The GBM algorithm was investigated as a nonparametric approach potentially able to perform variable selection and capture nonlinear effects by fitting either SNP genotypes (GGBM), gene transcript levels (TGBM) or to integrate both layers implicitly modeling interactions within-and between-omics layers (GTGBM).
Using data from 6 distinct traits measured at 2 or 3 time points (resulting in 13 traits in total), we first assessed the proportion of phenotypic variance explained by each variance component included in the parametric models ( Figs. 1 and 2). When using transcripts as predictors, 2 main patterns were observed. For 10 out of 13 traits (Figs. 1 and 2), the TBLUP model explained much a larger portion of the phenotypic variance than GBLUP. The observation that the portion of variance explained by gene transcripts is strongly trait-specific is in line with results observed when assessing the proportion of variance from gene transcripts for Table 5. Pearson's coefficient correlation (qÞ between model's components (ĝ,t,t c , andĝt) solutions and corrected phenotypes (y Ã Þ for BLUP models proposed. a The numbers in bold (per row) show the best GEBV accuracies (qĝ y Ã ) across models, identical results between 2 or more models are underlined. Superscripts were used to identify significant differences among qĝ y Ã (letters a and b) and among qt y Ã (letters u and v) obtained with the different models. a qĝ y Ã ¼ correlation between additive genetic effect and corrected phenotypes; q^t y Ã ¼ correlation between gene transcripts effect and corrected phenotypes; qĝ^t ¼ correlation between the additive genetic and gene transcripts effects; qĝĝ t ¼ correlation between the additive genetic effect and the interaction between genetic and gene transcript effects; q^tĝ t ¼ correlation between the additive genetic effect and the interaction between genetic and gene transcript effects; qt c y Ã ¼ correlation between gene transcripts conditioned on SNP genotypes and corrected phenotypes; qĝt c ¼ correlation between the additive genetic effect and gene transcripts conditioned on SNP genotypes. b For a description of the traits, see Table 1. c For a description of the models, see Table 2. complex traits in Drosophila (Morgante et al. 2020). Ehsani et al. (2012) have analyzed data from an F2 mice population using models integrating genotype markers and liver transcriptomics data. The authors reported that transcripts explained 79%, while genotypes explained 36% of the phenotypic variance for body weight at 8 weeks of age. It is important to emphasize that in Ehsani et al. (2012), RNA samples were measured at the same time point as phenotypes were collected. On the other hand, studies have observed that genetic markers always explained a larger portion of variance than transcripts in maize (Guo et al. 2016;Azodi et al. 2019) and Drosophila (Li et al. 2019). The conflicting results found in literature may reflect the transient nature of how genes are expressed (Kleyman et al. 2017). Differently from genotypes, transcripts are affected by many factors such as the tissue from where samples are collected, the moment in life of sampling and the environmental conditions that the animal was exposed to. These variables most likely impact the variance explained (and concomitantly prediction performance) by transcripts.
The stability of gene transcripts over time has been a topic of interest as it might affect the genotype-to-phenotype link (Karlovich et al. 2009). Bryois et al. (2017) have analyzed transcriptomics data from whole-blood samples in unrelated human individuals collected at 2 time points, separated by 22 months. The authors reported correlations between transcriptomics data of individuals at both time points ranging from À0.50 to 1.00, with an average of 0.31 across the approximately 18,000 genes considered. This indicates that while genome-wide there is a huge variation in stability of expressions across genes, the expression of some genes is highly stable between time points as far as 22 months apart. In this study, transcripts were measured when the mice were 26 weeks of age, while all phenotypes were recorded at younger ages (from 8 to 21 weeks of age). Phenotypes recorded closer to 26 weeks of age had a larger proportion of phenotypic variance explained by transcripts than measurements made earlier in the animal's life for the same phenotype (Figs. 1 and 2). For BW and FATP the transcripts explained a larger proportion of phenotypic variance than genotypes at all time points. For BMD, CHOL, GLUC, and TRGL, this was the case when there was 4 (BMD) to 6 weeks (CHOL, GLUC, and TRGL) in-between measuring phenotypes and transcripts, while genomics explained more phenotypic variance when this time frame increased to 14 (BMD) or 18 weeks (CHOL, GLUC, and TRGL) inbetween. In Azodi et al. (2019), gene transcripts were quantified from whole seedlings, while phenotypes were recorded at a much older age and authors have observed limited predictive ability of transcriptomics data. Together with results from literature, our findings seem to confirm that the magnitude of the association of phenotypes with gene expressions may be time dependent. It is important to emphasize here that although this outcome could have been expected beforehand, to our knowledge it is the first time that this link between amount of variance explained by transcript vs the time difference between measuring transcripts and phenotypes has been shown empirically. The relationship between the tissue from where RNA samples are taken and the phenotype being analyzed can also be detrimental to omics model's performance. The gene expression from whole maize seedlings considered in Azodi et al. (2019) showed no consistent improvement in predictive accuracy when compared with models considering only genomics data. Whole maize seedlings are probably less related to traits collected later in life than the gene expression from liver tissue available for the DO mouse dataset. It is widely known that the liver is strongly linked to many metabolic pathways (Ponsuksili et al. 2019), and therefore likely also especially to the BW and FATP traits used here, while the variation contained in a sample collected from whole seedlings do not reflect a specific tissue but a pool of all tissues in this organism. This could potentially mask any informative tissuespecific signals that could improve predictive accuracy for phenotypes. In general terms, it looks like although in many cases the variation in gene transcripts may capture higher proportions of variance from phenotypes than genetic markers, although this is dependent on the tissue and time of sampling, and these dependencies are likely to be trait specific.
When fitting both SNP genotypes and gene transcripts as predictors the portion of variance explained by SNP genotypes was drastically lower than for the GBLUP model. Ehsani et al. (2012) and Takagi et al. (2014) observed a reduction in captured genetic variance by SNP genotypes of around 50% when fitting genotypes together with transcripts compared to models fitting only genotypes as predictors for complex traits in other mice populations. This seems to confirm the hypothesis that there is redundant information between the genome and transcriptome layers (Wade et al. 2022), as also shown to be the case in Drosophila (Morgante et al. 2020). In our experience, it seems that the closer the phenotype analyzed is to the moment of RNA sampling, the higher the decrease in genetic variance captured by SNP genotypes in GTBLUP and GTIBLUP. This was observed for almost all traits we analyzed in different magnitudes. Takagi et al. (2014) analyzed circulating cholesterol at 10 weeks of age in mice and reported a large decrease in the genetic variance captured from SNP genotypes from models including only SNP genotypes (g 2 ¼ 46%) and together with liver transcripts (g 2 ¼ 19%) also measured at 10 weeks of age. In this study, we observed only a smaller decrease in genetic variance estimated when comparing GTBLUP (g 2 ¼ 28%) and GBLUP (g 2 ¼ 38%) for CHOL8. This seems to confirm that for the same phenotype, measurements made closer to the RNA sampling are prone to exhibit this pattern in a higher magnitude than when measured farther from the RNA sampling. This was further substantiated by the results observed for GTCBLUP. By conditioning the transcripts on the genotypes, the portion of variance explained by SNP genotypes was similar to the GBLUP model, while the variance explained by gene transcripts was much lower than estimated with TBLUP, GTBLUP, and GTIBLUP.
The formal variance partitioning achieved with the BLUP models cannot be achieved with the nonparametric GBM models. To compare the performance of GBM and BLUP in terms of explained variance we investigated the model R 2 within the reference set. For the GBM models, the model R 2 was almost always higher than for the BLUP models (Table 3). From our results, this pattern is recognizable for almost all traits analyzed, in which the GBM algorithm is able to capture a higher portion of variance than the parametric counterpart within the reference dataset (Table 3) but fails to outperform these models when predicting in the validation set (Table 4). The presence of noise in the data, limited size of the training set and the underlying complexity of the event being modeled are often cited as common causes of overfitting in machine learning models (Vabalas et al. 2019;Ying 2019). Here, we used a training dataset of approximately 286 animals and the high number of predictors in the models, coupled with the unavoidable presence of collinearity within-and between-omics layers may have caused GBM models to overfit (Shalev-Shwartz and Ben-David 2014). We have observed a big impact of hyperparameters on the predictive accuracies of the GBM models (results not shown). Having access to larger datasets could help to elucidate the magnitude of this impact for the models analyzed here since it would decrease the impact of hyperparameter definition in predictive performance, improving strength of evidence for any differences found between GBM and other models tested.
Prediction performance for phenotypes can be improved by adding transcripts in addition to genotypes in the model; however, our results suggested that the magnitude of improvement is dependent on the trait analyzed (Table 4). In line with the observed differences in variance explained by model components, TBLUP showed a better predictive ability than GBLUP for most traits except CHOL19, GLUC8, and GLUC19. In contrast, Takagi et al. (2014) reported higher predictive accuracies for circulating glucose and cholesterol when using liver transcripts as predictors when compared with using genotypes in a different heterogeneous mice population (Valdar et al. 2006). A relevant aspect is that in this study phenotypes for CHOL and GLUC were collected at 8 and 19 weeks, while RNA samples were taken at 26 weeks of age. As previously mentioned, in Takagi et al. (2014), RNA samples and phenotypes were collected around the same age (10 weeks). In this study, phenotypes recorded at a time point closer to the moment of transcript profiling resulted in more phenotypic variance being explained by transcripts for all 6 traits ( Fig. 1), and for 5 out of 6 traits in higher predictive accuracy for TBLUP (Table 4). This was very clear for FATP and BW, where the variances explained by gene transcripts in the liver were high, which biologically makes sense. The pattern of increased variance explained and higher prediction accuracy was, however, also observed for BMD where a link to gene expression in the liver is not obvious. For 4 out of 6 traits, GBLUP also yielded higher prediction accuracy when the trait was measured at later time points, which may suggest a general time dependency of variance explained, as the association of the phenotypes with the SNPs is not affected by the moment of RNA sampling. The SNPs, however, explained considerably less variance than the transcripts, while not showing a consistent increase in variance explained over time as the transcripts did. This implies that GBLUP generally had lower power than TBLUP, and suggests that the apparent time dependency of the accuracy for GBLUP, which was less pronounced than for TBLUP, should not be overvalued given the limited size of the data used.
The TGBM model outperformed the TBLUP model for several traits, which was not the case when comparing GBLUP and GGBM. This result indicates that interactions between transcripts were more easily captured or are more relevant than between SNPs. One other hypothesis is that as transcripts are more strongly linked to phenotypes than genetic markers, transcriptby-transcript interactions are also likely to affect the phenotype more strongly than SNP-by-SNP interactions (Green et al. 2019), hence the former is expected to have a clearer and more detectable signal. Morgante et al. (2020) have used the random forest model, a nonparametric ensemble machine learning method like GBM, to predict complex phenotypes in Drosophila using gene transcripts as predictors but did not observe a superior predictive ability when compared with the TBLUP model. While TGBM outperformed TBLUP for 7 out of 13 traits, the GTGBM only outperformed GTBLUP or GTIBLUP in only 4 out of 13 traits. This could mean that the inclusion of SNP genotypes together with gene transcripts as predictors in the GTGBM model may have impaired the ability of GBM to capture linear and nonlinear signals from within-and between-omics layers. The exact cause remains unclear, but the size of dataset together with the substantial increase in number of predictors when switching from TGBM to GTGBM may be in the roots of it. It is likely that the GBM algorithm may require more data to be able to accurately capture all patterns from the complex relationship between omics layers underlying quantitative traits. If this is indeed the case, testing these models using a larger dataset could help to confirm this hypothesis. In Azodi et al. (2019), machine learning models integrating genomics and transcriptomics data were also not able to outperform single-omics models in terms of predictive accuracy for phenotypes in maize using a dataset of similarly limited size as in this study.
Although the inclusion of an interaction component in GTIBLUP captured between 9% and 26% of phenotypic variance (represented by gt 2 in Figs. 1 and 2) within the reference set, it hardly affected the correlations of the model components with the phenotype (qĝ y Ã and q^t y Ã ), and those between the model components (qĝ^t ) ( Table 5). The low values observed for qĝĝ t and qtĝ t in GTIBLUP also seem to suggest that the interaction component is capturing a portion of variance not directly shared withĝ ort components, and therefore it does not affect the relationship between other components. The correlation betweenĝ andt in GTBLUP and GTIBLUP models was always higher than the correlation betweenĝ andt c in GTCBLUP across all traits analyzed. Since in the GTCBLUP model the transcript relationship matrix was conditioned on the variance of SNP genotypes (t c ), the correlation between solutions for the 2 components was expected to be closer to zero than observed for GTBLUP or GTIBLUP.
In this article, we proposed the GTCBLUP model as an alternative to integrate genome and transcriptome data for genomic prediction. There has been an increasing interest in the use of intermediate omics data in animal and plant breeding (Guo et al. 2016;Yang et al. 2017;Azodi et al. 2019;Morgante et al. 2020;Christensen et al. 2021;Michel et al. 2021), such as transcriptomics, metabolomics, or microbiome data. The inclusion of new layers of omics data into genomic prediction models could arguably help in capturing additional portions of variance not explained by genotype data, but at the same time, these layers most likely contain overlapping information, increasing collinearity between predictors. Modeling the relationship between G and T components could be an efficient way to realize the added value of integrating such omics data into genomic prediction models (Wade et al. 2022), but this could also be a challenge given the increase in number of parameters to be estimated. The advantage of the GTCBLUP is that as a preprocessing step it conditions the variance contained in transcripts on the variance of genotypes to minimize the amount of redundant information without having to increase model complexity. In general, the GTCBLUP model was able to produce GEBV that were at least as accurate as or slightly more accurate than the GBLUP model. The percentage of variance explained by SNP genotypes in GTCBLUP was similar to that with the GBLUP model, while it was always lower when using GTBLUP and GTIBLUP (Figs. 1 and 2). The observed reduction in additive genetic variance for GTBLUP and GTIBLUP when compared with GBLUP indicates strong redundancy in information contained in the genomic and transcriptomic layers. So, the conditioning of transcripts on SNP genotypes in GTCBLUP allowed this model to perform a more accurate variance partitioning for the additive genetic component, which consequently resulted in a more accurate estimation of GEBV (Table 5). Thus, when the aim is to predict GEBV considering both genomic and transcriptomic data, of the different models considered here, the GTCBLUP model is the most suitable.
Considering applications in practice, one limitation of the GTCBLUP as well as all other models considered in this study, is that they do not accommodate missing omics information, so all reference individuals must have genomics and transcriptomics data available. In the context of breeding programs, a situation in which all reference animals have multiple omics data available is unlikely to happen due to current high costs involved in collecting this kind of information. However, based on the observed decrease in costs of genotyping which has enabled large-scale genotyping, we may expect similar developments for the costs of transcriptomics and other intermediate phenotypes in the near future (Uzbas et al. 2019). At the same time, there have been some recent model developments that enable including other omics data in genomic prediction, when these other omics data are not available for all animals (Christensen et al. 2021;Zhao et al. 2022).

Conclusion
We have assessed prediction models that incorporate genetic markers and transcriptomics data in genomic prediction of complex phenotypes in mice. The proportion of phenotypic variance explained by transcripts was almost always higher when traits were measured closer to the time of measuring gene transcripts. While GBM models explained more variance in the reference data, their predictive performance did not exceed the GBLUP models. Models including SNP genotypes and gene transcripts did not consistently outperform the best single-omics models to predict phenotypes. While TGBM model was able to outperform TBLUP, this was not the case for GTGBM compared to GTBLUP and GTIBLUP. The newly developed GTCBLUP model was able to force all phenotypic variance associated with SNP genotypes into its additive genetic component, by conditioning gene transcripts on SNP genotypes. GTCBLUP generally yielded considerably lower accuracies of phenotypic predictions than the other models including SNP genotypes and gene transcripts, but it showed the best accuracies for breeding values for most traits. We recommend using the GTBLUP model for prediction of phenotypes, while the GTCBLUP should be preferred when the aim is to estimate breeding values.

Data availability
All data associated with this manuscript, and the code developed and used to perform analyzes described in this manuscript, can be obtained at https://doi.org/10.6084/m9.figshare.15081636.v1. All software used is publicly available.
Supplemental material is available at G3 online.

Funding
This study is part of the GENE-SWitCH project that received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no. 817998. GAC acknowledges support by the National Institutes of Health (NIH) grant R01 GM070683.