Prediction of Subgenome Additive and Interaction Effects in Allohexaploid Wheat

Whole genome duplications have played an important role in the evolution of angiosperms. These events often occur through hybridization between closely related species, resulting in an allopolyploid with multiple subgenomes. With the availability of affordable genotyping and a reference genome to locate markers, breeders of allopolyploids now have the opportunity to manipulate subgenomes independently. This also presents a unique opportunity to investigate epistatic interactions between homeologous orthologs across subgenomes. We present a statistical framework for partitioning genetic variance to the subgenomes of an allopolyploid, predicting breeding values for each subgenome, and determining the importance of inter-genomic epistasis. We demonstrate using an allohexaploid wheat breeding population evaluated in Ithaca, NY and an important wheat dataset from CIMMYT previously shown to demonstrate non-additive genetic variance. Subgenome covariance matrices were constructed and used to calculate subgenome interaction covariance matrices for variance component estimation and genomic prediction. We propose a method to extract population structure from all subgenomes at once before covariances are calculated to reduce collinearity between subgenome estimates. Variance parameter estimation was shown to be reliable for additive subgenome effects, but was less reliable for subgenome interaction components. Predictive ability was equivalent to current genomic prediction methods. Including only inter-genomic interactions resulted in the same increase in accuracy as modeling all pairwise marker interactions. Thus, we provide a new tool for breeders of allopolyploid crops to characterize the genetic architecture of existing populations, determine breeding goals, and develop new strategies for selection of additive effects and fixation of inter-genomic epistasis.

together to form a hybrid. This establishes heterozygosity (by descent) at all loci to form a homogeneous population. The union of two divergent suites of genes during the formation of an allopolyploid also results in a homogeneous population, but heterozygosity is established across homeologs rather than homologs. Diploid hybrids lose heterozygosity through segregation in following filial generations, but heterozygosity across homeologous genes is subsequently preserved through selfing in the allopolyploid (Mac Key 1970;Ozkan et al. 2001;Abel et al. 2005). Allelic interactions contribute to dominance variance in the diploid hybrid, whereas homeoallelic interactions will be present as part of the additive by additive epistatic variance in an inbred allopolyploid population. As such, allopolyploids may be thought of as an immortalized hybrid (Ellstrand and Schierenbeck 2000;Feldman et al. 2012), although it is not yet clear that these exhibit a true heterotic response as traditional hybrids have demonstrated. Birchler et al. (2010) note that newly synthesized allopolyploids often outperform their sub-genome progenitors, and that the heterotic response appears to be exaggerated in wider inter-specific crosses. This seems to hold true even within species, where autopolyploids tend to exhibit higher vigor from wider crosses (Bingham et al. 1994;Segovia-Lerma et al. 2004). Complementation of deleterious recessive alleles (or pseudo-dominance) has long been the primary explanation of the heterotic response (Stuber et al. 1992;Cockerham and Zeng 1996). However, Birchler et al. (2010) indicate evidence against this, where purging deleterious alleles has increased the additive value of inbred parents but has not reduced the heterotic response observed in the hybrid (Duvick 1999). Complementation also seems an unlikely driver of a heterotic response in allopolyploids, as the inbred subgenome progenitors would supposedly need functional copies of these genes to survive.
The availability of affordable genome-wide markers has sparked a revolution in selection on additive variation through the use of genomic prediction models. The additive genetic merit of an individual can be estimated as the sum of its additive marker effects to produce a genomic estimated breeding value (GEBV) (Meuwissen et al. 2001). When the number of markers is large, marker effects are typically considered random and normally distributed such that only one parameter need be estimated. Alternatively, the additive genetic covariance between individuals can be estimated from the same genome-wide markers and used to predict additive genetic values of individuals based on relatedness (Nejati-Javaremi et al. 1997;Van Raden 2008). These models are equivalent for prediction under the same set of assumptions (Garrick 2007;Van Raden 2008;Strandén and Garrick 2009). Genomic prediction models have since become popular for their ability to predict the performance of genotyped individuals with no phenotypic observations. Selections on unobserved individuals allows for reduction in the cost of phenotyping and breeding cycle time, increasing the rate of genetic gain (Goddard and Hayes 2007;Heffner et al. 2009;Jannink et al. 2010;Heslot et al. 2015).
The potential utility of genome-wide markers has also drawn renewed interest in non-additive genetic variation in recent years (Vitezica et al. 2013;Martini et al. 2016;Jiang and Reif 2015;Huang and Mackay 2016;Jiang et al. 2017). Genomic prediction models that use genome-wide markers can incorporate non-additive genetic components to obtain better estimates of individual performance than based on additivity alone (Technow et al. 2012;Vitezica et al. 2013;Jiang and Reif 2015;Akdemir and Jannink 2015;Akdemir et al. 2017;Wolfe et al. 2016). In outcrossing species such as maize, prediction of dominance effects is key to harnessing heterosis in unobserved hybrids (Technow et al. 2012). In inbred species, additive by additive epistatic effects have been shown to significantly increase genomic prediction accuracy (Crossa et al. 2010;Martini et al. 2016). Epistatic effects can be added to the prediction model by extending the method of expected epistatic covariance estimation Henderson (1985) to marker based covariance estimation (Jiang and Reif 2015;Martini et al. 2016).
The use of genome-wide markers has allowed for the partitioning of genetic variance to specific units of chromatin, previously infeasible with phenotypes alone (Visscher et al. 2007;Yang et al. 2011;Bernardo and Thompson 2016;Gage et al. 2017). Allopolyploids have been traditionally treated as diploids because they undergo disomic inheritance (Mac Key 1970), such that the contribution of each subgenome to the genetic variance is ignored. By assigning markers to each subgenome, an additive genetic covariance based on each subgenome can be calculated. Using these covariances in a genomic prediction model, the genetic merit of an allopolyploid individual can be assigned to each of its subgenomes. These subgenomic estimated breeding values (SGEBV) can then be used to identify parents with complementary subgenome effects for crossing.
Under Hardy Weinberg equilibrium, subgenomes segregate independently, and realized estimates of additive covariance of individuals based on each subgenome will be independent. However, this does not generally hold true in breeding programs, where population structure from nonrandom mating is inherent. As a consequence, the estimates of additive covariance between individuals based on different subgenomes will not be independent, potentially leading to confounding of effects from each subgenome and problems partitioning variance reliably. In an attempt to circumvent this obstacle, we present an approach for removing the largest sources of genetic variance (i.e., population structure) using singular value decomposition of the matrix of marker scores.
Common wheat (Triticum aestivum) is a staple allopolyploid crop which has undergone two allopolyploid events, resulting in three genomes, denoted A, B and D. The A genome ancestor, Triticum uratu, still exists today and was an early domesticate from the fertile crescent important in the neolithic revolution (Dvoˇák et al. 1993). The B genome ancestor (an Aegilops spp.) is believed to have since gone extinct (Blake et al. 1999), but the tetraploid formed by these two genomes, Triticum turgidum, is still cultivated today primarily as emmer and durum wheat. The D genome comes from a goat grass, Aegilops tauschii, which may have been incorporated in a single hybridization event as recently as 10,000 years ago (Salamini et al. 2002). However, recent evidence based on sequence divergence of the D genome from the A and B genome has suggested a much earlier D genome incorporation around 400,000 years ago (Marcussen et al. 2014). Other evidence shows that limited gene flow into the D genome may have occurred after the polyploidization event, but appears to be from a single lineage (Wang et al. 2013). As a result, the D genome has significantly lower genetic variation than either the A or B genome.
We demonstrate methodology to partition subgenome additive variance to estimate SGEBVs as well as subgenome interactions using two allohexaploid wheat data sets, the Cornell small grains breeding program soft winter wheat breeding population dataset presented here (CNLM) and the W-GY wheat data from Crossa et al. (2010).

MATERIALS AND METHODS
Empirical data sets CNLM population: The CNLM dataset consists of 8,692 phenotypic records of 1,447 soft winter wheat inbred lines evaluated at four locations near Ithaca, NY from 2007 to 2016, representing 26 environments. These phenotypic evaluations serve primarily as a first round of evaluation for grain yield and other agronomic traits before relatively few are selected for replicated regional trials around New York State. Lines are introduced and then removed after they are deemed either fit for advanced field trials or to be discarded or recycled in the breeding program. As such, this dataset is unbalanced in nature. Most lines were not replicated within a given trial (i.e., year and location), but various check varieties were used throughout these years and are typically replicated several times within a given trial.
Data were recorded for four agronomic traits, grain yield (GY), plant height (PH), test weight (TW) and heading date (HD). GY and TW have no missing data, but 842 records are missing PH and 246 records are missing HD. To facilitate comparison across traits, all phenotypes were standardized by subtracting the mean and dividing by the standard deviation of the phenotype vector (Table A1). Data were not standardized within environment to preserve all relationships within the data.
The population was genotyped with genotyping by sequencing (GBS; Elshire et al. 2011) markers aligned to the International Wheat Genome Sequencing Consortium (IWGSC) RefSeq v1.0 wheat genome sequence of 'Chinese Spring' (IWGSC 2018). Markers were filtered for minor allele frequency of at least 0.01 ( Figure A1), no more than 30% missing scores, and no more than 10% heterozygous calls. Missing marker scores were imputed using categorical random forest imputation by chromosome, and all heterozygous calls ( , 2% of all calls) were subsequently replaced with the population mode (i.e., homozygous major allele). Marker scores are presented as boolean indicators of the minor allele. Further details of the CNLM dataset can be found in Appendix 1.
W-GY population: The W-GY wheat dataset of 599 historical wheat lines from the CIMMYT Global Wheat Breeding program (Crossa et al. 2010) was used in this study due to its importance in genomic prediction of an inbred population with non-additive variation (Crossa et al. 2010;Martini et al. 2016). The W-GY dataset consists of genotypic values of all lines for grain yield in each of four environments. The genetic correlations between these environments ranged from -0.19 to 0.66 and can be found in Martini et al. (2016). As performance between these environments is not highly correlated, we refer to grain yield performance in each environment as a trait. The dataset was used in its entirety with one exception. Of the 1,279 available DArT markers, only the 1,188 with known chromosomal positions as indicated by (Crossa et al. 2010) were utilized in this study. This information was required to know which markers belonged to which subgenome, such that subgenome specific relationship matrices could be calculated.

Statistical framework
Subgenome additive effects: To illustrate, we begin with a linear mixed model depicting environments (i.e., trials) as fixed effects and genotypes as random where m is the population mean, E i and G j are the fixed environmental and random genetic effects, respectively, of the j th genotype evaluated in the i th environment, and e is the error associated with the k th observation. Using matrix notation, model 1 (denoted G) can be rewritten where 1 is a vector of ones, X is the design matrix of dummy variables for each trial, and b is the vector of fixed environmental effects. Z is the incidence matrix linking observations in the vector y to their respective genotype effects, in the vector g G . Normality was assumed for genotype effects and the residuals, where g G Nð0; s 2 G K G Þ and e N ð0; s 2 IÞ: The genetic covariance, K G , can be derived from the expectation (or coefficient) of co-ancestry between individuals from a pedigree (Henderson 1985), or by an empirical estimation of the realized genetic relationship calculated with genome-wide markers (Van Raden 2008). When genome-wide markers are used to estimate K G , the genomic prediction model initially suggested by Nejati-Javaremi et al. (1997) and Meuwissen et al. (2001) is obtained. Given an n · m matrix, M, of m markers scored as reference allele counts (i.e., f0; 1; 2g) for n individuals, method I of Van Raden (2008) finds the genetic relationship K as where P ¼ 1 n 52p T ; c ¼ 2p T ð1 m 2 pÞ and p is the vector of allele frequencies. The small coefficient of 0.01 was added to the diagonal to recover full rank after centering the matrix, such that K G is invertible. We use allohexaploid wheat to illustrate, but this method is easily truncated to allotetraploids, or extended to higher level allopolyploids. If we allow the total genetic effect, G j ; to be decomposed into individual additive effects for each subgenome, such that G j ¼ A j þ B j þ D j ; the following model (ABD) is obtained In model 4, each subgenome is allowed to have its own additive genetic variance and covariance between individuals, such that The realized additive genetic covariances for each subgenome, K A , K B and K D , are estimated using only markers corresponding to the respective subgenome, and calculated as described above.
Subgenome epistatic interactions: Following Henderson (1985), the epistatic covariance of individuals can be calculated as the Hadamard product of the component covariance matrices. Jiang and Reif (2015) and Martini et al. (2016) provide proofs of Henderson's method using genome-wide markers to estimate the additive by additive covariance matrix, H. An additional linear kernel can then be added for an additive by additive epistatic interaction term, I j ; once the additive covariance is estimated to obtain the following model (G · G) where g I N ð0; s 2 gI HÞ: As shown by Jiang and Reif (2015) and Martini et al. (2016), H is calculated from the marker data as where W ¼ M 2 P: Jiang and Reif (2015, Appendix A1) prove that H is asymptoically equivalent to K ⊙ K when the number of markers is large.
The additive by additive epistatic interaction term, g I ; can also be decomposed into across subgenome interactions and within subgenome epistatic interactions such that I j ¼ AB j þ AD j þ BD j þ I 2 j ; where AB j ; AD j and BD j are the subgenome interaction effects and I 2 j is the remaining epistatic effects due to within subgenome epistasis. Since no markers are shared across subgenomes, subgenome interaction covariances can be estimated by the Hadamard product of their component covariance matrices (Martini et al. 2016). These interactions can then be incorporated in the following model (ABD · ABD) where g AB N ð0; s 2 gAB ðK A ⊙K B ÞÞ; g AD N ð0; s 2 gAD ðK A ⊙K D ÞÞ; g BD N ð0; s 2 gBD ðK B ⊙K D ÞÞ and g ABD N ð0; s 2 gABD ðK A ⊙K B ⊙K D ÞÞ: The three way interaction is included here for biological completeness, but was found to be estimated on the boundary (i.e., zero) for all traits, and was therefore dropped from further analyses.
Accounting for population structure Under Hardy Weinberg equilibrium, subgenomes segregate independently, such that for subgenome effects, CovðA; BÞ ¼ CovðA; DÞ ¼ CovðB; DÞ ¼ 0 and VarðGÞ ¼ VarðAÞ þ VarðBÞþ VarðDÞ: A breeding program, however, intentionally violates this assumption, and therefore may contain significant population structure. Price et al. (2006) demonstrated that the first k largest principal components (PCs) of the kinship matrix can be used to control for population structure in genome-wide association studies, and its use has since become wide spread. Because most realized estimates of additive covariance are proportional to MM T ; singular value decomposition of M, instead of MM T ; can be used to separate the population structure as the first few principal components from the entire matrix of marker scores before it is divided into its subgenome components. This is accomplished by first extracting the first k principal components in the n · k matrix Q. The marker matrix can then be reconstructed by setting the first k singular values of the diagonal matrix to zero and multiplying to produce a matrixM with the population structure removed from each subgenome simultaneously (Appendix 2) Additive covariance matrices with reduced collinearity can then be calculated for each subgenome fromM and incorporated into the model as previously described. Q can then be added to the model as a set of fixed covariates, with slopes g, such that the model will now be of the form for all l genetic terms in the model. Genomic estimated breeding values are then predicted by summing the centered population structure and genetic effects. For this study, a population structure of dimension k ¼ 5 was chosen for both the CNLM and W-GY datasets, and used to compare to the k ¼ 0 models that do not correct for population structure. The number of PCs, k ¼ 5, was chosen based on Supplementary Figure S1, where the correlation of additive covariance estimates between subgenomes appeared to level off in both populations.

Genomic prediction
To determine the predictability of genetic effects and the variability of variance component estimates, five-fold cross-validation was performed with 10 replications. For each replicate, the set of individuals was randomly split into five groups, with 4 groups of 289 and one of 291. For each fold, records of individuals in the fold were removed (i.e., masked) from the dataset. Each model was subsequently fit with the remaining lines and used to predict the whole genome effect of the masked lines in the fold. Predictions for all five folds were gathered and correlated to the "true" genetic values once for each replicate. In this way, prediction results are directly comparable between the different models, and not subject to differences in the individuals sampled. The whole genome values were calculated as the sum of the genotypic additive and epistatic effects in the model as previously described.
Due to the unbalanced nature of the CNLM dataset, "true" genetic values were calculated as in equation 2 but were considered independent with a covariance K G ¼ I.

Software
Models were fit using restricted maximum likelihood (REML) with the software ASReml (Gilmour 1997) implemented in R (Butler 2009

Model fit and variance components
Model fit was assessed using Akaike's Information Criterion (AIC). Whole genome models tended to have the lowest AIC values, with the exception of the PH and HD traits for the epistatic ABD·ABD models in the CNLM population. When whole genome models had lower AIC values, the comparable subgenome models had only marginally higher AIC values (Supplementary Tables S1 and S2). Whole genome predictions between comparable whole genome and subgenome models were correlated at r . 0:999 or r . 0:993 for traits within the CNLM and W-GY populations, respectively. This indicates that little, if any, genetic information was lost by splitting the whole genome into biologically relevant subgenome effects. The lack of perfect correlation is at least partially due to floating point rounding errors during model fitting and summation of genotype effects. Subgenome additive variance parameter estimates were positive for all models, but subgenome interaction variance parameter estimates were often estimated on the boundary (i.e., near zero). Variance parameters estimated on the boundary were thus considered to be exactly zero. Shifts in variance component importance were seen when the epistatic terms were added in the model. For example, for the TW and E1 traits in the CNLM and W-GY populations, respectively, the A genome component was the largest in the additive only model, but was reduced to less than that of the B genome component in the epistatic model. Additive variance components were generally reduced in epistatic models compared to additive only models, but this reduction in additive variance was accompanied by non-zero subgenome interaction components. The B genome contributed the greatest amount of additive variance in the epistatic ABD·ABD models for all traits except HD. While the D genome variance component was far smaller than the A subgenome component for GY and TW in the CNLM population, it was comparable to the A subgenome component for all traits in the W-GY population.
The A·B component was particularly important for the W-GY traits, E1, E3 and E4, as well as the HD and TW in the CNLM population. The A·D component also featured prominently for the PH and TW traits in the CNLM population. The B·D component appeared to be less important, having the largest effect for PH. No epistatic terms were significantly greater than zero for the E2 trait in the W-GY population. Addition of epistatic interactions resulted in a significant likelihood ratio test at p , 10 26 for all traits except GY, which was significant at p , 10 22 . Despite the significant addition of epistatic terms, additive GEBVs were highly correlated with whole genome predictions from the epistatic models, at r $ 0:988 for the CNLM population and r $ 0:869 for the W-GY population. A model containing the three-way subgenome epistatic term was fit for all traits, but estimates of the three-way interaction variance parameter were zero for all traits.
The distributions of variance component estimates from repeated sub-sampling of the data during k-fold cross-validation were centered near the point estimate from the full model fit. These distributions were either as wide ( 2 standard errors from the center) or tighter than expected based on the standard error from the full model fit (Figure 1, Supplementary Figures S2 and S3). Standard errors were generally larger for epistatic variance components relative to their magnitude than additive variance components. Standard errors relative to their respective parameter estimates tended to be larger for all terms in models with more estimated variance parameters (Supplementary Tables S1 and S2).

Subgenome additive effects
Subgenome estimated breeding values (SGEBVs) were moderately correlated with the whole genome effect, but weakly correlated with one another (Supplementary Tables S3 and S4). The individuals with the highest SGEBV for one subgenome never had the highest SGEBV for the other two subgenomes, and were often not in the top 95% quantile of the population based on the other two subgenomes ( Figure 2, Supplementary Figures S4 and S5). For example, the individuals with the highest A, B and D SGEBV for GY in the CNLM population ranked 43 rd , 39 th and 60 th for the whole genome effect, respectively. In contrast, the individual with the best A SGEBV for GY ranked 1067 th , 952 nd for the B and D genome, respectively. The individual with the highest B genome breeding value for GY ranked 221 th and 1393 rd for the A and D genomes, respectively. The individual with the highest D genome breeding value for GY ranked 347 th and 123 rd for the A and B subgenome, respectively. The individual with the highest whole genome GEBV for GY ranked 6 th , 22 nd and 519 th for the A, B and D SGEBV, respectively. In several cases, the top individual based on a SGEBV was not in the top 95% quantile based on their whole genome effect, particularly in the W-GY population.

Prediction accuracy
Including epistasis kernels significantly improved genomic prediction accuracy for all traits except GY and E2 (Table 1). Subgenome models had either comparable or slightly lower mean prediction accuracy than whole genome models for all traits except HD, for which subgenome models had superior accuracy. The variability in the prediction accuracy based on the individuals sampled was either the same (GY and TW) or lower (PH and HD) for the epistatic models compared to the additive models in the CNLM population, but was similar in the W-GY population ( Table 1). The variability in prediction accuracy was increased for the subgenome models compared to the whole genome models in the W-GY population for some traits (E2 and E3), but was either the same or decreased in the the CNLM population.

Adjustment for population structure
The first two principal components explained 17% and 19% of the variance of M in the CNLM and W-GY populations, respectively, indicating that some population structure exists in both populations (Supplementary Figure S6). The correlation of additive genetic covariance estimates between individuals based on the three subgenomes declined as PCs were removed from M, but appeared to level out between 5 to 10 PCs (Supplementary Figure S1). Correlation of whole genome effects between additive models, G and ABD, for k ¼ 0 and k ¼ 5 was $ 0:999 and $ 0:996 for the CNLM and W-GY populations respectively. Whole genome effect correlations were lower between epistatic models G·G and ABD·ABD, with coefficients of $ 0:998 in the CNLM population and $ 0:980 in the W-GY population. Removing population structure with k ¼ 5 reduced most of the SGEBV effect correlation coefficients by up to 0.06 in the CNLM population, but there was one instance in which one correlation coefficient increased from 0.15 to 0.19 between A and B SGEBVs for PH (Supplementary Tables S3 and S4). This was not the case for the W-GY population, where many of the SGEBV effect correlations increased by up to 0.21.
Additive variance generally decreased as k was increased from 0 to 10 ( Figure 3, Supplementary Figure S7). Ranks of additive variance components relative to one another were stable for most traits, while epistatic variance components were more sensitive to changes in k. Significant epistatic variance component rank changes occurred for the PH, TW and E4 traits. For PH, the A·D term was comparable in magnitude with the additive variance components for A and D when k ¼ 0; but declined as k increased. The reduction in A·D variance for PH was accompanied by an increase in both the A·B and B·D terms. Similarly, a decline in A·B variance was followed by an increase in B·D for TW and A·D for E4 as k was increased.
Correlations of variance component estimates were calculated from the average information matrix for models k 2 f0; 1;:::; 10g (Supplementary Figures S8 and S9). Correlations between subgenome additive variance estimates were generally low (0.2-0.4), while correlations of subgenome interaction variance estimates were high (0.8-0.95), and correlations between the two were moderate (0.4-0.6). Despite a small reduction in the correlation of SGEBVs as k was increased from 0 to 5, little reduction in variance component estimate correlations was observed as k was increased from 0. Generally, correlations of additive variance parameter estimates were slightly reduced while correlations between interaction variance parameter estimates increased slightly.

DISCUSSION
Model fit and variance components While whole genome models tended to be the most parsimonious, subgenome models are worth consideration because they provide insight into the biology of the allopolyploid organism. Given the stability of variance component estimation and that no genetic information appears to be lost by partitioning the whole genome into its individual subgenome additive effects, such a partition is informative.
The method presented here could be used for any set of independent loci, such as estimating a variance component and breeding value for each chromosome (Yang et al. 2011). However, this will become computationally burdensome as the number of variance components to be estimated increases. If the number of variance parameters to estimate is high and the data set is small this may become infeasible. It is also unclear if the estimates from larger numbers of additive kernels would be reliable.
Bernardo and Thompson (2016) assigned a breeding value for each of the 10 maize chromosomes by fitting a single ridge regression model to estimate marker effects. They subsequently summed marker effects by chromosome to produce a breeding value for each chromosome. However, this method does not allow for direct estimation of variance components for each unit of chromatin. By fitting each unit simultaneously, variance attributable to sets of loci will be split, and the sum of the variance estimates should not exceed the total genetic variance (Yang et al. 2011). It is unclear what effect linkage disequilibrium across chromosomes has on the variance parameters estimated.
Here we assumed that the subgenome effects are independent, but this is clearly not the case. Generally, we can express the genetic variance due to the three sub genomes as where S is the subgenome covariance matrix, J is an n · n matrix of ones for n genotypes, and K is the additive relationship matrix for within and across subgenomes. In this report, we have assumed that S is diagonal with S ii ¼ s 2 i for the i th subgenome, and K is a block diagonal with the i th diagonal block represented by the subgenome additive covariance matrix.
An unstructured covariance matrix, S, could be estimated, with correlation coefficients between subgenomes. The subgenome effects would be allowed to have a correlation such that However, it is unclear what the covariance structure should be between subgenomes (e.g., K AB ). If consensus haplotypes from uniquely identifiable sequences could be determined with two or more alleles segregating in at least two subgenomes, a covariance across the subgenomes could be constructed. Polymorphisms that predate speciation would be used to identify the consensus haplotypes, while post speciation polymorphisms would be used to identity the subgenome origin. Individuals would then receive a score based on the number of consensus haplotypes they have in common between two subgenomes. This could prove to be a formidable challenge given the evolutionary time between the subgenome ancestors. The Hadamard product of the two additive covariance matrices is a tempting candidate for these off diagonal blocks, however, this would substitute a covariance between additive effects in place of an epistatic variance. It is unclear to these authors if epistasis variance can be thought of and modeled as a covariance between additive effects.

Genetic architecture
The genetic architecture of grain yield (GY, E1, E2, E3, E4) in the two populations investigated here are markedly similar, despite the divergent genetic backgrounds of the two populations. The CNLM population is primarily comprised of breeding lines and varieties derived from germplasm historically grown in the North East, in contrast to the W-GY population which has a broader pedigree. The D genome is known to have low genetic diversity due to limited gene flow from a single Ae. tauschii lineage after the most recent allopolyploidization event (Wang et al. 2013), estimated to have taken place as recently as 10,000 years ago (Salamini et al. 2002;Marcussen et al. 2014). The International Maize and Wheat Improvement Center (Centro Internacional de Mejoramiento de Maíz y Trigo, CIMMYT) has introgressed some genetic material from the D genome ancestor, Ae. tauschii, through the use of synthetically produced hexaploid wheat to increase the genetic diversity of the historically bottle-necked D genome. The higher proportion of D genome variance in the W-GY population may be due to the increased use of wild Ae. tauschii in their breeding program, highlighting the merit of the strategy.
Many of the subgenome epistatic variance parameters were estimated at zero, possibly due to a lack of power to detect them. Greater genetic diversity, larger numbers of individuals, and higher allele frequencies would allow for increased power to detect true interactions. Hill et al. (2008) emphasized the effect of low allele frequencies on epistatic interactions, proving that as allele frequencies (and therefore joint frequencies of alleles at two loci) approach zero or one, most of the epistatic variance becomes additive. For example, suppose two loci have a large interaction, such that one pair of alleles is selected. Once one locus becomes fixed, all remaining variance is due to segregation at the other locus, and becomes strictly additive. The low joint frequency is magnified in the three way interactions, likely causing the inability to detect any three way epistatic interaction signal between the three subgenomes.
This is also apparent in the reduction of additive variance components upon the addition of epistatic terms to the model. These epistatic components were often estimated to be rather large compared to the additive components, but did not change the final whole genome value drastically. This suggests that the additive terms absorb much of the epistatic variance in the absence of the epistatic kernels.
n Table 1 Table of genomic prediction accuracies for eight traits in the CNLM (GY, PH, TW and HD) or W-GY (E1, E2, E3, E4) populations with k ¼ 0 and k ¼ 5. k is the number of principal components removed from the marker matrix prior to calculating subgenome covariance matrices. The first 1 to k principal components were included as fixed effects in the model fit for k > 0 b Mean genomic prediction accuracy across ten replicates of five fold cross validation. c Standard deviation of prediction accuracy across ten replicates are shown in parentheses.
The A·B epistatic terms were the most important for many of the traits, reflecting the greater genetic variation of these two subgenomes. Subgenome interaction terms including the D genome were notably more important for traits known to have important loci on the D genome. PH is partially governed by two dwarfing genes, Rht-1D and Rht-1B on 4B and 4D, respectively. These two genes have been shown to exhibit a less-than-additive (Eshed and Zamir 1996) epistatic interaction, where the double wildtype is less tall than expected based on the additive effects of the two semi dwarfs from the double dwarf (Santantonio et al., 2018b). The B·D term was large for PH, particularly after correction for population structure. Population structure is common for these genes, as many breeding programs primarily utilize one or the other dwarfing gene to avoid producing double dwarfs during crossing, which are often agronomically undesirable.

Selection on SGEBVs
Partitioning genetic variance to the subgenomes of an allopolyploid provides a method for identifying individuals with complementary subgenomes as potential parents for crossing. If we consider the upper 95% quantiles as candidates for parental selection, many of the top candidates based on subgenome breeding values would not be considered candidates based on their whole genome breeding value. When they would be considered, they were typically not the top candidates. The low correlation between SGEBVs highlights the opportunity to identify individuals with complementary subgenomes for crossing.
These individuals may or may not be among the top performing selection candidates, demonstrating that the optimum set of crosses are not always between the top performing individuals (Akdemir and Sánchez 2016).
While it is difficult to evaluate the predictability of SGEBVs per se, the sum of their values appears highly predictive. The low correlation of SGEBVs also suggests that individual subgenomes may be directly manipulated as never before. Prior to the discovery and use of genetic markers to track genomic regions, the phenotype (or some summary statistic thereof) was the only indicator of the genetic structure of a genetically distinct individual. Variety releases still demonstrate this legacy, with phenotypic descriptors that define a new variety as genetically distinct from other similar varieties. One breeding strategy will be selecting parents for crossing that have complementary SGEBVs to increase the potential of transgressive segregation in the resulting offspring. We envision other breeding strategies beyond simply choosing parents with complementary subgenomes, and see an opportunity to weight SGEBVs according to some breeding goal.
For example, a newly formed population could undergo several rounds of genomic selection only on the D genome SGEBVs (i.e., weights of 0, 0 and 1 for the A, B and D subgenomes respectively) before phenotypic or whole genome selection. Because the D genome contributes the least to the total genetic variance, phenotypic selection on D genome loci is challenging. Selection will act on the largest sources of genetic variance first, potentially leading to fixation of small effect loci in the D genome by drift, while selection acts on the large effect loci on the A and B genomes first. By selecting on D genome SGEBVs, gains can be made to the D genome directly with little to no selection on the A and B genomes, a feat previously impossible with phenotypic selection. Signatures of selection on the D genome under this scheme may also help establish the accuracy of SGEBV prediction.

Subgenome interactions
Genomic prediction of GY and E2 did not appear to benefit from including epistaticinteractionsasitdidfortheothersix traits.Thismaybe dueinpart to the highly polygenic nature of grain yield, which is the culmination of essentially all functional genetic variants subjected to stress throughout the growth cycle. The E2 trait in the W-GY population has previously been shown to be invariant to the addition of various epistatic terms (Crossa et al. 2010;Martini et al. 2016), and it is unclear why this population does not exhibit non-additive variation in this environment. It may be that important epistatic interactions of GY in the CNLM population are too small to detect or are involved with differing performance across years or locations, such that they are lacking in a model that does not include genotype by environment interactions.
Subgenome epistatic terms increased genomic prediction accuracy comparable to modeling all pairwise interactions across the subgenomes, suggesting that the most important interacting loci are on different subgenomes. This result is consistent with the observation that newly formed allopolyploids undergo considerable changes in gene expression, known as genome shock (McClintock 1984). This shock has been suggested to be caused by incompatibilities of genetic pathways across the subgenomes (Comai et al. 2003). Residual subgenome incompatibility may still be affecting the germplasm pool, even thousands of years after the last polyploidization event. Decay of negative duplicated gene interactions may take hundreds or thousands of generations before all interacting genes are lost or silenced Conery 2000, 2003).
It is unclear what proportion of this non-additive signal is due to homeoallelic interactions. The proposed method models all pairwise interactions across subgenomes, of which homeoallelic gene interactions are a small minority in number. Smaller homeoallelic regions (Santantonio et al. 2018a) or homeoallele specific marker sets (Santantonio et al. 2018b) have been constructed to determine the relative importance of these interactions relative to other gene interactions across the subgenomes. The usefulness of the epistatic subgenome interactions is currently unclear and warrants further investigation.
Regardless of the source of the epistasis, we suggest that a breeding scheme should be designed to take advantage of beneficial subgenome interactions. If a suitable training set related to the breeding material can be established, subgenome interactions can be predicted in new, genotyped breeding materials. We suggest that a series of small bi-parental populations be constructed from important contributors to the breeding program, and be used in the development of a training population to balance high genetic diversity and high allele frequencies. This training population would be used to predict SGBEVs and subgenome interactions in individuals formed from new crosses. Individuals that contain favorable interactions could then be selected such that they are fixed in early filial generations. After fixation in a given line, phenotypic, whole genome, or subgenome selection could be used for further line development until complete homozygosity is reached.

Adjustment for population structure
The efficacy of the proposed method to handle population structure may need to be improved, or a different approach may need to be taken. While this method reduced the correlation of additive genetic subgenome covariance estimates across the three genomes, variance parameter estimate correlations were not drastically reduced. The correlation of subgenome interaction variance parameter estimates tended to increase slightly when accounting for higher levels of population structure, counter to the assumption that removing this structure should result in better estimates of subgenome interactions.
The lower correlation between epistatic models that correct and do not correct for population structure is likely due to removing Q from the marker matrix. Correcting for population structure also had a small, but negative effect on genomic prediction accuracy for epistatic models. The population structure fixed effect predictors are strictly additive and the loss of accuracy may be due to epistasis variance associated with these PCs (i.e., population structure epistasis). Epistatic variance related to these PCs may be recovered by using the squares of the PC scores, although this was not done in this study. At least for the additive models, it appears little to no genetic information is lost using the population structure adjustment proposed here.
Determining the best value for k will be at the crux for implementing this methodology for various traits and populations. The same population may need different values of k for different traits, depending on the covariance of the trait and population structure. Traits such as PH or HD may have lower covariance with population structure than traits such as TW or GY, due to different marker effect distributions and the history of the breeding population. Several methods might be used to determine k empirically from the marker matrix (Patterson et al. 2006, e.g.), however, these methods do not account for the covariance of trait and structure. We used the first one to k PCs in this study, but there is no reason why we must include all PCs up to some value k. There may be certain PCs that are important for a given trait, and could be tested as fixed effects for inclusion or exclusion.
This method may have better performance in populations with greater degrees of population structure than in the populations presented here. Use of this method for partitioning genetic variance to biologically important sets of chromatin and estimating epistatic interactions will need further testing and validation before widespread use.

CONCLUSION
To our knowledge, we provide the first attempt to assign a breeding value to each subgenome of an allopolyploid crop. With estimates of subgenome additive effects, parents with complementary subgenomes can be selected for crossing. Weighted selection of subgenomes using genomic prediction could be key to increasing the diversity of the D genome in wheat germplasm. Direct selection on the D genome may allow targeted introgression from Ae. tauschii while mitigating the effects of introducing unimproved alleles. Subgenome additive genetic variances appear to be estimated well, and no genetic information is lost partitioning the genome into its subgenome components. This demonstrates that partitioning genetic variance to the subgenomes of an allopolyploid can provide useful information for genomics assisted breeding efforts.
Subgenome interactions increase prediction accuracy, but it is unclear how well the epistatic variance is partitioned to the three interaction terms and what proportion of that variance is due to homeologous gene interactions. Because the homeologous interactions make up relatively few of the possible interactions across subgenomes, they may only explain a small portion of the observed epistatic variance. Yet, seeing as how homeologous genes likely operate in the same or similar physiological pathway, the likelihood for interactions between homeologous loci is high. Further research is needed to investigate the efficacy of modeling subgenome interaction terms, and to what degree this is explained by interactions between homeologous orthologs.
Allopolyploids have traditionally been treated as diploids in breeding programs because they undergo disomic inheritance. With modern DNA marker technology and ever increasing computational power, breeders of allopolyploids can further exploit the genetic complexity of their crops.

ACKNOWLEDGMENTS
Funding of this research was provided by the USDA National Needs Fellowship for N. Santantonio, in partial fulfillment of the requirements for a Ph.D in Plant Breeding and Genetics at Cornell University. Additionally, the field trials comprising the phenotypic data for the CNLM population were funded in part by the Hatch Project # 149-447. Genotyping was funded by the Wheat Coordinated Agricultural Project (WheatCAP). We are also grateful to Jesse Poland's research group at Kansas State University for their contribution to genotyping of CNLM materials. The authors thank the International Wheat Genome Sequencing Consortium for pre-publication access to IWGSC RefSeq v1.0. Finally, we would like to acknowledge the Cornell small grains staff, particularly David Benscher and James Tanaka, who were vital in implementing, collecting and processing the materials used to build the CNLM dataset.

LITERATURE CITED
APPENDIX 1 CNLM DATASET Field plots 1.5 m by 3 m in size were planted with 100 g of seed in September or October of the year prior to the harvest year. Data were recorded for four agronomic traits: grain yield (GY), plant height (PH), test weight (TW) and heading date (HD). Plots were harvested for grain with a plot combine after physiological maturity and oven dried to a grain moisture of approximately 12%. Dried grain was cleaned, weighed and measured for moisture content using a grain moisture analyzer (GAC 2100, Dickey-John). GY was standardized to a uniform grain moisture of 12%. PH was measured as the distance from the ground to the top of the grain head at full extension. TW is used as a measure of grain quality and was measured as the mass of a volume of grain (gL 21 ) using the grain moisture analyzer (GAC 2100 Dickey-John) which corrects TW for moisture content. HD is a proxy for flowering time and was defined as the number of Julian days until 50% of the primary grain heads have extended out of the boot.
The data set initially consisted of 1,552 lines evaluated in 10,069 1.5 m by 3 m plots planted in September to October of the year prior to the harvest year. Thirty one lines from 2007 were not harvested for GY, nor were they genotyped, and were dropped from the data set. Because GY is of primary interest for breeding, plots that were not harvested or had missing values for GY were dropped, resulting in 9,090 plots with GY measurements. This also caused two additional lines with missing GY measurements to be dropped from the dataset.
Due to the reasonable size of the dataset, small physical area of most trials, lack of replication within environment for most lines and the availability of genetic markers, raw plot observations were used and no attempt was made to correct plot level data for various spatial effects or otherwise. Preliminary results also indicated relatively high genomic prediction accuracy, suggesting that spatial correction, such as an AR1 · AR1 row column autocorrelation structure, would be unlikely to reduce error variance drastically. Instead, 59 plots that included breeder comments about bad seed or significant damage to the plot, via animal or otherwise, were removed from the dataset. Observations outside of a four standard deviation interval from the grand mean of uncorrected GY phenotypic observations were also removed to account for any significant undocumented damage, grain spillage or other undocumented mistakes. This included two observations that were deemed too high, and 20 observations that were deemed too low.
Observations of 11 lines lacking at least one phenotypic record in at least two separate trials, 60 lines that were missing at least 50% of their genotypic calls and one line that had greater than 20% heterozygous genotype calls were also removed from the dataset. This resulted in 8,692 phenotypic observations of 1,447 lines across 26 environments, representing 96.6% of the plots with grain yield measurements. HD was not recorded for the 246 observations from 2007, and PH was not recorded for the 840 observations from 2009. Two additional plots were missing PH measurements from the Ketola location, one that was recorded at 2 meters in 2008 which was likely a recording mistake, and one in 2010 which was simply missing a record.
While most of the genotypes were directly from the Cornell small grains breeding program, a few varieties and breeding lines from other breeding programs that had been genotyped and evaluated were not excluded from the dataset as long as they met the previous criteria. This included 75 lines from The Ohio State University wheat breeding program and 93 lines from the Michigan State University wheat breeding programs that were part of the Allele Based Breeding initiative, among other lines from various breeding programs. Genotyping by sequencing (GBS) libraries (Elshire et al. 2011) of 1,521 CNLM were developed using the protocol described by Poland et al. (2012) at Kansas State University, and subsequently sequenced at the Genomic Diversity Facility at Cornell University. Genotyping calls were accomplished using standard parameters of the Tassel 5.0 GBS v2 Pipeline (Glaubitz et al. 2014) and were aligned to the International Wheat Genome Sequencing Consortium (IWGSC) RefSeq v1.0 wheat genome sequence of 'Chinese Spring' (IWGSC 2018). Following Poland et al. (Poland et al. 2012), 64 bp sequence tags containing no more than three Single Nucleotide Polymorphisms (SNPs) per tag were included to increase the likelihood of obtaining subgenome specific markers. Only markers with a minor allele frequency of at least 0.01 ( Figure A1), no more than 30% missing scores, and no more than 10% heterozygous calls were kept for the following analyses. Then individuals with greater than 20% heterozygous calls and individuals with more than 50% missing genotype calls were excluded from the dataset. The process was repeated iteratively, starting by filtering on markers until the number of markers and genotypes converged. This resulted in 11,604 available GBS markers distributed throughout the three subgenomes ( Figure A2). Of the 61 lines removed, 60 were removed due to missing marker information and 1 was removed due to high heterozygosity in the final iteration. The 11 lines with a single observation and the two lines without grain yield observations were subsequently removed to produce genotypic information for the 1,447 lines.
Marker scores were coded using f21; 0; 1g for homozygous major allele, heterozygous and homozygous minor allele, respectively. Categorical marker imputation was done independently for each chromosome using random forest imputation via the R package 'missForest' (Stekhoven and Bühlmann 2011) which relies on the R package 'randomForest' (Liaw and Wiener 2002). Random forest has been shown to be effective for genotype imputation in wheat (Rutkoski et al. 2013). To allow all individuals to be considered completely inbred, the remaining heterozygous calls ( , 2% of all marker scores) were conservatively replaced with the population mode for that marker (i.e. the homozygous major allele, -1). Marker scores were then converted to f0; 1g coding for presence of the minor allele.
Genetic correlations of traits were estimated in a multivariate model fit (Table A2). This was accomplished by treating genotypes as independent, or having a realized additive covariance structure calculated from genetic markers.