Improving Prediction Accuracy Using Multi-allelic Haplotype Prediction and Training Population Optimization in Wheat

The use of haplotypes may improve the accuracy of genomic prediction over single SNPs because haplotypes can better capture linkage disequilibrium and genomic similarity in different lines and may capture local high-order allelic interactions. Additionally, prediction accuracy could be improved by portraying population structure in the calibration set. A set of 383 advanced lines and cultivars that represent the diversity of the University of Minnesota wheat breeding program was phenotyped for yield, test weight, and protein content and genotyped using the Illumina 90K SNP Assay. Population structure was confirmed using single SNPs. Haplotype blocks of 5, 10, 15, and 20 adjacent markers were constructed for all chromosomes. A multi-allelic haplotype prediction algorithm was implemented and compared with single SNPs using both k-fold cross validation and stratified sampling optimization. After confirming population structure, the stratified sampling improved the predictive ability compared with k-fold cross validation for yield and protein content, but reduced the predictive ability for test weight. In all cases, haplotype predictions outperformed single SNPs. Haplotypes of 15 adjacent markers showed the best improvement in accuracy for all traits; however, this was more pronounced in yield and protein content. The combined use of haplotypes of 15 adjacent markers and training population optimization significantly improved the predictive ability for yield and protein content by 14.3 (four percentage points) and 16.8% (seven percentage points), respectively, compared with using single SNPs and k-fold cross validation. These results emphasize the effectiveness of using haplotypes in genomic selection to increase genetic gain in self-fertilized crops.

Genomic selection is an important breeding approach for improving quantitative traits. It was advocated as a marker-assisted selection approach that uses high density SNP genotypes for estimating genomic breeding values (Meuwissen et al. 2001). Genomic selection relies on linkage disequilibrium (LD) between SNP markers and quantitative trait loci (QTL), where the LD among markers is used as a verification for the association between markers and QTL. Several genomic prediction models were proposed including RR-BLUP, Bayes A, Bayes B, Bayes Cp, Bayes LASSO, and Reproducing Kernel Hilbert Space RKHS (Meuwissen et al. 2001;de Los Campos et al. 2009;Kizilkaya et al. 2010;Lorenz et al. 2011). These prediction methods vary in the assumed genetic effects or/and variance associated with markers. Factors affecting the accuracy of genomic prediction include trait architecture, marker density and LD, training population size, and population structure (Daetwyler et al. 2010;Asoro et al. 2011;Heffner et al. 2011;Lorenz et al. 2011; population is phenotyped and genotyped to train a prediction model for estimating breeding values of the selection candidates. The composition of the calibration population is of paramount importance because it determines the efficiency of selecting the best performing individuals. Several studies investigated methods to construct a calibration population for improving the accuracy of genomic prediction including stratified sampling, CDmean optimization, prediction error variance (PEV), and Gmean (Rincent et al. 2012;Akdemir et al. 2015;Isidro et al. 2015;Lorenz and Smith 2015). These methods varied in their improvement of prediction accuracy for traits with different genetic architectures (Isidro et al. 2015;Tiede and Smith 2018). One of the important factors determining prediction accuracy is population structure, which can result in variability in allele frequencies and the degree of the genetic relationship between subpopulations/clusters, leading to changes in the accuracy of prediction. The effect of population structure on the accuracy of genomic prediction was observed in both animals (Hayes et al. 2009b;Saatchi et al. 2011) and plants Technow et al. 2013), resulting in a general recommendation of constructing a mixed calibration population that includes individuals from all clusters for improving the accuracy of prediction. To cope with structured populations that are developed from parents with different breeding histories, the stratified sampling approach was proposed by sampling a representative sample from each cluster and this approach showed improvement in the prediction accuracy for several quantitative traits (Isidro et al. 2015). Unlike PEV and CDmean optimization, the stratified sampling approach is not dependent on trait heritability; thereby, it is expected to perform more consistently across different traits with variable genetic architecture (Rincent et al. 2012). Current methods of genomic selection mostly use single SNP markers to predict the genetic merits of individuals. However, haplotypes may have several advantages over single markers for genomic selection. Phased marker haplotypes can better estimate identity-by-descent and haplotype effects (Meuwissen and Goddard 2000;Hess et al. 2017). Additionally, haplotypes increase the LD between the group of phased markers and QTL, explaining greater levels of QTL variance (Hayes et al. 2007).
The goal of using genomic selection in plant breeding is to improve the rate of genetic gain under conditions of reduced resources available for evaluating the calibration population. We tested the effect of population structure and using haplotypes on the accuracy of genomic prediction in a wheat population that represents the genetic diversity of the University of Minnesota spring wheat breeding program. The goals of this study were to (1) investigate the effect of population structure on prediction accuracy for yield, test weight, and protein content in a hard red spring wheat population, (2) compare stratified sampling optimization with k-fold cross validation for the prediction of the three traits, and (3) compare the prediction accuracy of single markers to four different multi-allelic haplotype blocks with different sizes.

Phenotypic data
The Minnesota wheat genomic selection (MN-WGS) panel is composed of 383 breeding lines that represent the genetic diversity of the University of Minnesota spring wheat breeding program and includes 93 parents and 290 derived lines from 177 unique crosses represented in their pedigrees (Conley et al. 2015). Parents included lines from the spring wheat breeding programs of the University of Minnesota, North Dakota State University, South Dakota State University, AgriPro, WestBred, and CIMMYT. The MN-WGS panel was evaluated together for agronomic traits in two trials in 2013 at St. Paul and Crookston, MN using standard agronomic practices. Plot sizes were 2.6 square meters in St. Paul and 3.4 square meters in Crookston. No fungicides were applied in either location. Lines were planted in a Type II modified augmented field design with 32 blocks. Linkert (Anderson et al. 2018) was used as the primary check with LCS Albany (PI 658002), Briggs (Devkota et al. 2007), Prosper (Mergoum et al. 2013), and Vantage (PI 653518) as secondary checks. Linkert was repeated once in all of the 32 blocks. The population was phenotyped for grain yield, test weight, and protein content. Yield was determined after harvesting plots with a Wintersteiger small plot combine then weighing the grain to express data as kg/ha. The test weight was measured as the weight of seeds that completely fill a quarter pint (118.3 Milliliter) and the resulting data were expressed as kg/hL. Near infrared reflectance spectroscopy (NIR) was used to determine protein content in the harvested grains (Inframatic 9500, Perten Instruments, Sweden).

Phenotypic data analysis
Correction for spatial field variability for yield, test weight, and protein content was done using a moving grid adjustment (Technow 2015; R-package mvngGrAd, R development core team 2017). After setting the field in rows and columns, a moving mean was calculated using a surrounding grid of a particular size. This moving mean was used subsequently as a covariate to calculate the adjusted phenotypes. A moving average window of eight plots was used to determine the phenotypic performance of the line in the center. For all traits, the entire set of lines were used to correct for variance in trial means using the MIXED procedure in SAS 9.4 (Sallam et al. 2015;SAS Institute 2013). In all experiments, genetic and residual variances were calculated using the MIXED procedure in SAS. Broad-sense heritability was estimated using the equation H ¼ s 2 g =ðs 2 g þ s 2 e =nÞ, where s 2 g is genetic variance, s 2 e is the variance of random residuals, and n is the number of trials.
Genotyping and linkage disequilibrium Leaf tissues were harvested from the 383 breeding lines at the three leaf stage. DNA extraction was performed using the BioSprint 96 DNA Plant Kit according to the manufacturer's instructions (Qiagen 2016). The panel was genotyped using the 90K Illumina Infinium iSelect Assay. Clustering was performed using Illumina's Genome Studio Polyploid Clustering Module v1.0 using the procedure described by Wang et al. (2014a), followed by manual curation to correct inaccurately clustered loci. Markers were filtered for MAF , 0.05 and missing data . 0.10 resulting in 16,697 SNP markers. From this marker set 14,086 SNP markers had map positions based on a consensus wheat map developed from six independent double haploid mapping populations (Wang et al. 2014a). Missing marker data were imputed using LD-kNNi, which imputes missing marker genotypes based on the k-nearest neighbor imputation method (Money et al. 2015).
To characterize the level of LD in the MN-WGS panel, the adjacent marker LD was estimated as r 2 for the 21 wheat chromosomes in TASSEL (Bradbury et al. 2007). The genomic additive relationship matrix was estimated among all lines in rrBLUP package of R using all markers (Endelman and Jannink 2012). The genomic additive relationship matrix was estimated as: where: Z = M -P, M being the individuals by SNP loci marker matrix and P the frequencies of alleles expressed as 2(p i -0.5) with p i representing the allele frequency of marker i (VanRaden 2008).

Constructing haplotype blocks
The high density SNP marker genotypes were used to construct haplotype blocks after ordering markers based on the consensus map positions for all 21 chromosomes (Wang et al. 2014a). We generated four different haplotype blocks, each with a fixed number of 5 adjacent markers (Haploblock-5), 10 (Haploblock-10), 15 (Haploblock-15), and 20 (Haploblock-20) for each chromosome. Haplotype alleles for each haplotype block were numbered using a custom script in R (R Development Core Team 2017).
Assessment of population structure and training population optimization A cluster analysis was performed by generating a pairwise distance matrix estimated as 1 -IBS (identity-by-state) probability in TASSEL using SNP marker data for all lines in the MN-WGS panel. Using the distance matrix, K-means clustering was performed using the Hartigan-Wong algorithm implemented in R (R Development Core Team 2017). Based on prior knowledge of pedigree information, three clusters were assumed in K-means clustering. Principal component analysis (PCA) was performed in R using SNP marker data for all lines in the MN-WGS panel to visually identify clusters assigned by the K-means clustering (R Development Core Team 2017). Using the genomic additive relationship matrix, the average genetic relationships were calculated for lines within a cluster (A ij withing ) and lines between clusters (A ij between ). To investigate the effect of population structure on genomic prediction in the MN-WGS panel, using single SNP markers only, the three clusters identified by K-means clustering were used in evaluating the predictive ability by combining two clusters for predicting the performance of the third cluster and repeating this step iteratively for all clusters. The predictive ability was calculated as the correlation between phenotypic values of individuals in the validation population and the estimated genomic predictions of those individuals (Legarra et al. 2008).
To evaluate genomic prediction accuracy, k-fold cross validation was implemented so each individual appeared once in the validation population. We used both single SNP markers and haplotype blocks for the assessment of the predictive ability. The population was randomly shuffled followed by using k-fold cross validation by dividing the MN-WGS panel into four groups. One of those groups were excluded to estimate marker/haplotype effects using the three remaining groups to define 75% (288 individuals) of the population as a random calibration population. The k-fold cross validation was repeated four times iteratively for each of the four randomly assigned groups. These previous k-fold cross validation steps were replicated four times. In addition to k-fold cross validation, a training population optimization procedure using stratified sampling was evaluated. For the stratified sampling procedure, clusters identified by K-means clustering were used as a criterion for selecting the calibration population. A stratified sampling genomic prediction procedure was performed by constructing a calibration population through randomly sampling 75% of lines from each of the three clusters. Therefore, the sample size from each cluster was proportional to the size of the cluster, and a total of 75% (288) of lines in the MN-WGS panel were used as a calibration set to predict the remaining 25% (94). The stratified sampling prediction approach was replicated sixteen times using both single SNP markers and the four haplotype block sizes. Each predictive ability value was transformed using Fisher Z. The test statistics were calculated as T ¼ r ffiffiffiffiffiffiffiffiffiffiffi n 2 2 p = ffiffiffiffiffiffiffiffiffiffiffiffi 1 2 r 2 p , where r is the predictive ability and n is the number of tests (Bobko 2001). The test statistic follows a t N -2 distribution (Bobko 2001). A paired t-test was used for the assessment of statistical significance between single markers and each haplotype block size for the same constructed calibration populations.

Genomic prediction models
For genomic best linear unbiased prediction (GBLUP) using single markers, the mixed model with SNP additive effects (or average effects of gene substitution) based on the partition of genotypic values (Da et al. 2014) was used: where m = population mean, 1 = n · 1 column vector of 1's, n = number of lines, a = m · 1 column vector of marker additive effects, m = number of SNPs, W a = n · m model matrix of a with elements of 2p 2 , p 2 2 p 1 , and 22p 1 for a marker genotype, p k = frequency of allele k of a SNP (k = 1,2), and a ¼ W a a = GBLUP of additive values of the n lines. Assumptions for the first and second moments are: EðyÞ ¼ 1m, VarðaÞ ¼ I m s 2 a , and VarðeÞ ¼ R ¼ I N s 2 e , where s 2 a = variance of SNP additive effects, s 2 e = residual variance, I m = m · m identity matrix, and I N = N · N identity matrix. The GBLUP of additive values, and genomic restricted maximum likelihood (GREML) estimates were calculated using the GVCBLUP computer package (Wang et al. 2014b; https://animalgene.umn.edu).
For haplotype analysis, a multi-allelic haplotype model that treats each haplotype block as a 'locus' and each haplotype within the haplotype block as an allele (Da 2015) was used. The multi-allelic haplotype prediction was modeled as: where m = population mean, 1 = n · 1 column vector of 1's, n = number of lines, a h = n a · 1 column vector of haplotype additive effects, n a = number haplotype additive effects (or average effects of gene substitution), W ah = n · n a model matrix of a h with elements of 2p k , 2ð1 2 2p k Þ, and 22ð1 2 p k Þfor a haplotype genotype, p k = frequency of a haplotype in a haplotype block, and a ¼ W ah a h = GBLUP of additive values of the n lines. Assumptions for the first and second moments are: EðyÞ ¼ 1m, Varða h Þ ¼ I na s 2 ah , and VarðeÞ ¼ R ¼ I N s 2 e , where s 2 ah = variance of haplotype additive effects, s 2 e = residual variance, I na = n a · n a identity matrix, and I N = N · N identity matrix. The GBLUP of additive values were calculated using the GVCHAP computer package (Prakapenka et al. 2020; https://animalgene.umn.edu).

Data availability
Genotypic and raw phenotypic data for this study are available at figshare portal. The link to the genotypic data (https://figshare.com/ articles/Conley_MNWGSpanel_cM_hmp_txt/10031867). The link for the raw phenotypic data (https://figshare.com/articles/Pheno_ MN-WGS/10032326). Supplementary tables are available at figshare (https://figshare.com/articles/Supplemental_Tables_for_MN-WGS_ panel/10031891). Table S1 includes the average adjacent marker LD estimated as (r 2 ) for the 21 wheat chromosomes in the MN-WGS panel. Table S2 includes number of haplotype blocks for each chromosome, distance covered by haplotype blocks, maximum number, and average number of haplotype alleles in fixed length haplotypes of 5, 10, 15, and 20 adjacent markers. Table S3 includes the predictive ability for yield, test weight, and protein content using single markers, haplotype blocks of 5 adjacent markers (Haploblock-5), haplotype blocks of 10 adjacent markers (Haploblock-10), haplotype blocks of 15 adjacent markers (Haploblock-15), and haplotype blocks of 20 adjacent markers .

Phenotypic and genotypic data analysis
The MN-WGS panel was evaluated in two balanced trials in Minnesota for grain yield, test weight, and grain protein content. Correction for spatial field variability and trial effects was performed to improve estimates of phenotypic values of individuals. Significant differences were observed among lines for yield, test weight, and protein content. Estimated genetic variance, residual variance, and the broad-sense heritability for each trait are shown in Table 1. Heritability estimates were 0.28 for yield, 0.67 for test weight, and 0.68 for protein content (Table 1). After quality control filtering, 14,086 markers with map positions were used in the study. Marker density varied among chromosomes and ranged from 73 for chromosome 4D to 1,488 for chromosome 2B (Table S1). Extensive levels of LD, estimated as r 2 , were observed for all chromosomes that varied between 0.45 for chromosome 7A to 0.69 for chromosome 3B (Table  S1). The average adjacent marker LD across all chromosomes was 0.57. K-means clustering identified three different clusters and the number of lines for each cluster were 175 for cluster 1, 89 for cluster 2, and 118 for cluster 3 (Table 2). For the PCA, the majority of individuals in the MN-WGS panel were located in their respective clusters identified by K-means clustering (Figure 1). The first principal component (PC1) explained 10.0% of the variability whereas the second principal component (PC2) explained 8.3% of the variability in the MN-WGS panel (Figure 1). The genomic additive relationship matrix agreed with the results of the K-means clustering in identifying three clusters, each including genetically related individuals ( Figure 2). Table 2 displays the average additive genetic relationship between individuals in different (A ij between ) clusters and individuals within (A ij within ) clusters. Cluster 1 had the highest A ij between and lowest A ij within compared to the other two clusters (Table 2). On the other hand, cluster 2 had the lowest A ij between and highest A ij within ( Table 2). The average yield for the three clusters were 5556, 5617, and 5583 kg/ha for cluster 1, cluster 2, and cluster 3; respectively. No significant difference was observed for yield across the three clusters. The average test weight for the three clusters were 79.1, 77.8, and 78.7 kg/hL for cluster 1, cluster 2, and cluster 3; respectively. No significant difference was observed for test weight across the three clusters. The average protein content for the three clusters were 14.4, 14.0, and 14.4% for cluster 1, cluster 2, and cluster 3; respectively. No significant difference was observed for protein content across the three clusters.

Haplotype block construction
Haplotype blocks of 5, 10, 15, and 20 adjacent markers were generated for all chromosomes, with variable number of haplotype alleles identified for each haplotype locus. We will refer to haplotype blocks of 5, 10, 15, and 20 as Haploblock-5, Haploblock-10, Haploblock-15, Haploblock-20; respectively. With the increase of haplotype lengths , lower number of haplotype blocks were generated across the genome with higher numbers of haplotype alleles per haplotype blocks (Table S2). For Haploblock-5, a total of 2,810 haplotype blocks were identified across all chromosomes with up to 29 haplotype alleles per haplotype block (Table S2). On average, across the 21 wheat chromosomes, each Haploblock-5 covered 2.2 cM (Table S2).
Training population scenarios and comparing between single and haplotype prediction Generally, the predictive ability was lower for yield compared to test weight and protein content in both k-fold cross validation and stratified sampling. To investigate the effect of population structure on the predictive ability, using single SNP markers, the identified clusters were used as training populations by combining two clusters for predicting the third cluster for yield, test weight, and protein content. The size of the formed training populations varied depending on the clusters size ( Table 2). The predictive abilities when including a cluster in the training populations are presented in Table  2. When including cluster 1 in the training population in two cases (cluster 1 + cluster 2 and cluster 1 + cluster 3), the predictive abilities were higher across all traits ( Table 2). The average predictive abilities for cluster 2 were similar to cluster 3 across all traits and both were lower than cluster 1 (Table 2).
After confirming the effect of population structure on the predictive ability, a training population optimization method was used to design a calibration population by sampling a representative sample from each cluster. In general, the stratified sampling resulted in an increase in the predictive ability compared with k-fold cross validation for yield and protein content; while decreasing the predictive ability for test weight using all marker prediction scenarios ( Figure 3; Table S3). Four different haplotype block sizes  were used to assess the effectiveness of haplotypes compared with single markers in genomic prediction. All four haplotype blocks improved the predictive ability in both k-fold cross validation and stratified sampling in yield and protein content compared with single markers (Figure 3; Table S3). For k-fold cross validation in yield, Haploblock-5, Haploblock-10, Haploblock-15, and Haploblock-20 resulted in average increases of 6.3, 2.9, 5.3, and 2.2% in the predictive ability over single markers (Figure 3; Table S3). With the use of stratified sampling, Haploblock-5, Haploblock-10, Haploblock-15, and Haploblock-20 resulted in significant average increases of 6. 8, 5.5, 9.4, and 5.2% in the predictive ability over single markers (Figure 3; Table S3). For k-fold cross validation in protein content, Haploblock-5, Haploblock-10, Haploblock-15, and Haploblock-20 resulted in significant average increases of 3.4, 4.6, 6.7 and 6.9% in the predictive ability over single markers (Figure 3; Table  S3). With the use of stratified sampling, Haploblock-5, Haploblock-10, Haploblock-15, and Haploblock-20 resulted in significant average increases of 2.7, 4.3, 6.0, and 5.8% in the predictive ability for protein content over single markers (Figure 3; Table S3). For test weight, the increase of predictive ability of haplotypes compared with single marker prediction was significant only when using Haploblock-15 and Haploblock-20 with stratified sampling optimization (Figure 3; Table S3).

DISCUSSION
Most genomic selection investigations rely on using single markers for predicting breeding values of individuals. In our breeding experiment, multi-allelic haplotype prediction performed better than single markers for all three traits investigated. For both single markers and haplotype predictions, the predictive ability for yield was lower than protein content and test weight due to a lower heritability estimate for yield compared with other two traits. Traits with low heritability estimates tend to be highly quantitative, controlled by many loci with smaller effects, and have much environmental noise; which can result in lower prediction accuracies (Bernardo and Yu 2007;Daetwyler et al. 2010;Sallam et al. 2015). Daetwyler et al. (2010) demonstrated that increasing heritability will improve the accuracy of the prediction for both GBLUP and Bayes B. In Norwegian dairy cattle, a strong relationship was observed between prediction accuracy and trait heritability (Luan et al. 2009). Similar results were observed in wheat and barley as high heritability traits such as heading date, height, and test weight had higher prediction accuracies compared with low heritability traits such as yield Sallam et al. 2015). With low heritability traits, a larger number of phenotypic records are needed to better estimate marker effects for improving prediction accuracy (Hayes et al. 2009a;Luan et al. 2009).
K-means clustering identified three clusters with variable sizes. The cluster analysis revealed the pedigree structure in MN-WGS panel. The three clusters had similar performances for the three traits. The five most frequent parents each appeared in pedigrees at least 45 times (data not shown). One of those parents was not part of the MN-WGS panel. For example, Sabin (Anderson et al. 2012), assigned in cluster 2, is a parent to 91 individuals, 87 of which are included in cluster 2. RB07 (Anderson et al. 2009), assigned in cluster 3, is a parent to 74 individuals, 71 of which are included in cluster 3. MN02072-7, assigned in cluster 1, is a parent to 66 individuals, 56 of which are included in cluster 1. MN01333-A-2 is a parent to 53 individuals, 41 of which are included in cluster 1. Finally, Glenn (Mergoum et al. 2006), assigned in cluster 1, is a parent to 45 individuals, 32 of which are included in cluster 1. These results indicate that clustering in the population is determined by pedigree stratification. Cluster 2 had the highest A ij within and by searching through pedigree information, we found that all lines in this cluster are halfsibs, sharing Sabin as a common parent. The two training populations that included cluster 2 (cluster 1 + cluster 2 and cluster 2 + cluster 3) had an average predictive ability that is lower than cluster 1, whose included three parents: MN02072-7, Blade (PVP no. 200800075), Faller (Mergoum et al. 2008), and Glenn (Mergoum et al. 2006) and high frequency of their progenies. Blade, Faller, and Glenn are wheat cultivars and used as parents in the MN-WGS panel but with more progeny for those parents in cluster 1 (48 progeny lines). Several direct progenies of these three parents were also included in cluster 2 (16) and cluster 3 (20), which may explain the highest A ij between for cluster 1. The high genetic relationship of cluster 1 with the other two clusters resulted in a higher prediction accuracy of single marker prediction when including cluster 1 in the training population. These results may not be applicable to other breeding situations with different levels of population structure due to admixture or pedigree stratification (Toosi et al. 2010;Asoro et al. 2011). Our findings are in agreement with a genomic selection study in angus beef cattle as one of the five clusters identified using K-means clustering showed lower genetic relationship to other clusters, resulting in the smallest prediction accuracy across all traits when including this cluster in the training population (Saatchi et al. 2011). It has been proven that increasing the genetic relationship between the training and validation populations will improve the accuracy of genomic prediction (Habier et al. 2007;Lorenz et al. 2012;Lorenz and Smith 2015).
The size of the training population is another factor that affects the accuracy of genomic prediction. Training populations including cluster 1 had a larger size compared with the other two clusters and that may contribute to the higher prediction accuracy of cluster 1. Despite the fact that training populations including cluster 3 were larger than cluster 2, both resulted in similar prediction accuracies across all traits. In breeding populations, the change of prediction accuracy due to the increase of training population size is dependent on the genetic relationship (Albrecht et al. 2014) and breeding history (Sallam et al. 2015); therefore, careful selection of the training population is needed for a successful implementation of genomic selection (Lorenz et al. 2012;Lorenz and Smith 2015).
When implementing genomic selection in a breeding program, it is important to consider the best genotypes to be included in the calibration population. Additionally, implementing an efficient marker prediction approach is required to maximize the accuracy of genomic prediction. An effective genomic selection strategy in plant breeding programs is able to design a smaller training population for the purpose of generating genotypic and phenotypic data, which can improve resource allocation (Lorenz 2013;Endelman et al. 2014). Several methods were proposed to optimize calibration population design including CDmean (Rincent et al. 2012), PEV , stratified sampling (Isidro et al. 2015), and Gmean (Lorenz and Smith 2015). The stratified sampling approach outperformed CDmean optimization in structured populations across several traits with different genetic architectures (Isidro et al. 2015).
n■  2, 9.5, 9.8, 9.4, and 9.0% for single markers,  respectively. Stratified sampling reduced the predictive ability for test weight across all the marker/haplotype genotyping scenarios. The reason for the reduction in the test weight predictive ability when implementing stratified sampling is unknown. Similar results were observed in barley when stratified sampling resulted in a reduction of prediction accuracy in yield across selection cycles while increasing the prediction accuracy for Fusarium toxin accumulation (Deoxynivalenol) (Tiede and Smith 2018). There is no absolute training population optimization method that could be applied to all traits with variable genetic architecture (Isidro et al. 2015;Tiede and Smith 2018). This may be partly due to the correlation between the traits and population structure, which can affect the accuracy of prediction (Sallam et al. 2015). However, other optimization methods performed similarly or more consistently in traits with different genetic architecture and in populations with limited population stratification (Isidro et al. 2015;Tiede and Smith 2018). The current study accentuates the improvement of prediction accuracy based on haplotype blocks vs. single marker genomic prediction in a self-fertilized crop species. The four different haplotype sizes significantly improved the accuracy of prediction compared with single marker prediction for yield, test weight, and protein content. It is expected for haplotypes to improve the prediction accuracy over single marker prediction due to the increased LD between haplotypes and causal genetic variants, the effectiveness of capturing genetic relationship using haplotype information, and the ability of haplotype blocks to capture short-range epistatic interactions of nearby genetic variants (Clark 2004;Hayes et al. 2007;Hess et al. 2017;Jiang et al. 2018). Haplotypes may better capture the genomic similarity between lines because LD patterns in each block are considered. A relationship was observed between the length of the haplotype and the accuracy of prediction in animal studies using both simulated (Calus et al. 2008;Villumsen et al. 2009) and empirical data (Hayes et al. 2007;Hess et al. 2017). The increase of haplotype length is expected to capture LD between markers in blocks with QTL;  thereby increasing the accuracy of prediction. However, this may also increase the number of haplotype allelic classes, which may reduce the accuracy of prediction due to smaller sample sizes representing these classes (Villumsen et al. 2009;Da 2015;Hess et al. 2017). In our study, the longer haplotype blocks in Haploblock-20 resulted in a large increase for the number of haplotype allelic classes, on average, compared with other haplotype sizes, leading to no improvement in accuracy over Haploblock-15. Villumsen et al. (2009) found that the ideal haplotype size could be determined based on the LD level and marker density in a population. In the current study, extensive levels of LD were observed in the four haplotype block sizes with average LD of 0.568, 0.569, 0.572, and 0.570 for Haploblock-5, Haploblock-10, Haploblock-15, and Haploblock-20; respectively. The high levels of LD are a consequence of the selfing nature of wheat that results in extension of LD over long distance. In a simulation study in animals, an adjacent marker LD of 0.20 was sufficient for the use genomic prediction (Calus et al. 2008). LD is an important component for driving the accuracy of genomic prediction as the prediction accuracy increases at a similar pattern to the increase of LD (Solberg et al. 2008). It is clearly evident that the prediction accuracy is reduced at lower LD levels (Solberg et al. 2008;Calus et al. 2008). To assess the accuracy of genomic prediction in a rice diversity panel using single markers, LD levels between 0.49 and 0.64 resulted in higher accuracies with reductions in the accuracy of prediction at lower LD levels for three different traits (Ben Hassen et al. 2018). Thus, monitoring LD level while constructing haplotype blocks is a safe approach to ensure improvement of genomic predictions. This is because partial linkage between QTL and a group of markers may reduce QTL variance explained by haplotypes, thereby lowering the prediction accuracy (Villumsen et al. 2009).

Utility of haplotype prediction in plant breeding
We evaluated the implementation of a multi-allelic haplotype genomic prediction model in wheat to assess the changes of the predictive ability compared with single markers. Several methods were proposed for constructing haplotype blocks including fixed-length haplotypes and variable-length haplotypes that are based on haplotype identityby-descent (IBD) and LD-based haplotypes (Calus et al. 2008;Cuyabano et al. 2014;Hess et al. 2017). In current study, four fixed numbers (5, 10, 15, and 20) of adjacent markers were used to construct four different haplotype block sizes that resulted in improvement over single marker prediction for traits with different genetic architectures. With the implementation of haplotype prediction in conjunction with a training population optimization approach such as stratified sampling, the prediction accuracy improved substantially. Using Haploblock-15 and implementing stratified sampling for training population optimization, the predictive ability was improved significantly by 14.3 (four percentage points) and 16.8% (seven percentage points) for yield and protein content, respectively, compared with single markers and random k-fold cross validation. Improvement of prediction accuracy can change the ranking of top performing individuals in the selection candidate population, thereby increasing genetic gain. Figure 3 The predictive ability for yield, test weight, and protein content using single markers, haplotype blocks of 5 adjacent markers (Haploblock-5), haplotype blocks of 10 adjacent markers (Haploblock-10), haplotype blocks of 15 adjacent markers (Haploblock-15), and haplotype blocks of 20 adjacent markers (Haploblock-20). The two validation methods used are k-fold cross validation and the stratified sampling optimization. A star over the error bar indicates a significant difference in the predictive ability between the haplotype and single markers for the same validation method.