Genome-wide association study and genomic prediction of white rust resistance in USDA GRIN spinach germplasm

Abstract White rust, caused by Albugo occidentalis, is one of the major yield-limiting diseases of spinach (Spinacia oleracea) in some major commercial production areas, particularly in southern Texas in the United States. The use of host resistance is the most economical and environment-friendly approach to managing white rust in spinach production. The objectives of this study were to conduct a genome-wide associating study (GWAS), to identify single nucleotide polymorphism (SNP) markers associated with white rust resistance in spinach, and to perform genomic prediction (GP) to estimate the prediction accuracy (PA). A GWAS panel of 346 USDA (US Dept. of Agriculture) germplasm accessions was phenotyped for white rust resistance under field conditions and GWAS was performed using 13 235 whole-genome resequencing (WGR) generated SNPs. Nine SNPs, chr2_53 049 132, chr3_58 479 501, chr3_95 114 909, chr4_9 176 069, chr4_17 807 168, chr4_83 938 338, chr4_87 601 768, chr6_1 877 096, and chr6_31 287 118, located on chromosomes 2, 3, 4, and 6 were associated with white rust resistance in this GWAS panel. Four scenarios were tested for PA using Pearson’s correlation coefficient (r) between the genomic estimation breeding value (GEBV) and the observed values: (1) different ratios between the training set and testing set (fold), (2) different GP models, (3) different SNP numbers in three different SNP sets, and (4) the use of GWAS-derived significant SNP markers. The results indicated that a 2- to 10-fold difference in the various GP models had similar, although not identical, averaged r values in each SNP set; using GWAS-derived significant SNP markers would increase PA with a high r-value up to 0.84. The SNP markers and the high PA can provide valuable information for breeders to improve spinach by marker-assisted selection (MAS) and genomic selection (GS).


Introduction
Spinach (Spinacia oleracea L.) is one of the important vegetable crops in the world economically, which was estimated to average $490 million (fresh and for processing) per year during 2018-20 in the United States (US), with 97% of the value for the fresh market [1]. Spinach is considered a "super food" due to a high concentration of phytonutrients and other health-promoting compounds, including vitamin A and C, carotenoid, lutein, folate, calcium, iron, and antioxidants [2,3]. In the US, spinach has become very popular during the past two decades as healthy-conscious consumers have increased the consumption of leafy vegetables. To meet the greater demand for spinach, commercial production has evolved into high density (up to 10 million/ha) planting, year-round production cycles, overhead sprinkler irrigation systems, high fertilizer application, and expanded production areas; all of them create an environment conducive for the development of diseases. Several diseases reduce yield and quality individually or in combination, thus posing serious challenges to commercial spinach production. Spinach suffers from many diseases while downy mildew, white rust, Fusarium wilt, Stemphylium leaf spot, and anthracnose leaf spot are the most devastating and economically important diseases of spinach.
White rust of spinach is caused by Albugo occidentalis, an obligate oomycete that can reduce yield and quality [3][4][5][6][7][8]. White rust has been an endemic throughout the central and eastern US for many years but has also been reported in other parts of the world, including Greece [9], Mexico [10], and Turkey [11]. The persistent appearance of this disease and its expansion into wider geographic areas pose a significant challenge to the spinach industry in the US and world. If this pathogen is introduced into major US production areas in California and Arizona that produce over 90% of the fresh market product, it would be devastating to the US spinach industry as the vast majority of the cultivars adapted to these areas have little to no resistance incorporated. Resistance to white rust has been previously found in USDA spinach germplasm and breeding lines [4,7,[12][13][14]. High levels of resistance to white rust have been reported in the spinach varieties developed at the spinach breeding program of University of Arkansas as this disease was a primary breeding goal [3]; thus, the Arkansas germplasm has been used as a source of resistance to transfer white rust resistance into several commercial cultivars. However, even this resistance material can suffer severe infection when conditions are highly conducive for the disease development [7,[12][13][14]. Yet, there is little information regarding the genetics of white rust resistance in spinach. Thus it is necessary to routinely evaluate and identify new spinach resources for white rust resistance for the development of cultivars with improved resistance. So far, only quantitative resistance has been found and utilized as no major genes have been reported for white rust resistance [5].
The publication of reference genomes and other new assemblies [15][16][17][18][19] has made genome-wide variant discovery in the germplasm panel and genome-wide association studies (GWAS) possible in spinach. With the decreasing genotyping cost in recent years and advanced statistical methods, GWAS and genomic selection (GS) approaches are commonly utilized to improve complex genetic traits in crops. GWAS, based on the genotyping and phenotyping of a natural germplasm population and high-density markers, has been employed to map simple to complex traits and identify candidate genes in many crops [20][21][22]. GWAS has been used in spinach for many traits, including surface texture, edge shape, and petiole color [23]; bolting, tallness, and erectness [24]; leafminer resistance [25]; oxalate concentration [26]; Verticillium wilt resistance [27], Stemphylium leaf spot resistance [28]; mineral nutrient contents [29]; white rust resistance [30]; growth habit [31]; anthracnose resistance [32]; and downy mildew resistance [33][34][35][36]. The identification of single nucleotide polymorphism (SNP) markers for the traits has provided valuable molecular tools for breeders to develop spinach cultivars more efficiently. It has been demonstrated that quantitative trait of white rust resistance in spinach can be screened under field conditions and is correlated with quantitative resistance to downy mildew [4,37]. Using a panel of 267 spinach accessions with 6111 SNPs,   [30] conducted GWAS analysis for white rust resistance and identified 448 minor alleles (SNPs) associated with white rust disease severity, which maybe be utilized in the selection for resistant plants.
Genomic prediction (GP) is emerging as a promising tool to improve the efficiency and speed of plant breed-ing. So far, GP has been reported in several crops [38][39][40][41][42][43][44][45][46][47][48] for various traits. Genomic estimation breeding value (GEBV) is a key step in GS. Several approaches have been proposed to determine GEBV, such as Bayesian methods (Bayes A, Bayes B, Bayes LASSO, and Bayes ridge regression) and BLUP methods (RR-BLUP, gBLUP, and cBLUP). The GS approaches have been adopted for variety of traits in various crops [49][50][51][52][53][54][55]. All articles reported the prediction accuracy (PA) estimates using the Pearson's correlation coefficient (r) between the observed phenotypic values and the predicted GEBV for each trait in a validation set using several different models.
Currently, USDA-GRIN (Germplasm Resources Information Network) has approximately 400 spinach accessions, which were originally collected from 33 countries and represent a diverse germplasm collection. The overall objectives of this study were to evaluate USDA spinach germplasm for white rust resistance under the field conditions and to identify resistance-associated SNP markers through GWAS to conduct marker-assisted selection and genomic selection for white rust resistance.

White rust phenotyping
The 346 spinach accessions were evaluated for white rust resistance in the Del Monte White Rust Nursery, Crystal City, Texas, during the three winter seasons of 2015-16, 2016-17, and 2017-18. The nursery is known to have high white rust disease pressure over the three decades. The field experiments were performed in a randomized complete block (RCB) design with two replications. In each block, each accession was planted in a 10-feet long row, three feet between rows, and 4-inch between plants within the row. Thus, there were about 30 plants in each row with 60 plants of each accession for evaluation each year. White rust disease was evaluated under natural disease pressure without introducing external inoculum. A susceptible cultivar, Viroflay, was planted as a spreader row on both sides of the tested genotypes.
The 2017-2018 winter season trial had high white rust disease pressure among the three years of evaluations, while the disease severity was lower in the other two years (data not shown), thus only disease severity data from the 2017-18 winter season were reported in this study. The white rust response data of the 346 spinach accessions were analyzed for the analysis of variance (ANOVA) with the general linear models (GLM) in JMP Genomics 9 (SAS Institute, Cary, NC). Multiple comparisons among individual accessions were performed using the student T-test at α = 0.05, and mean, range, standard deviation (SD), standard error (SE), and coefficient of variation (CV) of disease severity were computed. Distribution of white rust disease across accessions was drawn and the mean disease rating of each accession was used as the phenotypic data for GWAS.

Genotyping
DNA was extracted from fresh leaves bulked from 5-10 plants for each genotype. Qualified DNA for each sample was sheared randomly into 350-bp fragments by Covaris Ultrasonic Processor before sequencing. The construction of the DNA libraries followed the process of end repairing, adding A tails, purification, PCR amplification and library qualification [56]. The DNA libraries were pair-end sequenced by whole-genome resequencing (WGR) technology at 10x spinach genome size coverage generating about 10 Gb sequence data for each sample using Illumina NovaSeq Sequencer machine at BGI (https://www.bgi.com/). The spinach genome of Sp75 [18,57] available at SpinachBase was used as a reference to map the WGR data of the 346 spinach genotypes using Burrows-Wheeler aligner software (BWA v0.7.8-r455 [58]). SAMtools (v 0.1.19-44 428 cd) [58] were utilized to sort the bam files and remove duplication reads. The program Picard (v 1.111) [58] was used to merge the bam files from the same sample, and the GATK software (v 3.5) [59]

Association analysis
GWAS was performed in a two step process. In the first step, 2 357 260 SNPs were used to perform GWAS implementing single marker regression (SMR), GLM (PCA), and MLM (PCA + K) methods in TASSEL 5 [61]. However, we only used the 8399 randomly selected SNPs to estimate PCA and Kinship matrixes. PCA matrix was estimated with the PCA tool in TASSEL 5, setting covariance (alterative = correction) and the number of components = 2. Kinship (K) was estimated in TASSEL 5 by using Scald_IBS method. Based on GWAS analysis in TASSEL 5, there were 4836 SNPs with the logarithm of odds (LOD) [−Log 10 (Pvalue)] > 4.0 either in SMR, GLM, or MLM (We use LOD instead of -Log 10 (P-value) in this article.).
Multiple TASSEL and GAPIT models were used to find reliable and stable white rust resistance-associated SNP markers and candidate genes and QTL regions in spinach. The significant threshold of associations was calculated using Bonferroni correction of P-value with an α = 0.05 (0.05 / SNP number), and LOD value of 5.42 was used as significance threshold based on the 13 235 SNPs in this study.

Candidate gene identification/detection
Genes were searched within 50 Kb on either side of significant SNPs of the spinach Sp75 genome annotation at the SpinachBase site (http://www.spinachbase.org/). Our emphasis was to find analogs of disease resistance genes near the significantly associated SNP markers.

Genomic prediction for genomic selection of white rust resistance
The ridge regression best linear unbiased prediction (rrBLUP) method was used to perfrom GP using the rrBLUP package [66] in R Version 4.0.5. In addition, GP was conducted with gBLUP and cBLUP implemented in GAPIT package [54]; Bayesian models including Bayes A, Bayes B, Bayes LASSO (BL), and Bayes ridge regression (BRR) implemented in BGLR package [67]; and random forest (RF) model implemented in Random Forest R package [68] and support vector machines (SVM) [68] implemented in kernlab packages. GP using these packages has been reported in previous studies [49][50][51][52][53].
The PA for the tested models in this study was estimated by calculating the average Pearson's correlation coefficient (r) between the GEBVs estimates from the training set and white rust phenotypic values in the validation set or testing set [40,53,69]. The training and testing sets were randomly generated 100 times; the average r-value was estimated; and distribution charts (boxplots) were drawn using the ggplot2 R package.

Evaluation of white rust resistance
The white rust disease showed signs and symptoms on leaves, and the disease severity was recorded using the 0 to 10 disease scale (Fig. 1). The scale (0-10) in the 346 spinach accessions did not show a normal distribution but skewed toward a higher disease severity due to most material being highly susceptible (Fig. 2) in the association panel. The mean disease severity ranged from 1.0 to 6.5, averaged 4.8 with a standard deviation of 0.911 and the CV was 17.2%. The data showed an extensive range and variation of the white rust disease scale in the 346 accessions, confirming the suitability of the association panel for GWAS. The lines NSL 6098, PI 175311, PI 220686, PI 224959, PI 226671, PI 227045, and PI 648958 showed the highest white rust resistant levels with a score of 2.0 or less (Table 1 & S1), indicating their suitability as parents in breeding programs to develop white rust-resistant hybrids and cultivars.

Genetic diversity among white rust-resistant lines
Among the 346 spinach accessions, 23 showed white rust resistance with a rate 3 or below (Table 1), indicating that the 23 spinach accessions can be used as parents to develop new spinach cultivars or lines for white rust resistance in breeding. Five of the 23 accessions were originally collected from Afghanistan; two from China; two from India; five from Iran; three from Turkey; and six from United States ( Table 1), indicating that the white rust resistance was mainly distributed among Asia and U.S. accessions in this study.
The genetic diversity analysis among the 23 accessions showed that (1) the accessions from the same country were located at neighbor each other with less genetic distance in the phylogenetic tree in most cases; (2) two clusters were grouped: the five accessions, PI 227045, PI 165994, PI 175311, PI 433210 and PI 648949 from Iran, China and India as one group, and other 18 accessions as another group; and (3) In group I, the four accessions, PI 207518, PI 220686, PI 211632, and PI 212120 had different genetic base from others as I-outlier (Fig. 3, Table 1). The phylogenetic analysis will provide information on how to utilize these white rust-resistant accessions.  Table S1.

Association study
Based on the six models in GAPIT 3 and three models in TASSEL 5 when PCA = 2, 40 SNPs, located on chrs 1, 2, 3, 4, 5, and 6, were associated with the white rust resistance (Table S3). The observed vs expected LOD [−log 10 (p)] distributions in QQ-plots showed a large divergence from the expected distribution ( Fig. S3 B), indicating that there were SNPs associated with the white rust resistance in the association panel. The Manhattan plot showed that a dozen SNPs with LOD value greater than 5.42 (significant threshold) across the six GWAS models from GAPIT 3 ( Fig. S3 A & C), were associated with white rust resistance.
BLINK had SNPs with LOD greater than 5.42 on chrs 1, 2, 3, 4, and 6 and FarmCPU had SNPs with LOD >5.42 on chr 2, 3, 4, 5, and 6 ( Fig. 5, Table S3), indicating that there are SNP markers associated with white rust resistance. Gapit.SUPER, Gapit.GLM, Tassel.GLM and Tassel.SMR showed a peak at the region on chr 4, where a dozen SNPs had LOD >5.42 and only this region had SNPs with LOD >10 (Fig. 6, Fig. S4, Table S3), indicating there is a major QTL in the region of chromosome 4 for white rust resistance. However, the Tassel.MLM (Fig. S4), Gapit.MLM, and Gapit.MLMM (Fig. S5) don't have any SNP with LOD >5.42, but have dozen of SNPs with LOD >3.0 ( Fig. S4 and S5) and six SNPs had LOD score > 4.0 or 3.0 in the three models, indicating that there were small-effect QTLs for white rust resistance (Table S3).

Candidate genes for white rust resistance
A total of 121 genes were listed in Supplementary Table  S5 and they were located within 50 Kb distance from the 40 associated SNP markers in Table S3. All Leucine-Rich Repeat (LRR) genes plus those with less than 1 Kb distance from the associated SNP markers were listed in Table 3, where 13 genes were located at 12 associated SNP regions. Six SNPs were inside six genes and three SNPs were with a distance less than 1 Kb from a gene, respectively. Six SNPs were located less than 50 Kb from five disease resistance gene analogues encoding LRR domains ( Table 3).

Genomic prediction with different ratios of the training set to testing set
In this study, there were nine ratios between training and testing sets, two GP models, and three SNP sets making a total of 54 combinations. The average r-value (rȲ 100 ) and its standard error (SE) from the 100 runs for each combination are listed in Table S6 and the 54 averaged r values (rȲ 100 ) displayed in charts created by R package divided by two models: left half from BL (Bayesian LASSO) and right half from rrBLUP, and grouped by the nine folds with three SNP sets (Fig. S6).
The nine-fold sets had similar, although not identical, averaged r values (rȲ 100 ) in each SNP set using the same model, either BL or rrBLUP (Table S6, Fig. S6). From rrBLUP, the r-value averaged 0.67, ranged from 0.63 (2 fold) to 0.69 (8- (Table S6). Overall, the 2-fold had a low r-value but had a smaller SE. However, the SE increased when increasing the fold number. In general, BL model had higher r-value than rrBLUP. The all.13235.SNP set had higher value than the other two sets and the 40-SNP.marker set had a higher r-value than the 9-SNP.marker set (Table S6, Fig. S6). There were 24 combinations for GP analysis, consisting of eight SNP number sets selected from three SNP groups. Each GP analysis was run for 100 times to calculate GP statistical parameters and r values. The average r-value (rȲ 100 ) and SE of 100 runs for each GP combination are presented in the Table S7 and Fig. S7.

Genomic prediction with different SNP numbers
The results showed that the average r value (rȲ 100 ) decreased when decreasing the SNP number (Table S7 and Fig. S7). From the Set.13235SNP, the average r-value (rȲ 100 ) was 0.79 when 4836 SNPs were used and decreased to 0.25 when 9 SNPs were used. In the Set.4836SNP.select, the average r-value (rȲ 100 ) was 0.82 when 4836 SNPs were used and decreased to 0.31 when 9 SNPs were used. In the two SNP groups, the r-value was higher than 0.50 when 100 SNPs or more were used. But the average r-value (rȲ 100 ) was very low in the Set.8839SNP.random of the SNP group; ranged from 0.19 to 0.07; and the highest was only 0.19 when 4836 SNPs were used (Table S7, Fig. S7), indicating that random SNP group without associated markers included can't be used for white rust resistance in GS, but using associated SNP marker will increase the selection efficiency; and when > = 100 SNPs with associated markers can be used as a set to select white rust resistance in GS.

Genomic prediction using different models
Nine GP models (BA, BB, BL, BRR, SVM, RF, rrBLUP, gBLUP, and cBLUP) were used to conduct GP for white rust resistance among seven SNP sets: all, 40 m, 9 m, 40r, 9r, 40rr, and 9rr, where all signifies all 13 235 SNP set; 40 m is the 40 SNP markers in the Table S3; 9 m is the 9 SNP markers listed in Table 2; 40r is a SNP set consisted of 40 SNPs randomly selected from the 13 235 SNP set; 9r is a SNP set of 9 SNPs randomly selected from the 13 235 SNP set; 40rr Table 3. List of 13 genes located at 12 associated SNP regions including 6 SNPs on the 6 genes and 3 SNPs with a distance less than 1 Kb with a gene, respectively, and 6 SNPs with a distance less than 50 Kb with 5 disease resistance gene analogue LRR domain is a set consisting of 40 SNPs randomly selected from the random 8839 SNP set; and 9r is a set of 9 SNPs randomly selected from the random 8839 SNP set (Table S8, Fig. 7).  (Table S8, Fig. 7), indicating that the six GP models (BA, BB, BL, BRR, gBLUP, and cBLUP) are good GP models to be utilized in GS for selecting white rust resistance in spinach. All nine models had low rȲ 100 values in the four random sets (40r, 9r, 40rr, and 9rr), suggesting that we can't use a small SNP number randomly selected from a million SNP set in GS for white rust resistance.

Genomic prediction using GWAS-derived SNP markers
A higher r-value (rȲ 100 ) were observed when using the GWAS-derived SNP marker sets: > = 0.73 in 40-SNP marker set (m40) and > =0.59 in 9-SNP marker set (9 m) across the six GP models (BA, BB, BL, BRR, gBLUP, and cBLUP) (Table S8, Fig. 7). An averaged 0.75 of rȲ 100 value in 40 m set and 0.61 in 9 m set were calculated across nine GP models, which were much higher than those from the four randomly selected SNP sets: either 40r, 9r, 40rr, or 9rr across all tested nine GP models (Table S8, Fig. 7), indicating that the GWAS-derived SNP marker sets had high PA and suggesting that we can use the GWAS-derived SNP markers in GS for selecting white rust resistance.

Evaluation of white rust
White rust is a non-culturable oomycete pathogen making it difficult to screen and select spinach genotypes for resistance. Currently, no efficient method has been developed to evaluate white rust resistance in greenhouse or growth chamber conditions as disease severity among known resistant and susceptible genotypes is difficult to discriminate in a single disease cycle [70]. Therefore, field evaluations, whereby spinach genotypes can be evaluated over multiple secondary infection cycles over a longer period of time, can be used to evaluate and select white rust-resistant lines in spinach [4]. For example, some spinach genotypes that are known to be highly resistant to white rust get infected, but the lesions develop somewhat slower and the rust pustules may not even break through the epidermis. As a result, fewer infections occur on a given plant and the difference between a highly susceptible and highly resistant line in a single infection cycle may not be noticeable, but the differences become more pronounced over a longer period of time under field conditions. However, even in a white rust disease nursery, as was used in this study, disease severity is still highly dependent on the environment from year to year, resulting in a low and unpredictable selection efficiency. Establishing a uniform spread of consistent white rust nurseries under field setting is difficult to accomplish, requiring multiple years of no-rotation spinach crop to build enough inoculum in the soil. In this study, the 346 USDA spinach germplasm accessions were evaluated for white rust resistance in the Del Monte White Rust Nursery in Crystal City, Texas, for three years during the three winter seasons of 2015-16, 2016-17, and 2017-18. The nursery was used for spinach white rust evaluation for commercial spinach hybrids, germplasm, and breeding lines where heavy disease pressure had consistently been observed for 30 years when environmental conditions were favorable for the disease. Because the disease severity was relatively low in the winter seasons 2015-2016 and 2016-2017 including the known susceptible control lines, the white rust disease severity ratings were not robust (i.e. false low ratings due to escape from disease). However, disease severity was high in the winter 2017-2018 season due to the favorable environment. Therefore, this 2017-2018 disease severity report was only used to conduct GWAS and identify SNP markers associated with white rust. Based on the white rust phenotypic data, 23 of 346 spinach accessions showed relatively high resistance to white rust. The accessions showing higher resistance with a disease severity scale of 2.0 or less (Table 1 & S1) can be used as parents in breeding programs to develop white rust-resistant lines and cultivars.

Genome-wide association study and SNP marker identification for white rust resistance
In this study, GWAS was performed in two steps. In the first step, 4836 SNP markers associated with white rust resistance were identified from 2 357 260 SNPs using TASSEL 5. In the second step, 13 235 SNPs (the 4836 SNPs from the first step plus the randomly selected 8399 SNPs) were used to conduct GWAS by implementing multiple models, including six models in GAPIT 3, three models in TASSEL 5, and t-test. Forty SNP markers were associated with white rust resistance with LOD >5.42 in one of the six tested MLM models (Gapit.MLM, MLMM, SUPER, FarmCPU, BLINK, or Tassel.MLM). After combining analysis of the six models in GAPIT 3 and three models in TASSEL 5, nine SNPs located on chrs 2, 3, 4, and 6 were relatively consistent across the models and were selected as the associated SNP markers for white rust resistance in this study (Table 2) [30] was validated in this study since their approach targeted factors associated with susceptibility. Therefore, it is possible to combine resistance and susceptible associated SNPs found in this study to improve spinach resistance. As expected, we observed differences in the number of identified associated SNPs when using TASSEL 5 vs GAPIT 3 GWAS tools or different models such as BLINK, FarmCPU, GAPIT.MLM and GAPIT.GLM. The same differences have been widely reported and discussed in several publications [30][31][32][33][34][35][36]53]. In this study, we selected SNPs as the markers associated with white rust resistance by multiple models combined, including six models in GAPIT 3 and three models in TASSEL 5 if the SNP had a significant LOD value across multiple models.

Candidate genes for white rust resistance
In this study, a total of 121 genes were identified to be located within 50 Kb distance from the 40 associated SNP markers (Table S3). Thirteen genes located at 12 associated SNP regions were selected as condidates for white rust resistance, among them, five were disease resistance gene analogue with LRR domain (Table 3). However, further studies are needed to confirm whether the five LRR genes are related to white rust resistance.
Despite the success of GWAS in identifying genetic loci associated with several agronomic and disease resistance-related traits, it will be challenging to pinpoint the causal gene in each of these loci. A successful GWAS only identifies probable genomic regions but requires subsequent characterization for validating the actual identification of causal relationship with disease using proteomics/ transcriptional profiling. Given the complexity and nature of the white rust, most genes identified in our study, whether a given gene is likely to be involved in determining a resistant phenotype alone would need cloning individual genes in the appropriate genetic background. Most verified plant disease resistance genes isolated to date contain a nucleotide-binding site and leucine-rich repeat (NBS-LRR) domains-similar to the one we identified in this study. Activated NB-LRRs represent a tip of the signaling cascade that triggers defense responses and not the causal genes defining the resistance alone. It will be impulsive to correlate expression patterns of the identified candidate genes and disease reaction. The percent variation (R-square%) explained by individual SNP (Table 3) shows the strength of these variants at the evolutionary and population levels in defining resistance. Hence, the genes identified in this study open new avenues to design white rust resistance through systematic integration of selected accessions in the breeding program of spinach and other closely related species. Detailed characterization of these genes, although intriguing, is beyond the scope of this paper. However, we will continue pursuing the relationship of these genes in resistance mechanisms as a future work through additional studies.

Genomic prediction
In this study, the nine-fold sets from 2 fold (1:1) to 10fold (9:1) had similar, although not identical, averaged r values (rȲ 100 ) in each SNP set using the same model, either BL or rrBLUP (Table S6, Fig. S6). When increasing the fold number, the SE was also increased, suggesting that a larger training set and the smaller testing set would increase the error. The 2-fold set has a smaller rvalue and showed that a smaller training set would have less PA with a smaller r-value. Shi et al. (2021) [53] also reported that different training/testing ratio sets showed similar trends in GP analysis for soybean cyst nematode resistance in common bean. Ravelombola et al. (2021) [49] reported similar results for growth habit, flowering time, and grain yield in the cowpea population evaluated under drought conditions. Keller et al. (2020) [71] reported that a training set of <30% reduces PA due to a small sized training set that results in overfitting of the model. They also noted that increasing training set >80% leads to large variation between cross-validations with a small validation set.
The six GP models (BA, BB, BL, BRR, gBLUP, and cBLUP) had similar high average r-value (rȲ 100 ): > = 0.82 in all.13235-SNP set, > = 0.73 in the 40-SNP marker set, and > =0.59 in the 9-SNP marker set. The three models, SVM, RF, and rrBLUP, still had high average r-value (rȲ 100 ) > =5.2 in the all SNP set, > = 0.63 in the 40-SNP marker, and > =0.47 in 9-SNP marker set, but rrBLUP had low value with 0.38 and 0.25, respectively (Table S8, Fig. 7), indicating that the six GP models are good GP models to be utilized in GS for selecting white rust resistance in spinach. Usually, the Bayesian models such as BA, BB, BL, and BRR had high PA with higher r-value [53]. The gBLUP and cBULP models in GAPIT tool had a lower r-value than other models for SCN resistance in common bean [53]. Since 2021, the gBLUP and cBLUP models running in GAPIT 3 tool in Zhiwu Zheng's lab has improved their prediction and created higher PA than other models as shown in this study and the two models had the highest average r-value (rȲ 100 ) with the smallest SE (Table S8, Fig. 7), indicating their efficiency in GS for white rust resistance in spinach.
GP was thirdly performed with eight different SNP number sets from 9 to 4836 SNPs for white rust resistance using BL model. The results showed that the average r-value (rȲ 100 ) decreased when decreasing the SNP number (Table S7 and Fig. S7). From the Set.13235SNP, the average r-value (rȲ 100 ) was 0.79 when 4836 SNPs were used and decreased to 0.25 when 9 SNPs were used. In the Set.4836SNP.select, the average r-value (rȲ 100 ) was 0.82 when 4836 SNPs were used and decreased to 0.31 when 9 SNPs were used. In the two SNP groups, the r-value was higher than 0.50 when 100 SNPs or more were used. However, the average r-value (rȲ 100 ) was very low in randomly selected set: Set.8839SNP.random, which was randomly selected from 217 531 SNPs (4.06%); ranged from 0.07 to 0.19 (Table S7, Fig. S7), indicating that random SNP group without associated markers included cannot be used for white rust resistance in GS, but using associated SNP marker will increase the selection efficiency. From this study, 100 or more SNPs with associated markers can be used as a set to select white rust resistance in GS. In most reports, the smaller the number of SNPs resulted in lowering the PA [40,53,71,72]. Zhang et al. (2016) [40] estimated PA (r-value) of seed size of 309 soybean accessions and reported r = 0.85 using 2000 SNPs or 31 045 SNPs; and lowered to 0.8 when 1000 SNPs or 500 SNPs were used.
In this study, GP was performed using GWAS-derived SNP markers, either 4836 selected SNPs, 40-SNP marker set, or 9-SNP marker set had higher PA (r-value) than the randomly selected SNP sets in all of the tested GP models (Table S6 and S8). The results suggest the advantage of using the GWAS-derived SNP markers in GS for white rust resistance. Zhang et al. (2016) [40] reported that the r values were 25% higher when using GWAS-derived SNP markers than using the same number of randomly selected SNPs for seed size in soybean. Qin et al. (2019) [72] also reported that the average r values were higher when using SNP markers for 15 amino acid contents in soybean seeds. Spindel et al. (2016) [73] developed a GS model that combines RR-BLUP with GWAS derivedmarkers and reported that this new model outperformed for a variety of traits in multiple environments. Thus, using GWAS-derived SNP markers to perform GS is an approach that combines MAS and GS and can be used in the real-world breeding programs. However, the predictive ability may be biased when GWAS-associated SNP markers are used to predict the GEBVs in the same GWAS panel. The GP will probably be lower if prediction performance is tested in other panels with different individuals. Similar approaches have been tested for many traits in several crops and found it practical to do genome breeding using GWAS-derived SNP markers [49-51, 53, 72]. Therefore, GP approach combining both MAS and GS through GEBVs using associated SNP markers would be valuable in molecular breeding for white rust resistance in spinach and for other quantitative traits in other plant species, and assessment of genomic prediction potential is ongoing on several important traits in spinach [74].

Conclusion
In this study, 346 USDA spinach germplasm accessions were phenotyped for white rust resistance under field conditions; 23 accessions showed white rust resistance or intermediate resistance with a disease rate 3.0 or less based on 0-10 scale; and the seven accessions, NSL 6098, PI 175311, PI 220686, PI 224959, PI 226671, PI 227045, and PI 648958 showed the highest white rust resistant levels with a score of 2.0 or less, indicating their suitability as parents in breeding programs to develop white rust-resistant hybrids and cultivars. Genomewide association study (GWAS) was performed in the 346 accessions with 13 235 SNPs and identified nine SNPs, chr2_53 049 132, chr3_58 479 501, chr3_95 114 909, chr4_9 176 069, chr4_17 807 168, chr4_83 938 338, chr4_ 87 601 768, chr6_1 877 096, and chr6_31 287 118, located on chromosomes 2, 3, 4, and 6 associated with white rust resistance. Genomic prediction (GP) was tested for prediction accuracy (PA) using Pearson's correlation coefficient (r) between the genomic estimation breeding value (GEBV) and the observed values. High averaged r values were observed in each SNP set using different GP models and up to 0.84 when using GWAS-derived significant SNP markers. The SNP markers and the high PA can provide valuable information for breeders to improve spinach by marker-assisted selection (MAS) and genomic selection (GS).