Skim resequencing finely maps the downy mildew resistance loci RPF2 and RPF3 in spinach cultivars whale and Lazio

Abstract Commercial production of spinach (Spinacia oleracea L.) is centered in California and Arizona in the US, where downy mildew caused by Peronospora effusa is the most destructive disease. Nineteen typical races of P. effusa have been reported to infect spinach, with 16 identified after 1990. The regular appearance of new pathogen races breaks the resistance gene introgressed in spinach. We attempted to map and delineate the RPF2 locus at a finer resolution, identify linked single nucleotide polymorphism (SNP) markers, and report candidate downy mildew resistance (R) genes. Progeny populations segregating for RPF2 locus derived from resistant differential cultivar Lazio were infected using race 5 of P. effusa and were used to study for genetic transmission and mapping analysis in this study. Association analysis performed with low coverage whole genome resequencing-generated SNP markers mapped the RPF2 locus between 0.47 to 1.46 Mb of chromosome 3 with peak SNP (Chr3_1, 221, 009) showing a LOD value of 61.6 in the GLM model in TASSEL, which was within 1.08 Kb from Spo12821, a gene that encodes CC-NBS-LRR plant disease resistance protein. In addition, a combined analysis of progeny panels of Lazio and Whale segregating for RPF2 and RPF3 loci delineated the resistance section in chromosome 3 between 1.18–1.23 and 1.75–1.76 Mb. This study provides valuable information on the RPF2 resistance region in the spinach cultivar Lazio compared to RPF3 loci in the cultivar Whale. The RPF2 and RPF3 specific SNP markers, plus the resistant genes reported here, could add value to breeding efforts to develop downy mildew resistant cultivars in the future.


Introduction
Spinach (Spinacia oleracea L.) is a leafy vegetable crop cultivated on 58, 000 acres in the US valued at more than 500 million dollars in farmgate value [1]. The US ranks second in spinach production after China. Spinach is primarily produced in California and Arizona in the US, favored by cool and dry environmental conditions. High nutrition content and other health-promoting compounds make spinach an excellent component of a healthy diet [2]. Demand for spinach is continually increasing in the US, particularly for organically produced spinach, which now accounts for nearly half of the total production. Spinach, a diploid crop comprised of six chromosome pairs, is primarily a dioecious crop, although some other sex forms naturally occur. Selfing is not uncommon in female spinach plants, especially in the absence of pollen, which generates inbreed seeds lacking the desired combination of complementing RPF (Resistance to Peronospora farinosa) loci in hybrid seeds. Commercial seeds of spinach are produced in geographical regions with long days, moderate-cool temperatures, and dry weather during critical stages of pollination and seed set [3]. Optimal conditions for these stages are found in Denmark, Washington, and Oregon, where long days and minimal rainfall prevail. Denmark holds the largest share of spinach seed production, accounting for over 70% of global production, while Washington and Oregon contribute approximately 20%.
Downy mildew infection makes plants unsalable. Obligate oomycete Peronospora effusa, formerly Peronospora farinosa, causes downy mildew disease [2,3] and is host-specific as it is only known to infect spinach. Nineteen distinct P. effusa races are currently known in spinach [4][5][6], with 16 of them documented after 1990. These new P. effusa races have been overcoming newly introgressed resistance genes in spinach cultivars. Thus, managing downy mildew disease is a major agenda of public and commercial breeders. Multiple RPF loci are hypothesized to provide resistance to different P. effusa races [7]. Most spinach hybrid cultivars contain complementing RPF loci from two inbred parents that offer resistance against many pathogen races. Despite the development of spinach cultivars with resistance to specific downy mildew races, no cultivar is known to be resistant to all races of the pathogen except for Sunangel, which is marketed as having resistance to races 1 through 19, although exhibiting intermediate resistance to race 10. Developing a spinach cultivar that can resist all available pathogen races would significantly benefit sustainable production until the appearance of new pathogen races. The search for new RPF genes that can resist new races of P. effusa remains a routine task of spinach breeders to minimize yield loss from downy mildew pathogens. A large number of germplasm comprising both cultivated and wild species are maintained in the National Plant Germplasm System of the United States Department of Agriculture (NPGS-USDA) and the Centre for Genetic Resources, the Netherlands (CGN), Wageningen University and Research (WUR) [3]. These accessions provide valuable genes for the improvement of spinach, including leaf quality, yield, and tolerance to abiotic and biotic stresses [3,8,9]. Strikingly, wild species of spinach have been the main source of disease-resistance genes, including resistance to downy mildew pathogens. Our lab has recently sequenced these cultivated and wild species maintained at the CGN-WUR, which will reveal new molecular tools for selecting resistance to multiple pathogens and other traits. Further, these new genomic resources may indicate the origin of current resistance (R) genes deployed in spinach cultivars that are kept confidential by commercial breeders.
Marker DM1 was established to locate 1.7 cM away from the RPF1 locus in chromosome 3 [10]. Additional loci RPF1, RPF2, and RPF3 were later mapped to a 1.5 Mb of spinach chromosome 3, and diagnostic markers to determine resistant-susceptible alleles were reported [11]. The RPF1 locus was further narrowly mapped to 0.89 Mb of Sp75 chromosome 3 between 0.34-1.23 Mb [12]. Genotyping by sequencing (GBS) based SNP markers mapped the RPF1 locus in a multi-parent progeny population to 0.84 Mb of Sp75 assembly [13]. Similarly, GBS markers mapped another loci resistance to P. effusa race 16 to a 0.57 Mb region of Sp75 assembly [14]. Downy mildew pathogen resistance from field evaluations in the diverse germplasm panel showed R genes between 0.3 to 1.5 Mb of chromosome 3 in Monoe-Virof lay genome assembly [15]. The RPF3 locus in Whale was located at 1.25 Mb and at 2.73-2.74 Mb in Monoe-Virof lay genome assembly, and three regions of Sp75 genome assembly located at 1.19, 1.22-1.23, and 1.75-1.76 Mb [16]. The RPF2 locus was recently reported to be between 1.11 to 1.72 Mb on chromosome 3 of Sp75 genome assembly [17].
Host genetic resistance continues to be the primary method for managing downy mildew disease in spinach, especially in organic production, despite the widespread use of chemical fungicides for managing various crop diseases. Identification of additional unique resistance materials effective against many races of downy mildew pathogens, plus deep molecular and genomic studies of genetic resistance mechanisms, may provide new tools to support precise R gene pyramiding. The downy mildew pathogen-resistant RPF genes have been breaking down with regularly emerging new races of P. effusa. This scenario has urged an in-depth characterization of the spinach-downy mildew pathogen interaction in disease development that may enable optimum use of host genetic resistance. Genome wide association studies (GWAS) are commonly used to identify chromosomal regions associated with trait expression in self-pollinated and cross-pollinated crop species. GWAS have been used to map the disease resistance regulating genomic regions and have identified single nucleotide polymorphism (SNP) markers that are associated with many traits in spinach [13,15,16,18,19]. In this study, we employed the low coverage whole genome resequencing method, also known as skim resequencing, to generate SNP genotype datasets. We selected this method as a continuation of our previous report [16] because of its capability to offer complete genome coverage, which is advantageous for genotyping compared to the GBS method commonly used in spinach genotyping studies [13,14,20], which is limited by incomplete genome coverage. Skim resequencing has been extensively applied in various crop species for discovering genome-wide genetic variants, particularly SNPs, and in trait mapping studies in diverse and bi-parental populations [21][22][23][24].
Spinach differential Lazio constitutes RPF2 and RPF4 loci that resist P. effusa races 1-10 and 15. Whale has an RPF3 locus that provides resistance to races 1, 3, 5, 8-9, 11-12,14, 16, and 19, while Virof lay is entirely susceptible to all known races of downy mildew pathogen [4][5][6]25]. Of the RPF2 and RPF4 loci in Lazio, only the RPF2 locus resist race 5 of P. effusa, while the RPF4 locus is susceptible; thus, the progenies of Lazio segregating for race 5 will uniquely map the RPF2 locus in Lazio. Cross-bred progeny of Lazio and Virof lay screened for resistance to race 5 of P. effusa were used to map the RPF2 locus. In addition, progenies of Virof lay x Lazio plus Virof lay x Whale populations were combined in order to map the overall resistance regulating region. This work identifies markers and genes associated with RPF2 and RPF3 loci and adds genomic resources to facilitate molecular-guided breeding in spinach.

Resistance response in spinach progeny panel
In all inoculation experiments, the parental lines Lazio, NIL2, and NIL3 showed resistant reactions to P. effusa race 5 with no sporulation on cotyledon and true leaves, while the Virof lay and NIL4 were susceptible. The F2 seedling progenies from the cross of Virof lay x Lazio segregated upon inoculation, with 234 scored as resistant and 94 as susceptible. The segregation of Virof lay x Lazio progeny for P. effusa resistance in this experiment fit a 3:1 Mendelian segregation ratio (χ 2 = 2.34, P = 0.13) which is expected for a trait regulated by a single dominant gene. For the Virof lay x Lazio panel, 192 seedlings (comprising 142 resistant and 48 susceptible seedlings and two parents) were sequenced to generate genotype datasets. In addition, 192 progeny panels segregating from Virof lay x Whale for race 5 of P. effusa reported earlier [16] were included to perform a meta-analysis.

Variant discovery
Illumina sequencing of 192 Virof lay x Lazio progeny population generated 173.79 Gb sequence data. There were 1158.57 million raw reads, averaging 6.03 million reads per sample, equaling 1.01x average genome coverage. High-quality nucleotide bases with a Q score of 30 (>Q30) comprising 120.48 Gb data containing 810.06 million sequence reads were aligned against Sp75 reference genome assembly by implementing Illumina Dynamic Read Analysis for GENomics (DRAGEN) pipeline (v3.8.4), and for SNP calling. Raw SNP datasets were filtered for read depth to retain a minimum depth of coverage of 3 (DP 3), minimum genotype quality value of 9 (GQ 9), minor allele frequency (MAF) value of 0.05, and SNP with missing rates >75% using BCFtools [26] that retained filtered dataset comprising 617 998 SNPs with missing values of 68.01%. Beagle imputation retained 602 928 SNPs distributed in six chromosomes of spinach. This SNP dataset was filtered again to eliminate monomorphic SNPs, keep only biallelic variants, and exclude those with a missing rate > 25%, heterozygosity rate > 30%, and MAF value <5%. Further removal of non-polymorphic genotype calls in parents (identical calls in both Virof lay and Whale) retained 15 021 SNPs in six spinach chromosomes that were used for GWAS analysis.
Apart from Virof lay x Lazio progeny panel, SNPs were also discovered for the 384 progeny by merging another progeny panel of Virof lay x Whale reported previously [16], and we retained 34 234 SNPs for downstream GWAS analysis.

Phylogenetic and population structure of progeny panel
Genetic diversity assessment revealed two main subpopulations in the Virof lay x Lazio progeny panel (Supplementary Figure 1). In contrast, the larger panel of the progeny of Virof lay x Lazio and Virof lay x Whale differentiated into four subpopulations, as observed in the PCA plot and NJ dendrogram produced by the GAPIT program (Supplementary Figure 2). Thus, two and four principal components (PC) were computed in both TASSEL and GAPIT programs and added as fixed effect factors during GWAS analysis to control for population structure.

Fine mapping the RPF resistance region
In the first set, association analysis was conducted among the 192 F2 seedling progenies of the Virof lay x Lazio population to map the RPF2 locus. Twenty-three SNPs were identified to associate with the RPF2 locus with a mean LOD value >18 across four GWAS models implemented in TASSEL and GENESIS programs (Table 1, Figure 1A), although more than 100 SNPs were observed with a LOD value >10. These 23 RPF2-associated SNPs were localized in chromosome 3 between 0.47 to 1.46 Mb. The strength of RPF2-associated SNPs was powerful as the peak-associated SNP  Table 1). The peak associated SNP Chr3_1 221 009 showed the highest R 2 in all tested GWAS models.
The second set of GWAS performed in a larger panel of 384 progeny lines of Virof lay x Lazio and Virof lay x Whale evaluated against P. effusa race 5 identified 28 SNPs that were associated with a mean LOD value above 32 in four TASSEL and GENESIS models (Table 1, Figure 1B (Table 1).
Combined GWAS analyses of 384 progenies of Virof lay x Lazio and Virof lay x Whale segregating for RPF2 and RPF3 loci were compared with individual panels to identify unique resistance regions for RPF2 in Lazio and RPF3 in Whale ( Figure 3). Our previous study mapped the RPF3 locus in cv. Whale within 1.19-1.23 and 1.75-1.76 Mb of Sp75 chromosome 3 [16]. The RPF2 locus segregating from Virof lay x Lazio progenies did not show a strong association at the 1.75-1.76 Mb of chromosome 3; however, they were associated between 0.47 through 1.46 Mb (Figure 2A). On the other hand, a combined analysis of lines comprising both RPF2 and RPF3 loci showed an association in the 1.18-1. 23 (Table 1). Overall, these result indicates that both RPF2 and RPF3 loci fall within 1.19-1.23 Mb of chromosome 3 ( Figure 3). However, the RPF2 locus was localized in a more extended region between 0.47-1.06 Mb and a common RPF-associated region of 1.19-1.23 Mb ( Figure 3).

Candidate genes associated with the RPF loci
Searching for resistance (R) genes in the proximity of the 23 RPF2associated SNPs identified in the Virof lay x Lazio progeny revealed three genes within 10 Kb of the associated SNPs, plus Spo12908, located at 14-16 Kb away from the peak SNPs (Table 2, Figure 3). Gene Spo12793 encoding Serine/threonine-protein kinase was located at a six Kb distance from the RPF2-associated SNPs (Chr3_879 118, Chr3_879 241) in Sp75 assembly. Gene Spo12916 that encodes Leucine-rich repeat (LRR) was 8 Kb from the SNP marker Chr3_1 162 051. Gene Spo12821, a CC-NBS-LRR gene annotated to encode disease resistance protein, was within 1-2 Kb of SNPs Chr3_1 221 009, Chr3_1 222 101, and Chr3_1 222 211. The RPF3-associated SNPs between 1.22-1.23 Mb of chromosome 3 were at a distance of 2.41-2.65 Kb of the NBS-LRR gene Spo12821 [16], such that genes Spo12793, Spo12908, and Spo12916 lying between 0.87-1.16 Mb of Sp75 chromosome 3 appear to be more uniquely associated with the RPF2 locus, as presented in Figure 3.

Discussion
In this study, we aimed to map the RPF2 locus in spinach, which is responsible for resistance against race 5 of downy mildew pathogen. Downy mildew is the most damaging disease of cultivated spinach in the US, Europe, and elsewhere. The downy mildew pathogen can cause severe damage to crops, rendering them unsalvageable. In particular, the problem is more severe when the crops are exposed to high humidity and overhead irrigation practices. This facilitates the pathogen's spread across the entire acreage, resulting in massive crop losses. Breeding downy mildew pathogen resistant spinach varieties for commercial production is a continuous activity at public and private breeding programs by combining complementing RPF alleles from the two parents in the hybrid form [3,7]. However, the effectiveness of R genes is limited for a pathogen that evolves rapidly; thus, the resistance effect is not durable. The regular emergence of new races and isolates of the downy mildew pathogen poses a challenge as it often overcomes the R genes deployed in spinach cultivars, making the new cultivars unable to provide long-term resistance against the pathogen. Thus, identifying the new RPF allele that is effective against recently emerged and predominant races of the downy mildew pathogen and developing molecular platforms to select resistant alleles are continuous activities in public and private spinach breeding programs. Continuous development of new markers to select each RPF locus is expected to enhance selection efficiency and expedite spinach breeding.
This study first mapped the RPF2 locus segregating from breeding progenies of Lazio and Virof lay by employing multiple GWAS models to identify resistance locus-associated SNP markers as    assisted selection, especially in combination with other RPF genes. Further, incorporating RPF loci along with the field resistance QTLs [18] may increase the effectiveness of resistance genes. Additional investigations on developing markers f lanking the RPF loci are underway to provide practical molecular tools for selection. The spinach genome assembly shows clusters of R genes within the RPF-associated regions that accommodate six NBS-LRR genes between 0.6 and 1.3 Mb of Sp75 chromosome 3 [27], the same region with three RPF loci (RPF1, RPF2, and RPF3)   Overlay of the RPF2 and RPF3 associated region in spinach chromosome 3 and the disease resistance candidate genes. The physical location is based on the Sp75 assembly [27]. The red-filled circles in the Manhattan plot are the RPF3-associated SNPs based on the LMM model in the GENESIS program from the Virof lay x Whale population [16]. The blue-filled circles in the Manhattan plot are the RPF2-associated SNPs based on the mean LOD values across all four tested GWAS models from the Virof lay x Lazio progeny population in this study. The green-filled circles in the Manhattan plot are RPF2 and RPF3-associated SNPs based on the mean LOD values across all four tested GWAS models in the combined progeny panel (Virof lay x Lazio and Virof lay x Whale) reported in this study. All known disease-resistance R genes in the overlaid region are presented as a black line. Resistant genes near the RPF2 locus are marked with a blue line, while a red line denotes the gene in the vicinity of the RPF3 locus. The R gene Spo12821 appears close to SNPs associated with RPF2 and RPF3 loci. SNPs near the Spo12793, Spo12908, and Spo12916 were only associated with the RPF2 locus.
Leucine-rich repeat units was 8.04 Kb from SNP Chr3_1 162 051. Spo12821, an NBS-LRR encoding gene identified in the RPF2associated region, was also reported as a potential candidate gene for RPF3 locus [16] and RPF2 locus [17], making this gene of interest to have a prominent role in providing resistance in spinach against P. effusa. The NBS-LRR is the most predominant class of R gene in plants that are known to act as a receptor of plant pathogen effector proteins and induces effector triggered immunity (ETI) by activating the downstream defense response to inhibit pathogen infection [28,29]. Tandemly repeating LRR domains often recognize pathogen effector molecules and trigger resistance response [28,30]. Recent reports have indicated structural variation in the length of the LRR regions of the Spo12821 gene between resistant and susceptible cultivars and the possible functional role of such variation in providing resistance response [17]. The disease resistance NBS-LRR genes are primarily found in clusters, as in the case of spinach chromosome 3. Multiple genes can complement in providing effective resistance, as in Arabidopsis against downy mildew pathogen Hyaloperonospora arabidopsidis (formerly P. parasitica) [31] and rice against rice blast pathogen Magnaporthe grisea [32]. The RPF1-RPF6 have been established in spinach, plus molecular markers to select for RPF1-RPF3 loci have been developed. However, the R genes have not been functionally validated. These genes reported in this study, most notably the Spo12821 consistent with both RPF2 and RPF3 loci, should be studied for structural and functional variation to elucidate the downy mildew disease resistance mechanism along with the involvement of other complementing genes in regulating resistance phenomena. New research aiming to characterize the functions of genes with gene editing and over-expression studies may explain the effective genes involved in resistance to downy mildew pathogens.
Low coverage genome sequencing (∼1x) is a cost-effective approach to sequence a sample at a low coverage depth, typically between 1x and 3x, which can significantly reduce sequencing costs while still providing complete genome coverage and the ability to detect a greater amount of genetic variation. However, it is important to note that this approach is known to produce higher levels of missing data compared to higher coverage sequencing methods [24,33]. To overcome this, high missing calls were imputed to determine the missing genotype calls or unobserved data in the samples using the haplotype profile. While genotype imputation can facilitate downstream applications by allele calling of poor quality and increasing marker density, lack of confidence in imputed genotype calls and high error rates remains a challenge. One limitation of low-coverage sequencing and imputation is that the heterozygote genotypes may be called as homozygous references. To address this, we filtered those imputed data by removing imputed calls with a genotype probability (GP) value below 90% (GP <0.90) to retain genotype calls imputed with high accuracy. However, we did not evaluate imputation accuracy and error level in this study. Further research Table 2. Genes and their functions within 20 Kb of SNP markers that were identified to associate with the RPF2 locus in spinach. SNPs highlighted in bold were significantly associated in both association panels analyzed here on assessing the impact of imputation errors at different levels of sequencing depth (1x, 2x, 3x, and more) is necessary to determine the most promising practices for analyzing and utilizing lowcoverage sequencing in heterozygous crops like spinach. However, we suggest employing coverage of 2-3x for effective allele calls. Spinach cultivars containing multiple resistant genes may provide more durable resistance. Continued identification of new resistance sources, mapping new RPF genes, discovering markers, and gaining insights into their regulatory mechanism could present new and improved choices for breeding resistant cultivars. Such efforts are necessary to address the regularly emerging pathogen races that break the resistance loci. Overall, this study reports new sets of SNP markers to distinguish and select RPF2 and the RPF3 loci that can potentially accelerate the breeding process. The novel QTL regions and SNP markers identified here and the previously identified QTLs may have the potential to pyramid resistant genes when developing resistant spinach germplasm and breeding lines [10][11][12][13][14][16][17][18]. In summary, the results of this study emphasize the significance of ongoing efforts in identifying and utilizing novel sources of resistance to develop spinach cultivars with improved resistance against downy mildew. The findings from this study provide new insights into the development of spinach cultivars with enhanced resistance to the downy mildew pathogen, ultimately benefitting both the growers and consumers of spinach.

Conclusion
In conclusion, resistance to downy mildew pathogens is a key target trait in spinach breeding programs, but the frequent breakdown of resistance genes has been a major setback for spinach breeders. In this study, we successfully mapped the RPF2 locus for resistance to downy mildew in the spinach cultivar Lazio within 0.47 to 1.46 Mb of chromosome 3, identifying 23 associated SNP markers with a mean LOD value >18 across four tested GWAS models. The peak associated SNP with the RPF2 locus was located 2.41-3.65 Kb to the CC-NBS-LRR gene Spo12821, and other SNPs associated with the RPF2 locus were found in proximity to genes Spo12793, Spo12908, and Spo12916, which are known to have disease resistance functions in plants. These RPF2associated genes, especially Spo12821, offer promising prospects for future research to explore their role in regulating resistance against downy mildew pathogens and developing new tools and options for breeding resistant cultivars.

Plant materials
The differential cultivars Lazio and Whale were crossed with Virof lay to generate F1 seeds. Lazio and Whale are resistant to race 5 of P. effusa, while Virof lay is susceptible to all races of P. effusa. The F1 male and female plants were allowed to intercross with F2 seeds harvested from female plants and used for inoculation and genetic analyses.
In the beginning, 10-20 F2 plants of Virof lay x Lazio plus parent lines were evaluated for disease reaction against race 5 of P. effusa (isolate UA201715) at the University of Arkansas. After initial screening and understanding of the segregation pattern from the small set of seedlings, the remaining F2 progenies of Virof lay x Lazio (n = 328) were inoculated. In all inoculation trials, parents, six near isogenic line (NIL) differentials (NIL1 through NIL6), and Virof lay were inoculated as controls along with the segregating progeny population. Seeds were sown in plastic trays (25 x 50 cm) filled with a potting mix (Sun Gro Horticulture, Canada). Around 10-15 seeds were sown in a greenhouse plant tray in ten rows. Seedlings were later thinned to 6-8 seedlings per row, and each seedling was labeled. Plants in trays were grown in the greenhouse (25 • C) for two weeks.
In addition, 192 F2 seedlings of Virof lay x Whale segregating for resistance to P. effusa race 5 in our previous study [16] were merged with the Virof lay x Lazio population and reanalyzed here to map the RPF resistance region.

Pathogen inoculation and disease screening
One leaf from every F2 seedling was excised before inoculation and stored for DNA extraction. All two-week-old seedlings in plant trays were spray inoculated using the previously reported wholeplant inoculation method [4,5,25]. Fresh inoculums prepared by washing off conidia from sporulated leaves of Virof lay plants were spray-inoculated on each plant, which was then incubated in a dew chamber (set at 18 • C in the dark) for 24 h, a growth chamber (set at 18 • C with 12 h dark-light cycle) for five days, and again in the dew chamber (set at 18 • C) for 24 h. The disease reactions were observed for sporulation on cotyledons and true leaves and rated using a score of 0 to 4. A disease score of 0 = no sporulation; 1, 2, 3, 4 means up to 25%, 50%, 75%, and 100% leaf area with sporulation, as presented in the previous studies [16]. Each seedling was noted as "resistant" for a score of 0 or "susceptible" for a score of 1, 2, and 3 [16]. Disease responses were confirmed by re-inoculating and incubating for another week and re-scoring disease reaction after a week.

Sequencing and variant calling
Sequencing and SNP variant calling were performed as previously described [16]. Genomic DNA was extracted using an automated KingFisher Flex extraction system (Thermo Fisher Scientific, Waltham, MA, USA), quantified using Qubit Fluorometer, sample integrity was determined based on 1% agarose gel electrophoresis, and sequenced at the genomics facility at Texas A&M. This study used the whole genome resequencing (WGR) method at a low coverage depth to obtain approximately 1 Gb of sequence reads per individual, expecting to provide 1x genome coverage sequences. Sequence reads were mapped to Sp75 reference assembly [27] and genotype calling by implementing Illumina Dynamic Read Analysis for GENomics (DRAGEN) pipeline v 3.8.4. Initially, SNPs were processed to filter low-quality calls by removing variants with a minimum coverage depth of 3 (DP 3), minimum genotype quality value less than 9 (GQ <9), minor allele frequency (MAF <0.05), and genotype calls with missing rate > 75% using BCFtools [26]. These filtered SNPs were then imputed by implementing the Beagle 4.1 [34] tool. Imputed datasets with genotype probability (GP) calls >0.9 were retained.
Next, the SNP dataset from six spinach chromosomes was extracted using BCFtools [26]. The genotype dataset was filtered to remove monomorphic loci, keep only biallelic loci, and remove indels and SNPs around ten bp of indels. The SNP dataset was further filtered to remove SNPs with missing calls >25% calls using BCFtools [26], heterozygosity rate > 30%, and allele frequency < 5%. Finally, SNP showing no polymorphism among twoparent lines, Lazio and Virof lay, were removed.

Population structure and clustering
We swiftly assessed population structure and principal component analysis (PCA) using all SNPs in GAPIT3 [35,36]. The PCA and unweighted neighbor-joining (NJ) tree were drawn in GAPIT for two sub-population for the Virof lay x Lazio progeny panel. Similarly, PCA and NJ trees for multi-parent progeny panels (comprising Virof lay x Laxio and Virof lay x Whale progeny) were drawn for four sub-population in GAPIT3. In addition, PCA analysis was internally performed in TASSEL [37] and GENESIS [38] programs to use as covariates in the GWAS analysis.

Mapping the resistance region
GWAS analysis was initially performed in TASSEL 5.2.85 by implementing single marker regression (SMR), general linear model (GLM), and mixed linear model (MLM) [37]. The GLM and MLM models in TASSEL were run by including two PCA matrices and in-built kinship matrices. GWAS analysis was further performed by implementing logistic mixed model (LMM) that incorporates inbuilt PCAs and kinship matrices in the GENESIS R package [38]. The TASSEL models are more suitable to perform GWAS analysis of quantitative phenotype, while the LMM model in GENESIS is designed to fit the qualitative phenotype scores. Downy mildew disease score of 0 and 1 for a resistant and susceptible reaction was used in GWAS analysis in the GENESIS program, while disease response was converted to 1 for resistant and 9 for susceptible in TASSEL.
A meta-GWAS analysis was planned for a multi-parent population segregating for P. effusa race 5 by merging the Virof lay x Lazio population (described here) with the previous report of Virof lay x Whale population [16]. This meta-analysis aimed to simultaneously identify RPF2 and RPF3 associated regions following combined analysis and potentially identify unique R gene regions compared with the individual GWAS panels of Virof lay x Lazio (reported here) and the previous study comprising Virof lay x Whale progeny population [16]. For 384 progeny lines segregating from two populations, Virof lay x Lazio and Virof lay x Whale, SNPs were discovered following the above-outlined steps and were used for the GWAS analysis described above, including the first four principal components.
Manhattan plots and QQ plots were drawn in the R program using the CMplot package. A GWAS significance threshold oflog 10 (P-value) or the LOD score > 18 was employed to control false positives in Virof lay x Lazio progeny population. For the combined population of Virof lay x Lazio and Virolf lay x Whale, a higher LOD score > 32 was set to consider the significance of the association.

Candidate gene identification
For all RPF-associated SNPs identified across tested GWAS models, genes were searched around 20 Kb of associated SNPs in the Sp75 reference sequences [27]. Genes annotated to provide resistance in crops within 20 Kb of RPF-associated SNPs were considered candidate genes with likely involvement in providing resistance to the downy mildew pathogen. Predicted functions of such potential candidate genes in proximity (20 Kb) of RPF-associated SNPs were described.