Chromosome-level phased genome assembly of “Antonovka” identified candidate apple scab-resistance genes highly homologous to HcrVf2 and HcrVf1 on linkage group 1

Abstract Apple scab, a fungal disease caused by Venturia inaequalis, leads to losses in both yield and fruit quality of apples (Malus domestica Borkh.). Most commercial apple cultivars, including those containing the well-characterized Rvi6-scab-resistance locus on linkage group (LG) 1, are susceptible to scab. HcrVf2 and HcrVf1 are considered the main paralogs of the Rvi6 locus. The major apple scab-resistance loci Vhc1 in “Honeycrisp” and Rvi17 in “Antonovka,” were identified in close proximity to HcrVf2. In this study, we used long-read sequencing and in silico gene sequence characterization to identify candidate resistance genes homologous to HcrVf2 and HcrVf1 in Honeycrisp and Antonovka. Previously published chromosome-scale phased assembly of Honeycrisp and a newly assembled phased genome of Antonovka 172670-B were used to identify HcrVf2 and HcrVf1 homologs spanning Vhc1 and Rvi17 loci. In combination with 8 available Malus assemblies, 43 and 46 DNA sequences highly homologous to HcrVf2 and HcrVf1, respectively, were identified on LG 1 and 6, with identity and coverage ranging between 87–95 and 81–95%, respectively. Among these homologs, 2 candidate genes in Antonovka and Honeycrisp haplome A are located in close physical proximity to the scab-resistance marker Ch-Vf1 on LG 1. They showed the highest identity and coverage (95%) of HcrVf2 and only minor changes in the protein motifs. They were identical by state between each other, but not with HcrVf2. This study offers novel genomic resources and insights into the Vhc1 and Rvi17 loci on LG 1 and identifies candidate genes for further resistance characterization.


Introduction
Apples (Malus domestica Borkh.) are the most-produced fruit crop grown in temperate climates globally.Over the last 3 decades, the annual global production of apples has more than doubled, from 41 to 93 million tons (FAOSTAT 2021).Extensive pesticide application, improved horticultural practices, and breeding are among the major drivers of increased apple production (Khan and Korban 2022).Although pesticides are highly efficient in ensuring the stable production of disease-free apples, their overuse over the last few decades has had adverse effects on biodiversity and human health (Zaller and Brühl 2019;Hernández et al. 2020).In the last few decades, apple breeding programs have achieved major improvements in consumer-preferred fruit quality, horticultural traits, and storability; however, advances in disease resistance have been limited (Khan and Korban 2022).
Apple scab caused by Venturia inaequalis is the most economically significant fungal disease of apples and is highly challenging to control.The life cycle of V. inaequalis is complex, as it is divided into sexual ascospore production in spring followed by asexual conidia production in spring and summer, both resulting in obvious black, gray, or brown infection lesions on susceptible leaves and fruits (MacHardy 1996).During the growing season, 15-20 fungicide spray applications and/or the implementation of scab-resistant cultivars with a single qualitative resistance gene, such as Rvi6 from Malus floribunda 821, are required for successful scab control (MacHardy 1996;Turechek 2004;Bus et al. 2011).Frequent fungicide application and deployment of monogenic resistance to control scab have resulted in the selection of pesticide-tolerant and Rvi-overcoming fungal strains in all major apple production areas (Parisi et al. 1993;Shirane et al. 1996;Köller and Wilcox 2000;Köller et al. 2002Köller et al. , 1997;;Turechek and Köller 2004;Bus et al. 2011;Beckerman et al. 2013;Papp et al. 2019;Patocchi et al. 2020).To ensure sustainable scab control, identification and characterization of additional scab-resistance genes have been proposed as a need to develop cultivars with more durable scab resistance (Bus et al. 2011;Khajuria et al. 2018).
To date, 19 major-effect apple scab-resistance loci have been identified and termed as Rvi genes (Bus et al. 2011;Patocchi et al. 2020).These loci provide resistance against the majority of the known V. inaequalis strains.A few of these genes, including Rvi5,14,, still result in scab resistance in the host in the majority of apple-producing regions (Patocchi et al. 2020).Furthermore, candidate resistance genes of Rvi1, Rvi6, Rvi12, and Rvi15 have been identified, among which Rvi6 and Rvi15 have been functionally characterized (Belfanti et al. 2004;Malnoy et al. 2008;Joshi et al. 2011;Schouten et al. 2014;Cova et al. 2015;Padmarasu et al. 2018).These studies have shown that the genetic structure of the characterized loci is complex and harbors multiple gene paralogs and/or genes without known resistance functions (Vinatzer et al. 2001;Galli et al. 2010;Bus et al. 2011;Bastiaanse et al. 2016).
In contrast to Rvi6, the underlying alleles and resistance mechanisms of Rvi17 and Vhc1 remain unresolved.Rvi17 is a chlorosis-conditioning locus that is considered instrumental in the complex genetic resistance of "Antonovka Obyknovennaja," known as the "Common Antonovka" or "Schmidt's Antonovka," and the related Antonovka accessions (Bus et al. 2012;Pikunova et al. 2014).It has been suggested to confer resistance against race 6 (Parisi and Lespinasse 1996).The functional allele of Rvi17 has been associated with the 139-bp amplicon size of CH-Vf1 (Bus et al. 2012;Pikunova et al. 2014), although this marker occasionally results in inconsistent diagnosis of the Rvi17-scab resistance in Antonovka accessions.For example, an apple genotype TN10-8 and a cultivar "Freedom" did not show resistance against V. inaequalis race 6 but were able to amplify the 139-bp amplicon of CH-Vf1 (Calenge et al. 2004;Bus et al. 2012).Similar to Rvi6, the defense responses of the Vhc1 apple scab-resistance locus of Honeycrisp range from no symptoms to sporulation to a broad range in the US V. inaequalis strains.The amplicon size of 139 bp of the CH-Vf1 marker (same as in Antonovka) is considered to be diagnostic for the functional allele of the Vhc1 apple scabresistance locus (Clark et al. 2014).
Genome sequence data can complement genetic mapping for the identification of candidate resistance genes underlying resistance loci.Currently, long-read single-molecule sequencing techniques, such as PacBio long-read sequencing platforms, offer high-quality and cost-effective de novo genome assembly possibilities (Khan and Korban 2022).The long length and high accuracy of sequencing reads have enabled the improved phasing of highly heterozygous genomes (Khan et al. 2022).This approach has resulted in the generation of highly contiguous phased genomes of various rosaceous species, including apples (Aranzana et al. 2019;Sun et al. 2020;Khan et al. 2022;Khan and Korban 2022).Phased genome assemblies have enabled the identification and fine-mapping of candidate alleles for disease resistance and/ or several other traits in apples and grapevines (Sun et al. 2020;Frommer et al. 2023).Particularly, the recently published fully phased Honeycrisp assembly (Khan et al. 2022) could aid in the identification of apple scab-resistance alleles on LG 1 and the development of molecular markers compared with the existing candidate LG 1 resistance gene identification approaches conducted across different apple accessions (Boudichevskaia et al. 2009;Broggini et al. 2009).Amplification of HcrVf-associated markers in bacterial artificial chromosome genomic libraries derived from "Florina" (Broggini et al. 2009) resulted in the identification of 22 homologous sequences on LG 1 and 6, showing 73-94% nucleotide identity with HcrVf2 (Broggini et al. 2009).In addition, 685-748 bp PCR products with 85-100% amino acid sequence identity with HcrVf1 and HcrVf2 were identified in cultivars harboring Rvi6, other Rvi genes and no Rvi genes (Boudichevskaia et al. 2009).So far, the role of these homologs in scab resistance has not yet been demonstrated.
Here, we developed a phased chromosome-level genome assembly of Antonovka 172670-B (referred to as Antonovka hereafter), and identified homologs of HcrVf2 and HcrVf1 apple scab-resistant genes on the distal end of LG 1.We compared HcrVf homologs identified in Honeycrisp (Khan et al. 2022) and Antonovka genome assemblies with other available apple genomes to identify resistance gene candidates of the Vhc1 locus in Honeycrisp and Rvi17 locus in Antonovka.

Antonovka PacBio HiFi sequencing
Dormant branches of Antonovka 172670-B tree (PI 589956), an accession related to Antonovka Obyknovennaja, were collected from a research orchard at Cornell AgriTech, Geneva, NY, USA.These branches were placed in water until the leaves started to emerge and were subsequently kept in the dark for 2 days.After 2 days, the leaves were sampled and immediately frozen at −80°C.Frozen leaves were shipped on dry ice to the Genotyping Center of the University of Delaware, USA for high-molecular-weight genomic DNA extraction and Pacific BioSciences (PacBio) Single Molecule Real Time (SMRT) sequencing as previously described (Khan et al. 2022).For PacBio sequencing, the DNA was fragmented into 15-kb fragments, followed by construction of the HiFi library using the SMRTbell Express Template Prep Kit 2.0, and the DNA/ Polymerase Binding Kit 2.0 (Pacific Biosciences), according to the manufacturer's protocol.The library was filtered for fragments >10 kb using Sage Blue Pippin (Sage Sciences) to remove smaller fragments and adapter dimers.PacBio Sequel IIe was used in the CCS/HiFi mode with a single SMRT cell with 2 h of pre-extension and 30-h movie times to sequence the library.Finally, the read length distribution and quality of the obtained reads were assessed using Pauvre v0.1923 (Schultz et al. 2018).

Repeat annotation and gene prediction
Apple repeat sequence libraries previously constructed from the genomes of "Gala," Malus sieversii, and Malus sylvestris (Sun et al. 2020) were used to mask the Antonovka genome assembly.Redundant repeat sequences in these libraries were removed using the "cleanup_nested.pl"script in the EDTA package (v 2.1.0)with default parameters (Ou et al. 2019).The resulting nonredundant repeat library was used to mask each of the 2 Antonovka haplomes using RepeatMasker (v4.0.8; http://www.repeatmasker.org/).Protein-coding genes were predicted from the repeat-masked Antonovka genome assembly with MAKER (Cantarel et al. 2008), which integrates evidence from ab initio gene prediction, transcript, and protein evidence.AUGUSTUS (Stanke et al. 2006) and SNAP (Korf 2004) were used for ab initio gene predictions.Genome-guided transcript assembly of RNA-seq data reported by Sun et al. (2020) and CDS sequences from published apple genomes (Daccord et al. 2017;Zhang et al. 2019;Sun et al. 2020) were used as transcript evidence.Proteome sequences of published apple, peach, strawberry, and Arabidopsis genomes, as well as those from the UniProt database (Swiss-Prot plant division), were used as the protein homology evidence.To functionally annotate the predicted genes, their protein sequences were searched against the SwissProt and TrEMBL databases (https://www.uniprot.org/)using BLASTP with an e-value cutoff of 1E-5, and the InterPro database (https://www.ebi.ac.uk/ interpro/) using InterProScan (Paysan-Lafosse et al. 2023).

Genome sequence retrieval and identification of HcrVf homologs
Publicly available apple genome assemblies were downloaded from the Genome Database for Rosaceae (Jung et al. 2019) to identify HcrVf1 and HcrVf2 homologs in apple cultivars and wild Malus species.To estimate similarity levels among different HcrVf homologs from the Rvi6 locus of M. floribunda 821, HcrVf2 (GenBank identifier AJ297740) was compared against HcrVf1, -3, and -4 (AJ297739, AJ297741, and EU794466, respectively; Supplementary Table 1).The assemblies of M. domestica cultivars Antonovka, Honeycrisp, HFTH1 (anther-derived homozygous line of "Hanfu" apple), "Golden Delicious" (GDDH13), "Gala," and wild Malus species Malus baccata, Malus prunifolia Fupingqiuzi, M. sieversii, and M. sylvestris (Chen et al. 2019;Zhang et al. 2019;Sun et al. 2020;Khan et al. 2022;Li et al. 2022) were used to generate a BLAST database using the nucl function of the makeblastdb algorithm (Hahsler and Nagar 2019).Highly homologous sequences were identified by querying the coding sequences of HcrVf1 and HcrVf2 (AJ297739 and AJ297740, respectively) against the generated genomic database using the parameters of a maximal e-value of 1e−6, identity above 70%, and query coverage over 80%.The resulting blast output was transformed into FASTA files for comparative analyses.

Sequence alignment and phylogenetic analysis
Homologous sequences were aligned using ClustalW of the msa v1.30.1 multiple sequence aligner (Bodenhofer et al. 2015) in the R statistical environment to determine the sequence similarity among the identified homologs of HcrVf1 and HcrVf2.The effects of differences and similarities among the aligned sequences were assessed and visualized using principal component analysis (PCA) using the glPca v2.1.8function (Jombart and Ahmed 2011), which assigns loading scores to the variance of individual bases at different alignment positions.The degree of differences at individual positions was displayed by visualizing matches, mismatches and gaps, and estimating entropy among the homologs using the msavisr function of the seqvisr package v0.2.6 (Raghavan 2021) and seqdef of the TraMineR package v2.2-5 in combination with ggseqeplot v0.8.1, respectively (Gabadinho et al. 2011).Furthermore, neighbor-joining phylogenetic analysis was performed using MEGA11 (Tamura et al. 2021).First, the bestfitting statistical model was identified using the default settings of the Model Selection tool.The selected model was subsequently applied to construct a tree with 100 bootstrap replicates.The constructed phylogenetic trees were displayed using the Interactive Tree Of Life online tool (Letunic and Bork 2021).

Homolog mapping and haplotype analysis
Genome sequence of LGs harboring homologs with the highest similarity to HcrVf2 was aligned to visualize syntenic regions and single nucleotide polymorphism (SNP) and gene density in the region of interest.Haplotypes of Antonovka (A), Honeycrisp (A and B), Gala (A), and M. sieversii (A) were aligned using mini-map2 v2.24 (Li 2021).Pairwise bed alignment files showing structural variation and SNP positions were generated by SyRI v1.6.3 (Goel et al. 2019), and visualized by plotsr v1.1.1 (Goel and Schneeberger 2022).Antonovka gene positions were retrieved from the sequence annotation files.
The physical positions of the homologs were retrieved from BLAST results and visualized using the circlize v0.4.15 package (Gu et al. 2014) to determine the physical genomic positions of the identified homologs of HcrVf2 and HcrVf1 relative to the physical genomic positions of the microsatellite markers MdExp7, Ch-Vf1, and NzMS on LG 1 of apples.The physical genomic positions of these 3 markers were obtained by blasting their primer sequences (Clark 2014) against the respective apple genome assemblies.
To further verify the identity of the homologs and their identity by state or descent to the HcrVf genes, haplotype analysis was carried out using the publicly available 20K apple SNP array data (Bianco et al. 2014;Howard et al. 2021).Briefly, the accessions that have been genotyped were used to determine the numbers and lengths of shared haplotypes for evaluating their pedigree relationships.The 4 relationship groups consisted of accessions with available SNP data and a known relationship with a LG 1 apple scab-resistance genotypes (Supplementary Table 2 and File 1): "Antonovka" genome and apple scab-resistance genes | 3 (1) accessions harboring the Rvi6 gene from M. floribunda 821 ("Liberty," "Remo," "Goldrush," "Nova Easygro," and "Topaz"), ( 2) Antonovka (Antonovka OB) harboring the Rvi17 gene, (3) Honeycrisp harboring the Vhc1 gene, and (4) scab-susceptible accessions ("McIntosh," "Fuji," "Idared," Golden Delicious, and Gala; Howard et al. 2021).The SNP data set was used to parse the output from SPLoSH analysis (Howard et al. 2021) using the threshold of 4 cM and LG 1 data on the individuals that were visualized.

Protein domain analysis
Six selected amino acid sequences with the highest homology to HcrVf2 and the corresponding HcrVf1 homologs were searched against the InterPro database using InterProScan (Mulder and Apweiler 2007) with default parameters for protein domain identification.The identified domains of each homolog were visualized and aligned based on their position from the start of the sequence using the online InterProScan module (Blum et al. 2021).

Haplotype-resolved chromosome-scale assembly of Antonovka
In total, 932,852 PacBio HiFi reads with an average length of 16,150 bp were generated, and ∼90% of reads had lengths >11,000 bp, which resulted in a total of 15.1 Gb of reads, corresponding to ∼26× coverage of the Antonovka genome.The estimated genome size and heterozygosity level, based on 21-mers, were 530,399,965 bp and 1.38%, respectively.Two phased haplomes, haplome A (HAP1) and haplome B (HAP2), of Antonovka were de novo assembled into contigs, followed by chromosome assembly using the GDDH13 genome v1.1 as the reference (Daccord et al. 2017).Both haplomes were highly contiguous and similar in size (Table 1).HAP1 was 651 Mb in length, and contained 222 contigs with a contig N50 of 35.4 Mb, whereas HAP2 was 636 Mb in length, and contained 114 contigs with a contig N50 of 36.4Mb (Table 1).The final HAP1 and HAP2 contained 17 chromosomes, with 97.3 and 97.8% of the assembled contig sequences in the pseudomolecules, respectively.High genome completeness for both haplomes was suggested based on Merqury k-mer and BUSCO analyses (Supplementary Fig. 1).HAP1 and HAP2 showed k-mer completeness of 78.2 and 77.7% with QVs of 59.2 and 59.3, respectively, yielding a total completeness of 96.5% (Table 1).The BUSCO completeness of HAP1 was 97.7% and that of HAP2 was 97.5% with high mutual structural similarity (Fig. 1).A total of 401.2 Mb (61.7%) and 390.8 Mb (61.4%) repetitive sequences were identified in HAP1 and HAP2, respectively, of the Antonovka genome (Supplementary Table 3).Furthermore, a total of 45,200 and 44,969 protein-coding genes were predicted for HAP1 and HAP2, respectively, with BUSCO completeness rates of 97.4 and 97.1%, respectively.

Homologs of HcrVf2 and HcrVf1 apple scab-resistance genes at Vhc1 and Rvi17 loci of Honeycrisp and Antonovka
This Antonovka assembly and previously published Malus genome assemblies enabled us to identify HcrVf2 and HcrVf1 candidate homologs in scab-resistant cultivars Antonovka, Honeycrisp, and other Malus genotypes without known scab resistance.Analysis of the DNA sequence of the HcrVf2 homolog from the Rvi6 locus of M. floribunda 821 (GenBank accession AJ297740) when compared with the HcrVf1, -3, and -4 homologs (GenBank accessions AJ297739, AJ297741, and EU794466, respectively; Supplementary Table 1), revealed similar levels of identity, coverage, and e-values to HcrVf2 for all 3 homologs (92-95%, 54-55%, and <1e−6, respectively).A total of 43 and 45 sequences homologous to the 2,943-and 3,048-bp coding regions of HcrVf2 and HcrVf1, respectively, were identified in Antonovka, Honeycrisp, and other available apple genome assemblies (Table 2 and  Supplementary Table 4).Identity and coverage for HcrVf2 ranged between 90 and 95% (Table 2), and between 87-92% (identity) and 81-94% (coverage) for HcrVf1 (Supplementary Table 4), respectively.The majority of these homologs, i.e. 24 and 38 sequences, respectively, were located on LG 1, while the remaining homologs were located on LG 6, 11, or 17, or were assigned to a scaffold.Homologous sequences with the highest identity and coverage of HcrVf2, the causative resistance gene of the Rvi6 locus, were found on LG 1 of haplome A of the scab-resistant cultivars Honeycrisp and Antonovka.Both showed 2,801 (95%) identities and 6 gaps when aligned with HcrVf2.The overall blast score of 4,669 of these 2 homologs was followed by the scores of sequences identified in Honeycrisp haplome B and scab-susceptible cultivar Gala and M. sieversii, on LG 1B, 1A, and 11A, respectively.In contrast to the sequences with the highest identity and coverage to HcrVf2, HcrVf1-homologous sequences with the highest scores were identified in wild Malus species, i.e.M. baccata, M. sieversii,  and M. prunifolia, followed by the sequences from Honeycrisp and Antonovka (Supplementary Table 4).The identity and coverage of the former 3 sequences did not exceed 92 and 94%, respectively, and contained more gaps, i.e. ranging from 60 to 87 gaps.
In the latter 2 cultivars, they were even lower at 91 and 94%, respectively.
Neighbor-joining phylogenetic analysis of the aligned homologous sequences confirmed the BLAST results and grouped the sequences into 4 major clades (Fig. 2).Sequences found in the second clade, i.e. those on LG 1, showed the highest relatedness to HcrVf2 of M. floribunda 821.In particular, the haplome A homologs of scab-resistant genotypes Antonovka and Honeycrisp showed the highest relatedness to HcrVf2.They showed a moderate node support with a bootstrap value of 0.61 compared with HcrVf2 and a maximum node support with a bootstrap value of 1.This clade showed a lower node support with the most closely related clade, i.e. a bootstrap value <0.50, which comprised sequences on LG 1 of Honeycrisp haplome B, scab-susceptible cultivar Gala haplome A and M. sieversii haplome A. Other HcrVf2 homologs of the second clade were more distant from HcrVf2 and exceeded the node-support bootstrap value of 0.57, or even belonged to a different clade.Similarly, the PCA (Supplementary Fig. 2) showed that sequences from LG 1 formed a tight cluster around HcrVf2 with positive scores along components and 2, which explained 21.6 and 13.1% of the total sequence variability, respectively.Indeed, the cluster contained abovementioned sequences from Antonovka, Honeycrisp, Gala, and M. sieversii.The underlying sequence variation is illustrated by the increased sequence entropy along the entire sequence, and with the highest increase at the position between 2,000 and 2,500 bp (Supplementary Fig. 3).
In contrast to HcrVf2, neighbor-joining phylogenetic analysis of HcrVf1 homologs from scab-resistant and susceptible/wild Malus accessions indicated that the sequences found in wild Malus accessions were the most related to HcrVf1 (Supplementary Fig. 4).The scores show the comparison of a homolog with the HcrVf2 coding sequence, and the position indicates its location in the corresponding genome.bp, base pairs; NCBI, The National Center for Biotechnology Information.
"Antonovka" genome and apple scab-resistance genes | 5 The sequences were grouped into 4 clades.Clade IV contained HcrVf1 and its most related homologs.Among these, homologs on LG 1 of M. baccata and M. sieversii showed the highest relatedness to HcrVf1, with strong node support (bootstrap value of 0.89), followed by the homologs from LG 1 of M. prunifolia and M. baccata scaffold 241, which showed low node support (bootstrap value <0.50).
Other homologs found in clade IV, including those from haplotome A of LG 1 of scab-resistant Honeycrisp and Antonovka, showed lower relatedness with strong node-support bootstrap values of 1.The other clades were even more distant from HcrVf1, with the maximum node-support bootstrap values of 1. PCA based on sequence similarity (Supplementary Fig. 5) showed that the sequences from LG 1 formed a tight cluster around the HcrVf1.Sequences from this cluster showed negative and positive scores along principal components 1 and 2, explaining 15.9 and 13.8% of the total sequence variability, respectively.The underlying sequence variation is depicted by increased sequence entropy along the entire sequence, with the peak entropy at a position of approximately 2,500 bp (Supplementary Fig. 6).
Finally, phylogenetic analysis of only one region within the HcrVf2 sequences, i.e. the 700-800-bp long PCR products of scabresistant or scab-susceptible genotypes (Boudichevskaia et al. 2009), was unable to confirm observations based on the entire gene (Supplementary Fig. 7).PCR products of "Releta," "Regia" B, Regia A (non-Rvi6 accessions), and "Prima" (Rvi6 accession) showed the highest similarity to Hcrvf2, while homologs from scab-resistant Honeycrisp and Antonovka LG 1 of haplome A showed a low relatedness to the analyzed region of HcrVf2.

Location of HcrVf homologs and haplotype identity of LG 1 in scab-resistant and susceptible genotypes
Synteny analysis of the haplomes harboring HcrVf2 and HcrVf1 homologs with the highest similarity to HcrVf2 showed synteny between the scab-resistance regions of haplome A from Antonovka and haplomes A and B from Honeycrisp (Fig. 3a).Synteny was observed on both flanking sides of the Ch-Vf1 scabresistance marker between haplomes A of Antonovka and Honeycrisp.The SNP density was higher upstream from the marker compared with the downstream region.In fact, synteny was observed between all haplotypes from Antonovka and Honeycrisp (Supplementary Fig. 8).In contrast, no synteny was observed in the Ch-Vf1 flanking regions between these haplomes and the haplomes A from scab-scuseptible Gala and M. sieversii.The latter 2 haplotypes showed synteny in the Ch-Vf1 flanking regions.
Mapping of the HcrVf2 and HcrVf1 homologs on LG 1 revealed that the positions of the most related homologs from scabresistant Antonovka and Honeycrisp haplome A shared the same physical location (Fig. 3b and Supplementary Fig. 9).The location of the homologs in these 2 cultivars was 23-25 kb away from the 137-bp Ch-Vf1 marker, i.e. the marker used to determine the scab resistance of Rvi6-harboring genotypes.In contrast, other homologs in these 2 cultivars were more distant from Ch-Vf1, with distances of over 400 kb.Distances of the most related homologs from scab-susceptible Gala and the haplome B of Honeycrisp were even more distant, i.e. >1.68 Mb from the 139-bp Ch-Vf1 "Antonovka" genome and apple scab-resistance genes | 7 marker.A similarly large distance was observed between the marker and the most closely related homolog found in M. sieversii haplome A.
Furthermore, Honeycrisp and Antonovka (Common Antonovka) shared resistance haplotypes near the Ch-Vf1 marker, while their haplotypes at this locus differed from those of cultivars harboring resistance from M. floribunda 821 (Fig. 3c and Supplementary Fig. 10).Shared haplotype analysis of this region based on the SNPs of genotypes harboring Rvi6, Vhc1, Rvi17, or no resistance loci, indicated that HcrVf2-containing cultivars, i.e.Liberty, Remo, Goldrush, and Nova Easygro, showed a large shared genomic region with Topaz in proximity of Ch-Vf1.In contrast, no similarities were observed between SNPs in the Ch-Vf1 region of Antonovka and Honeycrisp compared with Topaz (Fig. 3c).However, haplomes of Honeycrisp and Antonovka, which are thought to harbor resistance alleles, share the distal part of LG 1 near Ch-Vf1 (Supplementary Fig. 10).Finally, scab-susceptible McIntosh, Fuji, Golden Delicious, and Gala did not show SNP identity with Topaz in the close proximity of Ch-Vf1 (Fig. 3c), whereas Idared showed large parts of the LG 1 as shared with Topaz.Ch-Vf1 regions of Golden Delicious, Idared, and Gala harbored SNPs that are mostly shared with Honeycrisp (Supplementary Fig. 10).

Amino acid sequence and protein domains of HcrVf homologs in scab-resistant genotypes
Amino acid sequence alignment and neighbor-joining phylogenetic analysis of HcrVf2 confirmed the observations at the DNA level (Supplementary Figs.11 and 12).Homologs from LG 1 of haplome A of scab-resistant cultivars Honeycrisp and Antonovka to HcrVf2 were the most closely related homologs, followed by those from LG 1 of Honeycrisp haplome B, the scab-susceptible cultivar Gala haplome A, and M. sieversii haplome A (Supplementary Fig. 11).Furthermore, different groupings of the sequences originating from LG 1 and 6 were observed at the amino acid level.In contrast to HcrVf2 homologs, amino acid analysis of HcrVf1 homologs revealed a more distinct phylogeny than that at the DNA level.In particular, neighbor-joining analysis indicated that sequences on LG 1 of M. prunifolia and scaffold 241 of M. baccata were the most similar to HcrVf1 (Supplementary Fig. 12).However, most of the homologs showed a greater distance from HcrVf1, with highly variable node-support bootstrap values.
Protein domain analysis using InterPro of the homologs with the highest similarity to HcrVf2 in scab-resistant cultivars Honeycrisp and Antonovka, scab-susceptible cultivar Gala, and wild accession M. sieversii showed that the sequences from haplome A of Honeycrisp and Antonovka contained the same number of LRRs as HcrVf2 (Fig. 4).In addition, the positions of the majority of these LRRs overlapped with the HcrVf2 domains and contained the signal peptide, the noncytoplasmic domain containing LRRs, and the transmembrane domain.In Antonovka, the eighth LRR domain was found upstream the coding sequence compared with HcrVf2.In addition, the last cytoplasmic domain was absent in all the homologs.Haplotypes of Gala and haplome B of Honeycrisp contained a stop codon, which could result in a prematurely terminated protein.Finally, the wild accession M. sieversii showed the absence of an LRR domain and an extended LRR domain compared with HcrVf2.The signal peptide region, as well as several LRRs, showed specific amino acid differences between HcrVf2 and its homologs from Honeycrisp and Antonovka (e.g.positions 9, 17, 26, etc.; Supplementary Fig. 13).Occasionally, other 3 sequences from Honeycrisp haplome B, Gala, and wild accession M. sieversii showed additional changes in amino acids compared with HcrVf2 (e.g.positions 93, 94, and 96-100).
All HcrVf1 homologs showed variable number and position of LRR domains, with an absence of the cytoplasmic domain (Fig. 4).The most similar homologs from scab-resistant cultivars, i.e.Honeycrisp and Antonovka, showed an additional third LRR domain, whereas the homologs of Gala and haplome B of Honeycrisp showed an absence of the fifth domain.The homolog from M. sieversii contained a stop codon after the third LRR domain.A high number of insertions and amino acid polymorphisms were found among all the HcrVf1 homologs (Supplementary Fig. 14).

Discussion
We assembled a high-quality phased chromosome-scale genome of the Antonovka 172670-B apple with a BUSCO completeness of over 97.5%.The quality of the Antonovka genome assembly is comparable with that of Honeycrisp, GDDH13, and Gala genomes with BUSCO completeness rates of 98.7, 97.4, and 97.7%, respectively (Daccord et al. 2017;Sun et al. 2020;Khan et al. 2022).The high quality and phased nature of this genome make it relevant for the genetic identification and characterization of allelic variations in highly heterozygous plants (Khan and Korban 2022).Phased genome assemblies with similarly high-quality haplomes have enabled identification of alleles linked with a variety of traits in domesticated apple cultivars and their wild progenitor species, as well as with other commercially important crops (Sun et al. 2020;Frommer et al. 2022;Hoopes et al. 2022).For example, using phased assemblies of Gala, M. sieversii, and M. sylvestris apples, allele-specific expression of MYB1 and consequently the yellowish-red fruit skin color were found in Gala and M. sieversii.In addition, the "A" allele at base 1,455 of the Ma1 coding gene sequence was found to be associated with the low fruit acidity of the former 2 genotypes.Indeed, the new Antonovka genome assembly in combination with the Honeycrisp assembly and other available Malus assemblies (Chen et al. 2019;Zhang et al. 2019;Sun et al. 2020;Khan et al. 2022;Li et al. 2022) enabled us to identify homologs of HcrVf2 and HcrVf1 on LG 1 and 6 across 9 analyzed genomes.These regions harbor dozens of resistance gene analogs, as demonstrated for LG 1 and 6 (Perazzolli et al. 2014;Singh et al. 2021).The identification of resistance gene homologues within a resistance locus can complement traditional association mapping, which often fails to pinpoint a set of high-confidence resistance gene candidates that can feasibly be functionally characterized (Dunemann and Egerer 2010;Clark 2014).The identified associated resistance loci on LG 1 of Honeycrisp and Antonovka were large spanning several Mb/cM, that is, 19 and 15 cM for Vhc1 and Rvi17, respectively (Dunemann and Egerer 2010;Clark 2014).As a result, only 2 of the 19 major scabresistance genes, i.e.Rvi6 and Rvi15, have been functionally validated to date (Bus et al. 2011).Also, the new genome may enable further identification of additional alleles associated with other traits of Antonovka accessions including its unsurpassed strong and pleasant fruit aroma, cold hardiness, and disease resistance (Wilner 1960;Bus et al. 2012).
Among the identified homologs, the 2 sequences on LG 1 of Honeycrisp and Antonovka haplome A showed the highest relatedness to HcrVf2, the functional allele of the Rvi6-resistant locus.Sequence alignment analysis showed that these 2 sequences were not identical to HcrVf2 but differed in several single nucleotides and in a deletion.This observation supports previous comparisons of alleles in these cultivars, showing that these genotypes contain different Ch-Vf1 marker alleles on the distal end of LG 1 (Bus et al. 2012;Clark 2014).Honeycrisp and Antonovka contain the marker allele that is 137 bp long (reported as a 139-bp allele), as opposed to the 159-bp allele associated with HcrVf2 from M. floribunda 821.Furthermore, Honeycrisp and several Antonovka accessions showed resistance to different V. inaequalis isolates compared with the cultivars with HcrVf2 (Bus et al. 2012).Unlike HcrVf2, which has been overcome in many appleproducing regions globally by V. inaequalis race 6 (Papp et al. 2019;Patocchi et al. 2020), Rvi17 has been suggested to confer resistance to race 6 (Parisi and Lespinasse 1996).Defense responses of the Vhc1 locus of Honeycrisp to a broad range of the US V. inaequalis strains are less well-defined and range from no symptoms to sporulation (Clark 2014).
Synteny, physical mapping, and haplotype sharing analyses of LG 1 demonstrated that the homologs in Honeycrisp and Antonovka with high similarity to HcrVf2 and HcrVf1 share physical location but are not identical by state (IBS) or descent (IBD) to HcrVf genes.In Honeycrisp and Antonovka, the flanking region of the Ch-Vf1 marker is highly syntenic among each other and the identified homologs are in tight physical proximity (∼20 kb) to the Ch-Vf1 marker.The locations of the identified homologs in these 2 cultivars may overlap with the locations of HcrVf2 and HcrVf1, which have been identified to be <140 kb from the marker (Patocchi et al. 1993;Vinatzer et al. 2001Vinatzer et al. , 2004)).Other homologs were >400 kb apart from Ch-Vf1.Furthermore, no SNPs identical to those in Rvi6-harboring cultivars were identified across the region harboring the Ch-Vf1 marker in Honeycrisp and Antonovka, most likely rendering these regions different by state and descent.This is in line with previous observations that Antonovka and Honeycrisp contain different Ch-Vf1 alleles than M. floribunda and its related cultivars (Bus et al. 2012;Clark 2014).In contrast, haplomes A of Honeycrisp and Antonovka, as well as other Honeycrisp-and Antonovka-related cultivars, are syntenic and share extended haplotypes in the proximity of Ch-Vf1, suggesting that their homologs tightly linked to Ch-Vf1 could potentially be IBS.However, as no common ancestry of Antonovka and Honeycrisp is known, they show high SNP frequency upstream from Ch-Vf1, and because the shared haplotype mainly extend downstream the Ch-Vf1 over the region containing HcrVf, their IBD remains to be clarified.The extended shared haplotypes SNP data is made publicly accessible for use in breeding and genetic research (Supplementary File 1).
The nucleotide differences among the homologs with the highest similarity to HcrVf2 and HcrVf1 resulted in amino acid changes that were reflected in the number and size of the predicted protein domains.The 2 sequences on LG 1 of Honeycrisp and Antonovka haplome A showed only minor changes in the number and size of the domains, i.e. the lack of a cytoplasmic domain and a shift in 1 LRR domain in Antonovka.The high similarity (>92%) between the 2 sequences exceeded the relatedness of the 4 Rvi6 locus paralogs, as HcrVf1 and HcrVf2 differed in the total number of LRR domains and showed lower identity (78-84%; Belfanti et al. 2004;Xu and Korban 2004).Similarly, in other species, the identity among resistance homologs, such as, R2 in Solanum, ranged between 85 and 99% (Srivastava et al. 2018).However, larger changes in amino acid sequences and protein domains were observed in the homologs from Gala, Honeycrisp haplome B, and M. sieversii.In these homologs, either the presence of stop codons is predicted to result in premature proteins or they contain different numbers and positions of the protein domains.This resembles the HcrVf3 gene in M. floribunda 821, compared with the other 3 HcrVf paralogs (Xu andKorban 2004, 2002).HcrVf3 contains a stop codon because of a single-point mutation at position 229, which most likely results in a prematurely terminated protein.Altogether, the presence or absence of the identified protein domains in different genotypes should be further validated for their functional roles in scab resistance.
In summary, construction of a novel phased genome assembly of scab-resistant Antonovka 172670-B and utilization of other available scab-resistant, scab-susceptible, and wild apple assemblies resulted in revealing candidate resistance genes with high homology to HcrVf2 and HcrVf1.In scab-resistant genotypes Antonovka and Honeycrisp, 2 homologs on LG 1 of haplome A showed particularly high similarity and coverage to HcrVf2, the apple scab-resistance paralog of the Rvi6 locus.These 2 homologs show high similarity to HcrVf2 at the amino acid level, have minor changes in their protein motifs, and are located in close proximity to the Ch-Vf1 scab-resistance marker.Potentially, these 2 homologs are identical-by-state, as both homologs share extended syntenic haplotypes in the proximity of Ch-Vf1 and show an absence "Antonovka" genome and apple scab-resistance genes | 9 of shared SNPs with M. floribunda 821-related cultivars.Homologs identified in scab-susceptible Gala, haplome B of Honeycrisp and M. sieversii have a lower homology with HcrVf2.Furthermore, 2 homologs highly related to HcrVf1 were identified in close proximity to the HcrVf2 homologs on LG 1 of haplome A of Antonovka and Honeycrisp.These homologs show high sequence and protein domain variability.In fact, HcrVf1 showed the highest relatedness to homologs identified in the wild Malus accessions.As a finality, the Antonovka 172670-B genome adds to the limited genomic resources comprising only partial coding DNA sequences of HcrVf2 homologs and marker data associated with the resistance, and hence makes future comparison of the genetic basis for scab resistance and other beneficial traits of this cultivar more feasible.Information on putative resistance gene candidates for Vhc1 and Rvi17 genes in Honeycrisp and Antonovka can be utilized for further dedicated studies on functional validation and breeding of these homologs.

Fig. 1 .
Fig. 1.Mummer dotplot of haplomes A and B (x axis and y axis, respectively) of the Antonovka 172670-B apple.

Fig. 2 .
Fig. 2. Phylogeny of HcrVf2 homologs from 12 genome assemblies of M. domestica and wild Malus genotypes generated by neighbor-joining phylogenetic analysis using MEGA11 (Tamura et al. 2021).The 43 homologs of HcrVf2 from Malus formed 4 major clades, supported by bootstrap values.The majority of members in clades I, II, and III were located on chromosome 1 or were placed to another LG, i.e. 11 or 17, or a scaffold (orange).Sequences from clade IV were found on LG 6 or on an unplaced scaffold (purple).Within clade II, HcrVf2 from M. floribunda 821 can be found (blue).The first part of the gene name represents accession/cultivar name (HC, Honeycrisp; Ant, Antonovka 172670-B; Mflo, M. floribunda 821; Mbac, M. baccata; Mpru, M. prunifolia; Msyl, M. sylvestris; Msie, M. sieversii; HFTH1, anther-derived homozygous genotype HFTH1; GDDH13, doubled-haploid derivative of Golden Delicious), followed by a chromosome number, haplome (if available), and the ranking based on the blast score from the same genotype.Numbers on nodes are bootstrap values, and values <0.50 are not shown.

Fig. 3 .
Fig. 3. Visualization of the synteny, location of HcrVf homologs, and shared SNPs on LG 1. a) Synteny plot of the LG 1 from Antonovka haplome A, Honeycrisp haplomes A and B, Gala haplome A, and M. sieversii haplome A, indicating the position and size of the Ch-Vf1 scab-resistance marker, and SNP and gene density along the up-and downstream flanking regions.b) Circoplot showing the physical positions of the most related HcrVf2 (green lines) and HcrVf1 (brown lines) homologs in the respective genome assemblies of haplome A of Honeycrisp and Antonovka, Gala, haplome B of Honeycrisp, and M. sieversii relative to the location of the Ch-Vf1 microsatellite marker.c) Extended shared haplotypes of the SNP marker data across the LG 1 from Rvi6-harboring Topaz with accessions known to carry different resistances, i.e.Rvi6 from M. floribunda 821, Vhc1 from Honeycrisp, Rvi17 present in Antonovka, or those found to be susceptible to the majority of V. inaequalis races.Thin and bold lines indicate SNP allelic regions that are not shared/ identical and shared/identical, respectively, with the SNPs in Topaz.The red dashed vertical lines indicate the approximate location of the Ch-Vf1 marker on the genetic map.

Fig. 4 .
Fig. 4. Schematic representations of the InterPro (Mulder and Apweiler 2007) domains of HcrVf2 and HcrVf1 homologs on LG1.Protein domains of the homologs with the highest relatedness to HcrVf2 and HcrVf1 (980 and 1015 amino acids in length, respectively) in scab-resistant accessions of M. floribunda, haplome A of Honeycrisp (HC) and Antonovka 172670-B (Ant), Gala, haplome B of Honeycrisp, and M. sieversii (Msie) include the signal peptide (yellow), N-terminus of the LRR region (red), extracellular LRR domains (dark green), hydrophobic transmembrane domain (light green), and the cytoplasmic terminus domain (brown).Stars indicate the presence of putative stop codons.The first part of the identification name represents accession/ cultivar name followed by chromosome (chr) number and haplome (if available).

Table 1 .
Summary statistics for phased chromosome-level genome assembly of Antonovka apples.