Extent of wild–to–crop interspecific introgression in grapevine (Vitis vinifera) as a consequence of resistance breeding and implications for the crop species definition

Abstract Over the past two centuries, introgression through repeated backcrossing has introduced disease resistance from wild grape species into the domesticated lineage Vitis vinifera subsp. sativa. Introgression lines are being cultivated over increasing vineyard surface areas, as their wines now rival in quality those obtained from preexisting varieties. There is, however, a lot of debate about whether and how wine laws defining commercial product categories, which are based on the classification of V. vinifera and interspecific hybrid grapes, should be revised to accommodate novel varieties that do not fit either category. Here, we developed a method of multilocus genotype analysis using short–read resequencing to identify haplotypic blocks of wild ancestry in introgression lines and quantify the physical length of chromosome segments free–of–introgression or with monoallelic and biallelic introgression. We used this genomic data to characterize species, hybrids and introgression lines and show that newly released resistant varieties contain 76.5–94.8% of V. vinifera DNA. We found that varietal wine ratings are not always commensurate with the percentage of V. vinifera ancestry and linkage drag of wild alleles around known resistance genes persists over at least 7.1–11.5 Mb, slowing down the recovery of the recurrent parental genome. This method also allowed us to identify the donor species of resistance haplotypes, define the ancestry of wild genetic background in introgression lines with complex pedigrees, validate the ancestry of the historic varieties Concord and Norton, and unravel sample curation errors in public databases.


Introduction
The spontaneous exchange of genetic material between interfertile species-also known as interspecific gene flow-has contributed to the origin of crop plants (e.g. bread wheat [1], date palm [2], and tomato [3][4][5]), to the restoration of crop diversity after domestication or genetic erosion (e.g. in sorghum [6] and soybean [7]) and to the adaptation to challenging environments [8,9]. Plant breeders have also made deliberate use of introgressive hybridization for crop improvement (e.g. wheat [10], rice [11], potato [12,13], tomato [14,15], cassava [16]), using the crop's secondary gene pool to supply genetic variation. Wild ancestry conferring disease resistance traits has been incorporated into newly released varieties in the most important staple and horticultural crops [17].
Grapevine is rather unique in this context. Extensive spontaneous introgressive hybridization has been doc-umented among North American species that could have had a role in stress adaptation [18]. Interspecific hybridization through controlled crosses in breeding programs has also been used to transfer favorable traits from wild interfertile species into the domesticated European lineage (Vitis vinifera L. subsp. sativa). These artificial hybridization events are more often blamed for deliberately introducing unnecessary diversity into an otherwise asexually propagated crop [19] than acknowledged as a solution for reinforcing fragile varieties [20] that are otherwise susceptible to fungal diseases and require intensive use of plant protection products [21]. Over the past two centuries, Vitis species in the Midwest and the East of the United States, Vitis amurensis in Northeastern Asia and Muscadinia rotundifolia have supplied major genes for Plasmopara viticola and Erysiphe necator resistance (e.g. Rpv3-1 [22], Rpv3-2 [23], Rpv3-3 [24], Rpv10 [25], Rpv12 [26], Ren3/9 [27], Rpv1/Run1 [28]) that grape breeders have transferred through repeated backcrossing into the cultivated germplasm (see the next section for details on the specific source and chromosomal location of each gene). Additional major loci [29][30][31][32][33] and minor QTLs [34][35][36][37][38] from diverse wild accessions and species are now being discovered, thanks to a wider germplasm exploration in natural habitats and in ex-situ collections and to the availability of next generation sequencing (NGS)-based genotyping assays for genetic mapping, which may refuel introgressive breeding in the coming years. Despite the scale of historical breeding and the launch of nationwide breeding initiatives in recent years (e.g. VitisGen in the US, ResDur in France), viticulture tends to downplay the role of grape introgression lines. These varieties are not even called introgression lines, a naming that should descend from the process used to create them, but they are classified as interspecific hybrids, no matter their genome composition, to underscore their origin and place a mark of discredit on them. Only recently, the German authority competent on the authorization of grape varieties for wine production (Bundessortenamt) took a countercurrent stance, granting the classification of V. vinifera to the introgression line "Regent" and removing the stigma around impure V. vinifera genomes.
The reason behind the belief that grape introgression lines will never rival single-species varieties for value of cultivation appears to be twofold: generalization from experience and blind commitment to earlier assumptions. Grape interspecific F1 hybrids were first generated in the nineteenth century in a hurry to curb epidemics that threatened the survival of V. vinifera vineyards in North America and Western Europe. They derived their wild ancestry from the most highly resistant American accessions known at that time, belonging to native species that were characterized by the presence of unusual berry metabolites that pass their unpleasant notes on wines. Once the fungal diseases that damage the aerial part of the vines could be contained with the use of effective agrochemicals, some wine-producing countries banned the plantation of interspecific F1 hybrids, including those that did not require grafting to withstand root-feeding insects and known as direct producer hybrids, and triggered a negative campaign on non-vinifera wines. Undesirable sensory attributes were attenuated but not eliminated by repeated backcrossing-the process that has generated introgression lines from interspecific hybrids. This pitfall led many in the realm of wine and viticulture to denigrate any attempt of interspecific introgression in V. vinifera, under the assumption that what was true for some accessions of a few donor species will always hold true for all accessions in all species. Advancements in wine flavor chemistry corrected this assumption only recently [39][40][41]. According to another prejudice, novel grape genotypes make by their nature, no matter their ancestry, lower quality wines than those we inherited from an earlier time. Emotional patterns are the foundation for this conjecture. Wine consumer behavior is strongly influenced by habitual taste [42], which is greatly influenced by the primary aromas of a few celebrated varieties [43], the ten most planted of which account for a quarter of the global wine production [44]. Most consumers therefore seek subtle sensory variation, conferred to grape composition by environmental factors and, on top of that, to wine composition by winemaking practices. Some consumers are more prone to exploratory behavior and quest for original tastes [45,46]. Very few are aware of the existence of novel combinations of desirable primary aromas that are expressed by varietal wines made from brand new genotypes, including those obtained from introgression lines.
The perspectives for novel introgression lines to produce wines that will increasingly please the consumer are, however, growing without historical parallel [47][48][49]. A genome-based classification of pure-species varieties, hybrids, and introgression lines will be needed to treat any grapevine specimen with appropriate nomenclature [50]. DNA genotyping and sequencing have made it possible to identify the origin and the pedigree of wine grapes of the V. vinifera species [51,52]. We recently showed that a whole genome sequencing (WGS)-based estimation of the aggregate length and distribution of identityby-descent (IBD) segments is a powerful method for revealing genealogical relationships between V. vinifera varieties with an unprecedented degree of accuracy [53]. A similar approach could also be applied to the characterization of intermediate forms between the cultivated germplasm of the domesticated lineage (V. vinifera subsp. sativa) and wild grape species, i.e. introgression lines carrying non-vinifera chromosome segments from Vitis species and Muscadinia species, that so far has proven not to be trivial [54,55]. Genomic ancestry estimation has already been proposed using Principal Component Analysis (PCA)-based distances [56] and used for quantifying the exploitation of wild grape species in breeding [57]. This approach returned a vinifera ancestry coefficient per diploid genome but no further information on the length, zygosity and chromosomal location of the introgressed genomic segments. Here, we developed a method for the quantification of interspecific introgression that can also be applied to any other crop for the genomic classification of cultivated varieties. We show six diverse applications of this method to: [1] distinguish introgression lines from interspecific hybrids and from accessions with genuinely wild or genuinely vinifera genomes.
[2] quantify their ancestry based on the DNA length and the species-of-origin of each chromosomal segment, including known resistance haplotypes. [3] compare experimental and simulated data on the effect of linkage drag around resistance loci on the recovery of the recurrent parental genome. [4] investigate the relationship between V. vinifera ancestry and varietal wine ratings in eight advanced European introgression lines. [5] authenticate the origin of two historical American introgression lines that have laid the foundation of viticulture in the eastern and mid-western United States.
[6] correct the classification of grapevine specimens in literature reports.
To this end, we searched public databases for wholegenome short-read DNA resequencing or RNA-Seq data of introgression lines, F1 hybrids, and grape species as of July 2020, including accessions of V. vinifera that were deposited past our previous diversity studies in the cultivated compartment [53]. We then generated DNA resequencing or RNA-Seq data for a set of recently-bred resistant varieties-an information that was completely lacking in public repositories. Along this process, we encountered numerous cases of sample misclassification that required a revision of their current status.

Intraspecific and interspecific SNP variation
To examine the levels of intraspecific and interspecific diversity, we used Illumina short reads from 193 grape specimens, which corresponded to 184 unique genotypes (Table S1), 13 of which were sequenced in this study and the rest was retrieved from public databases. We compared their SNP profiles with a SNP inventory from a V. vinifera subsp. sativa diversity panel [53] (Supplementary Note 1), hereafter referred to as vinifera diversity panel, and assumed that SNPs not represented in the panel are from non-vinifera species. Fig. 1 shows the densities of homozygous and heterozygous SNPs that are absent from the vinifera diversity panel (private SNPs, hereafter referred to as a proxy for non-vinifera SNPs) in specimens sorted by a posteriori taxonomical assignment.
Local density and zygosity of non-vinifera SNP alleles confirmed the current classification for 88.5% of the specimens. As for the remainder, we detected wild genome introgression in DNA/RNA samples that had been erroneously classified as pure V. vinifera and V. vinifera introgression in DNA/RNA samples of putative wild accessions (Supplementary Note 2). After curation, the germplasm under study consisted of 62 V. vinifera genomes that were not included in the vinifera diversity panel [53], 73 wild species genomes, which both served as validation panels (Supplementary Note 1, Figs. S1-S4), and 49 hybrid genomes or introgression lines. The introgression lines, due to their complex history of repeated backcrossing with different V. vinifera recurrent parents (used in order to avoid inbreeding depression) as well as with multiple non-recurrent parents (often used in order to bring in different resistance genes), can have an introgressed genomic segment on a single homolog, hereafter defined as monoallelic introgression, or on both homologous chromosomes. In this second case, if the two introgressed segments are identical by state we will define it as monoallelic homozygous introgression, and if, as in most cases, the two segments are not identical by state we will define it as biallelic introgression. The entirety of the genome length in wild species as well as specific chromosome segments in introgression lines carrying biallelic introgression were characterized by genomic windows with a high density of non-vinifera SNPs in both homozygous and heterozygous conditions, indicative of the presence of two copies of non-identical non-vinifera homologous chromosome segments (Fig. 1). The entire genome in interspecific hybrids (V. vinifera × wild species) as well as specific chromosome segments in introgression lines carrying monoallelic introgressions were characterized by genomic windows with a high density of non-vinifera SNPs only in heterozygous condition, indicative of the presence of one copy of non-vinifera homologous chromosomes. The density of heterozygous non-vinifera SNPs was higher in V. vinifera × wild species hybrids than in wild species because, in the latter, a substantial fraction of those sites is homozygous due to the presence of the same non-vinifera variant on both wild homologous chromosomes (i.e. fixed interspecific differences).

Wild ancestry assignment
We used local phylogeny to assign the genomic windows carrying signatures of introgression to the non-vinifera species that was matched with the highest identity-bystate (IBS) ratio (IBSR H ) value (Supplementary Note 1 and Figs. S5-S6), hereafter referred to as a proxy for the most closely related donor genome. To this end, we curated a reference SNP dataset for the grapevine wild relatives using 66 wild accessions of 47 grapevine species (Table S2). We validated the taxonomic assignment of these accessions by principal component analysis, Bayesian clustering [58] and maximum likelihood phylogenetic analysis (Fig. S5). The grapevine secondary and tertiary gene pools include, respectively, American and Asian Vitis species, which were separated by the continental drift, and Muscadinia species, which were isolated from sympatric American Vitis species by nearly complete reproductive barriers (Supplementary Note 3). The non-vinifera diversity panel has representatives of the following three main groups: the first group includes 29 accessions of 17 American Vitis species, with multiple accessions for taxa with broad geographic distribution and wide intraspecific variation, e.g. Vitis aestivalis, Vitis cinerea, Vitis riparia and Vitis rupestris. The second group includes 34 accessions of 29 Asian Vitis species, with multiple accessions for taxa that were used for resistance breeding (e.g. V. amurensis and V. romanetii) and one accession each for species that were recently exploited for gene/QTL discovery [59,60] (e.g. V. pseudoreticulata and V. quinquangularis). The third group consisted of three accessions from the genus Muscadinia, represented by the species Muscadinia rotundifolia, which is the only non-Vitis genetic resource usable by grape breeders. M. rotundifolia and V. vinifera have a different chromosome number, 2n = 40 versus 2n = 38. Their hybrids have an intermediate chromosome number and normally show sterility due to abnormal meiotic pairing [61], which led us to consider Muscadinia as a part of the tertiary gene pool, although rare exceptions allowed breeders to raise intergeneric introgression lines [62].

Genomic constitution, introgression maps and chromosome painting
For each introgression line, we calculated the aggregate length of genomic windows free-of-introgression, with monoallelic introgression, or with biallelic introgression, and the relative gene content in each fraction (Table S3). We generated introgression maps (e.g. Figs. S20a, S25a, S27-S30a-b, S37a, S40-S41, S43d-f) and used chromosome painting to show the ancestry of each chromosome segment (Fig. S6), which provide graphical outputs that illustrate the composition of the karyotype with a 100-Kb resolution (e.g. Figs. S21, S25b, S37b). The analysis of the genomic constitution in mildew resistant varieties reflected opposite historical trends in grape breeding (Fig. 2).
In North America, crop-wild hybrids were mainly intercrossed or backcrossed to wild species to reinforce resistance in varieties that had to deal with high disease pressure in challenging environments. In Europe, introgression lines were initially intercrossed and then backcrossed for several generations to V. vinifera with the chief aim of improving wine quality while maintaining a sufficient level of disease resistance. These different breeding schemes determined, in comparison to F1 hybrids, an increase of the percentage of the genome carrying biallelic introgression in American resistant varieties (on the left hand side in Fig. 2) and a progressive removal of the donor wild genome in European resistant varieties (on the right hand side in Fig. 2). Residual wild DNA remained mostly as monoallelic introgression in the genome of European resistant varieties ( Fig. 2) because it was normal practice for breeders to perform at least one generation of backcross to V. vinifera prior to varietal selection. This trend has been partially reversed only recently by the resumption of intercrosses between introgression lines, with the aim of stacking multiple resistance genes originally present in different lineages, and the release of selected varieties directly from an intercross generation (Fig. 2, Artaban and Floréal).

Recent European resistant varieties
We dedicated special attention to the genome analysis of recent resistant varieties that have been released in Europe since the 2000s, which we sorted in three groups based on their wild ancestry components (Fig. 3).
The first group includes the latest backcross generation of lineages that derived their wild ancestry only from American Vitis species (Fig. 3a). In this group, the percentage of vinifera DNA is on average 84.0%, ranging from the highest 85.2% in "Cabernet Eidos" to the lowest 82.8% in "Sauvignon Rytos". The residual wild donor genome proportion in these backcrosses was reduced slightly less than expected without any selection when compared to their resistant parent "Bianca" (71.8% vinifera DNA, Fig. 3b). For comparison, the variety "Regent," a benchmark for European resistant varieties with American Vitis wild ancestry, has 79.8% vinifera DNA and its resistant parent "Chambourcin" has 55.9% vinifera DNA (Fig. 3c).
The second group includes varieties that derived their wild ancestry from both American and Asian Vitis species (Fig. 3d), as a result of the efforts to combine resistances of different origin, which started in Germany in the 1970s and in Serbia in the 1980s. In the Serbian lineage, the American Vitis ancestry was donated by "Bianca" and the Asian ancestry was donated by the breeding line "SK77-4/5", which was selected from a BC2 generation of the interspecific hybrid V. amurensis × V. vinifera, generating the resistant parent of the lines tested here (Soreli, Fleurtai, Cabernet Volos, Merlot Khorus, Merlot Kanthus, Sauvignon Kretos). The percentage of vinifera DNA in this  group has increased to an average of 87.1% compared to the previous group, ranging from a maximum of 90.7% in "Soreli" to a minimum of 85.2% in "Merlot Kanthus", despite the fact that "Bianca" was intercrossed with a second introgression line before being backcrossed to generate these varieties. Since all the introgression lines in these two groups except "Chambourcin" were selected from backcross generations, they carry only monoallelic introgression and all genes carry at least one vinifera allele (Fig. 2).
The third group includes varieties that derived their non-vinifera ancestry from interspecific and intergeneric introgression (Fig. 3e), as a result of the pyramiding efforts to stack Muscadinia and Vitis resistance genes, which started in Hungary and Serbia in 1999 [63] [64], and in France and Germany in 2000 [65]. "Artaban" and "Floréal", which descended from the French-German lineage, combine Muscadinia and American Vitis wild ancestry and contain 90.1% and 76.5% vinifera DNA, respectively. While both of them were selected from an intercross generation, "Artaban" carries only monoallelic introgression and "Floréal" carries biallelic introgression at only 2.5% of the genes. "Pinot Iskra" and "Kersus", which descended from the Hungarian-Serbian lineage, combine Muscadinia, American Vitis and Asian Vitis wild ancestry and contain 92.2% and 94.8% vinifera DNA, respectively. Both "Pinot Iskra" and "Kersus" were selected from a backcross generation and therefore they carry only monoallelic introgressions (Fig. 2).

Grape ancestry and wine expert ratings
Wine ratings were assigned by expert panels [66] to the standardized product of eight introgression lines, whose ancestry is shown in Fig. 3, and seven benchmark V. vinifera varieties (see Supplementary Note 1 for details on the choice of this set). The experts evaluated aroma intensity and finesse, structure, complexity, familiarity, liking and expressed their propensity to use the variety or the varietal wine in their business, hereafter referred to as buying behavior (Supplementary Note 1). All white wines from introgression lines ranked intermediate with respect to V. vinifera controls ( Fig. S7a and Fig. S10). In the 2015 vintage, Sauvignon Rytos-the introgression line with the lowest V. vinifera ancestry-ranked highest among the wines obtained from introgression lines. The Sauvignon Rytos 2015 was second only to a genuinely V. vinifera wine obtained from Sauvignon Blanc FVG191a clone that is characterized by intense tropical and peachy notes-while it outweighed the more vegetal and pungent wine obtained from the Sauvignon Blanc clone FVG195. In the 2016 vintage-a season less conductive to varietal aroma intensities that resulted into a wholepanel appreciation more dictated by wine structure (Figs. S8-S9)-the Sauvignon Kretos and Sauvignon Blanc wines contended for the highest ranking (Fig. S7b). As for red wines, the panels showed higher inclination to prefer young wines (from vintage 2015 and evaluated in 2017) from genuinely V. vinifera varieties in comparison with those obtained from introgression lines (all ranging from 85.2 to 86.8% V. vinifera ancestry), with the notable exception of "Merlot Khorus" (Fig. S7c). The inclination of the panels was reversed for aged red wines (from vintage 2016 and evaluated in 2021, Fig. S7d).

Linkage drag around R-genes and recovery of the recurrent genome
Mildew resistance relies on effector-triggered immunity (ETI), which is conferred by the R-gene products of wild alleles at the Rpv3-1, Rpv12, Ren3/9, Run1/Rpv1 loci. The introgression of R-alleles in the resistant varieties of this study came at the cost of dragging 13.6-15 Mb of wild DNA around Rpv3-1, 11.5-22.6 Mb around Rpv12, 13.2-17.7 Mb around Ren3/9, and 7.1 Mb around Run1/Rpv1 ( Table S4). As expected, the carrier wild chromosomes were shortened with backcrossing to a lesser extent than non-carrier wild chromosomes (Figs. S11, S13, S15, S17). Individual and cumulative length of target and nontarget wild chromosome segments was similar to computer simulation of forward-in-time backcross generations (Figs. S12, S14, S16, S18).
Full agreement between real and simulated backcrosses was found for resistant lineages containing an ETI gene that alone explained a high proportion of the phenotypic variance for mildew resistance (Rpv1 [67], 73%; Rpv12 [26] 78.7%; Run1 [68], complete resistance). This occurred with resistant varieties that inherited Rpv12 from V. amurensis and Run1/Rpv1 from M. rotundifolia. Partial disagreement was found for lineages containing ETI genes from American Vitis that trigger the defense response but explain a lower proportion of the phenotypic variance under field conditions (33.9-50% among years for Rpv3-1-dependent downy mildew resistance [69], 44.5-59.4% among years and 23.1-63.8% at different phenological stages for Ren3/9-dependent powdery mildew resistance [27,70]). These introgression lines have retained more donor background than expected by simulation, both in terms of aggregate length of nonvinifera DNA and in terms of individual length of certain chromosome segments. In Rpv3-1 and Ren3/9 carriers such as "Regent", "Bianca" and "Sauvignon Rytos", only 14.7-19.6% of the wild ancestry is due to linkage with ETI wild alleles ( Fig. 3 and Table S4) and the predominant part of wild ancestry is still due to wild background. Minor QTLs, including one on chromosome 7 [69], were shown to contribute to the resistance phenotype in Rpv3-1 segregating progenies [71,72]. The genetic background could indeed contain factors acting downstream of ETI genes that modulate the strength of the ETI response or operate subsidiary defense responses [73]. It is possible that positive selection for minor effects-controlled by non-vinifera alleles and contributing to the expression of higher levels of field resistance under a polygenic model-has counterbalanced the negative selection that is presumed to act against wild background DNA during simultaneous selection for resistance and value for cultivation and use (VCU testing).
Two or more R-genes were historically stacked in various combinations [64,74]. The intercrosses for gene pyramiding often combined lineages at a different stage of backcrossing. The genes Rpv3-1 and Ren3/9 were stacked in the same genotype at very early stages of backcrossing several decades ago, combining downy and powdery mildew resistance in a single individual. In this lineage, many rounds of simultaneous backcrossing did not allow a high recovery of the recurrent genome in the subsequent generations (Figs. S15-S18). Further stacking of Rpv12 and Run1/Rpv1 genes was accomplished using donors of resistance that already harbored a highly vinifera background, which did not cause an increase of the wild ancestry in pyramided lines. In "Artaban", the presence of the resistance haplotypes Rpv3-1, Ren3/9 and Run1/Rpv1 accounts for 46.1% of the residual wild ancestry. In "Soreli", the presence of the resistance haplotypes Rpv3-1 and Rpv12 accounts for 47.8% of the residual wild ancestry. In Pinot Iskra, the presence of the resistance haplotypes Rpv12, Ren3/9 and Run1/Rpv1 accounts for 62.0% of the residual wild ancestry.

Donor species of resistance haplotypes and wild genetic background
We first set up and validated a pipeline for identifying the donor species of the wild genome in V. vinifera × Vitis sp. hybrids (Fig. S19), based on the highest IBSR H score with a wild species of the non-vinifera diversity panel. The use of such an approach is justified by the fact that the non-vinifera diversity panel was designed to include all the known donor species utilized in grapevine breeding programs and to span the full breadth of Vitis diversity. We found agreement with the reported pedigree in known V. vinifera × Vitis sp. hybrids (Fig. 4a and Fig. S19a) as well as across the genome consistency for unknown V. vinifera × Vitis sp. hybrids ( Fig. 4b and Fig. S19b) and reliable prediction for Vitis sp. × Vitis sp. hybrids between closely related wild species that are commonly used in viticulture as rootstocks ( Fig. 4c and Fig. S19c). We then used the same pipeline to identify the origin of wild haplotypes in introgression lines. We clearly detected the contribution of three highly differentiated gene pools in the resistant varieties with stacked American Vitis, Asian Vitis and Muscadinia introgressions (exemplified by Pinot Iskra in Fig. 5d). The donor species of the wild Muscadinia and Asian Vitis ancestry in the resistant varieties of this study were confirmed to be consistent with pedigree-based information, which reported the origin of the Rpv12 and Run1/Rpv1 carrier chromosomes as well as the associated wild background from V. amurensis and M. rotundifolia, respectively.
The resistant varieties with American Vitis ancestry showed, instead, multiple introgressions from different species belonging to the American gene pool (Fig. 5a-b), as a result of their complex pedigree. Most of the wild DNA in American Vitis introgression lines that carry Rpv3-1 and Ren3/9 is derived from the series Ripariae, with highest IBSR H scores with the species V. rupestris and Vitis acerifolia, and to a lesser extent from the series Aestivales. These introgression lines showed haplotype sharing across the entirety of chromosome 7 and the pericentromeric region of chromosome 18 (upstream of the Rpv3-1 locus) with a genuine American genome (Fig. S20), corresponding to the DNA sample TA-145 "Couderc" (Fig. 4c and Fig. S19c) sequenced by Liang and coworkers [55]. Those authors associated the DNA sample TA-145 with one of the earliest (V. rupestris × Vitis lincecumii) × V. vinifera hybrids obtained in France to combat phylloxera and mildews. We disproved the identity of TA-145 DNA because the corresponding specimen has no vinifera ancestry and may have been sampled from the wild parent "Munson", which is a recurrent line in the pedigree of Rpv3-1 and/or Ren3/9 carriers, or from another American stock initially used for hybridization in Europe. Despite this uncertainty, the TA-145 DNA certifies the presence in an American Vitis accession of large haplotypic blocks that are still shared with the most recent backcrosses of the Rpv3-1 and Ren3/9 lineages.
Even more interesting is the ancestry of the Ren3/9 and Rpv3-1 donor segments [22,27,70]. The Ren3/9 haplotype on the upper arm of chromosome 15 showed the highest IBSR H with V. aestivalis. The Rpv3-1 haplotype that spanned over the lower arm of chromosome 18 was very unique in showing low IBSR H scores with Vitis and Muscadinia accessions in the non-vinifera reference panel (Fig. S21). Weak matches were found with two divergent American accessions of V. cinerea and with Asian species, suggesting that this resistance haplotype, while being certainly introgressed into breeding lines from American native grapes, has an archaic ancestry. We used resequencing data from a Rpv3-1 homozygous lines (UD-21076) [22] to perform a phylogenetic analysis that confirmed that the chromosome segment carrying Rpv3-1 forms a divergent branch stemming from Asian Vitis lineages (Fig. S22). It is therefore possible that the Rpv3-1 haplotype predates the Asian-American divergence and has survived in American populations due to the selective advantage conferred by the resistance allele in a natural environment infested with Plasmopara viticola or that it originates from a marginal habitat species, genetically very different from all those represented in our panel. The first hypothesis is fully consistent with our previous data on insertion/deletion polymorphisms [75] that showed substantial levels of shared variation between Rpv3-1 and Asian haplotypes. We searched for a confirmation that Asian ancestry is detectable rarely in American native germplasm but not exclusively in Rpv3-1 carriers and not only in the Rpv3-1 region. Both Bayesian clustering and TreeMix analysis provided genome-wide evidence for Asian-American admixture in Southern US grape species (Fig. S5 and Fig. S23), including two accessions of the V. cinerea complex [76]. The second hypothesis, which is however not mutually exclusive with the first one, may find support in the recent finding that the US region of Texas and neighboring states represent a transition zone between eastern and western American Vitis germplasm [77] and are home to the highest richness in Vitis diversity, as they provided refugia during glacial displacement [76], including species in which fungal   Fig. S6. Asterisks indicate the position of the resistance alleles Rpv1/Run1 (chr12), Rpv12 (chr14), Ren3/9 (chr15) and Rpv3-1 (chr18) on the corresponding resistance haplotypes. resistant accessions can be found at highest frequencies [78]. It is in this geographic area from which much of the starting material for European resistance breeding was dispatched [79,80] that relict genetic lineages and archaic haplotypes are more likely to have survived.

Historical American resistant varieties
In Europe, the quest for resistant varieties with higher wine quality fostered varietal replacement as soon as advanced generations of backcrosses became available. As a result, the oldest selections disappeared from the vineyards and from the breeders' collections (e.g. Emily, Herbemont d'Aurelles, Seibel 752, Seibel 4595, Seibel 4911, Seibel 2003 × berlandieri), disposing of germplasm that could help illuminate the earliest steps in resistance breeding. In the US, however, resistant varieties that were selected from the earliest crosses are still cultivated. Old varieties such as "Norton" and "Concord" are still widely grown, but their origin and ancestry are only partially known from old records and recent molecular analyses [81][82][83][84].
According to the breeder's note, "Norton" originated in the 1820s from a backcross of an alleged Vitis labrusca × V. vinifera F1 hybrid called "Bland" to V. vinifera (Supplementary Note 4). This origin would imply the presence of monoallelic introgression across approximately 25% of the Norton genome, which conf licts with the genomic constitution reported in Fig. 6a. Ambers [85] noticed that Norton pedigree is suspicious in multiple aspects (Supplementary Note 4) and proposed an alternative hypothesis, according to which a V. aestivalis × V. vinifera F1 hybrid was backcrossed to a different V. aestivalis accession to generate "Norton". This origin would result in an equally represented mixture of monoallelic and biallelic introgressions across the Norton genome-with regions of biallelic introgression carrying non-vinifera SNPs predominantly in heterozygous state-and the complete absence of pure-vinifera chromosomes. We provide evidence that supports Ambers' hypothesis. In fact, 53.8% of the Norton genome shows biallelic introgression while all the remainder 46.2% shows monoallelic introgression. "Norton" shows the highest IBSR H scores with accessions of V. aestivalis across 35.2% of the genome length, followed by V. × slavinii (20.9%), V. rupestris (19.5%), V. labrusca (11.4%) and V. × novae-angliae (5.6%) (Fig. 6b).
"Concord" was reported to be raised in the 1840s from seeds of a V. labrusca accession trellised next to a "Catawba" stock [86]. We could confirm that "Concord" has a parent-offspring relationship with "Catawba" (Fig. S24), which is an F1 hybrid that derived half of the ancestry from a pure American grape, most likely V. labrusca (Fig. 4a), and the other half from V. vinifera "Sémillon" (Fig. S24). The second parent of "Concord" is most likely another accession of V. labrusca, different from the progenitor of "Catawba", because the highest IBSR H values were most frequently scored with V. labrusca throughout the genome and a substantial fraction of non-vinifera SNPs in regions of biallelic introgression were found in heterozygous state (Fig. 6a). The availability of genomic data from an F1 hybrid (Catawba), its pseudo-backcross to a wild accession (Concord) and a self-pollination of "Concord" (Worden) also offered us the possibility of monitoring the results of artificial selection after an event of interspecific hybridization. The genomic constitution of "Worden" appears to be very distant from the result expected under random assortment and recombination of chromosomes during meiosis (Fig. 6a, b). While one would expect a reduction of the amount of wild DNA in an S1 individual phenotypically selected with the aim of improving crop traits, the opposite has occurred with "Worden". "Concord" is heterozygous across 95.7% of the genome length and has monoallelic introgression across 54.9% of the genome length, offering the possibility of obtaining a complete recovery of the vinifera genome across those regions upon selfing. In "Worden", this recovery occurred in 1.4% of the genome length and involved a single tract of DNA containing 450 genes, exactly across a domestication locus on chromosome 17 [52,53]. Regions with biallelic introgression increased from 45.1% of the genome length in "Concord" to 60.7% in "Worden", which is counterintuitive if the selection applied by the breeder was aimed solely at the improvement of crop traits, which we assume to benefit from a genome-wide increase in vinifera alleles. Another rarity in "Worden" is the absence of assortment and recombination in as many as 13 out 19 pairs of "Concord" parental chromosomes that resulted in "Worden" having 83% of its genome identical to that of "Concord" against a 50% expectation. "Worden" and "Concord" show IBD = 1 across the remainder of the genome (Fig. 6c) and, across all those regions but the domestication locus, "Worden" is homozygous for non-vinifera alleles. The presence of heterozygous inversions, either paracentric or pericentric, such as those frequently observed even in V. vinifera intraspecific comparisons [87], could explain the observed lack of recombination along chromosomes but not the lack of homozygosity and the maintenance of the same chromosome combination found in the parental line. We therefore interpreted this dual-pronged deviation from random expectations as the result of a strong negative postzygotic selection against multiple recessive deleterious alleles that reside in repulsion on both parental chromosomes and result in pseudooverdominance [88] as well as in selection against homozygous combinations of parental chromosomes. The transmission of reassorted or recombinant chromosomes from an interspecific hybrid, indeed, implies for S1 zygotes to carry tracts of homozygous DNA, unless the gamete carrying recombinant chromosomes fuses with another gamete carrying reciprocal crossing overs, which is nearly impossible to occur.

Taxonomic treatment of questionable specimens and their impact on population genetics analyses
The DNA samples "Turkmenistan 1" and "Pakistan 2" showed local densities of private SNPs far higher than those found in any tested accession of V. vinifera subsp. sylvestris [89,90] (Table S1 and Fig. S25) and only comparable to those observed in interspecific hybrids or in chromosomal segments with interspecific introgression in resistant varieties (Fig. S26). "Turkmenistan 1" carries monoallelic introgression across 84.4% of the haploid genome length. Across these regions, "Turkmenistan 1" showed highest IBSR H scores with V. labrusca and V. rupestris TA-7 or with other American Vitis species (Table S5). "Turkmenistan 1" has likely originated from an interspecific hybridization followed by one or two rounds of backcrossing to V. vinifera. "Pakistan 2" carries monoallelic introgression across 29.3% of the genome and 5.5 Mbp of biallelic introgression. "Pakistan 2" showed highest IBSR H scores with Asian species (V. quinquangularis, V. coignetiae, V. romanetii, V. amurensis, in decreasing order of cumulative window length) and with two accessions of the American species V. cinerea but in all cases with low IBSR H values of the best matching accession ranging from 0.81 to 0.89 (Table S5). "Pakistan 2" is therefore an introgression lines that has derived the non-vinifera ancestry from a lineage of the Asian gene pool substantially different from those captured by our reference panel. It is possible that this contribution may have come from V. jacquemontii [91,92], the only species endemic to the south of the Himalayas [92]-the presumed area of origin of "Pakistan 2"-which is reported to be distantly related to the East Asian species included in our study and more closely related to the European species [92] (see Fig. S26 for the density of homozygous non-vinifera SNPs in regions of biallelic introgression). "Turkmenistan 1" and "Pakistan 2" were included in what was taken for granted as a representative sample of nine V. vinifera subsp. sylvestris accessions used for inferring the history of grape domestication [54]. The inadvertent inclusion of two interspecific introgression lines has led Zhou and coworkers to state that nucleotide diversity (π) is higher in sylvestris than sativa [54]. The removal of these introgression lines controverts their conclusion. We estimated a π = 6.48 × 10 −3 in the sativa panel of Zhou and coworkers [54] versus a π = 7.18 × 10 −3 in their sylvestris panel inclusive of "Turkmenistan 1" and "Pakistan 2", a value that dropped to 6.45 × 10 −3 using only genuine sylvestris. We found other examples of sample curation errors in phylogenetic studies [93] that have included Vitis sp. × V. vinifera hybrids and V. vinifera introgression lines among genuine American grape species (Supplementary Note 2). We also spotted inaccurate sample naming and inappropriate specimenmetadata associations in resequencing datasets [55] that may foreshadow more extensive curation errors in public databases.

Discussion
The boundaries around the group of individuals that collectively form a crop species are blurred in the absence of reproductive barriers with close relatives. In grapevine, the species assignment of individual genotypes not only influences the naming of plants but it also has implications for the commercial value of plant-derived products. Divergent opinions on grapevine classification have resulted in the lack of an agreed definition of V. vinifera, sowing confusion in national regulatory bodies and leading to the existence of various different treatments of the same specimen in different countries. On the conservative side of this debate, it is stated that grape varieties with any degree of non-vinifera ancestry must be considered interspecific hybrids [94] (also called cépages hybrides in France [95]) with the consequence of precluding their use for the production of protected denomination of origin wines in the European Union. This position is motivated by a fascination with the legacy of the past and concessions to the world's finest wine regions for preserving the dominant role of traditional varieties, but it does not seem to find support from blind wine tastings (Figs. S7  and S10). On the other side of this debate, the use of internationally agreed morphological descriptors was proposed to assign novel varieties to botanical groups on a phenotypic basis, with the possibility of granting a V. vinifera classification to introgression lines in the absence of non-vinifera traits and with the aim of offering more planting options to vine growers without penalizing those who switch to disease resistant varieties. As of June 2021, the International Organisation of Vine and Wine (OIV) is working on a resolution to craft widely accepted taxonomic and commercial definitions of V. vinifera [50]. Population genetics, phylogenetic reconstruction and genomic analyses can provide insightful data to solve the species assignment problem and to help find the best classification criteria that could be applied to V. vinifera. The transition from a phenotype-based classification to the use of a genomic species definition has revolutionized microbial taxonomy [96], thanks to the possibility of establishing systematics on the basis of whole genome information, therefore achieving unprecedented accuracy and detail [97,98]. Here, we offer the grape community an effective method, based on low-cost whole-genome sequencing that provides an accurate measurement of tract lengths of introgressed DNA and identifies the ancestry of each introgression, without requiring de novo genome assemblies. Methods for detecting genomic introgression have already been used in plant species. They are based on two different approaches. One approach compares populations using differentiation-based and phylogeny-based metrics and has been extensively used for studying natural introgression in the context of crop adaptation [99,100]. The other approach applies to individuals and uses genomic ancestry deconvolution to assign genomic regions to reference panels based on local patterns of allele or haplotype sharing. This approach is more appropriate for studying wild introgression as a result of crop improvement. Sawler and coworkers [56] used a PCA-based ancestry estimation to calculate ancestry coefficients in hybrid grapes, which were validated by the membership coefficients assigned by the Bayesian clustering approach [101]. Sawler's ancestry coefficient is accurate but it only expresses the proportion of variant sites carrying vinifera alleles across the diploid genome upon an independent treatment of each SNP genotype. Our method expresses the relative proportion of vinifera ancestry either as the percentage of vinifera DNA per diploid genome (i.e. over the diploid genome length), which informs on the vinifera ancestry, or as the percentage of the haploid genome length that is free of introgression, which informs on the fraction of the genome that is completely purged from interspecific wild DNA (Table S3). Good agreement in the estimates of vinifera ancestry was found for all the introgression lines that were in common between Sawler's panel and ours. "Regent" was estimated to have 68% vinifera ancestry according to Sawler's coefficient [57] versus either 79.8% or 59.7% according to the method presented in this paper, depending on whether the vinifera ancestry is expressed either over the diploid genome length or over the haploid genome length as of free-of-introgression loci, respectively. "Chambourcin" was estimated to have 46% vinifera ancestry according to Sawler's coefficient [57] versus either 55.9% (over the diploid genome length) or 28.3% (complete vinifera ancestry across the haploid genome length) with the method presented in this paper. "Concord" was estimated to have 31% vinifera ancestry according to Sawler's coefficient [56] versus either 27.4% (over the diploid genome length) or 0% (complete vinifera ancestry across the haploid genome length). "Norton" was estimated to have 31% vinifera ancestry according to Sawler's coefficient [57] versus either 23.1% (over the diploid genome length) or 0% (complete vinifera ancestry across the haploid genome length). "Beta" was estimated to have 11% vinifera ancestry according to Sawler's coefficient [57] versus either 17.4% % (over the diploid genome length) or 0% (complete vinifera ancestry across the haploid genome length). Our method exploits multilocus genotype information that capitalizes on the power of haplotypic blocks of nonvinifera polymorphisms to infer local ancestry, providing physical length estimates. As a result, the output for each individual genome is easily converted into a graphical layout representing the introgression map on the grape karyotype-an in silico version of chromosome painting [102]. Reconstructed ancestries and karyotypes can be used to assign individual genomes to arbitrarily defined taxa on a quantitative basis, providing ground to grape breeders, national regulatory bodies and policymakers for an evidence-informed revision of current classificatory systems and wine laws. In addition to the commercial implication for the wine industry, the species assignment problem is a threat for the validity of the scientific research that is based on comparative analyses of genetic resources. The application of this method unveiled inaccuracy in the taxonomic attribution of plant specimens, which had gone undetected in the absence of a careful inspection of their sequence data, spoiling the robustness of population genetics estimates in V. vinifera [54] and providing possible ground for error propagation [93,103].
Lastly, we used the introgression maps and localancestry inference of disease resistant varieties to offer grape breeders a genome-based revision of the historical achievements, i.e. the extent of recovery of the recurrent genome, and limitations of empirical breeding, i.e. the persistent linkage drag around resistance haplotypes. In light of the genomic consequences of introgressive breeding and the behavior of target gene-carrier chromosomes and non-target chromosomes that we showed here, it will be possible to optimize the breeding design for generating mosaic genotypes of beneficial wild alleles and vinifera DNA with minimal retention of wild background. Real and simulated data suggests that the highest recovery of the V. vinifera genome can be more efficiently achieved by prioritizing the elimination of the linkage drag around the ETI-genes on the target donor chromosomes and genomics-assisted selection is necessary for accomplishing this task. There is compelling evidence for this need because all target R-loci examined in the resistant varieties of this study are located in low recombinogenic regions (Figs. S11, S13, S15, S17). Data also suggest that pyramiding strategies are more likely to be accompanied by the expected reduction of the background genome if each donor genome is backcrossed stepwise prior to stacking rather than by schemes of simultaneous or convergent backcrossing. With the method presented in this paper, the estimated physical length of each introgressed region, including those spanning known resistance loci, relies on the level of completeness of the grapevine genome assembly that serves as a reference. We obtained all our estimates using the most widely used version of the genome assembly that had been obtained from a nearly-homozygous line [104] (PN40024, version 12X.v0). This assembly is haplotype-resolved by definition except for a few regions of residual heterozygosity and allelic scaffolds originating from heterozygous regions are not included into the chromosome pseudomolecules. Minor assembly issues in the reference genome and structural variation due to presence/absence of dispensable DNA in each individual affect the physical length of each genomic window used for ancestry assignment, which is here calculated based on the reference genome and assumed for further calculations to be invariant among individuals. We expect that these unaccounted sources of variation act randomly in direction and magnitude in every genomic window with a negligible net effect on the genome-wide estimate of vinifera ancestry. We showed that vinifera ancestry estimates differ in both directions in a validation set of 12 introgression lines (Table S6), with a median variation of −0.02% and with a maximum discrepancy of −2.1% in a single variety (Chambourcin), using another version of the reference genome assembly [105] that included more sequence scaffolds into the chromosome pseudomolecules resulting in a 5.6% increase of anchored nucleotide sequence. While underscoring the accuracy of our current vinifera ancestry estimates at the present state of knowledge, we acknowledge that the method presented here will provide ultimate precision once telomere-to-telomere haplotype-phased assemblies of multiple grapevine genomes are going to be available to the grapevine community.

DNA and RNA sequencing
Plant material and publicly available sequences used in this study are listed in Table S1 and Table S7. DNA libraries were generated according to the procedure described in Supplementary Note 1 and 100 bp pairedend DNA reads were obtained using an Illumina HiSeq 2500 sequencer. RNA was extracted from apical leaves using the Spectrum Plant Total RNA Kit (Sigma-Aldrich, Saint Louis, MO). RNA libraries were generated using the Universal Plus mRNA-Seq Library Preparation Kit (Tecan Genomics, Redwood City, CA) and 150 bp paired-end RNA reads were obtained using an Illumina NovaSeq 6000 sequencer.

Bioinformatics analysis
Genomic coordinates refer to the V. vinifera "PN40024" 12Xv0 genome assembly (GCA_000003745.2). DNA reads were aligned with the reference genome using the Burrows-Wheeler Aligner [106]. Uniquely mapping DNA reads were retained with a mapping quality >10. RNA reads were aligned using STAR [107]. Uniquely mapping RNA reads were retained with a mapping quality of 255. Raw variants were called using the UnifiedGenotyper tool in GATK [108] version 3.3-0 with 0.01 heterozygosity parameter. Non-vinifera SNPs were defined as variant sites exceeding the diversity inventoried in the vinifera diversity panel [53]. Chromosome sequences were segmented into 2367 genomic windows of variable length, containing 100-Kb of non-repetitive DNA. Non-vinifera SNP thresholds for classifying window with or without introgression, distributions of false vinifera and false nonvinifera discovery rates (Figs. S1-S3), DNA-Seq and RNA-Seq cross-validation (Table S8 and Figs. S27-S29), serial downsampling to simulate variable sequencing depths (Fig. S4), and IBD sharing of wild haplotypes (Fig. S30) are described in Supplementary Note 1. Identity-bystate ratio (IBSR H ) was calculated using the following formula (IBS2 + IBS1)/ (IBS2 + IBS1 + IBS0). Gene models refer to the V2.1 gene prediction [109]. Sample curation is described in Supplementary Note 2 and Figs. S31-S43.

Simulations
Individual-based simulation of forward-in-time backcross generations was performed using the computer codes carebBC [110], R-gene target positions as defined in [22,[26][27][28]70], 200 simulated individuals per generation and one individual carrying the target donor segment selected for the subsequent generation. In order to obtain a segmentation of the whole genome, we used 444 real markers with known recombination rates [105] and 514 simulated markers filling gaps and covering the telomeric ends (Supplementary Note 1). The output of carebBC was converted from genetic length to physical length.

Expert preference testing
Expert panels were formed by wine professionals active in production, commerce, journalism, science and education as defined by [66]. The expert panels performed blind testing on randomly ordered wines from standardized microvinification of grapes harvested from introgression lines and controls grown under the same conditions (see Supplementary Note 1).