European traditional tomatoes galore: a result of farmers’ selection of a few diversity-rich loci

The high phenotypic diversity observed among European traditional tomato varieties was created by traditional farmer-driven selection by inadvertently combining a very few polymorphic loci subjected to balancing selection.


Introduction
The widespread crop plant tomato (Solanum lycopersicum L. var. lycopersicum; SLL) originated in Mesoamerica, in a region corresponding to present-day Mexico, as a result of domestication of the ancestor S. lycopersicum L. var. cerasiforme (SLC) (Blanca et al., 2012(Blanca et al., , 2015Razifard et al., 2020). Tomato was later imported to Europe. The Italian botanist Mattioli described varieties with flat, round, and segmented fruit types in 1544 (McCue, 1952). This indicated that tomato fruits of various shapes had probably arrived to Europe from America (Sahagún, 1577;Luckwill, 1943;Sanfunentes Echevarrría, 2006). The tomato was not immediately adopted for consumption by Europeans as it was considered at different times and regions as poisonous, an aphrodisiac, ornamental, valuable for sauces and soups, a miracle cure, and, finally, a fresh salad ingredient (Harvey et al., 2003). It was only as late as the mid-19th century that the tomato became a regular component of the diet in Britain and North America (Harvey et al., 2003). By contrast, the tomato was better received, extensively cultivated, and consumed as food by the 18th century in Southern Europe, which therefore could have become a secondary center of diversity (Boswell, 1937;Bauchet and Causse, 2012). As a result of this long tradition of use, a large number of traditional varieties of tomato are currently available along the Mediterranean basin, showing an impressive phenotypic diversity in terms of fruit appearance, adaptation to local conditions, and culinary use. Despite the interest in unveiling the population history and the processes that gave rise to the domestication of tomato (Blanca et al., 2015;Razifard et al., 2020), there are as yet no detailed genetic analyses of the diversification history of the European traditional tomato varieties.
The extent and type of the molecular variation in the tomato clade has been extensively analyzed in previous studies. The first molecular studies, carried out with isoenzymes, determined that the worldwide cultivated SLL was less variable than the wild Solanum pimpinellifolium (SP) and that the ancestral SLC (which includes wild, feral, and semi-domesticated accessions) was genetically closer to SLL than to SP (Rick et al., 1974;Rick and Fobes, 1975). A clear trend of diversity reduction was observed at the species/subspecies level, probably due to bottlenecks associated with migrations and to the selection pressure imposed by humans during the early stages of domestication and the development of cultivars from SP to SLC, and, lastly, to SLL (Blanca et al., 2012(Blanca et al., , 2015Razifard et al., 2020).
Several molecular studies have unveiled the worldwide genetic structure within SLL and divided it into four major groups: processing, fresh market, cherry, and traditional tomatoes (Williams and Clair, 1993;Robbins et al., 2011;Sim et al., 2011;Casals et al., 2019). The first three groups corresponded to modern tomato varieties created by breeders in the 20th century, characterized by their different culinary uses and by the introgression of genes from wild species, mainly to increase disease resistance and also to develop new cultivars. In contrast, traditional cultivars (in this study, vintage, landraces, and heirlooms are considered synonymous with traditional) are defined as those that were developed by traditional farmers by intuitive breeding and that were cultivated (indeed, some of them are still cultivated locally nowadays) before the advent of systematic breeding programs. Park et al., (2004) found genetic differentiation between traditional and modern cultivars. More comprehensive analyses using the Solanaceae Coordinated Agricultural Project (SolCAP) genotyping panel, consisting of 7720 single nucleotide polymorphisms (SNPs) (Sim et al., 2012a), confirmed the previously described fresh, processing, and traditional groups (Blanca et al., 2012(Blanca et al., , 2015Sim et al., 2012a). These studies also found two extra clusters, located between SLL and SP, corresponding to cultivated and wild cherry tomatoes (Sim et al., 2012a), and clarified the status of the cherry tomatoes (Blanca et al., 2012(Blanca et al., , 2015-some of them being SLC originating from South America, Mesoamerica, and the subtropical regions, and the others being modern cherry tomatoes obtained by hybridizing cultivated SLL with wild SP. In addition, rarefaction analysis revealed that traditional SLL and SLC from outside Peru and Ecuador had much lower genetic diversity than Peruvian and Ecuadorian SP and SLC (Blanca et al., 2015).
The above-mentioned studies differentiated the modern varieties from the traditional ones, but none of them found any structure within the traditional tomato group. A few studies have addressed the differentiation of traditional variety groups, mostly among Spanish (García-Martínez et al., 2006) and Italian collections (Mazzucato et al., 2008;Sacco et al., 2015;Esposito et al., 2020). These studies have focused on a limited number of accessions from a narrow local diversity, and therefore, a broader view is clearly needed to better understand the history and relationships of the European traditional varieties.
In the present study, the genomes of 1254 European tomato accessions collected from Southern European seed banks were partially sequenced by genotyping by sequencing (GBS) (Baird et al., 2008;Elshire et al., 2011). The genetic structure, diversity, and association between the polymorphic loci and the morphological variations in these accessions were analyzed to shed light on the history of the make-up of the diverse traditional European tomatoes.  ARO, Rehovot, Israel). An additional set of 110 accessions, obtained from the COMAV gene bank (COMAV-UPV, Valencia, Spain), was used as wild and semi-domesticated accessions. This set contained 10 wild accessions from the Galapagos Islands, one accession of each wild species Solanum habrochaites, Solanum chmielewskii, and Solanum peruvianum, 39 S. pimpinellifolium accessions from Peru (SP) and North Ecuador (SP_NECu), and 34 SLC accessions, 19 traditional accessions, four modern accessions, and one S. pimpinellifolium × S. lycopersicum hybrid (SP×SL), corresponding to cherry cultivars and other crosses between the two species. Passport data can be found in Supplementary Table S1. The germplasm collection was extensively phenotyped in the TRADITOM project (Pons et al., 2017). The dataset corresponding to fruit morphology and color traits obtained in a trial at HUJI-ARO was used.

DNA extraction, library preparation, and sequencing
Genomic DNA was isolated from young leaves of 5-10 seedlings per accession, using the DNeasy 96 Plant Mini Kit (Qiagen, Germany). Genotyping by sequencing (GBS) was performed following the procedure reported by Elshire et al., (2011). Briefly, DNA was digested with the restriction enzyme ApeK I, barcoded libraries were prepared to track each accession, and the DNA sequence corresponding to the region flanking the ApeK I site was obtained on an Illumina HiSeq 2000 platform by LGC Genomics GmbH (Berlin, Germany). Following the Variant Call Format standard, we use the term 'sample' to refer to one genotyping experiment from one accession. The sequence data can be found in NCBI (https://www.ncbi.nlm.nih.gov/sra) under accession number PRJNA722111.

Read mapping, SNP calling, and SNP filtering
FastQC was used to evaluate the quality of the sequenced reads. Highquality reads were mapped against the S. lycopersicum genome build 2.50 (Sato et al., 2012) using BWA-MEM (Li, 2013, Preprint). After mapping, in order to avoid possible false positives caused by misalignment, the PHRED quality of three aligned nucleotides from each read end was set to 0 (Li, 2011). Mapping statistics were calculated with the samtools stats command (Li et al., 2009).
SNP calling was carried out by freebayes (Garrison and Marth, 2012, Preprint) using the following parameters: a minimum mapping quality of 57, 5 best alleles, 20 minimum base quality, 0.05 maximum mismatch read alignment rate, 10 minimum coverage, 2 minimum alternate allele count, and 0.2 minimum alternate fraction. To avoid regions with potential assembly problems in the reference genome, the Heinz 1706 reads used to build the reference genome were mapped against the reference assembly version SL2.50. A 50X mean coverage was obtained. Any region having a coverage higher than 200X was removed from the SNP calling.
SNP and genotype processing were carried out by using the Python variation library deposited in the Zenodo repository (https://doi. org/10.5281/zenodo.5783972). All scripts developed for the analyses have been also uploaded to Zenodo (https://doi.org/10.5281/ zenodo.5785413). To create the tier1 SNP set to be used in the rest of the analyses, the genotypes with a quality lower than 5 were set to missing. The variants with SNP quality lower than 50, an observed heterozygosity higher than 0.1, and a call rate lower than 0.6 were filtered out. Moreover, in order to avoid false positives, only variants in which the minor allele was found in more than two samples were kept. This filtering was carried out with the 'create_tier1.py' script. For some analyses, the pericentromeric regions, which seldom recombine, were removed. To locate the pericentromeric regions, a piecewise regression analysis was applied to the relationship between the genetic distance and the physical positions of the markers of the EXPIM map (Sim et al., 2012b). Regression analyses were done using the segmented R library (Muggeo, 2003). The calculated pericentromeric regions were: chromosome 1, from 5488553 to 74024603; chromosome 2, from 0 to 30493730; chromosome 3, from 16493431 to 50407653; chromosome 4, from 7406888 to 50551374; chromosome 5, from 9881466 to 58473554; chromosome 6, from 3861081 to 33077717; chromosome 7, from 4056987 to 58629226; chromosome 8, from 4670213 to 54625578; chromosome 9, from 6225214 to 63773642; chromosome 10, from 3775719 to 55840828; chromosome 11, from 10947270 to 48379978; and chromosome 12, from 5879033 to 61255621.

Principal coordinate analysis and genetic structure, diversity, and linkage disequilibrium
The genetic structure and the division in subpopulations were determined by conducting a series of hierarchical principal coordinate analyses (PCoAs). The PCoAs were carried out with a subset of the variants after filtering. The variant filtering process consisted of several steps. First, only the euchromatic variants were considered, and from those, only the ones with a call rate lower than 0.95 were selected. From this set of variants, the ones in which the minor allele was present in fewer than three samples were removed. For the remaining variants, 2000 evenly distributed across the genome were selected. Furthermore, to avoid overrepresentation of large regions in complete linkage disequilibrium (LD), when several consecutive variants had a correlation higher than 0.95, only one of them was kept. Finally, pairwise distances between samples were calculated (Kosman and Leonard, 2005). PCoA (Krzanowski, 2000) was generated from the distance matrix, following the pycogent implementation. These methods were implemented in the do_pca.py script. Additionally, the genetic structure was also estimated with fastSTRUCTURE (Raj et al., 2014).
The observed and expected heterozygosity and the number of variants per genetic group were calculated considering only the variants variable in the samples involved in the analysis. The script that implemented these analyses is calc_diversities2.py. The allele spectrum figure was plotted by the script calc_maf_trends.py and the rarefaction curves by rarefac-tion_analysis.py.
The LD was calculated between euchromatic markers with a major allele frequency lower than 0.98 following the Rogers and Huff method for loci with unknown phase (Rogers and Huff, 2009).

Detection of introgressions
To find introgressions typical of modern cultivars in the traditional varieties, a haplotype analysis was performed. The genome was divided into windows and, in each one, the Kosman distances were calculated from the non-traditional samples to the haplotypes found in the traditional samples. When the analyzed non-traditional sample haplotype had a non-zero distance from any of the traditional ones, it was marked as distant from the traditional collection.

Genome-wide association study and allele frequencies
A heatmap plot that represents the major allele frequency in each group was generated according to a dendrogram by the method implemented in the Python seaborn library (Waskom, 2021) and was plotted by the get_most_diverse_snps.py script.
A genome-wide association study (GWAS) was carried out with the Genesys R package (Gogarten et al., 2019) on the set of polymorphic variants (95% threshold). The quantitative characters were normalized by using the Box and Cox transformation implemented by the Python scipy library (Virtanen et al., 2020). The character normality was checked with a qqplot plotted by the Python statsmodels library (Seabold and Perktold, 2010). The correction for genetic structure was calculated with a principal component analysis on the filtered variants implemented by the SNPRelate R library with a 0.3 LD threshold (Zheng et al., 2012). The quantitative trait associations were tested with the Wald method, and the binomial ones by the Score method. To account for the multiple tests, a Bonferroni threshold was applied. The step-by-step implementation of the GWAS analysis can be analyzed in the gwas.py script.

Genetic group distances
Nei and Dest Smouse, 2006, 2012) genetic distances among groups were calculated and compared. They were implemented by the Python variation library and the cacl_pop_dists.py script. From those distances both a neighbor-joining tree and a split network were calculated using SplitsTree (Huson and Bryant, 2006).

High-throughput genotyping of a European traditional tomato collection
To genetically characterize traditional European tomatoes, a total of 1254 tomato accessions were used (Supplementary   Table S1). This set, which included an extensive representation of the extant European traditional tomato variability, comprised 506 accessions from Spain, 305 from Italy, 203 from Greece, 96 from France, and 58 from other origins, as well as 25 modern commercial cultivars and 39 SP and 22 SLC accessions (the last two being of American origin). A total of 3700 million reads with a mean phred quality of 33.5 were obtained after GBS, providing an average of 2.9 million reads per sample. Out of these, 99.0% were successfully mapped to the tomato reference genome (v2.50), but only 55.9% were kept after applying the MAPQ filter with a threshold of 57. These reads were mostly properly paired (96.1%). From all genomic positions that comprise the reference genome, 0.79% had an average sequencing coverage per sample higher than 5X, 0.46% higher than 10X, and 0.21% higher than 20X. The complete sequencing and mapping statistics for all samples are available in Supplementary  Table S3, and the number of positions per megabase with more than five reads in at least 90% of the samples is presented in Supplementary Fig. S1. Finally, 448 121 variants were called by freebayes and, after filtering them, a working dataset of 64 943 variants was created.

Genetically defining true European traditional tomatoes and their relationship with American relatives
To genetically position the European tomato collection relative to South American and Mesoamerican germplasms, which represent early domestication and improvement steps (Blanca et al., 2015), the variability of European traditional tomatoes was analyzed together with SP, SLC, SL×SP hybrids, and a sample of modern cultivars. A series of PCoAs (Figs 1, 2) was performed comparing the traditional and modern collections. The genetic classification based on these PCoAs can be found in Supplementary Table S1 under the header rank1 classification.
The PCoA (Fig. 1A, B) showed that the green-fruited and Galapagos wild species, SP, SP_NEcu, Ecuadorian SLC (SLC_ Ecu), Peruvian and Mesoamerican SLC (SLC_Peru_MA), and several SP×SL formed a series of clusters that were clearly separated from the modern and European traditional tomatoes (Fig. 1A, B). The SLC_Peru_MA group was the closest to the European traditional tomatoes. To obtain further insight into the genetic architecture of the European tomatoes, the genetic data were analyzed by using fastSTRUCTURE (Raj et al., 2014). The model marginal likelihoods reached a plateau by four populations (Supplementary Fig. S2). By comparing with the PCoA classification, the four fastSTRUCTURE populations were found to correspond to SP, modern tomatoes, and two distinct traditional populations ( Supplementary Fig. S2).
A continuous gradient from traditional to modern, rather than clearly split groups, was observed in the PCoA plots ( Fig. 1A, C, D). To define the limits between modern and traditional in the PCoA, we chose Heinz 1706 as the reference (indicated in pink in Figs 1, 2), since it was one of the first tomato varieties reported to include introgressions from wild Solanum species on chromosomes 4, 9, 11, and 13 (Sato et al., 2012;Causse et al., 2013;Menda et al., 2014), typical of modern cultivars carrying mainly disease resistance genes.
PCoA-based classifications indicated that a total of 24.9% of the accessions labelled as traditional according to their passport data mapped outside the traditional genetic cluster but within the modern and SP×SL genetic groups (Fig. 1) Several accessions initially catalogued as traditional that mapped close to the modern accessions in the PCoA space were consistently found to include haplotypes not present in the traditional group ( Supplementary Fig. S3) and thus were reclassified as modern accessions (Supplementary Table S1).
The modern accessions (including both modern references and the traditional accessions reclassified as modern) were spread across the PCoAs according to their use (fresh or processing) and their degree of introgression (Figs 1C, D, 2; Supplementary Fig.  S3). PCoAs applied only to the modern accessions resulted in four groups (Fig. 2): modern processing (the most distant group to Heinz 1706), modern long shelf-life (LSL) and processing (the closet to Heinz 1706), modern fresh 1 (between modern processing and Heinz 1706), and modern fresh 2 (closer to Heinz 1706). Each group was characterized by the presence of different introgressions in several chromosomes ( Supplementary Fig. S3).

Diversity among European traditional tomatoes
European traditional tomatoes are usually considered to have low genetic diversity (Blanca et al., 2015). Therefore, it was important to calculate the number of polymorphic variants present in our collection of European traditional tomatoes, the largest collection analyzed by sequencing thus far, and to compare it with the variability present in the wild SP, the wild and semi-domesticated SLC, and the modern cultivars. The total number of variants within the European traditional collection was 26 129, larger than the number found in SP (19 164), SLC (7690), or the materials classified as modern (17 328). However, this comparison could be biased in favor of the traditional collection because of the larger number of traditional samples (890) compared with SP (24), SLC (42), and modern (243).
To correct for this factor, diversity indices were calculated with the same number of samples (20) (Fig. 3A, B) and the analysis was repeated 100 times with a different set of 20 samples chosen at random each time. Both the Nei diversity and the percentage of polymorphic variants (with a 95% threshold) was much higher in the wild SP than in any other group, and both indices were the lowest, by far, in the traditional group. The analysis indicated that many of the variants found in the traditional collection could not be considered polymorphic. At a 95% threshold, the traditional collection contained only 298 polymorphic variants. This scarcity in polymorphic variants in the European traditional group can also be observed in the allele frequency spectrum (Supplementary Fig. S4), in which it is clear that most variants in the traditional collection were almost fixed.
To better compare the amount of genetic variability in each major cultivated group (SLC_Peru_MA, traditional, and modern) a rarefaction analysis was carried out (Fig. 3C). The number of variants found in the traditional group was always lower than in the modern and SLC_Peru_MA groups, but the total number of variants within the traditional collection kept increasing as more samples were added. However, the number of polymorphic variants stabilized with a few samples. Conversely, the Nei diversity decreased (Supplementary Fig.  S5) when more samples were added. This decrease was due to the high number of variants found within the traditional group that were close to fixation.

Linkage disequilibrium
The LD was calculated for SP, SLC from Peru and Mesoamerica, modern, and European traditional varieties, those genetic groups with enough polymorphic markers [minimum allele frequency (MAF) >0.02 threshold] (Supplementary Fig. S6). The wild SP showed the lowest LD (r 2 =0.42) at 5 kb and was the group in which LD decreased the fastest, being only r 2 =0.2 at 25 kb. In SLC, r 2 was 0.8 at 5 kb and 0.4 at 1000 kb. Traditional tomatoes had the highest LD at 25 kb (r 2 =0.97); however, it decreased to the lowest value (r 2 =0.05) at 1000 kb. The modern accessions had a high LD both at 25 kb (r 2 =0.9) and at 1000 kb (r 2 =0.6). The LD found at 1000k kb is likely due to population substructure. The SLC and modern groups had high long-range LDs, perhaps because the modern group included both fresh and processing accessions, which were clearly separated in the PCoAs, and SLC contained accessions from Peru and Mesoamerica, two geographically distant areas. Additionally, modern cultivars often contain introgressions from wild species, including disease resistance genes, that span large regions for which recombination is usually suppressed. SP is also known to have a clear population structure (Blanca et al., 2012) and also showed some long-range LD, which clearly supports the conclusion that LD is due not just to gamete disequilibrium, but to other causes too. The traditional accessions showed the lowest LD at 1000 kb, perhaps because this group has a less remarkable population substructure.

Classification of traditional tomato clusters
To further classify true traditional tomatoes, a series of PCoAs ( Supplementary Fig. S7) was performed. A genetic group was created when several samples that grouped together in the PCoAs shared their geographic origin or traditional variety name, or some aspect of their phenotype (Supplementary Table  S2), for example, fruit shape and size. Most traditional samples could be classified into 27 different genetic groups by using this PCoA strategy (Supplementary Table S2). The genetic classification of traditional tomatoes based on these PCoAs can be found in Supplementary Table S1 under the header rank2 classification.
Two connected clusters of genetic groups (for the sake of clarity we will use 'group' to refer to a PCoA group of samples and 'cluster' to describe a cluster of groups) were observed in PCoA (Supplementary Fig. S7A, B). Within the cluster at the center of PCoA, we found seven Spanish genetic groups, two Italian, and one Greek ( Supplementary Fig. S7C, H), together with 'Marmande', 'Bell pepper' and 'Palosanto pometa 1' (Fig. 4A), which were represented in all four Mediterranean countries (Spain, Italy, France, and Greece). Outside the central cluster, but close to it, we found groups of large tomatoes from all four countries ( Supplementary Fig. S7C, D; Fig.  4A, B). A second cluster included mostly Italian accessions and some Greek and Spanish accessions, all characterized by having a small size with no or weak ribbing ( Supplementary Fig. S7A, B, M-R; Fig. 4A, B). In summary, the PCoA separated traditional accessions mainly by country of origin and fruit size. It is interesting to note that the LSL-type accessions, which were highly represented in the collection, were not grouped together, but rather segregated by country: the Italian LSL varieties were found within the Italian cluster, and the Spanish LSL within the Spanish cluster. Several accessions located between the Spanish and Italian clusters could not be grouped by passport data or any other characteristic.

Allele frequencies across the genome in traditional groups and their relationship with phenotypic diversity
The clustering of the traditional genetic groups based on a distance tree was calculated using the polymorphic variants (Fig. 4A). The defined genetic groups had quite distinct allele frequencies along the genome. Concomitantly, the genetic groups also showed enrichment in specific phenotypic characteristics related to their horticultural classification (Fig. 4B). Furthermore, several noticeable clusters of genetic groups with common phenotypic traits could be observed. For instance, there was a cluster formed by small-fruited, slightly ribbed, LSL and processing Italian genetic groups, which included the well-known Italian 'da Serbo' and 'San Marzano' tomatoes. Another cluster was constituted mainly of LSL colourlessskinned Spanish tomatoes. Close to this cluster were some of the most typical Spanish traditional fresh-market varieties.
Genetic differences between the groups could be due to genetic drift or either inadvertent or conscious selection by traditional farmers. In order to elucidate whether the differentiating variants were associated with the phenotypic variation, a GWAS analysis was carried out using selected fruit traits (Fig. 4C).
Two of the main phenotypic characteristics differentiating the traditional tomatoes are fruit weight (fw) and ribbing (Fig.  4B). In the GWAS analysis, fruit weight was associated with variants on chromosomes 1, 3, and 11. The MAF analysis indicated that most of the small-fruited tomatoes shared the fixation of the same allelic variant in chromosome 1. The pattern found in chromosome 3 was similar. The position of candidate quantitative trait loci (QTLs) and genes is depicted to the right of the association columns in Fig. 4C. In contrast, almost all medium and large tomatoes had a fixed common variant in chromosome 11 that was associated by GWAS with fruit weight, ribbing at the calyx end, and fruit shape index.
Another trait differentiating traditional tomato cultivars was skin colour; for instance, most Spanish LSL tomatoes as well as tomatoes included in the 'Cor de bou', 'Montserrat', and 'Pera girona' genetic groups had colourless skin, which resulted in pinkish fruit (Fig. 4B). GWAS found associations with pink color in chromosomes 1 (two regions), 3, 5, and 10. Comparison of the GWAS and MAF analyses (Fig. 4A, C) showed that different pink genetic groups had different allelic compositions in the associated genomic regions, which might reflect a complex genetic control.
Fruit shape was associated with regions in chromosomes 2, 5, 10, and 12. The region in chromosome 2 was fixed in 'Marmande' and 'Scatolone di bolsena', two groups that are well known for having flat fruits. High-frequency minor alleles, almost fixed in the regions associated with fruit shape in the GWAS, were also observed in other genetic groups, such as the Italian 'LSL da Serbo', in chromosome 5, and 'Ita ellipsoid' and 'Tondo Piccolo', in chromosome 6, as well as in the 'Cor de bou' and 'Pimiento' groups, in chromosome 12.
In the case of use, associations were found in chromosomes 10 and 11, but, in this case, no clear relationship was found between allelic frequencies among the tomato genetic groups and GWAS.

Network analyses support the differentiation between Spanish and Italian traditional tomatoes and the occurrence of hybridization events in traditional tomatoes across Europe
To study the genetic relationships between accessions and groups of accessions, a network based on pairwise Dest group distances was created with SplitsTree. Evolutionary relationships are often represented as a unique tree under the assumption that evolution is a branching or tree-like process (Huson, 1998). However, real data do not always clearly support a tree. Phylogenetic split decompositions represented in a network may be evidence for conflicting reticulated phylogenies due to gene flow and/or hybridization (Huson, 1998).
The SplitsTree network of European tomato is depicted in Fig. 5. Like in the PCoAs ( Supplementary Fig. S7), the group organization in the network (Fig. 5) was structured in two main country-related clusters. One cluster was composed mainly of Spanish traditional groups, which included the Spanish LSL, and the other cluster was composed mostly of the small-fruited Italian LSL and processing groups. Interestingly, the 'Liguria' group, composed mainly of Italian accessions (Supplementary  Table S1), clustered with Spanish clusters, although the branch that linked it with the core Spanish clusters was quite large.
The degree of reticulation found (Fig. 5) suggested that hybridizations might have occurred between the ancestors of accessions collected from the same geographic regions. On the other hand, the groups that included accessions from different countries, such as 'Marmande', 'Pimiento', 'Cor de bou', or 'Palosanto pometa 1', were located between the Spanish and Italian clusters.
These groups of mixed origin could be more modern and derived from hybridization from older Spanish and Italian varieties or, alternatively, they could be very old varieties found across Europe before the Spanish and the Italian diversification started. To test these possibilities, a rarefaction analysis was performed of the number of polymorphic sites found in these three clusters (Supplementary Fig. S8). The number of polymorphic sites was clearly higher in the Italian and Spanish clusters and much lower in the mixed-origin cluster, evidence that supports that Spain and Italy were secondary centers of diversity for the European tomato, whereas the varieties included in the mixed cluster would be more recent.

Very low, but discriminant, variation in traditional European tomatoes
The genetic diversity of the European traditional collection was very low compared with the diversity found in SP or SLC, in agreement with previous surveys on worldwide SLL accessions (Blanca et al., 2012(Blanca et al., , 2015Sim et al., 2012a). Nevertheless, the current analysis represents the first estimate obtained using a comprehensive representation of traditional European tomatoes, and it is relevant to study the role of Europe as a secondary center for tomato diversification. The low level of diversity found in these traditional materials was quite striking and remarkable when we consider the high phenotypic diversity of traditional tomatoes. Moreover, the high LD found in the traditional varieties suggests that it is rather unlikely that the total number of polymorphic blocks would grow much even if whole genome sequences were to be obtained.
Previous studies demonstrated a strong bottleneck during the SLC tomato's travel from Ecuador and Peru to Mesoamerica (Lin et al., 2014;Blanca et al., 2015;Razifard et al., 2020). However, despite the low genetic diversity found in traditional European tomatoes, there are still a few highly polymorphic loci within this gene pool. Some of this variation could be due to the random nature of genetic drift. However, the association study revealed that a sizeable fraction of those diverse loci was associated with the traditional fruit phenotypic/morphological variation. Therefore, it is quite likely that many of those polymorphic loci had been under balancing selection (Delph and Kelly, 2014) during the diversification process and were, in fact, responsible for a sizeable part of the tomato phenotypic variation or, at least, in LD with the variants selected. It may seem paradoxical that the high diversity of shapes, colors, sizes, uses, and other agronomic traits in the traditional group could be maintained by such a poor gene pool, but it seems that the selection carried out by the traditional growers in favor of this agronomic diversity resulted in a desert of variation, with just a handful of scattered polymorphic loci. This is consistent with two highly polymorphic SNPs found in the lc locus  surrounded by loci with 'drastically reduced' diversity. Thus, the high polymorphism seemed to be the result of divergent selection for a low or high number of locules in different cultivars.
Recently, structural variants were studied in tomato using new long-read sequencing technologies and new analysis algorithms (Alonge et al., 2020;Domínguez et al., 2020). A large number of structural variants were identified and were mostly generated by transposons and related repeats. Similar to the variants studied here, most structural variants had a very low frequency, and the majority were singletons. Therefore, the phenotypic diversity present in European traditional tomatoes seems to have been built by remixing, reshuffling, and swapping very few polymorphisms with the selection pressure associated with the creation of new varietal types and the adaptation of these types to different regional environments.

History of tomato movement in Europe
The distribution of the genetic variability in the European traditional tomatoes showed mostly a continuous gradient. However, the Spanish and Italian varieties occupied opposite regions of the PCoA space, which supports a genetic differentiation among varieties originating in those countries. The lack of clear-cut limits might be due to the limited resolution of the current GBS analysis; resequencing of the traditional varieties would provide further details. Nevertheless, we do not expect a different general conclusion given the high LD found among the European traditional tomatoes. Additionally, Spain and Italy share a long common history, which also supports that the lack of limits would be due to migrations between different regions and countries and subsequent intercrossing. Despite this difficulty, the genetic traditional groups proposed here were differentiated by characteristics such as their main geographic origin, use, fruit morphology, and varietal name. The genetic groups sometimes corresponded with the varietal type (e.g. 'Valenciano', 'Muchamiel', 'Penjar'). However, the match between the proposed genetic group and the sample varietal name was seldom complete (the 'Cor de bou' group included two 'Valenciano', one 'Russe', and one 'Costoluto' samples). This may be due to the limitations of the genetic classification methodology utilized or to erroneous passport data. Other genetic groups, such as 'Italian small', showed no clear associations with any varietal name. Finally, cultivars previously classified as belonging to the same variety, such as 'Marmande', were included in many different genetic groups. It is likely that the popularity of some varietal types, such as 'Marmande' (typical large and multi-locule tomatoes), made some growers prone to apply the label to any variety that displayed the typical morphological characteristics of a well-known varietal type.
One clear example of mistaken identity and/or inadvertent outcrossing is provided by some supposed traditional varieties that we reclassified as modern, which may either have been misclassified or correspond to a mixture between traditional and modern varieties. It is not trivial to define the borderline between traditional and modern varieties. One could think that until the 1950s most varieties were heirlooms and landraces maintained by small farmers, but the real history is more complex. When tomato cultivation was popularized in the 19th century in France, England, and the USA, some of the varieties were already provided by seed companies (Boswell, 1937). Seed shipments are documented between countries, for instance, from England to the Canary Islands (Amador et al., 2013). Moreover, from 1910 onwards, professional breeding efforts created new varieties adapted to long-distance shipping and for processing (Boswell, 1937). These efforts did not yet include wild materials, so the genetic diversity of early-bred accessions is not easy to differentiate from traditional diversity in a PCoA analysis. It was only after breeders started introgressing alleles from wild species for disease resistance that the genetic diversity was different enough to be easily differentiated in the PCoA analysis. In any case, the traditional-modern distinction has to be somewhat conventional, although a characteristic of modern cultivars compared with traditional varieties is the introgression of genes from wild species. Therefore, true traditional cultivars were defined based on the absence of wild species' haplotypes. Cultivars carrying any of those introgressions came from modern breeding programs or were the result of a cross with modern cultivars.
Most of these introgressions seem to be related to disease resistance genes such as the Cladosporium fulvum resistance gene Cf-2 on chromosome 6 and Tm-2 (resistance to tomato mosaic virus) on chromosome 9. It is likely that the modern genetic variability has been combined with the true traditional varieties, so that some materials catalogued in the gene banks as 'traditional' are in fact a mixture of traditional and modern. This is to be expected, as the seed collectors/gene banks label as 'traditional' any material considered as such by the farmer from whom the seeds were collected. Although European small farmers often save their own tomato seeds, they may occasionally purchase or obtain plantlets from markets or nurseries, or save seeds from modern varieties purchased in the market and introduce them into their fields. This may lead, after several years of reproduction and farmer selection, to complex hybridizations and mixings. Clearly, there have been many opportunities for introgressing modern haplotypes into the traditional materials, such as unintentional crosses. This leakage might have a positive unintended consequence of increasing the very low diversity of the traditional pool, and it is also the case that evolution consists of change and adaptation of local varieties (Casañas et al., 2017). Therefore, traditional and modern varieties have not been isolated, and genetic exchange has occurred between them from the early professional breeding to the present day. This genetic exchange may explain the continuous genetic gradient and the lack of clear split between modern and traditional varieties. The haplotypes described herein could be used for the identification of non-true European traditional tomatoes.
The allele-frequency-based tree (Fig. 4A) defined three major clusters: Spanish, Italian, and mixed origin. The mixedorigin groups were basal in the tree in Fig. 4A, have longer branch lengths, and occupy an intermediate position between the Italian and Spanish clusters in the Dest network (Fig. 5). These results are compatible with the hypothesis that Italy and Spain formed two centers of diversity. The differentiation of Italian and Spanish gene pools is exemplified by the LSL varieties from both countries. Italian and Spanish LSL varieties were clustered apart from each other, with only a small number of samples from the other country. So, it seems as if the origin of the LSL tomatoes in both countries was independent. The transformation from a fresh to an LSL variety is likely due to a limited number of loci, as observed in Fig. 4A, in which the Catalonian fresh 'Montserrat' type is closely related to the Catalonian LSL 'Penjar' type. Esposito et al. (2020) also observed geographic differentiation of the Italian and Spanish LSL varieties. Therefore, although there may have been migrations from Italy to Spain and vice versa, these may not have been extensive enough to erase the genetic differences between the Italian and Spanish varieties.
Regarding the mixed-origin cluster, the groups included in it were basal in the tree shown in Fig. 4A, have longer branch lengths, and occupy an intermediate position between the Italian and Spanish clusters in the Dest network (Fig. 5). Moreover, the rarefaction analysis supports that this cluster included varieties derived from the two secondary centers of diversity. This could be the result of the long tradition of tomato cultivation in Southern Europe, with the groups included in this cluster being developed from hybridizations between the two centers of diversity (i.e. Spain and Italy). New mutations, other introductions of tomatoes from America, or new genes from varieties developed worldwide might also be involved in the history of the groups of mixed origin.
A complex pattern of migrations can also be inferred in several genetic groups, such as the 'Cor de bou' group, which included varieties from France, Italy, and Spain. The Italian 'Spagnoletta' group was closely related to the 'Marmande' group constituted of French, Spanish, Greek, and Italian accessions. Other genetic groups with mixed geographic origin are 'Liguria', 'Pimiento', and 'Palosanto Pometa 1'.

Do a few polymorphic genes differentiate the true European traditional tomato genetic groups?
In order to shed light on the apparent contradiction between the low genetic diversity and the large phenotypic variation of European traditional tomatoes, a GWAS was carried out with the polymorphic variants and some of the most obvious morphological traits (fruit morphology, color, and ripening behavior). Variants located in the genomic regions of previously identified loci involved in fruit weight, and likely involved in domestication and diversification, were associated with this trait in the GWAS. Most of the small-fruited tomatoes shared fixed variation regions in chromosomes 1 and 3, which mapped close to previously described QTLs and genes associated with fruit size (Fig. 4A, B): fw1.2 (Grandillo et al., 1999) and fw3.2/ KLUH and ENO (Chakrabarti et al., 2013;Yuste-Lisbona et al., 2020). In contrast, almost all medium and large tomatoes shared a region in chromosome 11 that mapped close to FAS (Xu et al., 2015) and fw11.3/CSR (Mu et al., 2017), with both genes playing a known role in controlling fruit size and fasciation. No further associations were observed in other genomic regions for fruit weight, so it seems reasonable to think that these QTLs might be responsible, at least in part, for the variability in fruit size among the European traditional tomatoes. Regarding fruit shape, the associated region on chromosome 2 includes ovate (Liu et al., 2002) and other fruit-shape QTLs (Brewer et al., 2007), and the region on chromosome 10 is located close to the position where the original copy of the sun locus was found (Xiao et al., 2008). In the case of skin color, a different pattern was characteristic of different pink genetic groups. 'LSL Penjar vlc' and 'LSL ramellet' shared a variant at the end of chromosome 1 that matched a region that was previously associated with skin color, the colorless-peel y-locus (Ballester et al., 2010), while 'Pera Girona' had the minor allele for the other chromosome 1 variant, which is located at the beginning of the chromosome and maps close to the SlCMT3 (Gallusci et al., 2016) and PSY3 (Li et al., 2008) genes, involved in epigenetic ripening regulation and carotene biosynthesis, respectively.
The current analysis suggests that variability in fruit morphology among European traditional tomatoes could be the consequence of the combination of a relatively low number of genes, as suggested by Rodríguez et al., (2011), including fw3.2/KLUH, ENO, FAS, SUN, and OVATE. On the other hand, skin color could be a consequence of y-locus and other genes related to phenylpropanoid metabolism. Interestingly, structural variant mutations have been found in fw3.2/KLUH, FAS, and SUN, which supports the impact of structural variants on tomato phenotypic diversity (Alonge et al., 2020;Domínguez et al., 2020). In addition, some cryptic variation hidden in the Mesoamerican tomatoes may have emerged in European tomatoes after generating new combinations and divergent selection by the traditional farmers, as found for the jointless trait in tomato (Soyk et al., 2017(Soyk et al., , 2019Alonge et al., 2020).

Impact on gene bank and on-farm variability management
Many of the few polymorphic genetic variants within the very-low-diversity European traditional tomatoes appeared to be associated with phenotypic variation. This has implications for the conservation efforts carried out by gene banks. Thousands of European traditional tomatoes are maintained in many of these gene banks. However, the cost of these conservation efforts could be drastically reduced if only these few polymorphic loci were considered. Of course, such an approach would ignore most variants, the ones found in very low frequencies, but conserving these low-frequency alleles, which in many cases would be neutral and thus not associated with any phenotypic variation, requires a sizeable investment. An alternative would be to identify the alleles associated with a phenotype; however, this would require an exhaustive phenotypic characterization.
Most of the European accessions analyzed here were collected from farmers during the 1950s to the 1980s and, as landraces, they are appreciated, competitive, and cultivated varieties. The genetic diversity of many other crops has also been maintained as landraces that evolved on-farm. However, this diversity is continuously under threat by the introduction of new modern varieties derived from a limited gene pool that have replaced the traditional varieties. It is generally believed that most of the accessions in seed banks do not contribute to modern varieties (Tanksley and McCouch, 1997), and this is also the case for tomato. Our identification of the morphological and genetic structure present in the European traditional tomato gene pool will be important to guarantee access to that variability as the basis for the development of new varieties or evolved landraces in the future (Casañas et al., 2017).

Conclusion
The entrepreneurship of many local European farmers during the past 500 years has managed to create a very complex and diverse set of tomato varieties adapted to different local tastes and morphological preferences. These localized activities did not restrain farmers from importing other interesting novelties developed by other farmers elsewhere, thus generating a much larger set of varietal tomato types that are characterized by an exuberant diversity that serves as a variety for fresh, processing, and LSL uses. The current report shows that such a plethora of different types has been created from an original material devoid of genetic diversity, by exploiting very few polymorphic loci subjected to balancing selection.

Supplementary data
The following supplementary data are available at JXB online. Fig. S1. Number of genomic positions with high coverage and number of variants per Mb along the genome in all accessions. Fig. S2. FastSTRUCTURE analysis. Fig. S3. Introgressed regions along the genome detected in the modern genetic groups. Fig. S4. Major allele frequency spectrum in the traditional, modern, and SLC_Peru_MA groups. Fig. S5. Rarefaction analysis of the expected heterozygosity for each genetic group. Fig. S6. Genome-wide LD decay in wild, SLC, traditional, and modern accession groups. Fig. S7. Hierarchical PCoA of European traditional tomato varieties. Fig. S8. Rarefaction analysis of the number of polymorphic variants (95% threshold). Table S1. Accessions analyzed in this study. Table S2. Phenotypic characterization of European traditional tomatoes. Table S3. Sequencing and mapping statistics for each sample.