Genome-wide clonal variability in European pear “Rocha” using high-throughput sequencing

Abstract Pears (Pyrus) are one of the most economically important fruits worldwide. The Pyrus genus is characterized by a high degree of genetic variability between species and interspecific hybrids, and several studies have been performed to assess this variability for both cultivated and wild accessions. These studies have mostly been limited by the resolving power of traditional molecular markers, although in the recent past the availability of reference genome sequences or SNP arrays for pear have enhanced the capability of high-resolution genomics studies. These tools can also be applied to better understand the intra-varietal (or clonal) variability in pear. Here we report the first high resolution genomics analysis of a pear clonal population using whole genome sequencing (WGS). Results showed unique signatures for the accumulation of mutations and transposable element insertions in each clone, which are likely related to their history of propagation and cultivation. The nucleotide diversity remained low in the clonal collection with the exception of few genomic windows, suggesting that balancing selection may be occurring. These windows included mainly genes related to plant fertility. Regions with higher mutational load were partially associated with transcription factors, probably reflecting the distinctive phenotypes in the collection. The annotation of variants also revealed the theoretical disruption of relevant genes in pear. Taken together, the results from this study show that pear clones accumulate mutations differently, and that those mutations can play a role on pear phenotypes, meaning that the study of pear clonal populations can be relevant in genetic studies, mainly when comparing with traditional association studies.


Introduction
Pears (Pyrus) include several fruit tree species cultivated globally, being one of the most economically important fruits with nearly 24 million tonnes produced in 2019 [1].The Pyrus genus is characterized by a high degree of genetic variability, mainly promoted by the outcrossing nature of its species and their interspecific fertility, which together with the recent domestication and dissemination events, becomes a hard puzzle when it comes to establish boundaries and recognize different species.Broadly, the estimates of diversity for Pyrus vary between 50 and 80 species and interspecific hybrids [2,3], although there is a consistent number of around 20 primary, non-hybrid species that are accepted and vastly recognized [4].Cultivated Pyrus species are traditionally divided into two groups based on domestication area and geographic distribution.European pears, Pyrus communis, are cultivated mainly in Europe and the U.S. whereas Asian pears like Pyrus pyrifolia, Pyrus bretschneideri, Pyrus ussuriensis and Pyrus sinkiangensis are grown in East Asia.
The assessment of Pyrus variability has been done in the past using several tools, including molecular markers.These include studies on Asian pears, both wild and cultivated [5][6][7][8] as well as European pears [9][10][11][12].In the case of European pear, various studies have been performed to assess the genetic diversity of both cultivated and wild accessions, including studies from Portugal, Spain, Italy, Turkey, Bosnia and Herzegovina, Iran, Tunisia, among others.Despite their relevance on establishing the population structure and phylogeny between accessions, these studies were mainly limited by the resolving power of the few molecular markers used when compared to genome-wide high-density markers such as Single Nucleotide Polymorphisms (SNPs).
In the last decade, several efforts have been made to generate accurate reference genome sequences for pear species.Wu et al. first published in 2013 the draft genome of Pyrus bretschneideri [13], followed by the draft genome of European pear by Chagné and colleagues [14].Later, improved versions of these genomes were made available [15,16] as well as the genome sequence of P. pyrifolia [17], the genome of the interspecific hybrid "Zhongai 1" [18] and even the genome sequence of wild species Pyrus betuleafolia [19].Altogether, these genomes enable researchers to perform in-depth, high resolution studies at a genome-wide level, which surely will i) contribute to a better understanding of pear genomics, evolution and domestication and ii) play a role in allowing the development of new molecular tools to assist pear breeding.This was surely true for the work developed by Wu and colleagues [20] as, to date, it is the only analysis on Pyrus species at scale, using whole genome sequencing technology on 113 globally distributed pear accessions.Their work clarified the processes of divergence and domestication of both Asian and European pears, and it represents the most exhaustive piece of work on the evolutionary history of pears.Concerning novel tools for breeding, reference genomes have also allowed the development of high resolution genetic maps mainly based on SNP markers and quantitative trait loci (QTL) analysis [21,22], genome-wide association studies (GWAS) [23][24][25] or even genomic selection (GS) [26].
The studies on pear genetics have been mainly focused on inter-or intraspecific variability, probably due to the inherited difficulty in understanding the evolution and domestication within the Pyrus genus.However, research on the intra-varietal (or clonal) variability in pear is scarce, despite that many pear varieties have been clonally propagated for centuries.Clonal populations can be of great use for pear genetic studies, mainly because selfing of individuals is not possible, thus F 2 populations are never available for such studies.Also, the generation of genetic populations in pear is a longstanding process, with large intergeneration periods, whereas the identification of bud mutants in pear orchards and their genomic analysis is fast.Although not all traits can be studied using clonal populations, simply because there are not enough bud mutants covering all possible traits, this limitation is also true for bi-parental populations that usually only segregate for a specific group of traits as well.
A bud mutant or bud sport is characterized by a somatic mutation that occurs in an adult plant, giving rise to a different phenotype.These mutants are often propagated into clonal progenies, thus retaining the new characteristic and keeping all the remaining plant traits.In fact, several pear varieties grown nowadays are bud sports of another variety, like "Red bartlett" or "Red d'Anjou" that were originated from "Bartlett" (syn."Williams") and "Beurré d'Anjou", respectively.Despite their importance for pear genetic resources and overall pear production, the genomics of pear clonal populations is largely unknown.Genome-wide clonal variation was already studied for other highly heterozygous crops, such as apple "Fuji" [27], "Jonathan" and "Sweet Jonathan" [28] as well as for grapevine "Malbec" [29], "Chardonnay" [30], "Nebbiolo" [31] and "Zinfandel" [32].These studies report a wide range of variants called between clones regardless of the species, ranging from a few hundred to a few million.The first approach to differentiate clones from a pear variety using SSR markers was performed in the Portuguese pear variety "Rocha" [33].
The origin of "Rocha" variety dates back to the 19 th century, when a chance seedling was identified in the farm of Mr Pedro Rocha by its superior fruit quality [34].Nowadays, approximately 99% of all Portuguese pear production comes from "Rocha", meaning more than 200.000 t per year, spread over more than 12.000 ha and a regular presence in the Top5 of the biggest European producers.Every single "Rocha" tree, in every pear orchard in Portugal, is a clone from that unique, ancient tree.In the 70's, a breeding program focusing on "Rocha" clonal selection and directed to yield and virus free plants was initiated, and the existence of high clonal variation in farmer's orchards was discovered.A collection of clones representing this variability was then established in 1985 at Vieira de Natividade Fruit Research Station (Alcobaça) from the National Institute of Agrarian and Veterinarian Research (INIAV, IP).Initially containing more than 80 unique trees, the collection is maintained nowadays in the field and in vitro at the Portuguese Plant Germplasm Bank (BPGV).This collection offers a unique opportunity to study the genomics of pear clonal variability and to understand how pear genomes evolve after centuries of clonal propagation.
This work aims to precisely quantify the genomic differences between "Rocha" clones and to infer, based on the base pair resolution of genomic variants, where these differences occur.Is the distribution of somatic mutations even across the pear genome or are there specific regions that tend to accumulate such mutations?Are these regions shared between "Rocha" clones, or does each clone possess a specific mutational signature?To address these questions, we have performed a whole genome resequencing of eight selected clones of "Rocha".

Sequencing, variant calling and genome-wide polymorphisms distribution
In this work, eight pear accessions belonging to the Portuguese pear germplasm collection, representing eight clones of the commercial pear variety "Rocha" were genome resequenced.Sequencing yielded from 70 M to 110 M reads depending on the sample, which accounted for a theoretical mean genome depth of 27× ranging from 21× in PRT 50 to 33× in PRT 51, respectively (Table S1).After filtering, approximately 2% of the reads for each sample were excluded due to quality thresholds.Clean reads were then mapped to the reference genome and only those reads mapping with a quality of 30 or above were considered.Reduction of reads due to mapping quality filtering ranged from 34% in sample PRT 58 to 47% in PRT 50.In the end, after all filtering steps, mean depth ranged from approximately 11× in PRT 50 to 20× in PRT 51, averaging 16× (Table S1).The genome coverage (i.e. the proportion of genome covered with at least one read) ranged from 77.6% in PRT 50 to 87.73% in PRT 51, averaging 86.1%.All samples presented 70% or more of their genome covered at least 10 times, with the exception of PRT 50 (Table S1).
After variant calling and filtering, 3 087 709 SNP positions and 339 628 INDELs were identified for the 17 chromosomes belonging to the eight "Rocha" clones.All data presented in this manuscript concerned the 17 pear chromosomes only, and did not include unassigned scaffolds and contigs.In detail, the number of SNPs per sample was stable around 3 million polymorphisms (from 2 941 735 to 3 025 490; Table S2A), with the exception of PRT 50 (2422271) Three primer pairs did not amplify any detectable fragment and were excluded.One primer pair amplified several fragments and was also excluded.For the remaining nine loci variants, seven (78%) presented the expected genotype as called in silico, including five SNPs and two INDELs (Table S7).
Overall, the distribution of missing data per chromosome across the SNP dataset (i.e. the amount of SNP positions that had an unknown genotype for four samples or less) is shown on Table S3A.The great majority of SNP positions had a valid genotype in all eight "Rocha" clones (2 385 442 out of 3 087 709; 77.25%)(Figure S3).Considering all SNPs with valid genotype in at least seven samples, available positions increase to 2 864 498 or 92.77% of all SNPs.The SNP frequency between "Rocha" and the reference genome "Bartlett" was found to be approximately 7.3 SNPs every Kbp (Table S2A).Similar results were obtained for INDELs (Table S2B).The great majority of INDEL positions had a valid genotype for all eight samples (248 667 positions out of 339 628; 73.21%) and the estimated variability between "Rocha" and "Bartlett" genomes was around 0.79 INDELs per Kbp (Table S2B).
The above mentioned rate of SNPs between "Rocha" and "Bartlett" genomes was uneven in the genome, depending on the chromosome considered, but not correlated with chromosome size (ρ = 0.303, α > 0.1).Furthermore, within each chromosome there was also great variability regarding SNP density, with both high and low SNP rate regions (Figure 1A).A relative abundance of regions with low -almost null -variability (i.e. 1 SNP/Kbp or rarer) between "Rocha" and "Bartlett" genomes was found, across all chromosomes.The size of such low variability regions, however, greatly varied between chromosomes, and was as extensive as 6-7 Mbp (Chr11 and Chr16; Figure 1A).In total, these very low variability regions accounted for 41.1 Mbp, which is roughly 1/10 of the combined size of the 17 chromosomes available for the "Bartlett" reference genome.In contrast, highly variable regions (1 SNP every 50 bp or more frequent) were also widely detected, although the extension of such regions was short.Similarly, the distribution of INDELs was uneven across the genome, with high and low variability regions (more than 2 INDEL/kbp or less than 0.2 INDEL/kbp, respectively) also occurring for these type of variants (Figure 1B).The variation in the abundance of both SNPs and INDELs across the genome did not correlate with the availability of genome coverage for variant calling (Figure 1A-B, orange background).

"Rocha" genome evolution and the relatedness among "Rocha" clones
The nucleotide diversity for the population of eight "Rocha" clones was estimated using a SNP dataset, without missing data, with 233 134 polymorphic SNPs.Nucleotide diversity (π /bp −3 ) ranged from 0.029 in chromosome 12 to 0.127 in chromosome 6 and chromosome 10 (Table 2).Similar results were found for the Watterson estimator (θ w /bp −3 ), with the lowest genetic variability being 0.041 for chromosome 12 and the highest being 0.157 for chromosome 6.The distribution of the nucleotide diversity using 50 Kbp non overlapping sliding windows was also investigated (Figure S2A-B).Chromosome 12 clearly presented a low diversity in all of its length, similar to chromosome 3.On the other hand, the most diverse chromosome 6 also had a high diversity throughout, but the highest peaks of nucleotide diversity were found in chromosomes 13, 14 and 15.
In order to understand the evolution of the "Rocha" genome in terms of the randomness of its mutations, and to test whether or not the frequency of polymorphisms in this population matched the history and origin of "Rocha", the Tajima's D was also calculated for all chromosomes (Table 2).
The lowest values for Tajima's D were found in chromosome 12 (−1.669)and the highest in chromosome 16 (−0.757).Using sliding windows of 50 Kbp, several regions with positive values of tajima's D were identified, some of them significantly deviating from the chromosome average (Figure 2).Two regions in particular were isolated, for chromosomes 15 and 17, that presented a positive tajima's D value higher than 1.5 (Table 3).For chromosome 15, there were four genes annotated in "Bartlett" genome for the selected region .In chromosome 17, there were three genes annotated in the positive tajima's D region (4.55-4.60Mbp).The nucleotide diversity within these regions was also very high, mainly for chromosome 15, with values of π /bp −3 and θ w /bp −3   3).The genes that were annotated in these two regions had GO assignments of ATP binding, protein binding, translation initiation factor activity, chromatin remodeling and (1-> 3)-beta-D-glucan biosynthetic process (Table 3).
To further understand the genetic relationship among clones we performed a Principal Component Analysis (PCA) of the SNP variants (Figure 3).PCA clearly separated "Rocha" clones into three main areas.PRT 50 was undoubtedly separated from the rest of the clones based solely on component one, which explained 47.32% of overall variation.PRT 56 was also separated from all other clones, and its positioning was also mainly explained by the variation of a single component, in this case component two (10.69%).All the remaining clones grouped together in a single cluster.
A Kinship analysis was also performed in order to quantify the relationships between "Rocha" clones.SNP data from a local cultivar "Carapinheira" (Accession: PRT 11), without any known relationship with "Rocha" variety, was also included in the analysis (Figure S4).The values of relatedness ranged from 0.0 (not related) in the case of all clones vs "Carapinheira" to 0.49 between all clones with the exception of PRT 50 (highly related;  for the Kbp resolution however, the differences among clones became evident, and all variability was found to be accumulated in very narrow locations.The coverage of sequencing was comparable between clones and the regions without unique SNPs did not correlate with low coverage regions.It also became evident that the greater rate of mutation found for PRT 50 came from a wider dispersion of mutations across the genome, rather than an accumulation in small regions (i.e. higher accumulation in the same Kbp).
To better understand the nature of the regions that accumulated the most mutations, we searched for DNA motifs that could be overrepresented in fragments with high mutation load.In total, 72 DNA motifs were found significantly enriched in these regions, when compared with regions of the same size (1 Kbp) without any mutation recorded.The five most significant DNA motifs for each class of region are presented in Figure S5.All significant motifs obtained were compared with available A. thaliana databases of known motifs.Results from this comparison are detailed in Table S4.Fourteen out of the total 72 DNA motifs discovered in regions of high or very high mutation had a match, the majority of which were known transcription factors.

The annotation of variant effects among "Rocha" clones highlights the theoretical disruption of key genes in pear
The putative impact of variants detected in "Rocha" clones was estimated using only polymorphic variants (233 134 SNPs and 43 954 INDELs).After annotation, those classified as having HIGH impact were isolated and inspected manually.In total, 216 SNP and 557 INDEL variants were classified in this category, which corresponds to 0.09% and 1.27% of all polymorphic variants, respectively.In addition, 168 SNPs and 425 INDELs were exclusive to a given clone and not shared.Within the SNP variants, the most common type of impact was "stop gained", accounting for 50.1% of all types of HIGH impacts annotated (Table S5).In the case Table 3.The two regions of 50 kbp with Tajima's D values higher than 1,5 in the "Rocha" clonal population, the nucleotide diversity of these regions, the genes annotated and their Gene Ontology (GO) of INDELs, the most common was "Frameshift variant", accounting for 89% of all types of impact.The genes that were impacted by these variants had a diverse distribution regarding their gene ontology annotations (Figure 4), with most common ontologies being protein binding, ATP binding, Protein phosphorylation, protein kinase activity and nucleic acid binding.

The majority of transposable element insertions in "Rocha" clones are unique
When compared to the "Bartlett" reference genome, PoPoolationTE2 and Retroseq yielded 1198 and 7518 new TEI in "Rocha" clones, respectively.Surprisingly, these TEI were mainly exclusive to one given clone rather than shared between several or all clones (Table S6).
In detail, following Retroseq results, 46.67% of all new TEI (3509) were exclusive to a given clone, 23.48% (1765) were shared between two clones and only 1.3% (98 TEI) were shared between all "Rocha" clones.Class I LTR retrotransposons were the most common among all new TEI detected (4075 out of 7518), followed by Terminal Inverted Repeat (TIR) and Miniature Inverted-repeat Transposable Elements (MITEs) class II transposons.For PoPoolationTE2 analysis the results were similar: 53.1% were exclusive to one clone, 21.37% were shared between two clones and only 0.58% (7 TEI) were shared between all clones.DNA transposons were the most common, mainly TIR and MITEs, followed by LTR and SINE retrotransposons (Table S6).TEI unique to one given clone were found to be inversely correlated with the amount of unique SNPs for that same clone (ρ = −0.71,α < 0.05 -PoPoolationTE2; ρ = −0.93,α < 0.005 -Retroseq).
In terms of location and distribution, for PoPoolationTE2 TEI accumulated mainly in three genomic regions, regardless of the clone where they were detected.These regions included the distal end of chromosomes five, eleven and fifteen, comprising a total of 20-25 Mbp (Figure S6B).In contrast, the distribution of nonreference TEI discovered with Retroseq was rather uniform, matching the overall SNP and INDEL density (Figure S6A vs Figure 1A/B).

Discussion
The intraspecific variability in P. communis Pears comprise a very rich genus with both wild and cultivated species, with different levels of ploidy and several hybrid species.Also, many pear traits are genetically complex, and others are still to be determined from a genetics perspective.This clearly indicates that genetic and genomic studies are much needed for Pyrus, to clarify the relationship among species, to study and compare pear genomes and their evolution, and to help develop new molecular tools to assist the breeding of key pear traits.In this study, 3 087 709 SNPs and 339 628 INDELs were called for the 17 chromosomes belonging to "Rocha" clones.This represents one measure of intraspecific variability of cultivated European pear and translates   [20] analysed 50 European pear accessions and reported a larger number of SNPs than those found for "Rocha" (6 945 796 or 13.2 SNP/Kbp).However, only 25 out of those accessions were cultivated European pear (Pyrus communis), and the SNP number on those 25 samples was much lower, around 4 220 232 SNPs or 8.02 SNPs/Kbp, which is almost in line with this study, despite the fact that the reference genome used there was the Asian pear genome, Pyrus bretschneideri "Dan-gshanSuli".Thus, Wu's result of 8.02 SNP/Kbp reflects the overall variability between P. communis samples and Pyrus bretschneideri genome, which translates to Pyrus interspecific variability.Another publication by Montanari et al [56], on the development of a 70 K pear SNP array, reported 3 809 750 SNPs detected for the 19 samples of the "communis group", which includes both wild and cultivated P. communis.Montanari's results on the variability of European pear translate into 7.24 SNPs/Kbp, which is in total agreement with the results presented here, even considering that a previous version of "Bartlett" reference genome (V1.0) was used for read mapping, and that their group of 19 samples included some wild European pear species.
Pear fertility is controlled by a gametophytic selfincompatibility system, a mechanism that promotes outcrossing and prevents selfing or crosses between common genome backgrounds, based on the alleles of the S-RNase gene.As a consequence, pears display highly heterozygous genomes.Nevertheless, some genomic regions present identity-by-descent (IBD), characterized by very low variability even when comparing different Pyrus species [20].In this work, roughly 10% of "Rocha" genome was discovered with a SNP rate lower than 1 SNP/Kbp over "Bartlett", suggesting the presence of large IBD regions.Such regions are expected between modern pear varieties due to the reduced number of founder lines used in breeding programs in the past.Indeed, "Rocha" and "Bartlett" are both supposed to be direct descendants of "Whyte Doyenne" [57], meaning they should be half siblings sharing ∼25% of their genome.However, the results from this work suggest that these varieties are third degree relatives (∼12.5% genome shared) rather than second degree relatives.This may be the first relatedness coefficient estimation for "Rocha" and "Bartlett" pair, which may contribute to solve the unresolved pedigree of "Rocha" cultivar.

The clonal variability of "Rocha" pear
For perennial crops, the understanding on how a given variety accumulates mutations over time is of key interest, given that these crops are mainly propagated clonally to maintain their traits, thus carrying their somatic mutations forward.In this study, from the global dataset of "Rocha" vs "Bartlett" SNP variants called in all samples, only 233 134 were polymorphic (i.e. at least one clone had a different genotype from the remaining clones).In other words, the genomic variation at the "Rocha" clonal population level (intra-varietal variation) is estimated at 1/10 (233 134 vs 2 385 442) of the genomic variation of "Rocha" vs "Bartlett" (intraspecific variability).This measure is higher than apple "Fuji" clonal variability [27], where the number of SNP variants between "Fuji" and the apple reference genome "Golden delicious" was 1 744 187 SNPs, whereas between "Fuji" and any of its bud mutants was approximately 51 000 SNPs, representing 1/34 of the intraspecific variability.However, the decreased variability found in apple "Fuji" clones when compared with this study in pear may be explained by the short period since the inception of the "Fuji" variety, which was only released to the market in the 1950's and the "Fuji" clones studied by [27] were isolated not before the 1980's.Another study on apple bud mutants studied the genome-wide differences between "Jonathan" and "Sweet Jonathan" and found as much as 4 198 955 SNPs between those two clones [28], representing 84 times the intra-varietal variation on apple "Fuji" from [27].Grapevine is another outcrossing crop species that has been studied for clonal variability.There are studies reporting as low as 941 intra-varietal SNPs for "Malbec" [29], a few thousand SNPs for "Chardonnay" [30] and "Nebbiolo" [31] and more than 630 000 SNPs for "Zinfandel" clones [32].This comes to show that the study of clonal diversity is a challenging task and results are somewhat inconsistent, either likely due to methodology such as the differences in sequence coverage, quality filtering and overall pipeline design or due to the propagation record and the history of the plant material.
In terms of overall nucleotide diversity, "Rocha" clonal population presented a genome-wide value of 0.070 π/bp −3 .For reference, this compares with 3.53 π/bp −3 for a population of 25 P. communis cultivars [20].Tajima's D values were highly negative for every chromosome and for the overall genome (−1.096), and more negative than at intraspecific level, as expected (−0.78 [20];).Nevertheless, when calculating both nucleotide diversity and Tajima's values over 50Kb sliding windows, highly variable regions among clones and high positive Tajima's D values were found.Two examples of such regions are presented in Table 3 where several key genes are annotated in "Bartlett" genome, including callose synthase, two Malectin Receptor-Like Kinase (RLK) genes and two SWR1-complex protein 4-like isoforms, among others.In A. thaliana, Callose synthase is required for the formation of cell walls during pollen development and may be involved in callose synthesis during pollen tube growth.During plant development, callose is also found as a transitory component of the cell plate in dividing cells and it is found as a structural component of plasmodesmatal canals [58,59].On the other hand, RLK Kinases have been related with immunity responses and overall resistance to pathogens in several plant species [60,61].However, RLK Kinases from the FERONIA subfamily (FER and ANX1/2) also play a key role during communication between pollen tube and synergids, making these genes required for normal fertilization in plants [62,63].In the Pyrus genus, the RLK family of proteins were studied in-depth by [64], with 26 genes in total being identified in Pyrus bretchneideri genome.From those 26, five genes were classified as belonging to the FER and ANX1/2 subfamily (PbrCrRLK1L3, PbrCrRLK1L5, PbrCrRLK1L8, PbrCrRLK1L11, and PbrCrRLK1L26).Kou and colleagues (2017) showed that the expression of PbrCrRLK1L3 and PbrCrRLK1L26 is higher during pollen tube growth when compared with other RLK genes from other subfamilies, and they proved that pollen tube elongation is halted when the expression of these two genes is inhibited.The two RLK genes in the highly variable region presented here (pycom17g06370 and pycom17g06380) have their best BLAST hits with Pyrus bretchneideri FER family genes, suggesting that they could also be required for pollen tube elongation in P. communis.On the other hand, pycom17g06370 and pycom17g06380 also showed great similarity with Malus domestica RLK FERONIA genes, which were recently proven to be involved in the regulation of apple fruit ripening [65].Regarding SWR1-complex protein 4 (SWC4) genes, these are part SWR1-complex which mediates the ATPdependent exchange of histone H2A for the H2A variant HZT1, leading to transcriptional regulation of selected genes by chromatin remodeling in Arabidopsis (UniPro-tKB:Q8VZL6).Arabidopsis artificial miRNA knock-down SWC4 (Ateaf1) lines showed decreased levels of H4K5 acetylation in the promoter regions of major flowering regulator genes and showed early f lowering [66].Altogether, in this work, several genes annotated in the two highly variable 50Kbp regions have orthologs with key functions described for plant fertility and flower development in other plant species.This suggests that balancing selection of fertility-related genes could be occurring in the "Rocha" clonal population.However, since all clones were initially isolated and selected based on the presence of distinctive traits, these highly variable regions are most likely a consequence of human selection.Future research with a larger sample dataset can corroborate these results and allow for a better understanding of these events in the collection.
The relatedness among "Rocha" clones was explored through PCA and Kinship analysis.These approaches revealed that PRT 50 is the least related with the remaining clones, followed by PRT 56.The kinship coefficient is equal to 0.49 for every pair of clones, with the exception of PRT 50 (0.45).These results are partially in agreement with the results by [33] using SSR markers in the same biological material and suggest the origin and the history of propagation of clones as the most likely explanation for their divergence.This is mainly true for PRT 50, since the orchard where it was identified ∼40 years ago was located in the region of Torres Vedras, closer to the location of the original "Rocha" tree.All the remaining clones were identified in orchards from the region of Alcobaça, with little geographical distancing between them (Figure S1).This correlation between genetic and geographical distancing could ref lect the sharing of grafting material between farmers in the 60's or 70 , or even the adaptation of clones to differing conditions, although this cannot be confirmed.
The relatedness between pear clones found in this work can be compared with other outcrossing crops like grapevine.Vondras et al [32] found that the 15 "Zinfandel" clones had kinship coefficients between 0.42 and 0.45, which suggests that grapevine clones tend to be less related than European pear ones.However, "Zinfandel" was confirmed to be the ancient Croatian variety "Tribidrag", for which there are records of cultivation in Croatia for as early as the 15 th century [67], meaning that its propagation history is at least 500 years old.Thus, the lower relatedness of "Zinfandel" clones is most likely a consequence of its extended clonal propagation, which certainly led to a greater accumulation of somatic mutations.
Clone-specific mutations (i.e.SNPs that are present in one clone and absent in the other clones) were not distributed equally in the genome.At the Mbp level, all clones presented the same tendency, but when searching at the Kbp resolution, some Kbp windows were targeted for SNP accumulation, but those windows were usually not shared between clones.Several DNA motifs were overrepresented at these locations, 14 of which had a match with available databases.The majority of these matches were associated with families of transcription factors that are known to play important roles regulating gene expression in plants, namely Basic leucine zipper (bZIP), NAC, basic helix-loop-helix (bHLH) and WRKY [68].This result shows that, despite the fact that somatic mutations are likely to be present throughout the genome, in European pear "Rocha" they can also occur close to transcription factors, thus reinforcing their role in the emergence of new pear phenotypes.This was corroborated by the annotation of variants, which could be directly impacting the coding sequence of genes.In this work, 216 SNPs and 557 INDELs with potential to cause protein truncation or loss of function were annotated.From those, 168 SNPs and 425 INDELs were unique to a given clone.This set of polymorphisms can be responsible for some of the unique phenotypes that can be found in the "Rocha" clonal collection, as hinted by the ontology of the gene affected, and they can contribute to the study of European pear genetics by uncovering the causal genes of key phenotype.

Each "Rocha" clone has a unique signature of transposable elements
In this work several thousand new TE insertions were found in "Rocha", but surprisingly those insertions were hardly shared among clones.In fact, unique TE insertions (those only detected in one single "Rocha" clone) accounted for approximately half of all insertions, and only ∼1% were shared between all eight clones.This is unexpected and is somewhat in disagreement with the history of "Rocha", since theoretically all individuals were propagated from a single ancient pear tree, meaning that most genetic variants that are exclusive to "Rocha" and not present in "Bartlett" reference genome should be shared between clones (which is true for SNPs and INDELs).
One possible explanation for these results is that the genomic location of TE in European pear is mostly common between cultivars, hence the majority of "Rocha" TEs are also present in "Bartlett", thus excluding the most "shared" TE from the analysis, since only non-reference insertions were considered (new TE insertions).Another factor that must be considered to understand this result is the origin of "Rocha" cultivar.Historic records mention a single tree, nearly 200 years ago, as the origin of all "Rocha" that exists nowadays [34].However, the uniqueness of TE location in this clonal collection suggests otherwise.If "Rocha" would have multiple origins or starting materials for propagation in its inception as a cultivar, it could explain the profile of TE found in the clonal population of this work.However, even considering multiple origins, a great degree of TE sharing would be expected that could ref lect a common genomic background vs "Bartlett".Finally, it is also worth mentioning that the quality of sequencing data can play a role in the results, mainly because the coverage and quality of sequencing varies depending not only on the region of the genome but also from sample to sample.In other words, a given TE in a given genomic location is only shared by N clones if there is sufficient sequencing on that specific region, in all N clones, to confidently call that TE.
Despite being unexpected, the uniqueness of new TE insertions in clonal populations is described in the literature for other perennial crops [32].Future work will help to understand what drives the movement of TE in constantly propagated perennial crops, and what are the implications for clonal integrity and genome stability.

Conclusions
The genomic studies of pear species are still scarce.The understanding of the pear genome and its features, as well as the study of the genetics behind many key traits in pear are crucial to assist the breeding of new genetic combinations.This will be very important for pear to adapt to climate changes and to meet other challenges that producers are already being faced with, including a transition into a greener agriculture.Clonal populations of pear can play a role in the understanding of pear genomics, uncomplicating its genome due to high similarity between individuals, thus overcoming the typical complexity of an outcrossing species.In this work we have identified several variants between "Rocha" clones occurring inside coding regions of genes that can be responsible for some of the unique phenotypes presented by these clones.In the future, the in-depth study of somatic variants may unveil the genetics behind several pear traits that are still to be determined to date, opening new opportunities for Marker Assisted Selection (MAS) and Breeding.

Plant material
Eight pear accessions from the Portuguese pear germplasm collection, representing eight clones of the commercial variety "Rocha" were used in this study.These accessions were identified in several orchards in the west region of central Portugal in the late 70's and early 80's (Figure S1), and were grafted in 1985 at the Vieira de Natividade Fruit Research Station (Alcobaça, Portugal).Additionally, a Portuguese local variety "Carapinheira" was also used in this study, but mainly for comparison purposes or as an outgroup.All plant materials are registered in the Portuguese node of the GRIN-Global database (http:// bpgv.iniav.pt/gringlobal/)under the accession numbers PRT 11, PRT 50, PRT 51, PRT 52, PRT 53, PRT 55, PRT 56, PRT 57, PRT 58 (Table 1).

DNA extraction and sequencing
Young leaves were collected from year-grown twigs at four positions in each tree, representing the four cardinal points.DNA was isolated using the innuPREP Plant DNA Kit (Analytik Jena AG, Berlin, Germany), according to the manufacturer's protocol.
After extraction, genomic DNA was checked for quality parameters according to sequencing provider instructions, namely for integrity (using agarose gel electrophoresis), minimum concentration of 20 ng/μL (Qubit ® ) and purity (absorbance at 260, 280 and 230 nm using NanoDrop ND2000 spectrophotometer (Thermo Scientific, Massachusetts, MA, USA)).Samples were shipped to Novogene Europe (Cambridge, UK) for shortinsert paired-end library preparation (2x150bp) followed by whole genome sequencing (WGS) in the Illumina Novaseq platform.

Intra-varietal variation
Since there is no "Rocha" reference genome available for variant calling, the intra-varietal variation could not be directly estimated from "Rocha" vs "Bartlett".However, an intra-varietal variation was considered when a given variant was polymorphic in the population (i.e. when at least one clone had a different genotype call when compared with the remaining clones).Polymorphic positions were also used to isolate unique genotypes as those positions were one clone was different from all the remaining seven clones (i.e. one clone "0/1" vs all the remaining clones "1/1").

Estimation of variant effects
The estimation of variant effects was performed using SnpEff software v4.3t [52].A local database for P. communis Bartlett DH Genome v2.0 annotation was built following SnpEff instructions, using the annotation file available at the Genome Database for Rosaceae (GDR) (ftp://ftp.bioinfo.wsu.edu/species/Pyrus_communis/Pcommunis_DH_genome.v2.0).Variants were then sorted by their predicted effect, and those annotated as of HIGH impact were selected for manual inspection.Genes affected by HIGH impact variants were selected and their predicted functional analysis was retrieved from genome-related files available at GDR, namely GO assignments, IPR assignments and KEGG pathways, using a custom python script.

Transposable elements
Genome-wide analysis of Transposable elements (TE) insertions for each sample was performed using PoPoola-tionTE2 [53] and Retroseq [54] softwares through the McClintock pipeline [55].In detail, a TE consensus library for P. communis was downloaded from URGIT RepetDB (http://urgi.versailles.inra.fr/repetdb/begin.do#search?taxonGroup=23211).Then, McClintock was used with the -make_annotations argument to create the annotation and taxonomy files, followed by TE insertions (TEI) discovery for each sample individually using the PoPoolationte2 and Retroseq methods.The individual result files for each sample were merged and then used to count the number of TEs per class and per number of clones where it was discovered, using in-house python scripts.A given TEI was only included for the final results dataset if it was newly discovered (non-reference) and confirmed by both forward and reverse reads in the case of PoPoolationTE2 (non-redundant).Since new insertions are only pinpointed (i.e. the spawn of the TEI is only 1 bp), the overlap of TEI among clones, or the calculation of which insertions were shared by N clones, was achieved as follows: if the same TE was present in more than one clone, in the same chromosome, it was considered to be shared between that many clones if the pinpoint positions of those TEs were less than 10 Kbp apart.

Figure 1 .
Figure 1.Overall "Rocha" SNPs (A) and INDELs (B) distribution across all 17 pear chromosomes.Blue bars at the bottom represent regions with very low variability.Top red bars represent regions with very high variability.Orange background represents the amount of low coverage base pairs (<5×) along the chromosomes.Black line represents the number of SNPs or INDELs called.

Figure 2 .
Figure 2. The distribution of tajima's D values, for the "Rocha" population, in 50 Kbp windows for all 17 chromosomes of pear.Red line represents windows with negative Tajima's D values.Green line represents positive Tajima's D values.

Figure 3 .
Figure 3. Principal component analysis (PCA) using genome-wide SNP data generated in this study for the eight "Rocha" samples.Only polymorphic SNPs with valid genotypes across all samples were used (233 134 SNPs).

Figure 4 .
Figure 4. GO assignments of genes affected by HIGH impact SNP variants in "Rocha" clones.The size of rectangles is directly correlated with the number of occurrences of a given GO.

Table 1 .
The origin, name year of collection and unique traits of the eight "Rocha" clones studied in this work

Table 2 .
Covered bases, estimates of nucleotide diversity (π and the Watterson estimator θ w ) and Tajimas' D for each chromosome or genomic region.
49overed bases where all eight clones had at least 5× coverage eachThe clonal variability in "Rocha" is uneven and presents both low and high mutation regionsFrom the 233 134 polymorphic SNP variants among clones, 183 407 of them (78.67%) were exclusive for a given clone, meaning that the majority of intra-varietal genomic variation found in "Rocha" is unique to a given accession and not shared.The remaining49727 variants (21.33%) were shared by two or more clones.These SNPs, ranging from 4848 in PRT 51 to 12 955 in PRT 56.The overall frequency of unique SNPs for each clone varied greatly considering 1 Mbp sliding windows, with an average of 282 SNPs/Mbp for PRT 50, 28 SNPs/Mbp for PRT 56 and between 10 and 20 SNPs/Mbp for the remaining clones.Taken together, the macro distribution of unique SNPs had the same tendency in all clones.When looking