Genetic characterization of an almond germplasm collection and volatilome profiling of raw and roasted kernels

Almond is appreciated for its nutraceutical value and for the aromatic profile of the kernels. In this work, an almond collection composed of 96 Sicilian accessions complemented with 10 widely cultivated cultivars was phenotyped for the production of volatile organic compounds using a proton-transfer time-of-flight mass spectrometer and genotyped using the Illumina Infinium®18 K Peach SNP array. The profiling of the aroma was carried out on fresh and roasted kernels enabling the detection of 150 mass peaks. Sixty eight, for the most related with sulfur compounds, furan containing compounds, and aldehydes formed by Strecker degradation, significantly increased during roasting, while the concentration of fifty-four mass peaks, for the most belonging to alcohols and terpenes, significantly decreased. Four hundred and seventy-one robust SNPs were selected and employed for population genetic studies. Structure analysis detected three subpopulations with the Sicilian accessions characterized by a different genetic stratification compared to those collected in Apulia (South Italy) and the International cultivars. The linkage-disequilibrium (LD) decay across the genome was equal to r2 = 0.083. Furthermore, a high level of collinearity (r2 = 0.96) between almond and peach was registered confirming the high synteny between the two genomes. A preliminary application of a genome-wide association analysis allowed the detection of significant marker-trait associations for 31 fresh and 33 roasted almond mass peaks respectively. An accurate genetic and phenotypic characterization of novel germplasm can represent a valuable tool for the set-up of marker-assisted selection of novel cultivars with an enhanced aromatic profile.


Introduction
Almond (Prunus dulcis Mill. D.A. Webb; syn. Prunus amygdalus Batsch.; Amygdalus communis L.; Amygdalus dulcis Mill.), belongs to the genus Prunus, family Rosaceae a taxonomic group that includes numerous species of agronomical interest such as: apple, pear, peach, apricot, cherry, prune, and several berry fruits. Among tree nuts, almond ranks third in worldwide production behind cashew and walnut, with the US being the largest producer 1 . In ancient times, its cultivation rapidly spread throughout the Mediterranean regions from central Asia reaching Sicily during the Greek domination 2 . Nowadays, almond is widely cultivated all over the Mediterranean basin.
Almond cultivation relies mainly on a few highly productive cultivars. However, almond germplasm is composed by thousands of accessions showing wide variability in terms of adaptation to different pedoclimatic conditions, resistance to biotic and abiotic stress and kernel quality traits [3][4][5] . The self-incompatibility of most of the almond cultivars, together with the extensive use of seeds for propagation, played an important role in the differentiation of such massive genetic diversity within the almond species 6,7 .
One of the leading aspects guiding the choice of the almond cultivar is the kernel quality. Such multi-factorial trait encompasses the physical appearance (colour, texture, size), its nutritional properties and the flavour (aroma and taste) 3 . Almonds are particularly valued for their sensory, nutritional, and health attributes 4 and kernels are often consumed fresh or as ingredients in processed foods 5 . Considering their wide use for fresh consumption or for confectionery, the flavour of both raw and roasted almond kernels greatly influences their economic value. While taste is determined primarily by nonvolatile metabolites (sugars, organic acids, amino acids) and it is perceived in the mouth, aroma is the result of the interplay of a large array of volatile organic compounds (VOCs) and it is perceived largely by the olfactory receptors. In light of this, aroma profiles of raw and roasted almond have been dissected through several approaches at harvest and during storage. VOC profile of raw almonds is composed for the most by aldehydes such as hexanal, nonanal and benzaldehyde [6][7][8][9] , although several ketones, alcohols, alkanes and heterocyclic compounds have been reported 10 . Pyrazines, pyrroles, furans and aldehydes comprised the main volatile compound classes in roasted almonds 10 . The chemical reactions behind the formation of the majority of VOCs in roasted almonds are the Maillard reactions 11 which produce branched-chain aldehydes, alcohols, sulfur-containing and heterocyclic compounds, while straight chain volatiles reflect heat-induced oxidation during roasting 8 . Several sulfur-containing aroma compounds are de novo produced during roasting by the degradation of sulfurcontaining amino-acids, such as dimethyltrisulfide and 2furfurylthiol, respectively formed by methionine and cysteine 9,12 .
Since aroma involves the perception of a plethora of VOCs, their assessment is crucial to guarantee the selection and marketability of high-quality fruit. Thus, high priority should be given to replace poor flavour cultivars with favourable ones, exploiting the variability already available in nature. However, the analysis of the aroma trait in many samples, necessary to overcome the significant biological and genetic variability among samples, may be laborious and time consuming. VOC phenotyping is currently a limiting step in breeding programs, due to high costs and complex analytical techniques. Another limitation also raised by the elevated, and difficult to be controlled, the interaction between fruit genetics, environmental effects, and product transformation. Even though different cultivars are often characterized by substantial variations in flavour 8 , most plant breeding programs have historically neglected this trait, given its intrinsic complexity and costs to phenotype 9,10 . To correct this inconsistency and incorporate flavour into breeding program routines, it is necessary to identify the sources of flavour variability, understand their genetic architecture, and define cost-effective methods of selection.
According to recent publications, direct injection mass spectrometry (DI-MS) techniques, like Proton Transfer Reaction -Time of Flight-Mass Spectrometry (PTR-ToF-MS), are powerful high-throughput phenotyping tool for both genetic and quality-related studies 11,12 . The rapidity and the moderate cost of DI-MS analysis may allow to perform a detailed aroma characterization with a peculiar attention to the VOC fold changes caused by ad hoc storage and transformation experiments. Indeed, this technique has been already applied for the VOC characterization of transformed products, such as fermented cocoa 13 and coffee beans 14 , and for genetic association studies of different fruit species 12,15,16 .
The production of these VOCs is controlled by two classes of genes: those encoding enzymes responsible for the synthesis of the end products and those encoding factors regulating the biochemical pathways 9 .
A significant increase in the use of both highthroughput DNA-derived data and advanced phenotyping approaches to dissect the causative genes underlying traits of agronomical interest through marker-trait association approaches [17][18][19] has happened in the last decades. To this extent several segregating populations were developed to build the first genetic maps of almond [20][21][22] . Such studies laid the foundations for QTL analysis approaches to detect genomic regions linked to phytosterol content 21,23 and other traits related to the physical traits of almond nut and kernel 22 . However, none of these genetic association studies was focused on understanding the genetic aroma regulation of almond kernels.
The high genetic similarity between peach and almond 19 allowed the development of interspecific almond x peach segregating population and their use for QTL analyses for traits of agronomical interest such as chilling and heat requirement 24 , brown rot resistance 25 and 'stone-adhesion/flesh-texture' 26 . The availability of high-throughput genotyping platforms enabled the use of genome-wide association study (GWAS) approaches on germplasm collections composed by unrelated individuals 27 . GWAS approaches proved its efficiency in almond 28 as well as in many other outcrossing tree crops, since they are capable of assessing higher allelic variability and smaller linkage blocks compared to other methods.
In light of this, the set-up of an ex situ germplasm collection is a strategic step both for conservation and breeding purposes. The present work is based on the analysis of an ex situ germplasm collection that was already characterized both phenotypically and genetically highlighting a variability both within Sicilian accessions and between those and the Italian and international elite cultivars 29,30 . Overall, such almond germplasm collection    In this survey, our almond collection was both phenotyped using a proton-transfer time-of-flight mass spectrometer and genotyped using the Illumina Infinium ® 18 K Peach SNP array developed by RosBREED consortium 31 . Genetic data were employed for synteny analysis and to decipher both the genetic stratification and the linkage disequilibrium (LD) extent within the collection in the analysis. The same germplasm was phenotyped for the production of VOCs both on raw and roasted kernels using a PTR-ToF-MS.
The aims of this work were (i) to estimate the volatilome variability among almond different genotypes; (ii) to evaluate the effect of roasting on the VOC composition of the almond kernel; (iii) to identify the best performing accessions to be used as superior parental lines for future breeding programs; (iv) to detect molecular markers linked to VOCs of interest. In addition, the results of this study might be useful in defining an objective VOC phenotyping protocol to apply in all production stages, from breeding to the food industry transformation. This study is a first, preliminary, step toward the definition of molecular markers that can be readily employed for marker-assisted selection (MAS) approaches and provide novel insights on the genetic mechanisms regulating the VOCs profile in almond.

Results and discussion
High-resolution VOC phenotyping Almond VOC profile was assessed on raw and roasted kernels in triplicate by PTR-ToF-MS analysis as described in Farneti et al. 11 . VOC mass peaks from the raw PTR-ToF-MS spectra were reduced from 422 to 150, applying noise and correlation coefficient thresholds (Table 1, Supplementary   Fig. 1, Supplementary Tables 1-2). Tentative identification (t.i.) of each mass peak, detected by PTR-ToF-MS, was based on in-house library of pure standards and on literature review [32][33][34][35][36] . VOC profile was considerably altered during roasting, as 122 mass peaks significantly differed between raw and roasted almonds ( Table 1, Supplementary Fig. 1, Supplementary Tables 1-2). To our knowledge, this is the first work about PTR-MS application on almond kernels; this technique has already been successfully applied for the characterization of fermented cocoa and green and roasted coffee beans 13,14 and for the online monitoring of coffee roasting [37][38][39] .
Only a few mass peaks (28 over 150) were not significantly modified by roasting. Among them, several compounds have an important role in the characterization    Fig. 1e). The VOC variability, assessed on raw and roasted almonds, is graphically represented by the PCA plot (Fig.  2a, b) defined by the first two PCs (PC1: 41.8 % and PC2: 21.3% of the total phenotypic variability). VOC differences related to roasting were mostly explainable by PC1, while differences among almond genotypes, in particular for fresh kernels, were mostly related to PC2. Cultivars defined by negative values of PC2 had a more intense VOC profile for both fresh and roasted kernels, as it was also validated by the hierarchical clustering and heatmap (Fig. 2c, d). Almond VOC profile seemed to be mostly influenced by roasting, but still with significant interaction with genetic variability. As a result, fresh and roasted almond genotypes were significantly clustered into two groups (Fig. 2a) based on PC1 variability.
According to solely to the VOC profile assessed on fresh kernels, an accurate prediction of the profile after roasting is quite complex, since several compounds, like sulfur compounds, furans, and few aldehydes, are produced by the degradation of primary metabolites only during roasting ( Fig. 1 and Supplementary figure 1). However, based on results of both PCA analysis (Fig. 2a, b) and hierarchical clustering (Fig. 2c, d), most of the accessions considered in this study maintained a comparable topological structure of the cluster's tree ( Supplementary  Fig. 2). In particular, it was possible to identify two clusters of accessions composed respectively by "Angelica" (#3), "Baggiana" (#4), "Belvedere" (#9), "Cacciatura" (#15), "Montagna" (#64), "Mullisa Piccola" (#67), and "Sarbaggia di Sciascia" (#87); and by "Amara di Martorana" (#2), "Calamonaci" (#17) and "Cesaro 1" (#25) that maintained their VOC characteristics after roasting. These two clusters were characterized, respectively, by an elevated concentration of m/ Moreover, these volatilome results evidenced that all almond elite cultivars assessed in this study, except "Ferraduel" (#44), were characterized by a less intense VOC profile than many of the Sicilian accessions. As for many other horticulture products, this lower VOC content might be the indirect consequence of a cultivar selection for the most oriented to the fruit productivity rather than to the quality 9 . Noticeably, it was possible to define several clusters of cultivars, among the Sicilian accessions, characterized by a considerable high content of compounds with a specific, and easy to be perceived, aroma note, like benzaldehyde (Fig. 1f) or phenyl ethyl alcohol (Fig. 1d). While benzaldehyde is the characteristic and predominant odour compound of bitter almond 45 , phenyl ethyl alcohol, associated with floral and rose aroma note, was already detected in several almond genotypes, but at low concentrations 45,46 . "Don Pitrino" (#36), "Pizzuta grande" (#80), "Comunista" (#28), "Pilusedda" (#76), "Mennula du nigliu" (#57) and "Vaiana" (#101) were some of the cultivars of our germplasm collection characterized by highest phenyl ethyl alcohol concentration. This feature might be interesting not only for the agro-food sector but also for the cosmetic industry 47 .
Taking into account the high genetic variability considered in this study, we aimed to uncover most of the possible VOC variability among Prunus dulcis genotypes. However, without a detailed sensory analysis, quantifying the relevance of each VOC might be too speculative, bearing also in mind the non-linear interaction of these molecules in determining consumer preference. For this reason, in order to reduce any possible statistical bias in the result interpretation, all data were analysed with unsupervised multivariate statistical methodologies (PCA and hierarchical clustering). Nonetheless, considering each quality trait independently (i.e. Supplementary Fig.  1) might be useful for a breeding approach aimed to introduce, or improve, a distinct quality trait to an elite breeding line.
To simplify the application of these results, we limited the number of VOC traits that have to be considered ( Supplementary Fig. 3), according to the loading plots of the principal component analysis and to the results of previously published articles on almond aroma 32,33,[48][49][50][51] . The content of each trait (including also some pomological feature such as fruit and kernel weights, kernel thickness, or flavour) was grouped based on the distribution quantile (low: 0-25%; middle-low: 25-50%; middle-high: 50-75%; high: 75-100%), calculated for both raw and roasted assessment (Supplementary Fig. 3). Accessions employed in the study can be consequently sorted and clustered according to the content of the trait of interest, which can be arbitrarily chosen as implemented in the dedicated webpage https://iuliiakhomenkofmach.shinyapps.io/QualySort/ 52 .

Definition of a robust SNP set and peach/almond synteny analysis
The original set of 16,038 SNPs was filtered using the ASSIsT software 53 resulting in the detection of 471 (2.9%) robust polymorphic markers spanning the eight almond chromosomes. Among the discarded markers, 11,743 (73.2%) were monomorphic, 2321 (14.5%) failed in the amplification and the remaining 1503 (9.3%) were characterized by the presence of null alleles. The relatively low number of failed SNPs confirmed the high synteny between peach and almond genomes, nevertheless, the high fraction of monomorphic markers well reflected the fact that the probes were designed to target SNPs characterizing a different, although similar, species.
The number of mapped SNPs per chromosome spanned from 22 (Pd5) to 123 (Pd2), with a mean value of 59, the average marker density was 1 marker every 424 Kb (Supplementary Table 3).
The physical position of the 471 SNPs on the almond 54 and peach genome 56 was highly consistent (r 2 = 0.96) highlighting a high synteny and collinearity between the two species. 13 SNPs (2.7%) mapped on different chromosomes in the two species. Pd1 and Pd6 did not show any inconsistencies and the SNP positions along the Fig. 3 Collinearity plots between almond (x axis) and peach (y axis). Physical coordinates for SNP markers were retrieved from the Prunus dulcis Texas Genome v2.0 and the Prunus persica Whole-Genome Assembly v2.0 respectively. SNPs mapped in the same linkage groups in both species were represented as full dots while SNPs mapped in different linkage groups were represented as cross almond and peach linkage groups showed an r 2 = 0.997 (Pd1) and 0.998 (Pd6), (Fig. 3). The other linkage groups were characterized by the occurrence of 1-3 SNP(s) mapped in different linkage groups in the two species (Fig. 3). The high synteny between peach and almond was in agreement with previous studies highlighting that most of the genomes of the Prunus species can be considered as a single entity 19,54,57 .

Analysis of genetic structure
The level of genetic stratification was assessed using the Bayesian approach implemented in the software STRUCTURE 58 . Among the different number of subpopulations postulated, K = 3 showed the highest likelihood (ΔK = 346) followed by K = 7 and K = 2 showing similar likelihoods (ΔK = 151 and 116 respectively, Supplementary Fig. 5). Figure 4A showed the genetic configuration of the 106 individuals for K = 3; 45 accessions were characterized by a clear predominance (Qi ≥ 0.8) of one of the three subpopulations, in particular: 19 accessions were predominantly characterized by Subpop1 while both Subpop2 and Subpop3 were represented by 13 accessions each. The remaining 61 genotypes showed a higher level of admixture (Supplementary Table 4). The SNP data analysis and the structure results confirmed the origin of the self-compatible cultivar "Supernova" (#93) as a mutant of the self-incompatible "Tuono" (#98) 59 with the two cultivars characterized by an identical genotypic profile for all the SNP tested (and consequently an identical genetic structure for all the Ks postulated, Supplementary Table 4, Supplementary Fig. 6). Overall, the Apulian and International accessions were characterized by a similar contribution of Subpop1 (12.3% and 12.1% respectively); then the most represented subpopulations were Subpop2 for the International group (54.3%) and Subpop3 (46.1%) for the Apulian accessions (Fig. 4b).

Identification of genomic regions underlying VOCs production
Marker-trait association approaches were successfully employed in most of the tree crops to identify molecular markers in strong LD with the causative gene(s) influencing a trait of agronomical interest. In this study, molecular and phenotypic data were employed for a preliminary application of a GWAS analysis to identify molecular markers linked to the VOC production of the fresh and roasted almond kernel.
Among the 150 mass peaks related to the VOC profile of fresh almond, 31 were characterized by significant markertrait associations for at least one of the SNP tested. Although with different relative frequencies, significant SNPs were For each SNP exceeding the GWAS significance threshold, the corresponding physical position according to the Prunus dulcis Texas Genome v2.0 was reported together with the relative FDR-adjusted p value (expressed as −log10 p value) and the corresponding volatile organic compounds (VOC) mass peak detected in all linkage groups except Pd7. Pd8 resulted significantly associated with 21 VOC mass peaks while for the other linkage groups, the number of VOC mass peaks exceeding the significance threshold ranged from 1 (Pd1, Pd5 and Pd6) to 10 (Pd2) ( Table 2). As for the VOCs contributing the roasted volatile profile of almonds, 33 mass peaks showed significant association (Table 3). Similarly, to what registered for the VOC profiling of fresh almonds, the highest number of signals were detected in Pd8 (21) while no significant associations were observed for Pd6 and Pd7 (Table 3). Among the VOC mass peaks showing a significant association, 15 were in common between fresh and roasted almond kernels (Tables 2 and 3). All those mass peaks were mapped in the same genetic regions in both VOC assessments except for m/z 44.025 (unknown molecule, mapped in Pd8 and in Pd3 respectively, Tables 2 and 3 Table 1), suggesting that, even if the quantity of the VOC changed significantly during roasting, the genetic region associated to the trait remained the same.
In both fresh and roasted phenotypes, the significant signals in Pd8 were detected in two genetic regions: at around 5.5 Mb (2 and 7 SNPs respectively for fresh and roasted kernels) and 17.4 Mb (19 and 14 SNPs respectively) suggesting the existence of either a cluster of genes underlying the synthesis of different aromatic compounds or the presence of common genetic regulation systems (Fig. 5). Further study with higher marker density will help to clarify the number and function of the genes located in Pd8.).

Analysis of LD
The analysis of the non-random association between loci through a whole-genome LD decay scan provides insights on the population genetic forces structuring the germplasm collection in the analysis. The mean r 2 for all intrachromosomal loci pairs was equal to 0.083, while the chromosome-wise LD ranged from 0.076 (Pd2) to  For each SNP exceeding the GWAS significance threshold, the corresponding physical position according to the Prunus dulcis Texas Genome v2.0 was reported together with the relative FDR-adjusted p value (expressed as −log10 p value) and the corresponding volatile organic compounds (VOC) mass peak 0.181 (Pd5). The LD decay detected in this study was slightly higher than in previous reports (mean r 2 = 0.04 23 ). The highly significant r 2 threshold, calculated as the 95th percentile of the r 2 distribution, was 0.41, corresponding to LD blocks of~60 Kb (Fig. 6). LD in almond decayed faster than peach [60][61][62][63] , apricot 64 and cherry 65 . Among the different causes taking part in the species-specific LD decay, the mating system (out-crossing versus self-compatible) is probably the most important factor influencing the different LD decays in Prunus. While almond and cherry, with some rare exceptions, are out-crossing species, peach and apricot are self-compatible and the subsequent self-pollination results in lower heterozygosity and slower LD decay 66 .

Conclusion
Future breeding programs, focused on the optimization of consumer perceived quality, need to consider almond VOC modification related to genetic variability, environmental effect, and transformation. This can be achieved only with a more objective and precise identification of the best performing cultivars to be used as superior parental lines in combination with a reliable phenotyping methodology and genotyping assay. This investigation supports the use of PTR-ToF-MS as an accurate and objective phenotyping tool to evaluate the VOC profile of almonds. Indeed, most of the molecules that were previously identified on a limited number of almond accessions by gas chromatographic A preliminary GWAS analysis enabled the identification of 63 VOC mass peaks (related to fresh and/or roasted treatment) showing a significant phenotype-genotype association. The detection of molecular markers in close linkage to several aroma components could be of great interest for the set-up of marker-assisted selection (MAS) approaches in novel breeding schemes to enhance the almond aroma. However, a better understanding of genes and enzymes involved in the VOC production, during kernel ripening or during roasting, is still needed. Further studies aimed at a real-time VOC assessment during almond roasting will provide a more complete overview of the volatilome of almond kernel, while the availability of a dedicated almond SNP array will allow a better genetic resolution for the detection of candidate genes regulating the aromatic characteristics of almond.

Plant materials
The germplasm was composed by 106 almond accessions maintained in the ex situ germplasm collection held at the 'Museo vivente del mandorlo Francesco Monastra', located in Sicily (latitude: 37.2921094, longitude: 13.5817574, altitude: 121 m above sea level). The germplasm collection was mainly composed of Sicilian almond accessions selected through centuries for their agronomic traits of interest (e.g. fruit quality, resistance to biotic or abiotic stress, shell hardiness) complemented with widely known national and international cultivars as outlined in Supplementary  Table 5. For each accession, almond kernels were harvested from four plants, grown under standard agronomical practices. Data related to the pomological characteristics of the fruit and kernel were retrieved from ref. 67 . Almond kernels were collected at full ripening stage according to the maturity period of the different accessions (from mid-August to mid-September 2019) and conserved at 4°C prior to the analysis. Three biological replicates of 3 g of sliced sample, each obtained by five fresh unpeeled almond kernels, were inserted into 20 mL glass vials equipped with PTFE/silicone septa (Agilent, Cernusco sul Naviglio, Italy). Measurements of almond VOCs were Fig. 6 Genome-wide scatterplot of linkage disequilibrium decay (r 2 , y axis) against the genetic distance (Mb, x axis) for pairs of linked SNPs across the eight linkage groups. In the window below, only the first 500 Mb were displayed together with a LOESS fitting curve summarizing the linkage disequilibrium decay at increasing physical distances (red continuous line) and the relative confidence interval (grey area). The intersection between the LOESS fitting curve and the 95th percentile of the r 2 distribution (black dashed line) was taken as the threshold value to consider two markers in close linkage disequilibrium performed on three biological replicates with a commercial PTR-ToF-MS 8000 (Ionicon Analytik GmbH, Innsbruck, Austria 12 ). The drift tube conditions were as follows: 110°C drift tube temperature, 2.8 mbar drift pressure, 428 V drift voltage, ion funnel (18 V). This leads to an E/N ratio of about 130 Townsend (Td), with E corresponding to the electric field strength and N to the gas number density (1 Td = 10 −17 Vcm −2 ). The sampling time per channel of ToF acquisition was 0.1 ns, amounting to 350,000 channels for a mass spectrum ranging up to m/z = 400. The sample headspace was withdrawn through PTR-MS inlet with 40 sccm flow for 60 cycles resulting in an analysis time of 60 s/sample. Pure nitrogen was flushed continuously through the vial to prevent pressure drop. Each measurement was conducted automatically after 25 min of sample incubation at 40°C and 5 min between each measurement was applied in order to prevent memory effect. All steps of measurements were automated by an adapted GC autosampler (MPS Multipurpose Sampler, GERSTEL) coupled to PTR-ToF-MS. After the PTR-ToF-MS measurement of fresh almonds, each vial, without cup, was transferred into an oven (WTB Binder, Germany) at 150°C for 15 min to achieve a medium roast. These roasting conditions were decided based on literature information 35,36 and on preliminary tests performed on almond kernel genotypes, characterized by different shapes and sizes profile of roasted almonds was assessed in the same way of fresh samples.
The analysis of PTR-ToF-MS spectra proceeded as described in Farneti et al. 11 . The array of masses detected with PTR-ToF-MS was reduced by applying noise and correlation coefficient thresholds. The first removed peaks that were not significantly different from blank samples; the latter excluded peaks with over 99% correlation, which mostly correspond to isotopes of monoisotopic masses 11 . R.4.0.2 68 internal statistical functions and the external packages "mixOmics", "heatmap3", "dendextend", and "ggplot2" were used for the multivariate statistical methods (PCA, heatmap, hierarchical clustering, and tanglegram) and for the "Lollipop graph" employed in this work [69][70][71][72] .

SNP Genotyping and synteny analysis
Total DNA was extracted from fresh leaf tissue using the CTAB extraction method proposed by Doyle and Doyle 73 following the protocol described by Distefano and colleagues 29 . The almond germplasm collection was genotyped employing the Illumina Infinium ® 18 K Peach SNP array 31 . The use of an SNP array developed for peach is due both to the lack of SNP arrays specifically designed for almond and the high marker transferability between the two species 57 . Robust SNPs were filtered using the ASSIsT software 53 with default parameters (allowed missing data = 0.05, unexpected genotype threshold = 0.003, frequency rare allele = 0.05). Markers were ordered along the eight linkage groups using the Prunus dulcis Texas Genome v2.0 54 , while the Prunus persica Whole-Genome Assembly v2.0 56 was employed for collinearity analysis.
Deciphering the population structure of the almond collection The most probable number of subpopulations (K) characterizing the 106 accessions was assessed using the STRUCTURE software v2.3.4 58 . The K tested ranged from 1 to 10. For each K, five independent runs were carried out with a burn-in period of 10,000 and 100,000 Markov chain Monte Carlo replications after burn-in. The K value that best fits the data was assessed by calculating the DeltaK value 74 as implemented in the STRUCTURE HARVESTER program 75 (http://taylor0.biology.ucla.edu/ structureHarvester/). The five independent runs were integrated using the CLUMPP software 76 and resulting Q matrices were displayed in R 68 . Accessions showing a membership coefficient (Q i ) equal or higher than 0.8 were assigned to a subpopulation, while the others were classified as 'admixed' 77 .

Phenotype-genotype association analysis
Phenotypic and genotypic data were integrated in a GWAS analysis using the Efficient Mixed-Model Association eXpedited (EMMAX) implemented in the 'GWAS' function of the rrBLUP R package 78 . The GWAS model employed in the analysis is expressed as follows: where β is a vector of fixed effects modelling both environmental factors and population structure, the variables g and τ models the genetic background of each line as a random effect and the additive SNP effect as a fixed effect respectively; ε summarizes the residual variance. The GWAS model employed takes genetic structure and kinship matrix as covariates to correct for genetic stratification and parental relationship. To minimize typeone errors, significant associations were detected after correcting the p value for multiple testing using the false discovery rate (FDR) method 79 . FDR is computed using the q value package in R 80 ; SNPs exceeding the FDR threshold rate of 0.05 (represented by a dashed line in the Manhattan plot) were considered significantly associated with the phenotype.

LD, QTL anchoring and in silico gene annotation
The LD decay was calculated using the R package sommer v2.9 81 . The genome-wide LD decay was visualized using the R software 68 plotting the LD parameter r 2 against the corresponding physical distance. The 95th percentile of the r 2 distribution was taken as the threshold value to consider two markers in close LD 82 . The r 2 threshold baseline was matched to the locally weighted polynomial regression-based fitting curve (LOESS) to estimate the average LD decay distance using the 'stats' R package 68 .