Dissecting the complex genetic basis of pre- and post-harvest traits in Vitis vinifera L. using genome-wide association studies

Abstract Addressing the pressing challenges in agriculture necessitates swift advancements in breeding programs, particularly for perennial crops like grapevines. Moving beyond the traditional biparental quantitative trait loci (QTL) mapping, we conducted a genome-wide association study (GWAS) encompassing 588 Vitis vinifera L. cultivars from a Chilean breeding program, spanning three seasons and testing 13 key yield-related traits. A strong candidate gene, Vitvi11g000454, located on chromosome 11 and related to plant response to biotic and abiotic stresses through jasmonic acid signaling, was associated with berry width and holds potential for enhancing berry size in grape breeding. We also mapped novel QTL associated with post-harvest traits across chromosomes 2, 4, 9, 11, 15, 18, and 19, broadening our grasp on the genetic intricacies dictating fruit post-harvest behavior, including decay, shriveling, and weight loss. Leveraging gene ontology annotations, we drew parallels between traits and scrutinized candidate genes, laying a robust groundwork for future trait-feature identification endeavors in plant breeding. We also highlighted the importance of carefully considering the choice of the response variable in GWAS analyses, as the use of best linear unbiased estimators (BLUEs) corrections in our study may have led to the suppression of some common QTL in grapevine traits. Our results underscore the imperative of pioneering non-destructive evaluation techniques for long-term conservation traits, offering grape breeders and cultivators insights to improve post-harvest table grape quality and minimize waste.


Introduction
Grapevine plays a pivotal role in global fruit production, generating almost 70 million tons of fruit annually, of which 43.3% is table grapes [1,2].The ideal grape qualities for consumers and growers encompass traits such as size, taste, firmness, and postharvest longevity, while the economic value is intrinsically tied to yield and quality [3][4][5][6][7].However, various factors including rachis browning, cluster decay, and berry cracking can compromise postharvest quality, affecting not only the aesthetics but also the sensory perception of the fruits [8][9][10][11][12][13].
Traditional grapevine breeding methods face challenges in balancing these desirable traits, largely due to the considerable investment in time and resources, in addition to the prolonged juvenile phase characteristic of woody perennial species [14][15][16][17].This issue is particularly pertinent since the fruit characteristics evaluation can only start after the plant matures, typically in the fourth or fifth year of its life cycle [18].An emergent solution lies in employing molecular markers derived from quantitative trait loci (QTL) analysis.This approach facilitates the prediction of fruit characteristics in immature plants, and promises a substantial reduction in the breeding cycle of up to a decade, as well as notable cost savings of 16%-34% [19][20][21].
Due to the commercial importance of seedlessness in table grapes, further research has been carried out to elucidate the genetic architecture of this trait.Bouquet et al. [22] proposed the prevailing hypothesis on seedlessness' genetic control, suggesting it is largely controlled by a dominant regulator gene, the seed development inhibitor (SDI), and three unidentified recessive genes.Chromosome 18 hosts a QTL linked to stenospermocarpic seedlessness, accounting for 50%-90% of seed weight variation [23,24].Further studies have identified VvAGL11, a major functional candidate gene encoding a MADS-box transcription factor involved in seed development [25].Marker-assisted selection (MAS) suitable for use in grape breeding programs has been developed [26].While additional QTL linked to seed dry weight have been found on chromosomes 2 [27], 5 [4,27], and 14 [4,27], their contributions are smaller compared to SDI.In terms of fresh weight, loci have been identified across numerous chromosomes [4,24,[28][29][30].Likewise, loci inf luencing seed number have been detected across various chromosomes [4,24,[28][29][30][31].
The genetic basis of berry shape, defined as the ratio between width and height or based on categorical scales, has also been investigated [41,42] and has been associated with genes on multiple chromosomes and a variety of functions, including transcription regulation, binding activities, catalytic activity, cell wall biogenesis, and protein transport [42].Berry diameter is inf luenced by loci on chromosomes 2 and 18 [24,33], whereas berry volume is impacted by loci on chromosomes 2, 12, 17, and 18 [33,40].Berry cracking has been linked to regions in chromosomes 11 and 13 [11,35].
Linkage mapping has contributed significantly to our understanding of grapevine traits but has limitations due to a reliance on high frequency of recombination events [43].Genome-wide association analysis (GWAS) provides an alternative by studying genetic architecture and gene interactions inf luencing traits [44][45][46][47][48][49].Despite these advances, further GWAS studies are needed for berry diameter, cluster weight, and post-harvest traits, the genetic underpinnings of which remain less explored than those of seed traits.
Exploring trait genetic architecture requires diverse individuals representing existing trait variability [50].Yet, many studies have relied on single-population analysis, underscoring the need for broader analyses, such as the three-population approach as demonstrated by [4].
In this study, we carried out a GWAS analysis on a comprehensive set of 588 genotypes, which includes seven populations and commercial varieties, to investigate into 13 yield-associated traits.Our main goals were i) to discover new QTL, with a special emphasis on the lesser studied post-harvest traits, and ii) to corroborate QTL previously pinpointed in other studies.Consequently, we aimed to elucidate the genetic architecture underpinning these traits.
Moreover, we present two novel statistical methods for the in silico validation of candidate single nucleotide polymorphisms (SNPs) that surpass the set P-value threshold.Firstly, we evaluated the suitability of molecular markers for MAS by integrating the 'bagging' concept [51].In the second approach, we utilized Gene Ontology-derived bioinformatic data to construct a trait network based on correlations, efficiently elucidating genetic interrelationships among traits.This enables the discernment of underlying genetic relationships among traits in an efficient and parsimonious manner.

Phenotype analysis
Figure 2A presents the adjusted phenotypes, which were tested for significant differences among families using the ANOVA test (P < 0.01, Supplementary Table S3).While the distribution of all traits, except for the seed traits, approximated a normal distribution, the seed traits showed mixture of two Gaussian distributions.Despite our attempts to improve the normality of the trait distributions through standard transformations such as log, sqrt, inverse, and asinh, we were unsuccessful.Nonetheless, the residuals of all cases followed the assumption of a normal distribution.

Population structure, genetic relatedness, and linkage equilibrium decay
Figure 2B shows the Principal Component Analysis (PCA) output, where the first eight PCs explain 92.27% of the total genetic variance.We detected nine clusters, including the seven crosses, the diversity panel group (jardin), and a ninth group of self-pollinated accessions.PC1 explains 63.36% of the total genetic variance and separates accessions based on their female ascendant.The PC1 axis divides the accessions based on their female parent, where the left portion of the axis represents four families of halfsibs (406, 411, 900, and 912) with cultivar 23 (Ruby Seedless × Centennial Seedless) as their female parental line and the selfpollinated family with cultivar 23 being both the female and male parental.The right portion of the axis includes the jardin group and the three remaining families (111, 902, and 929).In contrast, PC2, which accounts for 12.43% of the total genetic variance, enables clear identification of all the clusters of full-sibs.The lower portion of PC2 is occupied by two families (111 and 406) that share Crimson (the variety from the diversity panel with the lower value for PC2) as the male parent.
According to the kinship and k-means analysis, there are nine distinct clusters, which can be further categorized into two superclusters, which are consistent with the findings of the PCA.The first supercluster includes all the lines with accession 23 as their female parental line, while the second supercluster comprises the remaining lines.Analysis of linkage disequilibrium (LD) decay indicates that an average LD decay occurs at a distance of 10 kbp for a correlation threshold of r 2 = 0.2, as shown in Supplementary Fig. S4.

Genome-wide association study
We used the BLINK algorithm to assess a panel of 588 genotypes for 13 yield-related traits using a total of 49 210 SNP markers.Our QQ-plots showed deviations from the null hypothesis of no association for 11 of the 13 traits, as detailed in Supplementary Fig. S5.We did not observe any association for S_number and P_rachis_loss.We identified 69 significant associations (Table 1) above the false discovery rate (FDR) threshold, and 49 of these associations also exceeded the Bonferroni threshold (α = 0.05 49210 , LOD > 5.99).Chromosome chr18 had the highest number of associations (16), while chromosomes chr 3, chr 6, chr 7, chr 10, chr 12, and chr 13 had only one association each.Among the 11 traits with significant hits, B_weight had the fewest associations (1), while B_width (11) and S_fresh (14) had the highest number of associations.Additionally, H_rachis and P_cluster showed two associations each as shown in Fig. 3.

Associated SNPs as candidates for marker selection
In evaluating marker-trait associations, any significant SNP marker must exhibit a discernible pattern when we arrange accessions based on their phenotypic values.Consider an extreme case: if an SNP accounts for 100% of the variance, accessions coded as "0" should be first, followed by those genotyped as "1", and lastly by those marked as "2".Following this logic, if lines with genotype "2" register higher phenotypic values than those with genotype "0", a negative Spearman's correlation emerges.However, when phenotypic values rise in lines with genotype "0" Table 1.GWAS results.Information about markers associated with traits is presented in the table, including their physical position (chromosome and exact bp) and LOD score.
Additionally, we provide annotation information on the closest gene, such as gene name, BLAST similar proteins, and gene ontologies of the protein products of these genes

Gene
BLAST hits dist (bp) Gene ontologies  over those with "2", we observe positive correlations.In instances with ambiguous genotype trends, correlations lean towards zero.
Given that the assignments "0" and "2" are arbitrary for any two possible homozygous genotypes of a particular SNP, our focus is on the correlation's absolute value, rather than its original value.
We obtained Spearman's correlation (ρ) values as described in section 2.7 for the 11 traits with significant associations.Those ρ values ranged from 0.516 (H_cluster) to 0.941 (S_dry) (Fig. 4).We observed that related traits exhibited similar patterns, as seen in S_fresh (0.805) and S_dry (0.941), where accessions carrying allele "0" tended to have higher phenotypic values.However, we also identified an interesting phenomenon in the extreme highest phenotypes of S_fresh, which were occupied by accessions carrying allele "1".We found that B_height (0.907) and B_width (0.936) exhibited a continuous and smooth decay, whereas B_weight (0.713) exhibited an irregular and less informative decay.B_shape (0.603) and H_cluster (0.516) show the lower values, and their patterns were close to a uniform distribution in which the average genotypic value for each position is expected to be the average genotypic value in the whole population.Rachis-related traits, H_rachis (0.848) and P_rachis (0.718), showed a similar pattern, where accessions carrying allele "1" were highly likely to be in the top positions.Finally, P_cluster (0.894) and P_cluster_loss (0.803) show a solid but noisy decay trend.
To further interpret the ρ values, we conducted a second experiment using 10 randomly selected SNP markers for each trait instead of the most significant one (see Supplementary Fig. S7).The overall average Spearman's correlation was 0, with a high standard deviation of 0.41.

Candidate genes and gene ontology
We detected a total of 69 SNPs significantly associated with the phenotypic traits of interest (Table 1).We found that 48 SNPs were located within a gene, while 20 SNPs were situated within 10 kbp of a candidate gene among these identified SNPs (Table 1).The nearest gene to the remaining SNP (S_fresh -chr11:13989722) was located at a distance of 14 792 bp.Our study identified a set of 172 candidate genes (Fig. 3).On average, each GWAS hit was associated with 2.49 genes, resulting in 163 distinct genes, with 154 genes associated with only one hit, and nine genes associated with two hits.Notably, four genes (Vitvi18g01899, Vitvi18g01900, Vitvi19g00424, and Vitvi19g00425) were shared between S_fresh and S_dry, while two genes (Vitvi08g1794 and Vitvi08g1795) were common to both B_height and B_width.We observed a single gene in 13 hits and a maximum of six for hit S_fresh -chr15:16996087 followed by five genes for the associations of B_height -chr05:02487910, B_shape -chr05:00336776, B_width -chr08:17678111, and P_rachis -chr02:02666566.

Correlation between traits
We observed strong phenotypic correlations among traits within the same organ (seed, berry, and cluster/rachis) in our study, except for P_cluster_loss, which did not exhibit any correlation with other traits as depicted in (Fig. 5) Berry traits, except for B_shape, exhibited strong correlations among them.B_shape only is correlated with B_height and this correlation is weak in  the diversity panel.Lower correlations were found across organs.Both breeding lines and cultivar panels showed weak and null correlations between seed traits and other traits, respectively.
We observed that correlations based on GO were generally lower than those based on phenotypes, except for the subset of GOs related to the Cellular component (CC) category, which was composed of 10 GOs that appeared more than twice.In contrast, correlations within related traits were significantly lower in CC-GOs compared to phenotypic correlations.Specifically, the strong correlations (r > 0.7) observed were primarily driven by "GO:0005634 nucleus", which was found in traits such as B_weight and P_cluster_loss.Higher correlations between B_width and H_rachis, and between S_dry and P_rachis, were driven by "GO:0016020 membrane" and "GO:0005737 cytoplasm", respectively.
The correlations based on the Molecular Function (MF) and Biological Process (BP) subsets are considerably lower, with only one value exceeding 0.7 for each subset.The MF-GO subset exhibits robust correlations between S_dry and P_cluster_loss, which can be attributed to a comparable pattern in the ontologies "GO:0003700 DNA-binding transcription factor activity" and "GO:0043565 sequence-specific DNA binding", shared with P_rachis.Moreover, the correlations between B_height and P_rachis share the ontologies "GO:0008270 iron ion binding" and "GO:0004185 serine-type carboxypeptidase activity", whereas B_height and B_shape have ATP/GTP-related common ontologies.In the case of the BP-GO subset, all correlations with values higher than 0.3 are caused by the ontology "GO:0006508 proteolysis", in which the four correlated traits (B_height, P_cluster, P_cluster_loss, and S_fresh) are over-represented.S_fresh correlations are lower due to a differential pattern based on chloroplast and photorespiration ontologies.

Trait contribution to gene ontologies
Results for the most frequent GO terms are shown in Fig. 6B.In considering all GO subsets, we observed consistent relative contributions of each trait, aligning with the proportion of GWAS hits for each trait.However, upon analyzing the ten most frequent GO terms for each subset, we discovered the trait-specificity of each GO.Among them, only three GOs showed more than five traits contributing to them, which are "GO:0005634 nucleus" (with 19 contributions from 9 traits), "GO:0005737 cytoplasm" (with Figure 4. Marker selection using the most significant SNP as candidates.The y-axis displays the average genotypic value of the most significant SNP detected for a given trait.The x-axis represents the ranking position, with 1 indicating the highest phenotypic value and the 250 indicating the lowest phenotypic value for a particular trait.For each ranking position, the mean genotypic value is determined by averaging 200 replications of a procedure that randomly samples a subset of 250 accessions, sorts them based on their phenotypic value, and assigns the genotypic value to the ranking positions.The dashed line represents the mean genotypic value discovered in the entire population for the SNP of interest.The highest LOD from the GWAS analysis links each trait to the SNP.We measured the following traits: S_fresh: seed fresh weight, S_dry: seed dry weight, B_height: berry height, B_width: berry width, B_shape: berry shape (height-to-width ratio), H_cluster: cluster weight at harvest, H_rachis: rachis weight at harvest, P_cluster: cluster weight at 45 days post-harvest, P_cluster_loss: percentage of cluster weight loss after 45 days, P_rachis: rachis weight at 45 days post-harvest.
13 contributions from 7 traits), and "GO:0005524 ATP binding" (with 7 contributions from 6 traits).We did not find any common ontologies such as "GO:0016020 membrane" or "GO:0005524 ATP binding" that included post-harvest traits.B_shape was the only trait found in both the organelles "GO:0005739 mitochondrion" and "GO:0009507 chloroplast".
With regards to seed traits, there is limited information available for S_dry due to only one of its associated genes having GO annotation.Specifically, a transcription factor from the TCP family has been identified near the major QTL of chr18:26 M. The SNP at chr14:7345807 is responsible for all of the ontologies related to chloroplast structure and function in S_fresh.This SNP is located within the gene Vitvi14g00472, which encodes for the ATP-dependent zinc metalloprotease FTSHI 5.
Regarding berry traits, we identified an SNP located within the gene sequence of Vitvi15g00014 (F6I5H6) as the only match for B_weight (chr15:485443).This gene encodes a TATA-binding protein (TBP) and is associated with ontologies related to nuclear positioning, ATP binding, and chromatin remodeling.
B_width determines "GO:0022857 transmembrane transporter activity" exclusively, which is identified in two distinct hits.The first hit contains a polymorphic nucleotide at chr11:4447151 within the gene structure of Vitvi11g00454.This gene encodes three NRT1/PTR FAMILY 6.2 proteins, The second SNP is located at chr17:6092969, positioned near the gene Vitvi17g00516, which encodes an NFD4-like protein.
Two distinct hits contribute to the "GO:0006508 proteolysis" in B_height.The first hit is located at chr08:11609006 and contains the genes Vitvi08g00928 and Vitvi08g02111, which encode for aspartic proteinase CDR1 proteins.The second hit, located at chr14:28926364, contains the gene Vitvi14g03061, which encodes for a serine carboxypeptidase.
Finally, we found that H_cluster exclusively contributes to the ontology "GO:0003755 peptidyl-prolyl cis-trans isomerase activity" due to an SNP located at chr04:20111668 within the sequence of Vitvi04g01437.Additionally, H_cluster is the only trait in which the ontology "GO:0034599 cellular response to oxidative stress" was identified.Two different hits (chr04:20111668 and chr08:18000198) contain candidate regions that harbor the genes Vitvi08g01521 and Vitvi04g01438, respectively.Vitvi08g01521 encodes type II peroxiredoxin E, while Vitvi04g01438 is associated with peptide methionine sulfoxide reductase A5. .Correlation between traits.The Pearson's r between traits is represented in each graph, with nodes representing each trait.The width of the edges between nodes is proportional to the strength of the correlation, with edges shown in grey for 0.3 < r > 0.7 and highlighted in black for r > 0.7.Correlations were calculated using two methods: GO-based correlations, which used GO frequencies as features, and phenotype-based correlations, which used BLUE values for cultivars (jardin) or breeding lines (other families).We measured the following traits: S_fresh: seed fresh weight, S_dry: seed dry weight, B_height: berry height, B_width: berry width,B_shape: berry shape (height-to-width ratio), B_weight: berry weight, H_cluster: cluster weight at harvest, H_rachis: rachis weight at harvest, P_cluster: cluster weight at 45 days post-harvest, P_cluster_loss: percentage of cluster weight loss after 45 days, P_rachis: rachis weight at 45 days post-harvest.

Trait adjustment
Historically, the QTL for SDI on chromosome 18 has been pinpointed as pivotal for grapevine seed presence [24,25,27,52].Although serving as a reliable control in calibrating GWAS models, its potential to spur false positives in other traits is acknowledged [4,53].Our results showed that despite leveraging best linear unbiased estimators (BLUEs) corrections for berry and harvest traits covariate effects in our analysis, post-harvest traits did not exhibit correlations with seed traits.The use of BLUEs corrections in our study may have contributed to the absence of some common QTL previously reported in the literature.For example, the SDI-co-located QTL for berry weight on chromosome 18 [54] was not detected in our analysis.To further investigate this, we conducted additional analyses using uncorrected BLUEs as the response variable for mixed linear model (MLM) and BLINK GWAS models.We found that the SDI QTL appeared at a Bonferroni level in MLM and at an FDR level in BLINK, as described in Supplementary Fig. S6.These findings suggest that the use of BLUEs corrections in our study may have led to the suppression of some common QTL in grapevine traits, highlighting the importance of carefully considering the choice of the response variable in GWAS analyses [4].Among our adjusted traits, only berry width aligned with the QTL on chromosome 18 with logarithm of odds (LOD) scores of 5.49 and 6.33.Conversely, most unadjusted post-harvest traits, except for the percentage of rachis weight loss after 45 days, displayed a QTL association on chromosome 18 with LOD scores ranging from 8.73 to 14.53, suggesting an indirect association with seed weight.Further investigation is needed to determine the precise nature of this relationship and whether it can be exploited for grapevine breeding purposes.

Gene ontologies as trait features
Gene ontologies offer a valuable avenue for functional gene comparisons, especially in expansive data analyses [55,56].Our study introduces a pioneering methodology leveraging GO to juxtapose traits, focusing on genes spotlighted in GWAS.Tapping into GO annotations, we probed the nexus between potential genes and target traits.Annotations were ranked from 1 to 5, where ascending scores mirror superior annotation quality, corroborated through empirical evidence or literary sources [55].
This approach provides a starting point for identifying potential trait features and their associated candidate genes, with applications in plant breeding.Moreover, similar approaches have been used in other fields such as human health to explore specific regions of interest [57,58].
We found that the CC subset GOs had higher correlations between traits due to their non-specific nature, as proteins are required in all cellular components for almost all quantitative traits as denoted in Fig. 5.In addition, the MF and BP GOs had lower correlations, which were expected due to their higher specificity.We suggest that the patterns and insights discovered from the MF-GO and BP-GO correlations are more valuable than those found in CC-GO correlations.

Seed traits
The widely recognized SDI QTL for seedlessness [24,25,27,52] is evident in both seed fresh weight (2 hits) and seed dry weight Figure 6.Contribution of traits to the most frequent GOs.To link traits to the most frequent GO terms, we followed a process represented in (A) by means of regions and genes.(B) For each type of GO, including CC, MF, and BP, we displayed the total number of GOs and the top 10 most frequent ones in (B).The bars were colored based on the proportion of GO occurrences that were found for each trait, using the pipeline shown in (A).S_fresh: seed fresh weight, S_dry: seed dry weight, B_height: berry height, B_width: berry width, B_shape: berry shape (height-to-width ratio), B_weight: berry weight, H_cluster: cluster weight at harvest, H_rachis: rachis weight at harvest, P_cluster: cluster weight at 45 days post-harvest, P_cluster_loss: percentage of cluster weight loss after 45 days, P_rachis: rachis weight at 45 days post-harvest.
(4 hits), detailed in Table 1 and Fig. 3 from our analysis.Contrasting with traditional models like MLM, which often detect numerous near-significant SNPs [59,60], the BLINK algorithm pinpoints significant, proximate SNPs.This arises as BLINK removes linked markers based on an LD threshold of r 2 > 0.7, often matching physically associated markers.This concurrence might stem from the major QTL effect, accounting for nearly 70% of the variance [23][24][25].A discernible trend exists in the SDI region for seed number, visible in Supplementary Materials (Fig. S5), albeit not reaching statistical significance.
Significant SNPs on chromosome 1 for seed fresh (chr01:22306525) and dry weight (chr01:19586589) might align with the WRKY3 QTL noted by [61].Similarly, QTL for seed traits in linkage groups 2, 4, and 14 described in [4] may correspond to those in our study.The coincident SNP for both seed traits on chromosome 19 at 5 Mbp (chr19:5737748) could match the NDR1 QTL from [62].This SNP in our study is a structural variant of the Vitvi19g00425 gene, denoted as an ankyrin repeat-containing protein lacking annotated ontologies.
In the SDI QTL, we observed a few GO annotations for candidate genes, which might have resulted in an under-representation of seed dry weight in the GO-related analysis.However, we identified genes Vitvi18g01868 and Vitvi18g01875 as transcription factors from the TCP and MYB families, respectively.
Regarding the significant presence of allele "1" in the accessions with higher fresh seed weights, we identified that six of the top 10 lines came from family 929, all carrying allele "1".Within the top 50 accessions, family 929 represented 12 (24%).All of these carried allele "1".Conversely, of the remaining 38 lines, only eight had allele "1", with the rest having allele "0" (Supplementary Table S9).Importantly, just two accessions had allele "2", and both ranked in the last quartile (at positions 411 and 433).The variation in allele frequencies can be attributed to the unique characteristic of family 929, which has Italia as its only seeded parental line.

Berry traits
In our search for berry weight-related QTL, we failed to identify any stable QTL commonly reported in the literature, such as those in [4].We only found one QTL in linkage group 15 [28]; however, its genetic location deviated considerably from our physical position.The absence of significant SNPs for berry weight might be due to the corrections we made for seed dry weight, as discussed in section 4.1.
For traits associated with size, we discovered QTL on chromosome 5 linked to berry dimensions-height, width, and shape.These QTL might correspond to one highlighted in an earlier study [63].The gene closest to the berry height QTL is characterized as a bromodomain protein that is associated with cell shape regulation ontologies [56].In our search for berry width, we detected a noteworthy SNP on chromosome 11 with an LOD score surpassing 19 and a robust Spearman's correlation (ρ = 0.936).This SNP, interestingly, resides within Vitvi11g00454, encoding a protein designated D7TC02.This protein, a member of the NRT1/PTR family, plays a role in the transmembrane transport of secondary metabolites in response to jasmonic acid.Furthermore, researchers recognize D7TC02 as the last divergent ortholog (LDO) of the Arabidopsis NRT1 protein [64] (locus AT2G26690https://www.arabidopsis.org/servlets/TairObject?id=32128 type = locus).
Concerning ontologies shared by berry height and shape, we identified GO terms including "mitochondrion", "nucleic acid binding", "GTPase activity", "GTP binding", and "mRNA cleavage".Genes like Vitvi05g00104 and Vitvi05g00105, which encode for a zinc finger CCHC transcription factor and the subunit 9A of DNAdirected RNA polymerases II, IV, and V, play a part in some of these ontologies pertaining to berry height.Contrastingly, genes such as Vitvi18g01149 and Vitvi18g02813, which encode for a Rac-like GTP-binding protein RAC1 and the subunit RPA12 of DNA-directed RNA polymerase I, respectively, are pivotal to the ontologies related to berry shape.

Harvest traits
In this study, we explored the QTL linked to cluster weight and rachis weight in grapevine.Given the highly quantitative nature of these traits, inf luenced by numerous minor gene contributions, the literature hasn't reached a consensus on their QTL location.Additionally, population or environmental factors might render these QTL unstable.We delved into an exhaustive literature review, consulting sources like [34,36,40,[44][45][46].Our findings identified two potential overlaps: one on chromosome 11 from [44] and another on chromosome 17 courtesy of [40], both related to cluster weight.However, our search yielded no overlaps for rachis weight.Nevertheless, our data indicates that of all the traits, only cluster weight inf luenced the response to oxidative stress ontologies, and this was evident through two distinct QTL on chromosomes 4 and 8.

Post-harvest traits
In this study, we focused on post-harvest traits, a subject not previously examined in depth in the existing literature.To the best of our understanding, we are the first to investigate the genetic underpinnings of these traits in detail.Consequently, our discoveries shed light on the genetic architecture of postharvest traits, laying a foundation for subsequent research in this domain.Out of all the traits we studied, only seed number and the percentage of rachis weight loss after 45 days of storage remained elusive in yielding significant results.Given that measuring rachis weight is inherently destructive, we could not compute the rachis weight loss using the identical cluster at both harvest and postharvest for each genotype and experimental setup.This added an element of noise and uncertainty to our phenotypic value calculations, highlighting the importance of developing non-destructive methods for measuring post-harvest traits in grapevine breeding programs.
Our exploration revealed QTL linked to several post-harvest traits.Notably, we found a QTL that overlaps with the seedlessness QTL SDI on chromosome 18.We discovered a novel QTL on chromosome 18 at 1 Mbp for cluster weight at 45 days post-harvest, which was linked to the gene Vitvi18g00099.This gene manifests as the pre-rna-processing TSR1 protein D7UD95 and the LDO of Arabidopsis TSR1 protein [64,65] (locus AT1G42440 https://www.arabidopsis.org/servlets/TairObject?accession=locus:2035893), essential in ribosome biogenesis, especially its small subunit.Our results provide novel insights into the genetic basis of postharvest traits and emphasizing the pivotal role of identified QTL in grapevine breeding schemes.We identified two QTL for the percentage of cluster weight loss post-45 days on chromosomes 2 and 11, associating with genes Vitvi02g00420 and Vitvi11g00663, respectively.The annotation of Vitvi02g00420 designates it as a hydroxymethylglutaryl-CoA synthase with roles in acetyl-CoA metabolic events, inclusive of sterol synthesis [56].Even though [35] described a berry-cracking QTL on LG 11, our detection probably doesn't align with it given the spatial disparity.
Assessing rachis weight 45 days post-harvest led us to 6 distinct QTL, with chromosome 18's QTL aligning with SDI.We found these QTL across chromosomes 2, 4, 9, 15, and 19.Except for chromosome 4, all manifested as structural polymorphisms of genes.The genes from QTL on chromosomes 9 and 15 encode proteins involved in catalytic processes, while genes from QTL on chromosomes 18 and 19 are transcription factors from MYB and SCR families, respectively.

Conclusion
In this study, we identified many SNP markers that were significantly associated with yield-related grape traits.Interestingly, 70% of them were located within an annotated gene.We discovered a novel QTL on chromosome 11 affecting grapevine berry width linked to Vitvi11g00454, a gene instrumental in managing stress via jasmonic acid, which encodes an NRT1/PTR protein that is recognized as the LDO of Arabidopsis' NRT1/ PTR FAMILY 6.2 protein.This suggests genetic potential for breeding larger berries.Additionally, we identified QTL inf luencing post-harvest traits on chromosomes 2, 4, 9, 11, 15, 18, and 19.These findings contribute to the understanding of genetic factors that underlie the fruit's susceptibility to decay, shriveling, and weight loss after harvest.Furthermore, our results highlight the need to develop nondestructive methodologies that can accurately assess long-term conservation traits.These insights are valuable for grape breeders and growers who seek to improve the post-harvest quality of table grapes and reduce waste.Additionally, our study highlights the importance of carefully considering the choice of the response variable in GWAS analyses, as the use of BLUEs corrections in our study may have led to the suppression of some common QTL in grapevine traits.Overall, our approach of using gene ontology annotations to compare traits and examine candidate genes may provide a useful starting point for identifying potential trait features and their associated candidate genes in plant breeding.

Plant material and experimental design
In this study, we analyzed a total of 68 table grape cultivars (Supplementary Table S1) sourced from the germplasm collection of the Instituto de Investigaciones Agropecuarias (INIA) in Chile and 536 segregating individuals from seven related F 1 families (Supplementary Table S2) from the INIA table grape breeding program.These families were generated from directed pollination of traditional varieties, including Crimson Seedless (Crimson), Flame Seedless (Flame), and Italia; selections from INIA's breeding program, such as 23 (Ruby Seedless × Centennial Seedless), 5 (Red Seedless × Dawn Seedless), and Iniagrape-one (Flame Seedless × Black Seedless, also known as Kishmish Chernyi); along with unidentified pollen donors (5LL, 3 V, and 18 V).These crosses were performed in 2010, conducted embryo rescue to obtain all plants, except for those resulting from cross 929, which involved the seeded genotype Italia.We then established single plants at the INIA experimental field in La Platina, La Pintana, Santiago, Chile (33 • 34'S, 70 • 37'W, elevation 630 m) in 2013.We planted all vines with row and vine spacing of 3.0 × 1.5 m and trained them using the Guyot system.Drip irrigation was used, and we employed standard agronomic and phytosanitary management practices, except for growth regulator usage, which was not applied.

Phenotyping
We conducted phenotypic characterization of clusters and berries during the 2018, 2019, and 2020 seasons.At harvest and postharvest times, we collected six clusters from each F 1 plant, harvested at 16 • Brix, and determined with an analog refractometer on-site.Similarly, we collected six clusters of three plants from each cultivar in the germplasm collection.
For each plant, we divided the six clusters into groups A and B, each consisting of three clusters.We evaluated clusters from group A only at harvest time, measuring nine traits, including cluster weight (g), berry weight (g), soluble solids ( • Brix), seed number, seed fresh weight (g), dry seed weight (g), berry height (cm), berry width (cm), berry shape (berry height/berry width), and rachis weight (g).For berry-related traits, we obtained measurements from 10 random berries from each cluster using an in-house script for image analysis.Rachises were weighed after trimming all berries.In contrast, we evaluated harvest and postharvest parameters of the same clusters from group B, which were labeled, weighed, and packed under standard commercial conditions before being stored at 0 • C for 45 days in a controlled atmosphere for future post-harvest evaluations.At post-harvest, we measured four traits: final cluster weight (g), cluster weight loss (percentage), rachis final weight (g), and rachis weight loss (percentage).Weight loss was determined as the difference in weight between harvest and post-harvest, divided by the weight at harvest.We measured the same clusters for cluster weight before and after storage and used different clusters for harvest (from group A) and post-harvest (from group B) evaluation of rachis weight, as it is a destructive measurement.An schematic diagram of the phenotypic process is shown in Figure 1.For the sake of simplicity and clarity, we recorded the original trait names as follows: S_number (number of seeds), S_fresh (seed fresh weight), S_dry (seed dry weight), B_height (berry height), B_width (berry width), B_shape (berry shape, i.e. height-to-width ratio), B_weight (berry weight), H_cluster (cluster weight at harvest), H_rachis (rachis weight at harvest), P_cluster (cluster weight at 45 days post-harvest), P_cluster_loss (percentage of cluster weight loss after 45 days), P_rachis (rachis weight at 45 days post-harvest), and P_rachis_loss (percentage of rachis weight loss after 45 days).

Genotyping, quality control, and imputation
Genomic DNA from both germplasm collection and breeding families was obtained using a DNAeasy ® Plant kit (QiaGen, Germany).Samples from breeding families were extracted once, while samples from the germplasm collection were extracted in duplicate.Sequencing and genotyping were performed at the bioinformatics facility of the University of Minnesota.Samples were processed for genotyping-by-sequencing (GBS) using an ApeK1 enzyme and following standard procedures [66][67][68].Pooled samples were sequenced using Illumina HiSeq 2500 equipment.Sequencing reads from each sample were mapped against the V. vinifera reference genome PN40024.12Xavailable from Ensembl genomes, using the Bowtie 2 aligner [69] and FreeBayes Software [70] to perform the SNP calling considering diploid.To filter the raw SNP set, we utilized the vcftools software [71], removing nonbiallelic sites and those with a minimum allele frequency (MAF) <5%.We also excluded samples with a call rate of <50%.After filtering, we obtained an SNP matrix with 49 210 markers.Missing values were imputed using Beagle 5.4 software [72].

Population structure, cryptic relatedness, and LD decay
We utilized the set of SNPs obtained after conducting quality control to examine the population structure of the grape cohort through PCA via the prcomp function.To validate the PCA analysis, we employed two unsupervised clustering machine learning approaches, namely, the hierarchical (hclust function) and the Kmeans algorithms (kmeans).We chose a value of K = 9 for both methods, as recommended by breeders, to account for a group of cultivars, seven crosses resulting in different families, and a potential group of accidental self-pollinated lines.We evaluated the cryptic genetic relatedness between individuals by computing VanRaden's kinship matrix using the AGHmatrix package [73,74].Linkage disequilibrium decay was assessed by calculating the relationship between pairwise squared correlation (r 2 ) of SNPs and physical distance within 500 Kbp via the snprelate package [75].

Modelling of raw phenotypic data
We performed an adjustment of the phenotypic records prior to conducting GWAS analysis.Specifically, we conducted a linear fixed effects model using lm function in base R [76] to remove seasonal and trait-specific covariate effects and obtain the total genetic value via BLUEs.The general model can be expressed as follows: where y represents a vector of phenotypic records for a given trait.We employed the fixed effects design matrices X 1 , X 2 , and X 3 .The vector β denotes the estimates for seasonal effects, while the vector ω represents the estimates for trait-specific covariates.The vector g contains the estimates for the total genetic value, and is a random and homoscedastic error term.We selected trait-specific covariates based on the breeder's knowledge.For berry traits (B_height, B_width, B_shape, B_weight) and harvest traits (H_cluster, H_rachis), we corrected for both soluble solids and seed dry weight (trait S_dry).For post-harvest traits (P_cluster, P_cluster_loss, P_rachis, P_rachis_loss), we corrected using soluble solids.No covariate was used to correct for seed traits (S_number, S_fresh, S_dry).

Genome-wide association study
We conducted GWAS analysis to evaluate associations between SNPs and 13 yield-related traits using BLUEs as response variable, as described in the preceding section.We utilized the BLINK algorithm [59], implemented in R [76] and included in the GAPIT3 package [60], and incorporated the first six PCs to account for population structure [53].To reduce type I errors, we adjusted P-values using standard FDR and Bonferroni corrections.We evaluated deviations from the null hypothesis of no association between SNPs and traits using Q-Q (Quantile-Quantile) plots, a critical step in detecting confounding factors that could inf late P-values [77].
We used the BLINK algorithm because it has been shown to outperform its predecessors, including the general linear model, MLM, and Fixed and random model Circulating Probability Unification (FarmCPU), in terms of both computational efficiency and statistical power, as reported in previous studies [59].The computational efficiency is achieved by substituting the expensive random effects model, which accounts for genetic relatedness and uses the REML algorithm, with an efficient fixed effect model that is fitted by optimizing the Bayesian information criterion (BIC).Additionally, better control of false positives and false negatives is achieved by overcoming the assumption of uniform quantitative trait nucleotide distribution across the genome.This is done by replacing the bin approach of FarmCPU with an LDbased criterion.

Evaluation of associated SNPs as candidates for marker selection
When dealing with simple, qualitative traits, the genetic architecture typically revolves around one or few genes.If we have a genomic marker linked to these genes, we will be able to i) precisely select genotypes that are likely to express the desired phenotype and ii) identify statistically significant phenotypic distinctions by categorizing genotypes according to the alleles of said marker or markers.
However, when it comes with complex, quantitative traits governed by numerous genes, the pursuit of markers that lead to significant phenotypic disparities in populations exhibiting alternative alleles becomes more challenging.The complexity arises from the fact that even if we find a statistical causative link through GWAS, the markers' capacity to account for phenotypic variance may remain relatively limited.
To assess whether a marker contributes to differences, a strategy involves arranging the genotypes based on their phenotypic records.Subsequently, an assessment is made of both the local and global patterns of allelic frequency.We aim to expand this approach by applying the concept of 'bagging' [51], which consists in generating multiple populations or 'bootstrapped samples' from the original one by permutation.
To minimize sampling bias, we generate 200 bootstrapped populations by randomly selecting 250 accessions from the pool with replacement , and then we assigned a ranking position based on their phenotypic values (with the accession with the highest phenotypic being ranked as 1 and the accession with the lowest value being ranked as 250).We then averaged the genotypic value found for each of the ranking positions in the 200 bootstrapped populations.We assessed trends using both visual inspection and an analytical method, using Spearman's ρ correlation between the ranking positions and the average genotypic value.We described the procedure using pseudocode notation in Algorithm 2.7.

Gene annotation and gene ontologies
We used Grapedia as the source for gene annotations of the PN40024.12Xgff file in this study.The "gene" rows (column 3) were the only ones kept in the analysis, as the exon/intron structure of each gene was considered irrelevant to our research objectives.Additionally, we converted a table of GWAS hits into bed format, with the first column representing chromosome number and the second and third columns indicating the start and end positions of the hit, respectively.The fourth column retained relevant metadata.The start and end positions were extended to include a window of 25 kbp around each significant SNP.Next, we utilized bedtools intersect [78] to generate a list of genes that intersected with the 25-kbp range around each GWAS hit, using the reduced gff file.Finally, we used Blast2GO (https://www.blast2go.com/)to obtain functional annotations for each gene identifier to provide additional context.We used the physical positions of markers significantly associated with traits as a reference to identify potential candidate genes.To do this, we selected a bin of 20 kbp around the marker position, which corresponds to ±10 kbp from the SNP position, based on the LD decay pattern of the panel.The variation in LD patterns across chromosomes may ref lect complex historical recombination events or selection pressures specific to some genomic regions.Nevertheless, the ±10-kb bin was selected as a conservative threshold to avoid false positives (the detection of genes that are not truly related to significant SNPs).We then searched for all genes located within this bin as candidate genes.In cases where there were no genes in the 20-kbp bin, we extended the search to a larger bin of 50 kbp (±25 kbp from the SNP position) to identify the closest gene.If the closest gene was located outside the 50-kbp bin, we considered that the hit was missing a candidate gene.
We utilized an in-house script to automatically extract GO information from the Uniprot database [56,79] for all candidate genes within the initial 20-kbp bin, using the GET function from the httr package [80].We classified the GOs into three main subcategories: cellular location, molecular function, and biological process, and linked them to traits via SNP associations to generate a trait feature matrix.Only GOs that appeared more than twice were taken into account.We compared pairwise GO-based trait correlations with correlations obtained from phenotypic records and investigated the relative contribution of each trait to the most frequent GOs found in this study, as shown in Fig. 6A.We conducted all analyses using R version 4.2.2 and utilized packages from the tidyverse family [81] for data processing, including dplyr [82], and used ggplot2 for visualization [83].

Figure 1 .
Figure 1.Schematic workf low of the phenotyping process.Berry, seed, and harvest cluster/rachis weights were phenotyped at the time of harvest from Group A's clusters.After 45 days at storage, cluster weight was phenotyped from Group A's clusters and rachis weight was phenotyped from Group B's clusters.

Figure 3 .
Figure 3. Multi-layered summary of GWAS.Colors indicate the different traits, and significant associations are represented in the inner Manhattan plot.The length of the line and the size of the point are proportional to the LOD value.The white circle's radius represents the Bonferroni threshold (LOD > 5.99), where points in the gray area indicate associations above this threshold.The second layer shows the density of SNP markers.The third layer is a schematic representation of 20-kbp bins in which the gene search was performed.The outermost layer displays the number of genes found in each region (one gene per point).S_fresh: seed fresh weight, S_dry: seed dry weight, B_height: berry height, B_width: berry width, B_shape: berry shape (height-to-width ratio), B_weight: berry weight, H_cluster: cluster weight at harvest, H_rachis: rachis weight at harvest, P_cluster: cluster weight at 45 days post-harvest, P_cluster_loss: percentage of cluster weight loss after 45 days, P_rachis: rachis weight at 45 days post-harvest.

Figure 5
Figure5.Correlation between traits.The Pearson's r between traits is represented in each graph, with nodes representing each trait.The width of the edges between nodes is proportional to the strength of the correlation, with edges shown in grey for 0.3 < r > 0.7 and highlighted in black for r > 0.7.Correlations were calculated using two methods: GO-based correlations, which used GO frequencies as features, and phenotype-based correlations, which used BLUE values for cultivars (jardin) or breeding lines (other families).We measured the following traits: S_fresh: seed fresh weight, S_dry: seed dry weight, B_height: berry height, B_width: berry width,B_shape: berry shape (height-to-width ratio), B_weight: berry weight, H_cluster: cluster weight at harvest, H_rachis: rachis weight at harvest, P_cluster: cluster weight at 45 days post-harvest, P_cluster_loss: percentage of cluster weight loss after 45 days, P_rachis: rachis weight at 45 days post-harvest.

Algorithm 1 .
An algorithm for estimating the suitability of a single SNP marker to phenotypically differentiate bootstrapped populations.Input: Vector of accessions (L), vector of phenotypic values (Y), and vector of genotypic values (X).set R as the number of replicates.set S as the sample size initialize an empty matrix M of R rows and S columns for r = 1 to R do generate a subset C by randomly sampling S accessions from pool L. reorder C subset based on phenotypic values on Y for s = 1 to S do set c as the s-th line in C c ← C s set current element in M as the genotypic value of accession c M rs ← X c end for end for Compute the average genotypic value of each column in M Vector with the average genotypic value of ranked positions based on phenotype