Chromosome-level genome sequence assembly and genome-wide association study of Muscadinia rotundifolia reveal the genetics of 12 berry-related traits

Abstract Vitis has two subgenera: Euvitis, which includes commercially important Vitis vinifera and interspecific hybrid cultivars, and Muscadinia. Of note, the market for Muscadinia grapes remains small, and only Muscadinia rotundifolia is cultivated as a commercial crop. To establish a basis for the study of Muscadinia species, we generated chromosome-level whole-genome sequences of Muscadinia rotundifolia cv. Noble. A total of 393.8 Mb of sequences were assembled from 20 haploid chromosomes, and 26 394 coding genes were identified from the sequences. Comparative analysis with the genome sequence of V. vinifera revealed a smaller size of the M. rotundifolia genome but highly conserved gene synteny. A genome-wide association study of 12 Muscadinia berry-related traits was performed among 356 individuals from breeding populations of M. rotundifolia. For the transferability of markers between Euvitis and Muscadinia, we used 2000 core genome rhAmpSeq markers developed to allow marker transferability across Euvitis species. A total of 1599 (80%) rhAmpSeq markers returned data in Muscadinia. From the GWAS analyses, we identified a total of 52 quantitative trait nucleotides (QTNs) associated with the 12 berry-related traits. The transferable markers enabled the direct comparison of the QTNs with previously reported results. The whole-genome sequences along with the GWAS results provide a new basis for the extensive study of Muscadinia species.


Introduction
Muscadinia grapes, which are native to the southeastern US and constitute one of the two subgenera of Vitis, were the first American grapes to be cultivated [1]. The other subgenus, Euvitis, includes interspecific hybrids of Vitis vinifera and Vitis labrusca (e.g. "Concord", "Niagara") that are familiar European and American bunch grapes, respectively. Muscadinia is composed of three known species, M. rotundifolia, M. munsoniana, and M. popenoei, and is a much smaller group than Euvitis, which includes approximately 70 species [2]. The number of chromosomes is also different between Euvitis (2n = 38) and Muscadinia (2n = 40). Whereas bunch grapes are one of the most economically important fruit crops worldwide, the market for Muscadinia grapes remains small. However, because of the nutritional content of Muscadinia grapes, they are receiving increasing attention [3][4][5]. Among the three Muscadinia species, only M. rotundifolia is commercially important.
Compared to bunch grapes, Muscadinia grapes have many superior traits. Muscadinia grapes are tolerant to many diseases that cause extensive economic losses in bunch grapes, such as downy and powdery mildew, Pierce's disease, and phylloxera [6][7][8][9]. However, introducing these disease resistance traits to Euvitis is not efficient due to the low success rate of interspecific crosses [10]. Only one resistance locus for powdery mildew and downy mildew, RUN1/RPV1, has been introduced to V. vinifera through grape breeding programs [11][12][13][14][15]. Along with the disease resistance traits, Muscadinia grapes are noted for their high nutritional content, including that of phenolic compounds [16]. The phytochemicals in Muscadinia grapes are effective against inflammation [17], in reducing obesity [18], in anticancer, and in antioxidant effects [19,20]. Muscadinia grapes have a distinctive aroma and flavor and contain sweet juice with low acidity [1].
Although Muscadinia grapes have the potential to be grown as important commercial crops due to their beneficial traits, only a few genetic studies related to Muscadinia breeding have been conducted [21][22][23]. Because genetic studies of Euvitis are relatively well established compared to those of Muscadinia, the use of genetic information from Euvitis as a reference would be an effective approach for establishing genetic studies of Muscadinia. For this purpose, molecular markers developed in Euvitis have been used in Muscadinia to develop genetic linkage maps [21,23]. However, the rate of transferability of SNPs obtained by simple sequence comparisons between V. vinifera species was found to be 2.3% among nonvinifera Vitis species and 0% in M. rotundifolia [24]. To address this issue, a recent study targeting the Euvitis core genome utilized RNase H2 enzyme-dependent amplicon sequencing (rhAmpSeq) to develop molecular markers that were transferrable across Euvitis species [25]. A total of 2000 rhAmpSeq markers were developed in the study, and 91.9% were transferable among Euvitis species. This ∼45-fold improvement in marker transferability is expected to be helpful in the genetic study of Muscadinia grapes.
To establish a basis for studies in Muscadinia grapes, we performed whole-genome sequencing and chromosomelevel assembly in a Muscadinia grape cultivar (M. rotundifolia cv. Noble). For high-throughput trait-associated marker development, we also tested the 2000 rhAmpSeq markers in a genome-wide association study (GWAS) involving a total of 12 berry-related traits.

Results and discussion
Sequencing and comparative analysis of the Muscadinia genome Genomic DNA of M. rotundifolia cv. Noble was used to generate a total of 50.7 Gb of 2 × 250 bp sequence reads. The reference-free profiling of the Muscadinia genome revealed diploid with 1.55-1.58% of heterozygosity and the estimated size of the haplotype genome was ∼416.1 Mb (Supplementary Fig. S1). The total lengths of repetitive and unique sequences were estimated to be ∼144.3 (34.7%) and ∼ 271.8 Mb (65.3%), respectively. The sequences were assembled into contigs (N 50 = 107 128 bp) with 399.5 Mb of the total length (Supplementary Table S1). The scaffolding of the contigs was performed by Hi-C and PacBio sequencing, resulting in 20 haploid chromosomes with a total assembled haploid length of 393.8 Mb. To evaluate the completeness of the assembly, we realigned the raw Illumina genome sequencing reads to the assembly, and 98.6% of these reads were mapped to the genome, indicating a high degree of completeness. We also performed a survey of core eukaryotic genes using BUSCO [26], which identified the presence of 96.0% complete universal single-copy orthologs. A recent diploid assembly of M. rotundifolia cv. Trayshed identified a similar number of complete BUSCO orthologs (97.5%) and a similar genome size (400.4 Mb for Haplotype1 and 369.9 Mb for Haplotype 2), suggesting that this "Noble" assembly marginally underrepresented the haploid genome [27]. While the estimated genome size of M. rotundifolia cv. Trayshed by a f low cytometry method was 483.4 ± 3.1 Mb in the previous study [27], our estimation for M. rotundifolia cv. Noble by a K-mer distribution analysis was 416.1 Mb. Considering that the sizes of the two independently assembled Muscadinia genome sequences with longrange sequence information are 393.8 and 400.4 Mb, respectively, the estimated genome size of M. rotundifolia would be close to 416.1 Mb. A total of 26 394 genes were annotated using a combination of PacBio Iso-Seq, Illumina RNA-seq data, and the amino acid sequences of V. vinifera, similar to gene counts obtained for "Trayshed" (25 706 for Haplotype 1 and 23 673 for Haplotype 2). The number of unique genes identified in our gene annotation that do not appear in M. rotundifolia cv. Trayshed was 2600. Compared to the genes identified in the V. vinifera PN40024 genome, the number of unique genes in M. rotundifolia cv. Noble was 1757 and that in M. rotundifolia cv. Trayshed was 1804, indicating that both Muscadinia genomes have similar quality of gene annotation. However, considering the close genetic distance of the two Muscadinia cultivars, the number of the unique genes might be due to the error of gene annotation in both Muscadinia genomes.
To investigate the assembly accuracy of the Muscadinia chromosome sequences, we compared the collinear chromosome sequences of M. rotundifolia cv. Noble, M. rotundifolia cv. Trayshed [27], and V. vinifera PN40024 [28]. For the comparison, we used nucleotide BLAST results between M. rotundifolia cv. Noble and M. rotundifolia cv. Trayshed and the position information of orthologous gene sets between M. rotundifolia cv. Noble and V. vinifera PN40024 (Fig. 1). Between the two Muscadinia cultivars, at least ten partial inversions, three translocations, and five inversions with translocation were observed, indicating that one of the two chromosome assemblies might contain assembly errors or genomic variations. The assembled genome size of V. vinifera PN40024 was 498 Mb [28], about 21% larger than that of M. rotundifolia cv. Noble. Gene synteny was highly conserved between M. rotundifolia cv. Noble and V. vinifera PN40024, but at least 18 partial inversions were also observed. Chromosome 7 of V. vinifera was observed to be divided into chromosomes 7 and 20 in M. rotundifolia, as previously shown [27].

Genotyping of the Muscadinia population with Euvitis rhAmpSeq markers
The chromosome-level assembly of the Muscadinia genome enabled the use of GWAS for the genetic analysis of Muscadinia traits. Of the 2000 rhAmpSeq markers developed to target the core Euvitis genomes [25], a total of 1599 markers showed polymorphisms in 356 individuals (Supplementary Table S2). After filtering and imputation, a total of 1283 markers and 348 individuals were used for the subsequent analyses. The imputation accuracy was 0.94. Consistent with the development of rhAmpSeq markers mainly in gene-rich regions [25], the 1283 markers primarily aligned to gene-rich regions of Muscadinia chromosome sequences ( Fig. 2A, upper panel). The distribution of the distances between adjacent markers was consistent for each chromosome, with median values ranging from ∼120 to 250 kb and interquartile values ranging from ∼70 to 510 kb ( Fig. 2A, lower panel).
Linkage disequilibrium (LD) between adjacent markers averaged 0.36 (r [2]), and the distance over which LD decayed to half of the initial value was ∼2.3 Mb (Fig. 2B). This represents far slower LD decay than that reported in Euvitis, in which rapid LD decay has been observed within 11.6 and 16.6 kb [29,30]. The reason for the slow LD decay might be the origin of the population which consists of breeding populations with multiple parent sets (Supplementary Table S2). Similar slow LD decay was also observed in a GWAS of breeding lines of apple derived from crossing [31].
To identify the number of subpopulations (k) of the Muscadinia population, we performed a parametric clustering analysis implemented by the ADMIX-TURE software [32]. Subpopulations from k = 1 to k = 30 were tested, and the minimum of the crossvalidation error was reached at k = 15 ( Fig. 2C and Supplementary Fig. S2). Therefore, the optimum k value was determined to be 15, and the value was used in the subsequent GWAS analysis.

Phenotype analysis of 12 berry-related traits of Muscadinia grapes
For GWAS, we used phenotype data for 12 berry-related traits of Muscadinia grapes ( Supplementary Fig. S3). To investigate the correlation coefficient among the 12 traits, we performed the Pearson test with the phenotype data. Two groups of the traits revealed significant positive correlations (Fig. 3). One group included three fruit color-related traits like visible color, chroma, and lightness, revealing ≥0.69 of the correlation coefficient (r). The other group included the traits like length, width, weight per berry, and firmness (r ≥ 0.54). The length and width revealed a strong positive correlation (r = 0.92), indicating Muscadinia berries tend to have a round shape. Interestingly, the firmness of berries also showed a positive correlation with the length, width, and weight per berry (r ≥ 0.54), indicating that larger berries tend to be firmer. The number of berries per cluster negatively correlated with the length, width, and weight per berry (r ≤ −0.27).

Quantitative genetic analysis of the 12 berry-related traits with multi-locus GWAS approaches
We performed multi-locus GWAS for the genetic analysis of quantitative traits. Because the quantitative trait nucleotides (QTNs) detected by multiple multi-locus approaches are known to be more reliable [33], we tested a total of six multi-locus methods, including mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pkWmEB, and ISIS EM-BLASSO [34]. To balance the high power of the multi-locus approaches for QTN detection and low falsepositive rate, we used ≥3 of a logarithm of the odds (LOD) threshold value [33]. If multiple methods detect the same QTN, we chose the result with the highestlog 10 (P) value. With this approach, we identified a total of 52 QTNs that were associated with the 12 berry-related traits (Fig. 4, Table 1, Supplementary Fig. S4, and Supplementary Table S3). Of the 52 QTNs, 26 were detected by multiple methods (Fig. 4, indicated by pink dots), indicating that these QTNs are more reliable than those were detected by a single method (Fig. 4, indicated by blue dots). Among the six methods, pLARmEB revealed the most sensitive detection of QTNs by detecting a total of 35 QTNs (Table 1).

Berry visible color, chroma, and lightness
We performed GWAS with three types of berry color: visible color, chroma, and lightness. A total of nine QTNs were detected for these traits, and three of them, including chr4_9358331, chr4_11238026, and chr16_23164495, were pleiotropic QTNs identified in multiple berry color traits (Table 1). Among them, chr4_9358331 and chr4_11238026 were identified as the major QTNs in the three traits revealing 8.06-28.15% of phenotypic variation explained (PVE). Because the markers used in this study were named after the chromosome and position in the reference genome sequence of V. vinifera [28], the marker position in the V. vinifera genome could be identified from the marker name. Therefore, the positions of the two major QTNs for berry color in the V. vinifera genome were 9.36 and 11.24 Mb on chromosome 4. The position of chr4_11238026 was consistent with the previous finding obtained from a high-density linkage map of Muscadinia species [22]. In the Muscadinia linkage map, the berry color locus was predicted to be located between 11.09 and 11.99 Mb on Euvitis chromosome 4, which includes the position of chr4_11238026. Another report identified a candidate gene for the determination of Muscadinia berry color based on the high-density linkage map of Muscadinia [35]. In the study, a glutathione S-transferase gene (VvGST4) associated with the anthocyanin biosynthetic pathway [36,37] was suggested as the candidate gene [35]. To determine whether our QTN is physically associated with the candidate gene, we investigated annotated genes between the two f lanking markers of the QTN in the Muscadinia genome. The positions of the f lanking markers were 9.96-10.75 Mb on Muscadinia chromosome 4, and a total of 29 genes were identified (Supplementary Table S4). Of the 29 genes, four glutathione S-transferase genes were identified. Among them, the two genes between 10.58 and 10.61 Mb of Muscadinia chromosome 4 (Supplementary Table S4, evm.model.chr4.1308 and evm.model.chr4.1310) corresponded to the candidate gene (VvGST4) with ≥98% nucleotide sequence identity. This indicates that the chr4_11238026 QTN for berry color in our study is associated with the known candidate gene for the trait. In the previous genetic study, the berry color locus was identified by qualitative trait scoring (black or bronze), indicating the locus is for qualitative trait determination [22]. Therefore, the chr4_11238026 QTN corresponding to the same locus was also expected to be the qualitative trait locus for Muscadinia berry color.
Because we used quantitative phenotype scores for the berry color-related traits, other QTNs for these traits were expected to determine quantitative traits. The pleiotropic QTN, chr4_9358331, revealed 8.37-10.46% of PVE in the three berry color-related traits and is considered a major QTN for Muscadinia berry-color traits. The flanking markers for this QTN were located at 7.49 and 7.97 Mb on Muscadinia chromosome 4, and a total of 22 genes were identified between the two markers (Supplementary Table S4).
The major quantitative trait loci (QTLs) of berry color in Euvitis species have been identified as VvMYBA1 and VvMYBA2 [29,38]. The orthologs of these genes in Muscadinia were identified around 19.01 Mb of Muscadinia chromosome 2. However, no QTN was detected around that region in this study. A previous study reported additional QTLs for the Euvitis berry color trait based on a high-density linkage map [39]. One of the additional QTLs was identified as being associated with VvGST4, which is related to the qualitative trait of Muscadinia berry color [35]. Among the rest QTLs, only one QTL, col-2-2, showed a similar location with one of our QTNs, chr2_6241099. One of the candidate genes identified by col-2-2 was VIT_02s0012g00630, and its Muscadinia ortholog was located at 7.46 Mb of Muscadinia chromosome 2 which is about 0.35 Mb away from the position of chr2_6241099. The flanking markers of chr2_6241099 were identified at 6.99 and 8.57 Mb on Muscadinia chromosome 2. Therefore, the chr2_6241099 QTN was expected to be the same QTL as col-2-2. Except for the chr2_6241099 QTN, the rest QTNs revealed different positions from the previous results in Euvitis. These results may indicate that the QTNs of berry color traits in Muscadinia are different from those in Euvitis; therefore, the berry color of Muscadinia might be regulated by a different genetic background from that of Euvitis.

Berry length, berry width, weight per berry, berry firmness, and number of berries per cluster
A total of eight QTNs were detected in berry length and width, respectively (Table 1). Of the eight QTNs, six were pleiotropic QTNs identified in both traits. This might explain the strong correlation (Fig. 3, r = 0.92) between the two traits. Among the six pleiotropic QTNs, chr2_4825658 revealed the highest PVE in both traits as 10.26% in length and 13.22% in width. The chr2_4825658 QTN was also detected in weight per berry and firmness, revealing 11.16% and 9.41% of PVE. The PVE of the rest QTNs was ≤6.12%. Therefore, chr2_4825658 was the major QTN for the four traits and might explain their correlation (Fig. 3, r ≥ 0.54). The flanking markers of chr2_4825658 were located at 4.29 and 6.72 Mb of Muscadinia chromosome 2. Between the two flanking markers, a total of 211 genes were identified (Supplementary Table S4). Because of the long distance between the flanking markers, it would also be possible that multiple genes in this region are involved in the four traits.
There has been no genetic study of Muscadinia berries related to size, weight, and firmness. In contrast, a recent study identified four markers associated with berry weight in Euvitis species [29]. However, the four identified markers were on Euvitis chromosomes 17, 18, and 19, different from our findings in Muscadinia. In our study, chr2_4825658 was the major QTN for berry weight, and the next was chr6_506716 with 5.39% of PVE. The two QTNs were located on chromosomes 2 and 6. The rest QTNs for berry weight revealed ≤3.12% of PVE.
In berry firmness, chr2_4825658 was the major QTN. In multiple fruit crops, the firmness of fruits is known to be determined by the activity of a polygalacturonase gene [40][41][42]. Among the genes identified by chr2_4825658, a polygalacturonase gene was identified at 6.23 Mb of Muscadinia chromosome 2 (Supplementary Table S4, evm.model.chr2.980). A QTL study with apple (Malus×domestica Borkh.) identified that the polygalacturonase gene is one of the QTLs that determines the apple fruit firmness [43]. These may indicate that the polygalacturonase identified by chr2_4825658 is a candidate gene for the berry firmness of Muscadinia grapes.
The number of berries per cluster revealed three QTNs. Of the three QTNs, chr5_5884957 revealed the highest PVE at 7.85%, and a total of 42 genes were identified between the flanking markers of the QTN (Supplementary Table S4). Although it revealed a significant negative correlation with the three berry sizerelated traits, it did not share any pleiotropic QTN with the three traits.

Weight per cluster, number of seeds per berry, scar pattern, and maturation date
Because berry size and the number of berries per cluster were negatively correlated, weight per cluster was expected to be related to yield. Only one QTN (chr17_11747237) was detected for the weight per cluster with 16.97% of PVE. A total of 96 genes were identified between flanking markers of the QTN (Supplementary Table S4). In Euvitis grapes, the QTL for cluster weight was analyzed in three different years and revealed unstable QTLs [44], indicating that the QTN for weight per cluster of Muscadinia may also need further study to obtain reliable results. Two QTNs were detected for the number of seeds per berry, including chr7_22781295 and chr17_7695724 with 12.89 and 10.65% of PVE, respectively. A total of 33 and 15 genes were identified by chr7_22781295 and chr17_7695724, respectively (Supplementary Table S4). The scar pattern was measured to the extent of the wound on berry skin when separating a mature berry from the pedicel. A total of four QTNs were detected for scar pattern. Among them, chr11_5215446 revealed the highest PVE with 11.78%. A total of 36 genes were identified between the f lanking markers of chr11_5215446 (Supplementary Table S4). In the maturation date, one QTN (chr18_7185348) was detected with 6.48% of PVE. The QTN was supported by all the six methods, indicating that it is statistically reliable. A total of 72 genes were identified between the flanking markers of the QTN (Supplementary Table S4).

GWAS of Muscadinia grapes
We performed a GWAS of Muscadinia berry-related traits with the 1283 rhAmpSeq markers developed from Euvitis genome sequences [25]. Although the number of markers used was much smaller than that employed in the genotyping by sequencing (GBS) method, the rhAmpSeq markers provide more information for tracking ancestral haplotypes, as each amplicon is a local haplotype spanning multiple SNPs [25]. The rhAmpSeq marker resolution was sufficient here in part because our population showed slow LD decay. However, some chromosomes harbored empty blocks that did not contain markers in gene-rich regions ( Fig. 2A: chromosomes 2 , 4, 6, 8, 10, 18, and 20). This is an issue that should be solved to use transferable markers across a wide range of Vitis species. Although the transferability of the Vitis rhAmpSeq markers was shown to be 91.9% when applied to Euvitis species, it was 61.9% to the Muscadinia species examined in our study. This difference is partly due to the genetic distance between Euvitis and Muscadinia species and the low diversity among the evaluated Muscadinia individuals, which consisted of progeny from multiple parental sets (Supplementary Table S2). The low diversity of the Muscadinia population might also be related to the slow LD decay observed. Therefore, the development of a proper core Muscadinia collection will be necessary to perform an efficient GWAS with a high resolution.
Among the 12 berry traits analyzed in our study, the three traits related to berry color could be compared to the results of the previous genetic analysis in Muscadinia species [22]. The two independent genetic analyses of berry color in Muscadinia grapes revealed the same results, indicating that the results of our GWAS are reliable. The multi-locus GWAS methods are more powerful than the single-locus method in detecting QTNs for complex traits [33]. Using six different multi-locus GWAS methods, we could obtain reliable QTNs for the 12 Muscadinia berry-related traits. In particular, the pleiotropic QTNs identified from the significantly correlated traits would provide strong evidence for further study to identify pleiotropic genes that control the traits.

Conclusions
In this study, we report a chromosome-level reference genome sequence for Muscadinia grapes. Based on the assembled chromosome sequences, the genome size of M. rotundifolia was approximately 21% smaller than that of V. vinifera, but gene synteny was highly conserved between the two genomes. Our multi-locus GWAS analysis of Muscadinia grapes with transferable markers between Muscadinia and Euvitis revealed reliable results in the identification of quantitative trait-associated markers. The transferability of the markers used in our GWAS will provide new opportunities to study comparative genetics between Muscadinia and Euvitis species. Along with the obtained whole-genome sequence, the GWAS of M. rotundifolia will help expand the study of Muscadinia grapes.

Genome sequencing and assembly
Plant samples were collected from a 5-year-old M. rotundifolia cv. Noble grapevine. Flash-frozen leaf was ground with a mortar and pestle under liquid nitrogen, and nuclear DNA was isolated using the Dellaporta method. Sequencing libraries were prepared using 1 ug of DNA using the Illumina TruSeq PCR-free kit (Illumina), and size selection for ∼250 bp fragments was performed using SPRIselect beads (Beckman Coulter). Libraries were sequenced at ∼121× depth on an Illumina HiSeq 2500 in rapid run mode, and a total of 101 428 643 paired-end reads were generated. The total length of the reads was 50.7 Gb. The ploidy level was determined with Smudgeplot software [45]. The heterozygosity and haploid genome length of the Muscadinia genome were estimated with GenomeScope 2.0 [45]. For Smudgeplot and GenomeScope analyses, K-mers were counted with KMC software at K = 21 [46]. We assembled the reads using DISCOVAR de novo (Broad Institute, Cambridge, MA) to generate sequence contigs. To scaffold the resulting contigs, we performed HiRise genome scaffolding using both in situ and in vitro Hi-C. To fill the remaining assembly gaps, we generated ∼10X coverage PacBio sequences with an average fragment size of 10 kb. The gap-filling was performed with PBJelly (ver. 14.9.9).

Gene annotation
To annotate genes, we first annotated transposable elements with RepeatModeler v2.0.1 and masked repeat sequences with RepeatMasker v4.1.0. To identify and characterize genes, we used muscadine transcriptome data. For this purpose, we generated RNA-seq data on the Illumina HiSeq 2500 platform from triplicate samples of 13 tissues (adult leaves, closed flowers, fine roots, fruit set, mature fruits, open flowers, pollen, preveraison, ripening seeds, ripening skin/flesh, tendrils, veraison, and young leaves). In addition, we performed PacBio Iso-Seq on pooled RNA using the Sequel II platform. A total of 4 834 933 circular consensus reads were generated, with a total length of 11.9 Gb and an average read length of 2.5 kb. By combining Illumina RNA-seq, PacBio Iso-Seq, and protein sequences of the V. vinifera genome, we performed gene prediction with BRAKER [47].

Comparative genome analysis
Comparative genome analysis was performed on the chromosome sequences of M. rotundifolia cv. Noble, M. rotundifolia cv. Trayshed, and V. vinifera PN40024. Colinear regions between the two Muscadinia cultivars were identified by nucleotide BLAST search (score: ≥2000, identity: ≥97%, and e-value: ≤e-10). To identify colinear regions between M. rotundifolia cv. Noble and V. vinifera PN40024, orthologous CDSs were identified according to reciprocal best hits from a BLAST search (single HSP, score: ≥150, and e-value: ≤e-8) using all the CDSs of the two species. The reverse transcriptase (RT) domain sequences of LTR retrotransposons were isolated with HMMER (default options) using customized HMM profiles for the Gypsy and Copia superfamilies ( Supplementary  Files 1 and 2). The position information of the RT sequences was used to display the distribution of LTR retrotransposons in the two Muscadinia cultivars and V. vinifera chromosome sequences. The visualization of the comparative analysis results was carried out with an in-house Python script.

Genotyping, population structure, and LD
The genomic DNA of the 356 Muscadinia individuals was extracted with Qiagen DNeasy plant kits by following the provided manual. The genotyping of the 356 individuals was performed with the 2000 rhAmpSeq markers by following the method described in Zou et al. [25]. A total of 1599 markers revealed polymorphisms among the 356 individuals. Markers with a minor allele frequency of ≤0.01 and individuals with a minimum heterozygosity proportion of ≤0.05 were filtered, leaving 1283 markers and 348 individuals, respectively. The missing genotype data were imputed with the LD KNNI method implemented in TASSEL 5.0, and the imputation accuracy was calculated with the same software. The marker density on the Muscadinia chromosome sequences was analyzed by aligning the 1283 marker sequences with BLAST (single hit, single HSP, score: ≥40, and e-value: ≤e-8, identity: ≥0.95). The genotype data of the 348 individuals was used to analyze the population structure. The number of subpopulations (k) was inferred by ADMIXTURE analysis [48]. The bar plot of the 348 individuals was presented with STRUCTURE 2.3.4 [49,50] for k = 15. The LD (r [2]) values among the 1283 markers were calculated with the software implemented in TASSEL 5.0, and the graph of LD decay was generated with an in-house Python script.

Phenotyping and GWAS analysis
All phenotype data of the 348 Muscadinia individuals were obtained from previous studies by Campbell et al. [51][52][53]. The correlation coefficient analysis of the phenotypes was conducted using the R package "corrplot". The associations between phenotypes and markers were analyzed with six multi-locus GWAS methods implemented in the R package "mrMLM.GUI". The Q-matrix for k = 15 was generated with STRUCTURE software and used in the GWAS. Kinship (K) matrix was calculated with the software implemented in mrMLM.GUI. Default options were used for the rest. The Manhattan plots and Q − Q plots were generated with the software implemented in mrMLM.GUI.