-
PDF
- Split View
-
Views
-
Annotate
-
Cite
Cite
Marivi Colle, Courtney P Leisner, Ching Man Wai, Shujun Ou, Kevin A Bird, Jie Wang, Jennifer H Wisecaver, Alan E Yocca, Elizabeth I Alger, Haibao Tang, Zhiyong Xiong, Pete Callow, Gil Ben-Zvi, Avital Brodt, Kobi Baruch, Thomas Swale, Lily Shiue, Guo-qing Song, Kevin L Childs, Anthony Schilmiller, Nicholi Vorsa, C Robin Buell, Robert VanBuren, Ning Jiang, Patrick P Edger, Haplotype-phased genome and evolution of phytonutrient pathways of tetraploid blueberry, GigaScience, Volume 8, Issue 3, March 2019, giz012, https://doi.org/10.1093/gigascience/giz012
- Share Icon Share
Abstract
Highbush blueberry (Vaccinium corymbosum) has long been consumed for its unique flavor and composition of health-promoting phytonutrients. However, breeding efforts to improve fruit quality in blueberry have been greatly hampered by the lack of adequate genomic resources and a limited understanding of the underlying genetics encoding key traits. The genome of highbush blueberry has been particularly challenging to assemble due, in large part, to its polyploid nature and genome size.
Here, we present a chromosome-scale and haplotype-phased genome assembly of the cultivar “Draper,” which has the highest antioxidant levels among a diversity panel of 71 cultivars and 13 wild Vaccinium species. We leveraged this genome, combined with gene expression and metabolite data measured across fruit development, to identify candidate genes involved in the biosynthesis of important phytonutrients among other metabolites associated with superior fruit quality. Genome-wide analyses revealed that both polyploidy and tandem gene duplications modified various pathways involved in the biosynthesis of key phytonutrients. Furthermore, gene expression analyses hint at the presence of a spatial-temporal specific dominantly expressed subgenome including during fruit development.
These findings and the reference genome will serve as a valuable resource to guide future genome-enabled breeding of important agronomic traits in highbush blueberry.
Introduction
Since domestication efforts began in the early 1900s [1], highbush blueberry (Vaccinium corymbosum L.) has rapidly become a high-value fruit crop worldwide [2–4]. Highbush blueberry, compared to hundreds of closely related blueberry species (e.g., huckleberry, Vaccinium ovatum Pursh; bilberry, Vaccinium myrtillus L.; and sparkleberry, Vaccinium arboreum Marshall) in the Ericaceae [5, 6], is widely cultivated due to its adaptation to temperate climates, excellent fruit quality, yield, and composition of phytonutrients [7]. As a result for the demand for fresh blueberries as a "superfruit" [8], highbush blueberry production has increased 600% during the past three decades and steadily grown to a multi-billion dollar industry [9]. In addition to its short domestication history, highbush blueberry is unique in being one of only three major commercially valuable fruit crops, accompanied by cranberry (Vaccinium macrocarpon Ait.) [10] and the garden strawberry (Fragaria x ananassa) [11], with wild progenitor species native to North America.
Blueberries have a single epidermal layer that expresses a rich profile of anthocyanins during ripening that, in combination with epicuticular wax, generates its characteristic "powdery blue" color. The cuticular and epidermal layers contain nearly all of the phytonutrients in the fruit such as anthocyanins, proanthocyanidins, and flavonols [12–14]. Previous studies on blueberry have reported that these groups of compounds may have diverse health-promoting properties, including controlling diabetes, improving cognitive function, and inhibiting tumor growth [15–21]. With the growing awareness of the potential health benefits of blueberry and increasing consumer demand, a primary goal of the blueberry research community is to develop cultivars with improved antioxidant levels along with other important fruit quality traits (e.g., aroma, taste, and firmness) [22]. However, despite its economic importance and health benefit potential, breeding efforts to improve fruit quality traits in blueberry have been slow due, in large part, to the lack of genomic resources. A draft genome for a wild diploid species (2n = 2x = 24) of blueberry was previously assembled [23]. However, that draft genome consists of a large number of scaffolds (13,757 total; N50 of ~145 kb), high percentage of gaps (~27.35%) in a ~393.16 Mb assembly, and, most importantly, does not reflect the genome complexity of the economically important and cultivated tetraploid (2n = 4x = 48) highbush blueberry.
Here, we present the first chromosome-scale genome assembly of tetraploid highbush blueberry. The haplotype-phased assembly consists of 48 pseudomolecules with ~1.68 Gb of assembled sequence, ~1.29% gaps, and an average of 32,140 protein coding genes per haplotype (128,559 total). A haplotype is the complete set of DNA within the nucleus of an individual that was inherited from one parent. We leveraged this genome to examine the origin of the polyploid event, gain insights into the underlying genetics of fruit development, and identify candidate genes involved in the biosynthesis of metabolites contributing to superior fruit quality. Furthermore, we examined gene expression patterns among the four haplotypes in highbush blueberry. This analysis uncovered the presence of spatial-temporal specific dominantly expressed subgenomes. These findings and the reference genome will serve as a powerful platform to further investigate "subgenome dominance" [24–26], facilitate the discovery and analysis of genes encoding economically important traits, and ultimately enable molecular breeding efforts in blueberry.
Results
Assembly and annotation of the tetraploid highbush blueberry genome
Our goal was to obtain a high-quality reference genome for the highbush blueberry cultivar “Draper,” which is widely grown around the world due to its excellent fruit quality. We sequenced the genome using a combination of both 10× Genomics (Pleasanton, CA) and Illumina (San Diego, CA), totaling 324X coverage of the genome (Supplementary Table S1). These data were assembled and scaffolded using the software package DenovoMAGIC3 (NRGene, Nes Ziona, Israel) (Supplementary Table S2). The genome was further scaffolded to chromosome-scale using Hi-C data (91.4X coverage) with the HiRise pipeline (Dovetail, Santa Cruz, CA) (Supplementary Figs. S1 and S2). The total length of the final assembly is 1,679,081,592 bases distributed across 48 chromosome-level pseudomolecules (Fig. 1). The final assembly size falls within the estimated genome size of "Draper" based on flow cytometry (1.63 Gb with 95% confidence interval +/− 0.06 Gb) (Extended Data Table 1).

The haplotype-phased chromosome-scale highbush blueberry genome. (a) Collinearity among the homoeologous chromosomes. The gray lines represent conserved gene arrays between chromosomes. Chromosomes were drawn proportionally with respect to the number of genes on each chromosome. (b) Gene and transposable element (TE) density and LTR assembly index (LAI) in chromosomes 1–12 plotted in 300 Kb sliding window using Circos. The tracks from outside to inside are: 1 = chromosomes, 2 = gene density, 3 = TE density, and 4 = LAI score.
The genome was annotated using a combination of evidence-based and ab initio gene prediction using the MAKER-P pipeline [27] (Supplementary Table S3). RNA sequencing (RNA-seq) data from 13 different gene expression libraries, representing unique organs, developmental stages, and treatments (Supplementary Table S4), and publicly available transcriptome and expressed sequence tags (EST) data of V. corymbosum in theNational Center for Biotechnology Information (NCBI) were used as transcript evidence. Protein sequences from Arabidopsis thaliana [28, 29], Actinidia chinensis [30], and UniprotKB plant database were also used as evidence for genome annotation. We predicted a total of 128,559 protein-coding genes. Benchmarking Universal Single-Copy Orthologs analysis (BUSCO, RRID:SCR_015008) v.3 [31] was performed to assess the completeness of the assembly and quality of the genome annotation. The annotated gene set contains 1,394 out of 1,440 (97%) BUSCO genes (Supplementary Table S5). Functional annotation was assigned using Basic Local Alignment Search Tool (BLAST) 2GO [32] to reference pathways in the Kyoto Encyclopedia of Genes and Genomes database [33] (Supplementary Fig. S3). Comparative genomic analyses assigned genes to 16,909 orthogroups shared by six phylogenetically diverse plant species including five eudicots (A. chinensis [30], A. thaliana [28, 29], Fragaria vesca [34], Rubus occidentalis [35], and Vitis vinifera [36]), each with distinct fruit types, and Zea mays [37] as the outgroup.
Transposable elements (TEs), both Class I and II, were identified and classified in the genome using the protocol described by Campbell et al. [27]. Overall, 44.3% of the blueberry genome is composed of TEs (Supplementary Table S6). Consistent with previous reports [38, 39], the most abundant Class I TEs were long terminal repeat retrotransposons (LTR-RTs), specifically the superfamily LTR/Gypsy followed by LTR/Copia, while for Class II transposons, the miniature inverted repeat (MITE) superfamily hAT was the most abundant. The quality of the genome was further assessed by examining the assembly continuity of repeat space using the LTR Assembly Index (LAI) deployed in the LTR_retriever package (v1.8) [40]. The adjusted LAI score of this blueberry genome is 14, and based on the LAI classification, this score is within the range of "reference" quality (Fig. 1). Estimation of the regional LAI in 3 Mb sliding windows also showed that assembly continuity is uniform and of high quality across the entire genome.
Assessment of the origin of tetraploid highbush blueberry
The origin of highbush blueberry from either a single (i.e., autopolyploid) or multiple diploid progenitor species (i.e., allopolyploid) is a long-standing question [41]. Previous reports have suggested that highbush blueberry may be an autotetraploid based on the segregation ratios of certain traits [42]. However, an analysis of chromosome pairing among different cultivars revealed largely bivalent pairing during metaphase I [43], similar to patterns observed in known allopolyploids [44, 45]. To gain further insights into the polyploid history of highbush blueberry, we calculated sequence similarity and synonymous substitution (Ks; silent mutation) rates between genes in homoeologous regions across the genome. The average sequence similarity is ~96.3% among syntenic homoeologous genes. The average Ks divergence between syntenic homoeologous genes is ~0.036 per synonymous site. The average Ks divergence between homoeologous genes can be used to not only identify polyploid events [46–48] but also to estimate the divergence of the diploid progenitors from their most recent common ancestor (MRCA) [49]. The Ks divergence between homoeologs in highbush blueberry is six times higher than that between orthologs of two A. thaliana lines (Col and Ler; Ks of ~0.006) that diverged roughly 200,000 years ago [50]. Based on the relatively high Ks rate between homoeologous regions across the genome, this suggests that tetraploid blueberry is unlikely an autopolyploid that was formed from somatic doubling or failure during meiosis involving a single individual (parent).
Furthermore, comparative genomics revealed that homoeologous regions are highly collinear, except a few notable chromosome-level translocations (Fig. 1a). These translocations were manually inspected and verified with both the raw sequence and Hi-C data. Rapid changes among homoeologous chromosomes is known to occur in newly formed allopolyploids [44, 45, 51]. We also assessed the level of similarity and content of LTR transposable elements among the four haplotypes. As the most prevalent transposable elements in plants, LTR-RTs undergo continual "bloat and purge" cycles within most plant genomes [52], resulting in a unique signature that may distinguish subgenomes in an allopolyploid. To examine the evolutionary history of LTR-RTs in the highbush blueberry genome, we calculated the mean sequence identity of LTR sequences among each of the four haplotypes (Supplementary Fig. S4). This analysis revealed that the majority of more recent LTRs (>97% similarity) are subgenome specific in highbush blueberry. In other words, the data suggest that LTRs proliferated independently in the genomes of each diploid progenitor (i.e., subgenome), following the divergence from their MRCA, but prior to polyploidy. The pair-wise LTR difference (d) of the two ancestors is 2.4%–2.6%. With Jukes-Cantor correction (K = −3/4*ln(1–4d/3)) and synonymous substitution rate of (μ = 1.3e-8) [53], the estimated time (T = K/2μ) of divergence for the diploid progenitors from their MRCA is between 0.94 to 1.02 million years ago.
These date estimates and the average speciation rate (λ = 0.59 per million years; [54]) for temperate angiosperms suggests that highbush blueberry is either an allopolyploid derived from two closely related species or an autopolyploid derived from the hybridization of two highly divergent populations of a single species. To date the most recent polyploid event in highbush blueberry, we analyzed the unique LTR insertions present in each haplotype. Based on the pair-wise LTR difference between the four haplotypes, which is of 0.81%–0.89%, the polyploid event occurred approximately 313 to 344 thousand years ago. The substitution rate of LTR sequences is likely different from that of protein coding genes. Thus, more accurate date estimates will be possible once the LTR substition rate in highbush blueberry becomes available from future studies.
After allopolyploidization, one of the parental genomes (i.e., subgenomes) often emerges with significantly greater gene content and a greater number of more highly expressed genes [55–58]. The emergence of a dominant subgenome in an allopolyploid is hypothesized to resolve genetic and epigenetic conflicts that may arise from the merger of highly divergent subgenomes into a single nucleus [26, 59, 60]. However, classic autopolyploids, formed by somatic doubling, are not expected to face these challenges or exhibit subgenome dominance since all genomic copies were contributed by a single parent [61]. This was recently supported by genome-wide analyses of a putative ancient autopolyploid (soybean; Glycine max) [62]. It's important to note that subgenome expression dominance could still be observed in intraspecific hybrids and autopolyploids formed by parents with highly differentiated genomes [25].
To explore this in highbush blueberry, we compared gene content and expression-level patterns between homoeologous chromosomes (Fig. 2). While gene content levels were largely similar among homoeologous chromosomes, with a few notable exceptions (Fig. 2a), gene expression levels were highest for one of the four chromosome copies in the majority (average 9.3 of 14) of gene expression libraries (112 of 168 comparisons, x2 test P value < 0.001)(Fig. 2b, Supplementary Fig. S5). Noteworthy, in the three fruit libraries, the most dominantly expressed often became the least expressed among the four homoeologous chromosomes (19 of 36 comparisons; x2 test P value <0.01) or among the two lowest expressed copies (26 of 36 comparisons; x2 test P value <0.01). The most dominantly expressed in other tissues remained so in developing fruit for only two of the chromosomes (6 and 10). These homoeologous chromosome sets have undergone the most structural variation, which may have modified gene expression patterns (Fig. 1a). These analyses are based on a single biological replicate from a plant grown in a growth chamber. Thus, the findings reported here should be considered as preliminary. Future studies should further explore subgenome expression dominance in highbush blueberry, including at the individual homoeolog level [63, 64], with additional biological replicates and across multiple environments.
![Assessment of the origin of polyploid blueberry. (a) Gene content comparison of homoeologous chromosomes (1, 13, 25, and 37) plotted along 2,725 collinear syntenic regions. This analysis for all 48 chromosomes can be regenerated here: [65]. (b) Gene expression comparison (FPKM; fragments per kilobase per million) among the same four homoeologous chromosomes across different blueberry tissues (1 = flower bud; 2 = flower at anthesis; 3 = petal fall; 4 = green fruit; 5 = pink fruit; 6 = ripe fruit; 7, 8 = leaf collected at 12 p.m. and 12 a.m., respectively; 9, 10, 11 = methyl jasmonate-treated leaf collected after 1 hour, 8 hours, and 24 hours, respectively; 12 = shoot; 13 = root; 14 = salt-treated root).](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/gigascience/8/3/10.1093_gigascience_giz012/4/m_gigascience_8_3_giz012_f2.jpeg?Expires=1747876772&Signature=exQZhxmVh6CSIXCXiKzeUNnJVfm6RVhHgtFw-0xb0O~BFRe-T9VeVzlbo~S5ArWo8nJMuNTzZiJpQgcQBgHhUxZlCwpuMT4O4AeNshvoa~6GfZoKifSkT5jbq8SKURAb~tonH81qa5jpxhSq-qvqrtHS3733GUEbt0Uga-Cw7LyihIQ-DB2R3x5yxX~Uedl~irK1S7YZ2UIEuZYb7XMvq3UaLptVHcBLXPO496ZW3Jdy-C2GZ6sd3ZX3YPM1pxAuIpmENBupXHxBaM0sQ34VTmnNhDg7LP7NxqCRG1NlVvRZnvPDYrpU6gO6k0oIiYqPltVPvYszBcqA-TyvzjLU-g__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Assessment of the origin of polyploid blueberry. (a) Gene content comparison of homoeologous chromosomes (1, 13, 25, and 37) plotted along 2,725 collinear syntenic regions. This analysis for all 48 chromosomes can be regenerated here: [65]. (b) Gene expression comparison (FPKM; fragments per kilobase per million) among the same four homoeologous chromosomes across different blueberry tissues (1 = flower bud; 2 = flower at anthesis; 3 = petal fall; 4 = green fruit; 5 = pink fruit; 6 = ripe fruit; 7, 8 = leaf collected at 12 p.m. and 12 a.m., respectively; 9, 10, 11 = methyl jasmonate-treated leaf collected after 1 hour, 8 hours, and 24 hours, respectively; 12 = shoot; 13 = root; 14 = salt-treated root).
Changes in transcript abundance during blueberry fruit development
The progression of fruit development in blueberry is marked with visible external and internal morphological changes including in size and color (Supplementary Fig. S6a). We profiled gene expression in fruit across seven developmental stages from the earliest stage (i.e., post-fertilization) through the final stage (i.e., ripe fruit) to identify genes differentially expressed during fruit development. Distinctive transitions in gene expression were observed between early fruit growth to start of color development and complete color change to ripened fruit. We found that the majority of genes upregulated during early fruit development were involved in phenylpropanoid biosynthesis, nitrogen metabolism, as well as cutin, suberin, and wax biosynthesis (Supplementary Table S7a). In contrast, genes involved in starch and sugar metabolism were highly expressed at the onset of and during fruit ripening (Supplementary Table S7b). Moreover, principal component analysis showed the first two components accounted for 84% of the variation and separated the developmental stages into three groups: early developmental stages, petal fall and small green fruit; middle developmental stages, expanding green and pink fruit; and ,late developmental stages, complete fruit color change, unripe and ripe fruit (Supplementary Figs. S6a and S7).
Genes associated with cell division, cell wall synthesis, and transport were found to be expressed the highest during the earliest developmental stages (Extended Data Table 2), which is consistent with previous work on other fruit species [66, 67]. In addition to genes regulating cell proliferation, defense response-related genes were also highly upregulated during the earliest developmental stages. During the middle developmental stages, genes regulating cell expansion, seed development, and secondary metabolite biosynthesis were highly expressed. During late developmental stages and as the berry transitions to ripening, late embryogenesis, transmembrane transport, defense, secondary metabolite biosynthesis, and abscisic acid-related genes were highly overrepresented. Blueberry is considered a climacteric fruit; however,unlike the ethylene-driven fruit ripening in other climacteric species, abscisic acid has been demonstrated to regulate fruit ripening in blueberry [68]. In summary, global gene expression patterns mirror the morphological and physiological changes observed during blueberry development (Supplementary Fig. S6a).
Antioxidant capacity in blueberry
The economic value of blueberry is largely determined by its fruit quality and nutritional value [7, 18, 69]. We assessed the total antioxidant capacity in mature fruit across a blueberry diversity panel and the abundance of secondary metabolites responsible for its antioxidant activity in developing fruit. A diversity panel, composed of 71 highbush blueberry cultivars and 13 wild Vaccinium species, was evaluated for total antioxidant capacity in mature fruit using the oxygen radical absorbance capacity (ORAC) assay [70]. Similar to previous reports [71–73], we observed a wide range in antioxidant capacity (~5–95 nmol TE/mg FW) across cultivars, with "Draper" having the highest levels of antioxidants (Supplementary Fig. S6b). The observed variation in antioxidants among highbush blueberry, consistent with our results, were previously shown not to correlate with fruit weight or size [74]. However, in another study, a correlation between fruit size and total anthocyanin levels was identified within a few select highbush blueberry cultivars but not across other Vaccinium species or blackberry [75]. This inconsistency is likely due to sample size differences between studies.
To further examine the antioxidant capacity in "Draper" during fruit development, fruits from the seven aforementioned fruit developmental stages were assayed for antioxidant levels (Supplementary Fig. S6a). The highest level of antioxidants was observed at the earliest "petal fall" stage (537 nmol TE/mg FW) (Supplementary Fig. S8) after which, the level of antioxidants declined during the middle and late developmental stages. This is consistent with previous reports on the antioxidant activity in blueberry during fruit maturation [76] and similar to observations in blackberry and strawberry, wherein green fruit have the highest ORAC values [77]. The antioxidant capacity in blueberry is influenced by various metabolites including anthocyanins [12, 75, 78]. Using the same fruit development series, we quantified anthocyanin and flavonol aglycones in "Draper" using liquid chromatography-mass spectrometry (LC-MS). Overall, as the fruit changed its exocarp color from pink to dark blue during ripening, delphinidine-type anthocyanins started to accumulate and were the most abundant compound in ripe fruit (181 peak area/IS/gDW) followed by cyanidin, malvidin, and petunidin (Supplementary Fig. S6c). Flavonols were also detected in all developmental stages, with quercetin glycoside being the most abundant (88 peak area/IS/gDW), while myricetin glycoside and rutin were present at very low levels.
Blueberry also has high levels of phenolic acids; among phenolics, chlorogenic acid (CGA) was the most abundant. High levels of CGA were observed throughout fruit development, with the highest accumulation detected in young fruits (Supplementary Fig. S6d). This correlates with the pattern of antioxidant capacity across different fruit stages, suggesting that CGA is one of the major metabolites contributing to high ORAC values in young developing fruit. CGA is derived from caffeic acid and quinic acid and has vicinal hydroxyl groups that are associated with scavenging reactive oxygen species [79–81]. The antioxidant properties of CGA have been associated with preventing various chronic diseases [82–86].
Expression of antioxidant biosynthesis-related genes
To better understand the biosynthesis of antioxidants in blueberry fruit, we identified homologs of previously characterized genes in other species involved in ascorbate, flavonols, chlorogenic acid, and anthocyanin biosynthesis (Fig. 3 and Extended Data Table 3) [68, 87–89]. The key biosynthetic genes for these compounds exhibited a distinct developmental-specific pattern of expression (Fig. 3c-3e and Supplementary Fig. S9). For example, genes involved in the conversion of leucoanthocyanidins to proanthocyanidins (e.g., LAR and ANR) are highly expressed in the earliest and middle developmental fruit stages but not in ripening fruit (Fig. 3c, green triangle, and Extended Data Table 4). Conversely, genes involved in the conversion of leucoanthocyanidins to anthocyanins (e.g., ANS, UFGT, and OMT) were highly expressed in mature and ripe fruit but not during early fruit developmental stages (Fig. 3c, red circle, and Extended Data Table 4). Additionally, paralogs encoding the same anthocyanin pathway enzymes (e.g., FHT, OMT) and genes involved in vacuolar localization of proanthcyanidins (e.g., glutathione S-transferase and multidrug resistance-associated protein-type) exhibited similar developmental stage-specific expression patterns. The expression of these biosynthetic genes is regulated by specific transcription factors [90]. For example, the transcription factor complex MYB-bHLH-WD regulates expression of anthocyanin biosynthetic genes in eudicots [91–94]. Using the Plant Transcription Factor Database v.4.0 [95], we identified homologs of transcription factors belonging to 55 gene families, and members of some of these gene families were predicted to be involved in the developmental regulation of flavonoid biosynthesis during blueberry fruit growth (Extended Data Table 4), including R2-R3-MYBs, R3-MYBs, bHLHs, and WDRs (Fig. 3b, 3e). These transcription factors also exhibit fruit development-specific expression patterns.
![A schematic presentation of flavonoid biosynthesis in blueberry. (a) Predicted flavonoid biosynthetic pathway leading to production of anthocyanin. The proposed pathway is based on previously described flavonoid biosynthetic pathway in plants (Zifkin et al. [68]) and expression of predicted anthocyanin biosynthetic genes in blueberry. The core genes include phenylalanine ammonia-lyase (PAL), 4-hydroxycinnamoyl CoA ligase (4CL), trans-cinnamate 4-monooxygenase (C4H), cytochrome P450 98A3 (C3H), chalcone synthase (CHS), chalcone flavonone isomerase (CHI), flavanone-3β-hydroxylase (FHT), flavanone 3-hydroxylase (F3H), flavonoid 3′-hydroxylase (F3′H), flavonoid 3′,5′-hydroxylase (F3′5′H), dihydroflavonol reductase (DFR), leucoanthocyanidin reductase (LAR), anthocyanidin reductase (ANR), anthocyanidin synthase (ANS), UDP-glucose flavonoid 3-O-glucosyl transferase (UFGT), anthocyanin-O-methyltransferase (OMT), hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyltransferase (HCT), and hydroxycinnamoyl-CoA quinate hydroxycinnamoyltransferase (HQT). (b) Hypothetical regulatory pathway of anthocyanin biosynthetic genes based on the proposed model by Albert et al. [92]. (c) Developmental-specific expression pattern of key anthocyanin biosynthetic gene (green triangles = examples of genes upregulated during early fruit growth; red circles = examples of genes upregulated during late fruit development). (d) Chlorogenic acid biosynthetic genes (1 = petal fall, 2 = small green fruit, 3 = expanding green fruit, 4 = pink fruit, 5 = fruit color completely changed from pink to purple, 6 = unripe, 7 = ripe). (e) Expression profile of transcription factors predicted to regulate anthocyanin biosynthesis in blueberry. A high-resolution version of the heat maps is available on PURR (see Availability of Supporting Data section).](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/gigascience/8/3/10.1093_gigascience_giz012/4/m_gigascience_8_3_giz012_f3.jpeg?Expires=1747876772&Signature=tI25JZIDy6h4IIzMX3bqY3-J0N6mYmxz9OKl2ZcfkNEMek7sGv2Gns4pa-Eh~HZJclkEPRvRaZF-smi88TGcCifHZaqckmF4VdphmsrgAIULb~G9rnWMhjDKXPZnqcw3cD6pAtcKCYuxwh6k8g82C2Xc4SPyjX1x~aIh62j1zGHGEHd9etAVGA8IiopdwxhTu51XMiNwko-1nT~3hmARw7-g0Cjrjgt6a~gbyJlky78QI3Am-YKk~NxPt2xJ0mK3~Jqy6xt4-betiQU0A-SsZY50sdPzh3Ls443LyGxogeQCn5asH~NQR0GnqdBP5DvBKt38bIPDG1OUp6MieSQiDw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
A schematic presentation of flavonoid biosynthesis in blueberry. (a) Predicted flavonoid biosynthetic pathway leading to production of anthocyanin. The proposed pathway is based on previously described flavonoid biosynthetic pathway in plants (Zifkin et al. [68]) and expression of predicted anthocyanin biosynthetic genes in blueberry. The core genes include phenylalanine ammonia-lyase (PAL), 4-hydroxycinnamoyl CoA ligase (4CL), trans-cinnamate 4-monooxygenase (C4H), cytochrome P450 98A3 (C3H), chalcone synthase (CHS), chalcone flavonone isomerase (CHI), flavanone-3β-hydroxylase (FHT), flavanone 3-hydroxylase (F3H), flavonoid 3′-hydroxylase (F3′H), flavonoid 3′,5′-hydroxylase (F3′5′H), dihydroflavonol reductase (DFR), leucoanthocyanidin reductase (LAR), anthocyanidin reductase (ANR), anthocyanidin synthase (ANS), UDP-glucose flavonoid 3-O-glucosyl transferase (UFGT), anthocyanin-O-methyltransferase (OMT), hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyltransferase (HCT), and hydroxycinnamoyl-CoA quinate hydroxycinnamoyltransferase (HQT). (b) Hypothetical regulatory pathway of anthocyanin biosynthetic genes based on the proposed model by Albert et al. [92]. (c) Developmental-specific expression pattern of key anthocyanin biosynthetic gene (green triangles = examples of genes upregulated during early fruit growth; red circles = examples of genes upregulated during late fruit development). (d) Chlorogenic acid biosynthetic genes (1 = petal fall, 2 = small green fruit, 3 = expanding green fruit, 4 = pink fruit, 5 = fruit color completely changed from pink to purple, 6 = unripe, 7 = ripe). (e) Expression profile of transcription factors predicted to regulate anthocyanin biosynthesis in blueberry. A high-resolution version of the heat maps is available on PURR (see Availability of Supporting Data section).
In addition, we performed a gene co-expression network analysis to identify metamodules of genes that appear co-regulated during fruit development, specifically genes that are associated with phytonutrient biosynthesis. Our analysis identified 1,988 metamodules of co-expressed genes, of which 428 metamodules contained at least one of the 57 Pfam domains that have been previously categorized as associated with specialized metabolic pathways in plants [96]. Our analysis revealed that 142 of 428 metamodules were more highly expressed in developing fruit compared to other plant tissues. Some metamodules showed clear trends of being highly expressed during either early or late fruit development. For example, METAMOD00377 is expressed early in fruit development and contains homologs to known anthocyanin genes OMT, HCT, PAL,andHQT as well as 31 homologs to known transcription factors. In contrast, METAMOD01221 is expressed late in fruit development and contains homologs of HCT, TT19, UFGT,andOMT and contains 10 homologs to known transcription factors. Moreover, we also examined metamodules for genes associated with other biosynthetic pathways that impart unique blueberry fruit characteristics. We identified two metamodules where genes appear to be co-regulated. Metamodule METAMOD00377, which contains Pfam domains associated with terpene, saccharide, and alkaloid specialized metabolism, and METAMOD01221, which contains terpene and saccharide metabolism. These metamodules contained genes that are differentially expressed during fruit development. Overall, the developmental-specific expression patterns of key biosynthetic genes and their putative transcriptional regulators emphasize the tight regulation of production, conversion, and transport of precursor compounds that lead to the accumulation of antioxidant-related metabolites in blueberry.
Fruit aroma and the role of terpenes
The coregulation of genes involved in the biosynthesis of terpenes and saccharides during early and late fruit development described above reflects a coordinated interplay between these metabolites during fruit growth. Both terpenes and sugars contribute to the characteristic flavor of ripened fruit [97]. In blueberry, two components play a central role in flavor perception: taste, which is a balance of sweetness and acidity, and aroma. Blueberry aroma is a complex blend of volatiles that include aldehydes, esters, terpenes, ketones, and alcohols [98, 99]. Previous reports in blueberry showed that the aroma profile varies greatly across different blueberry ecotypes and cultivars [100–102]. For example, the aroma of highbush blueberry is primarily driven by terpene hydrocarbons (e.g., linalool, geraniol, hydroxycitronellol) and aldehydes (e.g., (E)-2-hexenal, (E)-2-hexenol, (Z)-3-hexenol) [98, 103]. Both linalool and geraniol are associated with sweet floral flavor. However, linalool was reported to largely impart the characteristic blueberry flavor when combined with certain aldehydes [98].
Here, we also identified and examined the expression of genes involved in the biosynthesis of linalool. Four of the linalool synthase homologs in tetraploid blueberry are highly expressed during late fruit development (Extended Data Table 5). This pattern of expression coincides with previous reports of linalool accumulation in ripened blueberry fruit [99, 103, 104]. On the other hand, one homolog of linalool synthase, although it was expressed during fruit growth, did not show a clear fruit development-specific pattern. Investigating the underlying factors regulating these enzymes will facilitate genetic manipulations that may lead to further improving blueberry flavor in the future.
Sugar transporters
Superior fruit quality is also associated with sugar levels [105]. During fruit ripening, sugar levels of the endocarp increase by importing hexose symplastically and/or apoplastically. Sugar transporters (i.e., sugar will eventually be exported transporter [SWEET]), sucrose transporter, and tonoplast sugar transporter (TST) have been demonstrated to regulate intercellular sugar transport in phloem and fruit [106, 107]. In A. thaliana, all clade III SWEET play a role in sucrose transport, with AtSWEET9 primarily functioning in nectary secretion [108], while AtSWEET15 is required for seed filling by acting with SWEET11 and SWEET12 [109]. In blueberry, the clade III SWEET transporters 9 and 10 were highly expressed during early fruit growth, while clade III SWEET transporter 15 was mainly expressed in ripe fruit (Extended Data Table 5). Interestingly, one of the blueberry SWEET15 homologs showed a distinct pattern of expression compared to the other three homologs. To the best of our knowledge, we are the first to report on the potential role of these genes during blueberry fruit development.
In addition, homologs of A. thaliana TST1 [110] and watermelon ClTST1 and ClTST3 (tonoplast sugar transporters) [107] were expressed during fruit ripening in blueberry. Elevated expression of a ClTST1 homolog was observed throughout fruit development, but the ClTST3 homolog showed very low expression. Another gene that is highly expressed during fruit maturation is vacuolar invertase. As described in other systems [111], its upregulation during fruit ripening coincided with the breakdown of starch to sucrose or a mixture of glucose and fructose, suggesting that it may be involved in the regulation of sugar accumulation in blueberry fruit. It was previously reported that vacuolar invertase modulates the hexose to sucrose ratio in ripening fruit [112]. In addition, there are also two sugar transport protein homologs that exhibited developmental specific expression. However, their function remains largely unknown, thus, their potential role in sugar accumulation in the developing berry requires further investigation.
Expansion of antioxidant-related gene families through tandem duplication
Tandemly duplicated genes arise as a result of unequal crossing over or template slippage during DNA repair [113, 114], exhibit high birth-death rates (i.e., predominantly young) [46], and typically are in co-regulated clusters in the genome [115]. Smaller-scale duplications [116], which include tandem duplicates, are highly biased toward certain gene families [117] including those involved in specialized metabolism [118–120]. Furthermore, tandem duplications often results in the increased dosage of gene products [121] and may improve the metabolic flux of rate-limiting steps in certain biosynthetic pathways [122].
Most genes associated with the biosynthesis of antioxidants (CGA, flavonols, anthocyanins, proanthocyanidins) have at least one tandem duplicate present in the highbush blueberry genome, with tandem array sizes ranging from 2 to 10 gene copies (Extended Data Table 6). The largest tandem arrays were found for HQT and HCT genes, which are co-regulated and involved in the CGA pathway (Fig. 3a). Differences in tandem array sizes were also observed between homoeologous chromosomes for various genes. For example, the C3H gene, which is involved in CGA biosynthesis (Fig. 3a), was present on all four homoeologous chromosomes but with varying tandem array sizes. One of the homoeologous chromosomes had two copies of C3H, while the other three homoeologous chromosomes had four copies. This suggests that copy number differences of C3H among subgenomes may be due to either selection for gene duplication or loss or, in the case of allopolyploidy, may be due to preexisting gene content differences among the diploid progenitor species.
Genes in the anthocyanin pathway with other unique duplication patterns include CHS, CHI, OMT, and UFGT. The gene CHS, involved in the conversion of 4-coumaryl-CoA to naringenin chalcone, has two copies, and both have tandem duplicates in at least three of the homoeologous chromosomes. Interestingly, the gene CHI has a single preserved tandem gene duplicate on only one of the homoeologous chromosomes. However, additional copies of CHI were also identified more distantly away from the syntenic ortholog on another homoeologous chromosome, likely involving a transposition event following tandem duplication. The OMT and UFGT genes all have tandem duplicates on all of the homoeologous chromosomes, although with varying array sizes, while the ANR gene involved in the conversion of anthocyanidin to proanthocyanidin is single copy on all homoeologous chromosomes. DFR gene, which is involved in the conversion of dihydroquercetin/dihyromyricetin to leucoanthocyanidin, has a single tandem duplicate on only one of the homoeologous chromosomes. These findings suggest that there may have been greater selective pressure to retain tandem duplicates for genes encoding enzymes involved in anthocyanin production than conversion to proanthocyanidins.
The vast majority of tandem duplicates are eventually lost (i.e., nonfunctionalization); however, in rare instances, some may undergo functional diversification (e.g., sub- and/or neo-functionalization) [46, 123]. Gene expression analysis revealed that 83.4% of the tandem duplicates were expressed in at least one transcriptome library with 73.5% expressed in at least one of the fruit developmental stages. This suggests that a subset of these duplicate genes have nonfunctionalized, subfunctionalized, or neofunctionalized. Future studies are needed to more thoroughly investigate the functions of these genes with more diverse libraries and additional transcriptome analyses.
Discussion
Despite the economic importance of blueberry, molecular breeding approaches to produce superior cultivars have been greatly hampered by inadequate genomic resources and a limited understanding of the underlying genetics encoding important traits. This has resulted in breeders having to solely rely on traditional approaches to generate new cultivars, each with widely varying fruit quality characteristics. For example, our analysis of a diversity panel consisting of 84 cultivars and wild species revealed that "Draper" has antioxidant levels that are up to 19x higher than other cultivars. Thus, the genome of "Draper" should serve as a powerful resource to the blueberry community for guiding future breeding efforts aimed at improving antioxidant levels among other important fruit quality traits. Furthermore, to our knowledge, this is not only the first genome assembly of the cultivated highbush blueberry but is also the first chromosome-scale and haplotype-phased genome for any species in the order Ericales. Ericales includes several other high-value crops (e.g., tea, kiwifruit, and cranberry) and wild species with unique life history traits (e.g., carnivorous, American pitcher plants; parasitic, Sarcodes "snow flower"; and extremophiles, "Jacob cactus"). Thus, we anticipate that this reference genome, plus associated datasets, will be useful for a wide variety of evolutionary studies.
Here, we also leveraged the genome to identify candidate genes and pathways that encode superior fruit quality in blueberry, including those associated with pigmentation, sugar, and antioxidant levels. Furthermore, we found that genes encoding key biosynthetic steps in various antioxidant pathways are enriched with tandem gene duplicates. For example, tandem gene duplications have expanded gene families that are involved in the biosynthesis of anthocyanins. This suggests that, in addition to a recent whole genome duplication, tandem duplications may have greatly contributed to the metabolic diversity observed in blueberry (as previously described in Arabidopsis [124]). These tandem duplicates may have evolved new functions (i.e., neofunctionalized), possibly involved in the biosynthesis of novel compounds, and/or were selected to improve the metabolic flux of specific biosynthetic steps that alter the dosage of certain endpoint metabolites [122]. Future studies are needed to further investigate the possible role of tandem duplications in having modified metabolite levels and composition in wild and cultivated blueberry.
Our analyses also revealed that highbush blueberry, a tetraploid, likely arose from the hybridization of two distinct parents, possibly allopolyploidy, based on the sequence divergence, unique transposable element insertions, and subgenome expression patterns. Our analyses revealed that the subgenomes in highbush blueberry may be controlling a distinct set of genetic programs (e.g., fruit development vs mature leaves). The dominantly expressed subgenome in most surveyed tissues becomes the lowest expressed during fruit development. This observation is similar to findings in allopolyploid wheat where developmental and adaptive traits were shown to be controlled by different subgenomes [125–127]. For example, cell type- and stage-dependent subgenome expression dominance was observed in the developing wheat grain [127]. We argue that both highbush blueberry and hexaploid wheat, each now with high-quality reference genomes [128], make excellent systems to further investigate these underlying mechanisms of subgenome dominance [25]. Subgenome dominance has far-reaching implications to numerous research areas including breeding efforts [58]. For example, marker-assisted breeding needs to target the correct set of dominant homoeologs given the trait in polyploids that exhibit subgenome dominance. Thus, we anticipate that this genome, combined with improved insights into subgenome dominance, will greatly accelerate molecular breeding efforts in the cultivated highbush blueberry.
Materials and Methods
Plant material
Vaccinium corymbosum cv. Draper was selected based on having the highest antioxidant levels among a diversity panel of leading cultivars and due to its overall importance to the industry (Supplementary Fig. S6). Furthermore, cultivar Draper was selected since germplasm is widely available to the community from blueberry nurseries. The genome size (1.63 +/− 0.06 Gb) was estimated using flow cytometry with four technical replicates from Flow Cytometry Core at Benaroya Research Institute at Virginia Mason (Seattle, WA)(Extended Data Table 1).
Genomic sequencing
High-molecular-weight genomic DNA was isolated from young leaf tissue, following a 72-hour dark treatment, using a modified nuclei preparation method [129, 130]. DNA quality was verified by pulsed-field gel electrophoresis. DNA fragments longer than 50 Kb were used to construct a 10× Gemcode library using the Chromium instrument (10× Genomics; Pleasanton, CA) and sequenced at HudsonAlpha Institute for Biotechnology (Huntsville, AL) on a HiSeqX system (lllumina; San Diego, CA) with paired-end 150 bp reads. Approximately 95 Gb (~58-fold coverage, based on an estimated genome size of 1.63 Gb) of 10X Chromium library data was sequenced (Supplementary Table S1). To increase sequence diversity and depth, three separate mate-pair (MP) libraries were constructed with 2–5 Kb, 5–7 Kb, and 7–10 Kb jumps using the Illumina Nextera Mate-Pair Sample Preparation Kit. In addition, two additional size-selected Illumina genomic libraries, ~470 bp and ~800 bp, were sequenced. The ~470 bp and ~800 bp libraries were made using the Illumina TruSeq DNA PCR-free Sample Preparation V2 kit. The ~470 bp library was designed to produce "overlapping libraries" after sequencing with paired-end 265 bp reads on an lllumina Hiseq2500 system, producing "stitched" reads of approximately 265 bp to 520 bp in length. The 800 bp library was sequenced on an Illumina HiSeq2500 system with paired-end 160 bp reads, while the MP libraries were sequenced on an Illumina HiSeq4000 system with paired-end 150 bp reads. A total of ~433 Gb (~266× fold coverage) of additional Illumina sequencing data were generated (Supplementary Table S1). Illumina library construction and sequencing was conducted at Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign.
Genome assembly
The genome of "Draper" was assembled using the DeNovoMAGIC software platform (NRGene, Nes Ziona, Israel), which is a de Bruijn graph-based assembler designed for higher polyploid, heterozygous, and/or repetitive genomes [131, 132]. The Chromium 10X data were utilized to phase, elongate, and validate haplotype scaffolds. Four Dovetail Hi-C libraries were prepared as described previously [133] and sequenced on an Illumina HiSeq X system with paired-end 150 bp reads to a total of 90.7X physical coverage of the genome (Supplementary Fig. S1). The de novo genome assembly, raw genomic reads, and Dovetail Hi-C library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies [134]. Illumina genomic and Dovetail Hi-C library sequences were aligned to the draft input assembly using a modified SNAP read mapper [135]. The separations of Dovetail Hi-C read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins and to make joins to close gaps between contigs.
Collection of blueberry tissue samples, RNA library preparation, and sequencing
Plant tissue samples (flower bud, flower at anthesis, flower post-anthesis, young shoot, leaves treated with methyl jasmonate, small green fruit, expanding green fruit, pink fruit, ripe fruit, and salt-treated and untreated roots) were collected from blueberry cv. Draper grown in the growth chamber (16/8 hours photoperiod; 408mE light intensity; 23/20C day/night temperature). For the fruit developmental series, three biological replicates each of berries at seven developmental stages (petal fall/cup, small green fruit, expanding green fruit, pink fruit, purple reddish fruit, purple unripe fruit, and blue ripe fruit) were collected from cv. Draper in a field at the Horticulture Teaching and Research Center, Michigan State University, in July 2017. All plant tissues were immediately flash frozen in liquid nitrogen, and total RNA isolation was performed using the KingFisher Pure RNA Plant kit (Thermo Fisher Scientific, MA). Isolated total RNA was quantified using a Qubit 3 fluorometer (Thermo Fisher Scientific, MA). RNA libraries were prepared according to the KAPA mRNA HyperPrep kit protocol (KAPA Biosystems, Roche, USA). All samples were submitted to the Michigan State University Research Technology Support Facility Genomics core and sequenced with paired-end 150 bp reads on an Illumina HiSeq 4000 system (Illumina, San Diego, CA).
Genome annotation
The draft genome of V. corymbosum cv. Draper was annotated using the MAKER annotation pipeline [27]. Transcript and protein evidence used in the annotation included protein sequences downloaded from A. thaliana (Araport11) and UniprotKB plant databases, V. corymbosum ESTs from NCBI, and transciptome data assembled with StringTie [136] from different blueberry tissues (Supplementary Table S4). A custom repeat library and Repbase [137] were used to mask repetitive regions in the genome using Repeatmasker [138]. Ab initio gene prediction was performed using gene predictors SNAP [139] and Augustus (Augustus: Gene Prediction, RRID:SCR_008417) [140]. The resulting MAKER Max gene set was filtered to select gene models with Pfam domain and annotation edit distance <1.0. The filtered gene set (MAKER standard) was further scanned for transposase coding regions. The amino acid sequence of predicted genes was searched (BLASTP, 1e-10) against a transposase database [27]. The alignment between the genes and the transposases was further filtered for those caused by the presence of sequences with low complexity. The total length of genes matching transposases was calculated based on the output from the search. If more than 30% of gene length aligned to the transposases, the gene was removed from the gene set. Furthermore, to assess the completeness of annotation, the V. corymbosum Maker standard gene set was searched against the BUSCO v.3 [31] plant dataset (embryophyta_odb9). Genes were annotated with pfam domains using InterProScan (InterProScan, RRID:SCR_005829) v5.26–65.0 [141].
Annotation of repetitive elements
To identify and classify repetitive elements in the genome, LTR retrotransposon candidates were searched using LTRharvest [142] and LTR_finder [143] and further identified and classified (e.g., Copia and Gypsy) using LTR_retriever [40]. A non-redundant LTR library was also produced by LTR_retriever. Miniature inverted transposable elements (MITEs) were identified using MITE-Hunter [144]. MITEs were manually checked for target site duplications and terminal inverted repeats and classified into superfamilies (e.g., Mutator, hAT, Tc1Mariner/Stowaway, and PIF/Harbinger). Those with ambiguous Target Site Duplication (TSD) and Terminal Inverted Repeats (TIR) were classified as “unknowns”. Using the MITE and LTR libraries, the V. corymbosum genome was masked using Repeatmasker. The masked genome was further mined for repetitive elements using Repeatmodeler [145]. The repeats were then categorized into two groups: sequences with and without identities. Those without identities were searched against the transposase database; if they had a match, they were considered a transposon. The repeats were then filtered to exclude gene fragments using ProtExcluder [27] and summarized using the ‘fam_coverage.pl' script in the LTR_retriever package. The assembly continuity of repeat space was assessed using the LLAI [146] deployed in the LTR_retriever package [40]. LAI was calculated based on either 3 Mb sliding windows or the whole assembly using LAI = (Intact LTR-RT length * 100)/Total LTR-RT length. For the sliding window estimation, a step of 300 Kb was used (-step 300,000 -window 3,000,000). To account for dynamics of LTR retrotransposons, LAI was adjusted by the mean identity of LTR sequences in the genome based on all-versus-all blastn search, which was also performed by the LAI program [146].
Transcriptome assembly and gene-expression analysis
Illumina adapters were removed from the raw reads using Trimmomatic/0.33 (Trimmomatic, RRID:SCR_011848) [147], and trimmed reads were filtered using FASTX Toolkit [148]. After quality assessment using FastQC (FastQC, RRID:SCR_014583; [149]), the filtered reads were then aligned to the V. corymbosum genome using STAR [150]. For the samples that were used for annotation, transcript assembly was performed de novo using StringTie. Counts of uniquely mapping reads were generated through HTSeq [151] for all 35 RNA-seq datasets (plant tissue samples as well as fruit developmental series samples). Multimapping reads were excluded from the analysis except for the tandem gene expression analysis. Differential gene expression analysis was performed using the DESeq2 pipeline [152] across fruit developmental stages with three biological replicates per developmental stage (e.g., stage 1 compared to stage 2)(Fig. 3). Gene expression values were derived by calculating the fragments per kilobase per million reads mapped (FPKM) values using the standard formula for FPKM (= read count/‘per million' scaling factor)/gene length in kilobases [Kb]).
To construct the gene co-expression network, genes that were not expressed or very weakly expressed (count <5) in 30 or more conditions were first excluded from the analysis. The count data was then transformed into variance stabilized values using the variance stabilizing transformation function in DEseq [151]. Pairwise correlations of gene expression were calculated using Pearson correlation coefficient (PCC) and mutual rank (MR) [153, 154] using scripts available for download from the project's data repository [155]. MR scores were transformed to network edge weights using geometric decay function e−(MR-1/x)[156]; five different co-expression networks were constructed with x set to 5, 10, 25, 50, and 100, respectively. Edges with PCC <0.6 or edge weight <0.01 were excluded. For each network, modules of co-expressed genes were detected using ClusterONE v1.0 using default parameters [157], and modules with P value > 0.1 or quality score <0.2 were excluded. The results from all co-expression networks were then combined by collapsing modules into metamodules of nonoverlapping gene sets.
Oxygen radical absorbance capacity analysis
Total antioxidant capacity of tissues from the fruit developmental panel was analyzed using the ORAC assay [70]. Briefly, ~20–30 mg of frozen ground fruit tissue was measured for tissue samples prior to extraction. Sample extractions were performed on ground tissue using 1.8 mL of ice cold 50% acetone. Samples were vortexed and then put on a shaker for 5 minutes at room temperature. Samples were then centrifuged at 4°C for 15 minutes (4500 g). The ORAC assay was performed in a 96-well black microplate (Thermo Fisher Scientific, Waltham, MA) using the FLUOstar OPTIMA microplate reader (BMG LABTECH, Offenburg, Germany). Each reaction well contained 150 μL of 0.08 μM fluorescein and 25 μL of 75 mM phosphate buffer (blank), Trolox standards (6-hydroxy-2,5,7,8-tetramethylchroman-2-carboxylic acid), or diluted sample extracts. For blueberry tissue samples, 1:80–1:20 dilutions were used. Upon loading all appropriate wells, the 96-well microplate was put into the microplate reader and incubated for 10 minutes at 37°C. Following incubation, 25 μL of 150 mM AAPH (2,2′-azobis-2-methyl-propanimidamide, dihydrochloride) was added to each well, and fluorescence measurements began immediately. Fluorescence measurements (excitation: 485 nm, emission: 520 nm) were taken for 90 seconds per cycle for 70 cycles until the fluorescent probe signal was completely quenched. The area under the fluorescence decay curve (AUC) was calculated for each well. The total antioxidant capacity of a sample was calculated by subtracting the AUC from the blank curve from the AUC of the sample curve to obtain the net AUC. Using Trolox (water-soluble analog of vitamin E) of a known concentration, a standard curve was generated (12.5 μM–100 μM), and the total antioxidant capacity of each sample was calculated as Trolox equivalents. Each sample was run twice for two technical replicates. The coefficient of variation between technical replicates was required to be less than 0.20. Biological replicates (n = 3) were run for all tissues in the fruit developmental series.
Assay of phenolics and anthocyanin content
Berries from "Draper" were collected as described above. Approximately 100 mg (~10:1 solvent/tissue ratio) of each frozen ground sample was resuspended in extraction solvent in a 2 mL tube (80% methanol/20% water + 0.1% formic acid, containing 0.5 nM telmisartan [internal standard]). Ground tissue was immediately mixed thoroughly to prevent thawing during extraction and to prevent metabolism of analytes by enzymes in the samples. All tubes were spun down for 10 minutes at 13,000 × g to pellet protein and other insoluble material. Then, 1 mL of supernatant was transferred to an autosampler vial. Anthocyanin content was evaluated by LC-MS as follows: 5 uL of sample extract were separated using a 10 minute gradient on a Waters Acquity HSS-T3 UPLC column (2.1 × 100 mm) on a Waters Acquity UPLC system interfaced with a Waters Xevo G2-XS quadrupole time-of-flight mass spectrometer (Waters Corp, Milford, MA). Column temperature was maintained at 40°C, and the flow rate was 0.3 mL/min with starting conditions of 100% solvent A (water + 0.1% formic acid) and 0% solvent B (acetonitrile). The gradient was as follows: hold at 100% A for 0.5 minutes, ramp to 50% B at 6 minutes, then ramp to 99% B at 6.5 minutes, hold at 99% B to 8.5 minutes, return to 100% A at 8.51 minutes, and hold at 100% A until 10 minutes. Mass spectra were acquired in positive ion mode electrospray ionization over m/z 50–1500 in continuum mode using a data-independent MSE method that acquires data under both low and high collision energy conditions with the high collision energy setting using a ramp from 20–80 V. Capillary voltage was 3 kV, desolvation temperature was 350°C, source temperature was 100°C, cone gas flow was 25 L/hr, and desolvation gas flow was 600 L/hr. Correction for mass drift was performed using continuous infusion of the lock mass compound leucine encephalin. Anthocyanins and other related flavonoids were identified based on accurate mass and fragmentation pattern. Peak areas were determined using Quanlynx within the Masslynx software package (Waters Corp). Relative anthocyanin content was calculated for each sample using the formula: reported peak area of the compound/peak area of internal standard/weight of extracted tissue (peak area/IS/gdw).
Genomic and gene family analyses
The genome was aligned against itself in CoGe's SynMap program using LAST (LAST, RRID:SCR_006119) and default parameters [158]. Maximum distance between two matches was set to 20 genes, with minimum number of aligned pairs set to 10 genes. Tandemly duplicated genes were identified and filtered from CoGe outputs with a maximum distance of 10 genes. Fractionation bias was calculated, setting the max query and target chromosomes to 48. These analyses can be regenerated using the CoGe platform [65]. Protein sequences of blueberry was searched against previously characterized antioxidant related genes in Arabidopsis and other species in UniprotKB and NCBI databases using blastp in the BLAST+ package [159] with a cut-off e-value of 1E-10.
Availability of supporting data
The genome assembly, annotations, and other supporting data are publicly available on PURR [155] and also via the GigaScience database GigaDB [160] and the CyVerse CoGe platform [65,,161]. The raw sequence data were deposited in the Short Read Archive under NCBI BioProject ID PRJNA494180.
Additional files
Extended Data Table 1
Extended Data Table 2
Extended Data Table 3
Extended Data Table 4
Extended Data Table 5
Extended Data Table 6
Abbreviations
ANR: anthocyanidin reductase; ANS: anthocyanidin synthase; BLASTP: Basic Local Alignment Search Tool (Protein); bHLH: basic helix-loop-helix; bp: base pair; BUSCO: Benchmarking Universal Single-Copy Orthologs; CGA: chlorogenic acid; 4CL: 4-hydroxycinnamoyl CoA ligase; C3H: cytochrome P450 98A3; C4H: trans-cinnamate 4-monooxygenase; CHI: chalcone flavonone isomerase; CHS: chalcone synthase; DFR: dihydroflavonol reductase; EST: expressed sequence tags; F3H: flavanone 3-hydroxylase, F3'H: flavonoid 3′-hydroxylase; F3'5'H: flavonoid 3′,5′-hydroxylase; FHT: flavanone-3β-hydroxylase; FPKM: fragments per kilobase per million; kb: kilo base; HCT: hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyltransferase; HQT: hydroxycinnamoyl-CoA quinate hydroxycinnamoyltransferase; LAI: LTR Assembly Index; LAR: leucoanthocyanidin reductase; LC-MS: liquid chromatography-mass spectrometry; LTR: long terminal repeats; LTR-RT: long terminal repeat retrotransposons; Mb: mega base; MRCA: most recent common ancestor; MITE: miniature inverted repeat; MP: mate-pair; MR: mutual rank; MRP: multidrug resistance-associated protein; MYB: myeloblastosis; OMT: anthocyanin-O-methyltransferase; ORAC: oxygen radical absorbance capacity; PAL: phenylalanine ammonia-lyase; PCC: Pearson's correlation coefficient; SWEET: Sugar Will Eventually be Exported Transporter; TE: transposable element; TST: tonoplast sugar transporter; UFGT: UDP-glucose flavonoid 3-O-glucosyl transferase; UPLC: Ultra Performance Liquid Chromatography; WDR: WD40 repeats.
Competing interests
The authors declare that they have no competing interests.
Author contributions
P.P.E. designed the research; M.C., C.P.L., C.M.W., S.O., K.A.B., J.W., J.H.W., A.E.Y., E.I.A., H.T., Z.X., P.C., G.B., A.B., K.B., T.S., L.S., G.S., K.L.C., A.S., N.V., C.R.B., R.V., N.J., and P.P.E. performed research and/or analyzed data; and M.C. and P.P.E. drafted the manuscript. All authors reviewed and edited the manuscript.
ACKNOWLEDGEMENTS
We thank the reviewers and editor for their helpful comments during the review of this manuscript. This work was supported by Michigan State University AgBioResearch, USDA-NIFA HATCH 1009804 to P.P.E., USDA-NIFA AFRI 1015241 to P.P.E and G.S., USDA-NIFA HATCH 1016057 to J.H.W., National Natural Science Foundation of China 31560302 to X.Z., and Inner Mongolia Major and Special Program of Science and Technology 5163901 to X.Z.
References
CoGe – a platform for comparative genomics; https://genomevolution.org/r/12w9o.; Last accessed February 18th, 2019.
Scalable Nucleotide Alignment Program;
; Last accessed February 18th, 2019.