Alternative Splicing Regulation of Anthocyanin Biosynthesis in Camellia sinensis var. assamica Unveiled by PacBio Iso-Seq

Although the pathway and transcription factor regulation of anthocyanin biosynthesis in tea plants [Camellia sinensis (L.) O. Ktze] are known, post-transcriptional regulation mechanisms involved in anthocyanin accumulation have not been comprehensively studied. We obtained the full-length transcriptome of a purple cultivar (‘Zijuan’) and a normal green cultivar (‘Yunkang 10#) of C. sinensis var. asssamica (Masters) showing different accumulation of anthocyanins and catechins through PacBio isoform sequencing (Iso-Seq). In total, 577,557 mapped full-length cDNAs were obtained, and 2,600 average-length gene isoforms were identified in both cultivars. After gene annotations and pathway predictions, we found that 98 key genes in anthocyanin biosynthesis pathways could have undergone alternative splicing (AS) events, and identified a total of 238 isoforms involved in anthocyanin biosynthesis. We verified expression of the C4H, CHS, FLS, CCOM, F3′5’H, LAR, PAL, CCR, CYP73A13, UDP75L12, UDP78A15/UFGT, UDP94P1, GL3, MYB113, ANR, ANS, F3H, 4CL1, CYP98A3/C3H, CHI, DFR genes and their AS transcripts using qRT-PCR. Correlation analysis of anthocyanin biosynthesis and gene expression results revealed that C4H1, FLS1, PAL2, CCR2, UDP75L122 and MYB113-1 are crucial AS transcripts for regulating anthocyanin biosynthesis in C. sinensis var. assamica. Our results reveal post-transcriptional regulation of anthocyanin biosynthesis in tea plants, and provide more new insights into the regulation of secondary metabolism.

Alternative splicing (AS), the process of splicing the exons of primary transcripts (pre-mRNAs) from genes in different arrangements, can produce structurally and functionally distinct mRNA and protein variants (Blencowe 2006). Some important phenotype-related genes undergo AS in plants, e.g., those responding to drought, heat (Liu et al. 2018c), light (Cheng and Tu 2018), circadian rhythm regulation (Filichkin and Mockler 2012) and flowering time regulation (Macknight et al. 2002). Genes in the flavonoid biosynthesis pathway, including CHS, F3H, DFR, ANS and UDP-glucosyltransferase, also exhibit AS events during kiwifruit (Actinidia chinensis 'Hongyang') fruit development, whose main pigments in the fruit inner pericarp are anthocyanins (Tang et al. 2016). AS of the ANS gene, implicated in anthocyanin accumulation, was also detected in variegated peach flowers (Hassani et al. 2015). Additionally, AS of CHS and LAR exists in C. sinensis 'Shuchazao' (Zhu et al. 2018a). We therefore hypothesized that biosynthetic genes may undergo AS, allowing post-translational regulation of anthocyanin biosynthesis in C. sinensis.
The PacBio Iso-Seq (isoform sequencing) platform generates long reads-often up to 10 kb-making it possible to accurately reconstruct full-length splice variants. PacBio full-length cDNA sequencing has proven to be a robust technique for discovering thousands of alternatively spliced isoforms in several species, including sorghum, maize, red clover, wheat and rice (Wang et al. 2018;Chao et al. 2018;Zhang et al. 2019). The most extensive efforts to unravel secondary metabolism have been undertaken in C. sinensis var. sinensis using the single-molecule real-time (SMRT) sequencing platform (Xu et al. 2017;Wei et al. 2018). In our study, we generated the first full-length transcriptome reference sequences from the ZJ and YK cultivars of C. sinensis var. assamica derived from purple and green tender shoots, respectively, using the PacBio Iso-Seq technique. In addition, qRT-PCR verification identified more AS isoforms, contributing to our understanding of post-translational regulation of anthocyanin biosynthesis in tea plants.

MATERIALS AND METHODS
Plant materials and RNA sample preparation Purple young leaves (one bud with two leaves) and green mature leaves of 'Zijuan' (ZJ) and young leaves (one bud with two leaves) of 'Yunkan-10#' (YK) [both cultivars of Camellia sinensis var. asssamica (Masters) Kitamura] were sampled, with six replications, from healthy plants grown in the tea garden of the Tea Seed Multiplication Farm of Pu'er City, Yunnan Province, China. All samples were collected in the morning in early August, quickly frozen in liquid nitrogen, and stored at 280°until total RNA extraction.
Total RNA was extracted using an Omega Plant RNA kit (Omega Bio-Tek, Code: R6827) according to the manufacturer's protocol. The quality, integrity and quantity of total RNA were assessed using a NanoDrop 1000 Spectrophotometer (ThermoFisher Scientific, Wilmington, DE, USA) and an Agilent Bioanalyser 2100 with the Agilent RNA 6000 Nano kit (Agilent Technologies, Santa Clara, CA, USA), respectively.
PacBio Iso-Seq long-read sequencing First-strand cDNA was synthesized using a Clontech SMARTer PCR cDNA Synthesis Kit (Clontech, Code: 634926), and second-strand cDNA was synthesized by PCR using Phusion High-Fidelity DNA Polymerase (New England Biolabs, Code: M0530). After amplification, PCR products were purified using 0.59 AMPure beads (Beckman, Code: A63880). The products were size-selected using a BluePippin Size Selection System with the following bins for each sample: 1-2, 2-3 and 3-10 kb. Amplified cDNA products were used to generate SMRTbell template libraries in accordance with the Iso-Seq protocol (PacBio). Libraries were prepared for sequencing by annealing a sequencing primer and adding polymerase to the primerannealed template. Polymerase-bound template was bound to Mag-Beads, and sequencing was performed on a PacBio RSII instrument by Gene Denovo Biotechnology Co. (Guangzhou, China).

PacBio data analysis
The SMRT-Analysis software package SMRT Link v5.0.1 (PacBio) was used for Iso-Seq data analysis. Initially, the PacBio raw reads were classified into circular consensus sequence (CCS) and non-CCS subreads using ToFu23. The Classify module with default parameters was used to remove adapter sequences, poly-A tails, artificial concatemers and 39-truncated transcript sequences, resulting in a set of non-fulllength isoforms, full-length non-chimeric (FLNC) isoforms, fulllength chimeric isoforms and short reads (which were filtered out). For additional error correction, the FLNC transcripts and non-full-length isoforms were analyzed using PacBio Quiver software and Error Correction (ICE). The resulting high-quality (HQ) isoforms were then mapped to the Galgal 4 reference genome assembly using the Genome Mapping and Alignment Program with default parameters.

Functional annotation of transcripts
Transcripts were annotated by conducting BLASTx searches against public databases, including the NCBI non-redundant protein (Nr), NCBI non-redundant nucleotide (Nt), Swiss-Prot, Protein Family (Pfam), Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (https://www.kegg.jp/), with an E-value threshold of 10 25 .
Identification and classification of AS events AS events were identified using SUPPA (Alamancos et al. 2015), with the {SE,SS,MX,RI,FL} setting for generateEvents and default setting for Boundary. The tool AS talavista (Foissac and Sammeth 2007) was used to classify AS events, resulting in four major AS categories, IR (AS code: 1^22, 0), ES (AS code: 1-2^, 0), AA (AS code:12, 22) and AD (AS code: 1^, 2^), which were identified from the output files and counted. All AS transcripts were used for KEGG enrichment analysis (https://www.kegg.jp/).

Analysis of catechin and anthocyanin content
To analyze the content of catechins and anthocyanins in shoots and leaves, fresh materials were ground to a fine powder in liquid nitrogen. The powder (0.2 g) was extracted with 5 mL methanol at room temperature for 1 h. The residue was re-extracted three times using this method. Supernatants were filtered through a 0.2-mm membrane. Catechins and anthocyanins were analyzed by highperformance liquid chromatography (HPLC) (Wang et al. 2012) and quantified at 345 nm and 530 nm, respectively. Statistical analyses based on the data were performed using ANOVA to determine significant differences among group means; data are presented as means 6 SD (SD). A p value less than 0.05 was considered statistically significant. Correlation analysis between content of catechins and anthocyanins and qRT-PCR was performed using SPSS 22.0. Results are presented using TBtools (http://www.tbtools.com/).

Data availability
PacBio SMRT sequencing data have been submitted to the SRA of NCBI under accession numbers SRR8639545 and SRR8639546. HPLC chromatography for determation of anthocyanins and catechins ( Figure S1 and Table S2, Supplemental data 1). Length distribution of ZJ and YK Iso-seq sub-reads (Supplemental data 2 and Supplementary Figure S2). Number of ZJ (Purple digital) and YK (Green digital) isoforms enriched in terms of molecular function, cellular component and biological process based on GeneOntology (GO) analysis ( Figure S3). Gene annotation of KEGG pathways involved in anthocyanin and catechin biosynthesis in ZJ and YK. Isoforms from ZJ and YK were gathered in the KEGG database and their numbers compared ( Figure S4). Relative expression quantity of genes and their AS transcripts in ZJ and YK samples with one leaf and two buds compared to ZJ mature leaves based on qRT-PCR studies ( Figure S5); primers used are listed (Table S1). Number of isoforms in anthocyanidin biosynthesis of YK and ZJ (Supplemental data 3 and 4). GeneOntology (GO) annotation of ZJ and YK PacBio Iso-seq isoforms (Supplemental data 5) and KEGG annotation of ZJ and YK PacBio Iso-seq isoforms (Supplemental data 6). Classification of AS events in ZJ and YK (Supplemental data 7). Number of AS evens in anthocyanidin biosynthesis in YK and ZJ (Supplemental data 8). Relative expression quantity of genes and their AS transcripts in ZJ and YK samples with one leaf and two buds compared to ZJ mature leaves (Supplemental data 9). Coefficient of association matrix between chemical composition contents and gene expression (Supplemental data 10). Supplemental material available at figshare: https://doi.org/ 10.25387/g3.10559621.
Overview of PacBio Iso-Seq of ZJ and YK cultivars RNA-Seq technologies using the Illumina platform have been extensively used to analyze flavonoid biosynthesis in teas (Wang et al. 2018). However, the limitations of short-read sequencing result in the vast majority of isotags not representing full-length cDNA sequences. PacBio Iso-Seq provides reads .10 kb, and up to 60 kb (PacBio, Menlo Park, CA, USA 2016). In order to fully characterize the expression profiles of flavonoid biosynthesis genes, we generated supplemental information using PacBio isoform sequencing of multiple genes. Independent pooled samples of ZJ and YK were sequenced to obtain wide-coverage transcriptomes using PacBio Isoseq. We obtained 5,150,499 (ZJ) and 5,039,169 (YK) sub-reads based on 7,831,095,131 (ZJ) and 8,013,752,613 (YK) bases sequenced. The length distribution of ZJ and YK Iso-seq sub-reads is displayed in Supplementary Figure S2. We extracted 416,745 (YK) and 405,979 (ZJ) reads of circular consensus sequence (CCS) from the sub-reads. After trimming, assembly, filtering, classifying and clustering, the majority of these [306,188 (73.47%) for YK and 285,067 (70.22%) for ZJ] were full-length reads as indicated by detection of poly (A), as well as 59 and 39 primer, sequences. The full-length reads were mapped to the reference genome of the YK cultivar. Full-length non-chimeric sequences (FLNCs) of YK ranged from 131 to 10,547 bp, with an average length of 2,302 bp, whereas FLNCs of ZJ ranged from 130 to 8,823 bp, with an average length of 2,251 bp. The N50 values of the cDNAs were 2,602 and 2,680 bp in ZJ and YK, respectively. Subsequently, 101,178 and 89,438 polished, high-quality (HQ) isoform consensus reads were filtered from 180,132 (YK) and 163,658 (ZJ) full-length cDNAs, respectively, after Quiver polishing using the interactive ICE algorithm.
The isoforms we identified improved the map and annotation, and were similar to those of a previous study . The HQ isoforms obtained for ZJ and YK were mapped to reference genome of 'Yunkan-10 # '. Of the total mapped isoforms (87,575/97.92% isoforms for ZJ, 99,884/98.72% isoforms for YK), 13,719/15.34% (ZJ) and 16,620/16.43% (YK) HQ isoforms were multiply mapped, while 73,856/82.58% (ZJ) and 83,264/82.29% (YK) HQ isoforms were mapped to one unique location with greater than 90% coverage and identity, including reads 'mapped to the positive-sense strand or the negative-sense strand'. The remaining 1863 isoforms from ZJ and 1294 isoforms from YK were unmapped (Table 1). Compared to 80,217 isoforms with an average length of 1,781 bp and N50 of 2,459 bp obtained from C. sinensis 'Shuchazao', we obtained a higher number of alternative isoforms, which were also longer (Xu et al. 2017). The 213,389 polished consensus isoforms have been identified in 'Yuncha 1' (C. sinensis var. assamica) previously (Zhu et al. 2018), but the absence of mapping to the reference genome makes these data less reliable than the data we obtained in this paper.
After annotation using four databases (the non-redundant section of the NCBI database, SwissProt, KEGG and GO databases), 23,591 ZJ isoforms and 28,835 YK isoforms had at least one positive hit in the four public databases. In GO annotation, the majority of isoforms were assigned the molecular functions of protein binding (40% for YK and 41% for ZJ) and catalytic activity (49% for YK and 48% for ZJ), the biological process category of developmental and cellular processes, and the cellular component category of cell formation, in particular, cell membrane, organelle and cell component (15%, 22% and 22%, respectively, in both ZJ and YK) (Supplementary Figure S3).

AS transcripts involved in biosynthesis of anthocyanidins
We detected 4,337 and 5,524 alternatively spliced genes in ZJ and YK from PacBio Iso-seq reads, respectively. Both ZJ and YK generated seven main types of AS: exon skipping, alternative 59 splice sites, alternative 39 splice sites, retained introns, mutually exclusive exons, alternative first exon and alternative last exons. Among them, retained introns predominated, accounting for 34.47% and 38.7% of alternative transcripts in ZJ and YK, respectively ( Figure 2A, and Supplemental data 7). These alternatively spliced genes were assigned to 131 and 129 KEGG pathways, which were further grouped into 19 classes, including amino acid metabolism, biosynthesis of other secondary metabolites and carbohydrate metabolism. Most alternatively spliced transcripts belonged to the pathway associated with the biosynthesis of secondary metabolites and included genes associated with biosynthesis of amino acids (ko001230), plant-pathogen interaction (ko04626) and purine metabolism (ko00230) (Figure 3). For buds, young leaves, summer mature leaves, winter old leaves, stems, flowers, fruits and roots of C. sinensis 'Shuchazao', of the 69,569 AS events detected, 20.32% resulted from intron retention (IR), followed by alternative acceptor sites (AA, 15.59%), exon skipping (ES, 15.05%), and alternative donor sites (AD, 14.01%) (Zhu et al. 2018a). Thus, tissue-specific AS in tea plants might perform various functions during development. We obtained alternative isoforms that might regulate anthocyanin biosynthesis in tea. We found that 98 genes annotated with anthocyanin biosynthesis pathways could have undergone AS events ( Figure 2B), producing a total of 238 isoforms (Table 2). In YK, 55, 19, two, two and one alternatively spliced genes were associated with the following KEGG pathways related to biosynthesis of anthocyanidins, respectively: phenylpropanoid biosynthesis (ko00940), flavonoid biosynthesis (ko00941), anthocyanin biosynthesis (ko00942), isoflavonoid biosynthesis (ko00943), and flavone and flavonol biosynthesis (ko00944). In ZJ, 50, 14, three, six and one alternatively spliced genes were assigned to the same KEGG pathways mentioned above. These alternatively spliced transcripts encode key enzymes involved in biosynthesis of anthocyanidin, such as PAL, 4CL, LAR, C4H, F3H, CHS, ANS, ANR, LAR, FLS, CCOAOMT, UDP-glycosyltransferase 75L12, beta-glucosidase and peroxidase/cationic peroxidase (Table  2). In addition, alternatively spliced transcripts of GL3, AOG, 5AT, MYB113 and ANL2 were detected. Interestingly, apart from DFR and 5AT (only detected in YK), AS events occurred in genes of both ZJ and YK, and the alternatively spliced isoforms were different, such as those for transcripts of PAL, 4CL, F39H, CHS, ANS, ANR, LAR and FLS ( Figure. 2C). Therefore, AS events are widespread among biosynthetic genes of anthocyanidin in C. sinensis var. assamica, suggesting they are involved in the post-transcriptional regulation of anthocyanin biosynthesis as discussed previously (Jia et al. 2015). This hypothesis is confirmed by previous reports in other plants showing that AS events make important contributions to gene regulation (Zhu et al. 2018) and nearly all structural flavonoid pathway genes generate multiple splice variants (Huang et al. 2013).

Correlation analysis of anthocyanin biosynthesis and gene expression
After multiple sequence alignment, some genes and their partial alternatively spliced transcripts mentioned above were chosen to test their expression levels using qRT-PCR. The expression profiling shown in Supplementary Figure S5  GL3, F3H and MYB113 exhibited lower expression levels in one leaf and two buds of ZJ than in the same tissues of YK. However, expression of the F395'H, F3H, FLS, LAR, CHS, GL3 and MYB113 genes increased in mature leaves of ZJ compared with that in one leaf and two buds. The alternatively spliced transcripts MYB113-1 and MYB113-2 showed opposite patterns of expression. In addition, alternatively spliced transcripts of C4H, PAL and MYB113 showed discrepant expression in tea plants, with expression of C4H1, FLS1, PAL2, UDP75L12 and CCR2 be higher than that of other isoforms.
On the basis of the data above, AS isoforms C4H1, FLS1, PAL2, UDP75L122 and CCR2 are actively expressed in YK, they show a significant negative correlation with total anthocyanin biosynthesis and accumulation, and show a positive correlation with EGC and EGCG contents; we suggest that these are critical AS transcripts for regulating anthocyanin biosynthesis in Camellia sinensis var. assamica.
Most intriguingly, MYB and bHLH TFs were regulated by AS and may serve as activators for mediating downstream genes to affect anthocyanin content (Albert et al. 2014;Zhu et al. 2018). The expression level of MYB113 and GL3 was higher in YK than in ZJ, Figure 5 Construction of biosynthesis pathways for anthocyanins in C. sinensis var. Asssamica. The colorful font represents the genes regulated by alternative splicing evens. the green font represents the genes' expression are negtive correlation with the total anthocyanins contents, and red font represents the genes' expression are positive correlation with the total anthocyanins contents.
n■ consistent with the expression pattern described in Arabidopsis (Matsui et al. 2008;Zhu et al. 2009;Albert et al. 2014), so they may downregulate anthocyanin biosynthesis. What is more noteworthy is that AS of MYB113 TF genes (MYB113-1 and MYB113-2) in this study transcriptionally regulated the chemical composition in opposite patterns: higher expression level of MYB113-1 in one leaf and two buds of YK than ZJ, but lower expression level of MYB113-2 in YK. Only MYB113-1 showed significant positive correlation (P , 0.05) with anthocyanin contents in ZJ, and higher expression of GL3-2 might contribute to more EGC produced in YK than in ZJ. In summary, we carried out a varietal-specific transcriptome analysis of C. sinensis var. asssamica using a PacBio long-read sequencing approach to determine the regulatory mechanism of anthocyanin biosynthesis. Alternatively spliced transcripts of genes may act as major mediators at the post-transcriptional level during anthocyanin accumulation in different colorful leaves and cultivars of tea. We speculate here that anthocyanin accumulation in tea plants is most likely due to regulation, i.e., differential expression and AS of transcripts such as C4H1, FLS1, FLS3, 4CL, ANR, ANS, CYP98A3/ C3H, CHI, DFR, PAL1, PAL2, CCR2, CYP73A13, UDP75L12 (UDP75L121, UDP75L122 and UDP75L123), MYB113-1, MYB113-2, GL3-2 and UDP78A15. C4H1, FLS1, PAL2, CCR2, UDP75L122 and MYB113-1 are critical AS transcripts for regulating anthocyanin biosynthesis in Camellia sinensis var. assamica. The biosynthesis pathways for anthocyanins in C. sinensis var. asssamica are shown in Figure 5. However, our mechanistic understanding of the roles of n■ Table 2 The annotated isoforms and their AS transcripts of anthocyanidin biosynthesis genes in ZJ and YK AS in tea has remained extremely poor. Continued studies on the functions of regulators will provide new insights into the regulation of anthocyanin metabolism in tea plants.