Terpene Synthases and Terpene Variation in Cannabis sativa1[OPEN]

Judith K. Booth,a Macaire M.S. Yuen,a Sharon Jancsik,a Lufiani L. Madilao,a Jonathan E. Page,b,c and Jörg Bohlmanna,b,d,2,3 Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4 Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4 Aurora Cannabis, Vancouver, British Columbia, Canada V6B 3J5 Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4

The pistillate flowers of cannabis (Cannabis sativa) are densely covered with glandular trichomes that produce and accumulate a resin that is rich in cannabinoids as well as monoterpenes and sesquiterpenes (Turner et al., 1978;Brenneisen and elSohly, 1988;Livingston et al., 2020). Cannabinoids are responsible for the various medicinal and psychoactive properties of cannabis. The terpenes of cannabis resin, which include more than a dozen different monoterpenes and over a hundred different sesquiterpenes, account for much of the diverse organoleptic impressions of cannabis products ( Fig. 1; Fischedick et al., 2010;Casano et al., 2011;Booth and Bohlmann, 2019). Cannabis is broadly categorized into three major chemotypes based on the ratio of D 9 -tetrahydrocannabinolic acid (THCA) to cannabidiolic acid (CBDA). Type I has high amounts of THCA; type II has approximately equal amounts of THCA and CBDA, and type III is CBDA-dominant (de Meijer et al., 2003). Across these three major chemotypes, terpene profiles show much variation between different cultivars, with myrcene, limonene, a-pinene, a-terpinene, or b-caryophyllene as major variable components (Fischedick et al., 2010;Fischedick, 2017;Richins et al., 2018;Reimann-Philipp et al., 2019).
Terpene synthases (TPSs), which are encoded in large TPS gene families with several subfamilies, produce the diversity of cyclic and acyclic terpene core structures found in plants (Chen et al., 2011). In angiosperms, the TPS-a subfamily generally contains sesquiterpene synthases (sesqui-TPSs), and the TPS-b subfamily contains primarily monoterpene synthases (mono-TPSs) and hemiterpene synthases. Acyclic monoterpenes are also produced by members of the TPS-g subfamily. The TPS gene family has undergone lineage-specific expansions, leading to blooms of related TPS enzymes, as shown for example in grapevine (Vitis vinifera; Martin et al., 2010), eucalyptus (Eucalyptus grandis; Külheim et al., 2015), and tomato (Solanum lycopersicum; Falara et al., 2011). Terpenes and cannabinoids share common isoprenoid precursors (Fig. 1). The most abundant cannabinoids in different cannabis cultivars are THCA and CBDA, which are produced by cannabinoid synthases from cannabigerolic acid (CBGA; Sirikantaramas et al., 2004;Taura et al., 2007). CBGA is formed by condensation of the monoterpene precursor geranyl diphosphate (GPP) with the aromatic polyketide olivetolic acid (OA; Fellermeier and Zenk, 1998).
At least 55 different CsTPS gene models have previously been reported (Supplemental Table S1), but only 14 have been functionally characterized, including eight mono-TPSs and six sesqui-TPSs (Gunnewich et al., 2007;Booth et al., 2017;Allen et al., 2019;Zager et al., 2019;Livingston et al., 2020). The 14 functionally characterized CsTPSs account for some of the major terpenes in cannabis (e.g. a-pinene, limonene, myrcene, b-caryophyllene) as well as some of the rare compounds (e.g. terpinene, hedycaryol, and alloaromadendrene). However, much of the terpene variation in cannabis remains to be explored. While this article was in preparation, Zager et al. (2019) reported gene networks associated with terpenoid biosynthesis in seven different cannabis cultivars, revealing relationships between gene expression and terpenoid accumulation.
Cv Purple Kush (PK) has been established as a reference for genomic research in cannabis (van Bakel et al., 2011;Booth et al., 2017;Laverty et al., 2019). Here, we report the terpene profile of cv PK and its genome annotation for CsTPS genes and other genes of terpenoid and cannabinoid biosynthesis. We investigated variations in terpene profiles in flowers (Fig. 2) of six different cannabis cultivars, including cv PK, based on metabolite analysis, trichome-specific RNA-sequencing (RNA-seq) transcriptome analysis, and functional characterization of CsTPSs.

Annotation of the cv PK Reference Genome
As a foundation for our study of CsTPS genes and their role in terpenoid variation in different cannabis cultivars, we annotated CsTPS genes and other genes involved in isoprenoid and cannabinoid biosynthesis in the cv PK reference genome. We identified 19 complete CsTPS gene models in cv PK (Fig. 3), including four clusters of two to five genes that are more similar in sequence to one another than they are to those of any other gene model. Sequences for CsTPS6 were identified at two adjacent loci. In addition, five partial CsTPS genes were found in the cv PK genome, likely representing pseudogenes. We also located gene models for all known steps in isoprenoid and cannabinoid biosynthesis, including the plastidial methylerythritol phosphate (MEP) pathway leading to cannabinoids and monoterpenes. Many of the isoprenoid pathway genes, notably 1-deoxy-D-xylulose-5-phosphate synthase (DXS) and 1-deoxy-D-xylulose-5-phosphate reductase (DXR), have multiple copies. Similar to the CsTPSs, several other isoprenoid and cannabinoid pathway genes are arranged in multicopy clusters, namely genes encoding DXS, DXR, two copies of the polyketide synthase (PKS) responsible for producing OA (Taura et al., 2009), OA cyclase (OAC), aromatic prenyltransferases involved in cannabinoid and cannflavin biosynthesis (aPTs; Page and Boubakir, 2011;Luo et al., 2019;Rea et al., 2019), and cannabinoid synthases THCAS and CBDAS. None of the CsTPS gene models clustered with any other genes known to be related to terpenoid or cannabinoid biosynthesis. While there are no obvious biosynthetic clusters, CBDAS, GPP synthase (GPPS) small subunit, and CsTPS9 are positioned within a 10-megabase region of the cv PK genome.

Variation of Foliar Terpene Profiles within and between Cultivars
To explore variation of terpene biosynthesis in cultivars with different terpene profiles, we initially grew plants from 32 seeds, which according to the supplier's information, represented eight different cultivars: cv Lemon Skunk (LS), cv CBD Skunk Haze (CSH), cv Blue Cheese (BC), cv Afghan Kush (AK), cv Chocolope (Choc), cv Blueberry, cv Vanilla Kush, and cv Jack Herer. The initial metabolite analysis was done with leaf samples to enable subsequent selection of individual plants for clonal propagation. Once plants have reached the flowering stage, propagation from cuttings becomes inefficient.
In total, we detected 48 different terpene peaks in the gas chromatography-mass spectrometry (GC/MS) analysis of foliar extracts across all 32 individuals, of which 11 were annotated as monoterpenes and 37 as sesquiterpenes (Supplemental Fig. S1). Of these, only three monoterpenes, namely myrcene, a-pinene, and limonene, and two sesquiterpenes, b-caryophyllene and a-humulene, were present in every individual. To select plants representing the most contrasting terpene profiles for further study, we performed a principal component analysis (PCA). Principal components (PCs) 1 and 2 account for 26.4% and 19.7%, respectively, of the terpene variation among the 32 plants (Fig. 4A). Most plants cluster toward the lower end of PC2. All plants labeled as cv CSH clustered together. Only one cv Jack Herer seed germinated, so variability and clustering could not be assessed for this cultivar. For the other plants, there was as much variation among plants with the same cultivar label as between plants labeled as Representative photographs are of a cv PK inflorescence at four different stages, from youngest (1) to oldest (4). Different stages are characterized as follows: (1) very pale pistils and few to no stalked trichomes; (2) no browned pistils and ;50% stalked trichomes; (3) pistils beginning to brown, with entirely stalked trichomes; (4) entirely browned pistils, with brown or amber trichome heads. For this study, metabolite analyses were performed at stages 1 and 3. different cultivars. Five individual plants, one from each quadrant and one from near the center of the PCA plot, were selected for clonal propagation and detailed characterization including terpene and cannabinoid analysis of flowers, floral trichome transcriptome sequencing, transcript expression analysis, and CsTPS discovery and characterization. The selected individuals represent plants identified as belonging to the cultivars cv AK, cv BC, cv Choc, cv CSH, and cv LS.
Hierarchical cluster analysis of foliar terpenes from the 32 plants was used to determine which compounds account for most of the differences between individuals, and to identify compounds that co-occur. Of the 48 total terpene peaks identified, 23 were found to account for the significant variation in seven groups. Bisabolol contributed the most to differentiation between cultivars, followed by (E)-b-farnesene (Fig. 4B). Two guaiane-type sesquiterpenes clustered together and apart from other compounds. A guaiane-and an eremophilane-type sesquiterpene also clustered together and apart from other compounds. The two sesquiterpenes b-caryophyllene and a-humulene, which are produced by the same CsTPS (Booth et al., 2017), formed a unique cluster. Myrcene was a Figure 3. Genome locations of genes related to terpenoid and cannabinoid biosynthesis. Scaffolds are from Laverty et al. (2019). TPSs are shown in pink, UbiA family prenyltransferases in blue, MEP pathway genes in green, and cannabinoid biosynthetic genes in black. Loci with identical labels represent duplicated genes. Figure 4. Foliar terpene profiles differentiate cannabis plants grown from seeds. A, First two dimensions (Dim) of a PCA of foliar terpene profiles from 32 cannabis plants. Dim1 accounts for 26.35% of the variance between individuals and Dim2 accounts for 19.73%. Colors indicate the names under which seeds were obtained. Boxed points are individuals that were chosen for clonal propagation and further characterization. B, Unsupervised hierarchical cluster analysis of 46 terpenoid peaks (x axis) in 32 cannabis seedlings. Ward's minimum variance was used as the clustering method. Seven clusters, indicated by the colored boxes, were determined by inertia gain. member of this clade, but did not cluster with any other compounds. The remaining 39 compounds grouped into a larger cluster. The monoterpenes camphene and a-pinene clustered with a guaianetype sesquiterpene and three cadinane-type compounds. The same group also included b-bisabolene and eudesma-3,7(11)-diene, which were closely related to the largest cluster consisting of another bisabolane-type sesquiterpene, an unidentified sesquiterpene, terpinolene, linalool, limonene, and a himachalane-type sesquiterpene. The remaining compounds did not account for a significant proportion of the variation between terpene profiles.  Table S2). The five cultivars included four that are THCA-dominant and one with approximately equal amounts of THCA and CBDA (Supplemental Table S3). Foliar terpenes were dominated by sesquiterpenes ( Fig. 5; Supplemental Fig. S1), with a total terpene content between 0.5 and 1.1 mg g 21 dry weight (DW). In contrast, terpene levels were between 4.9 and 7.3 mg g 21 DW in juvenile flowers ). The proportion of monoterpene increased as flowers developed from juvenile to mature. In total, 15 different monoterpenes and 27 different sesquiterpenes were separated by GC/MS and quantified in mature flowers of the five cultivars (Table 1; Supplemental Fig. S2). In addition, several other terpenes were below the limit of quantification. Myrcene was the most abundant terpene in cv CHS, cv AK, and cv BC. In cv LS and cv Choc, the most abundant monoterpenes were (1)-a-pinene and (2)-limonene, respectively. In four of the five cultivars, b-caryophyllene was the dominant sesquiterpene. In cv AK, germacrene B was the dominant sesquiterpene. (E)b-farnesene was present in all samples and was a major component of cv Choc and cv AK.
A terpene profile for cv PK (Table 1) was produced with three clones of the cv PK plant that was sequenced for the reference genome; however, these plants were grown under different conditions. The terpene content of floral trichomes of cv PK, induced to flower after 4 weeks of vegetative growth, peaked at 21 mg g 21 DW.

Transcriptomes of Floral Trichomes Are Enriched for Terpene and Cannabinoid Biosynthesis
We produced 15 separate trichome-specific transcriptomes from three plants for each of the five cultivars.  (Livingston et al., 2020), and with .80% of pistils turning from white/green to brown. Total RNA was extracted from isolated trichome heads and used for RNA-seq. We initially assembled sequences from all five cultivars into a single pooled transcriptome. The normalization of a pooled transcriptome allows quantitative comparison among cultivars. The pooled assembly contained 599,285 nonredundant contigs with an average length of 511 bp (Supplemental Table S4). The trichome transcriptome raw sequence data are deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (accession no. PRJNA599437).

Table 1. Amounts of terpenes in mature flowers of different cannabis cultivars
Each value is the mean of three replicates from three clones of each cultivar, with the SD of nine samples in parentheses. Metabolites marked with an asterisk have been verified using authentic standards; others were identified based on MS and RI. Compounds below the limit of quantification across all samples are omitted. RI was calculated on a DB-Wax GC column. Note that cv PK data were obtained from plants grown for a separate study, and data may not be directly comparable with those of the other five cultivars. n.d., Not detected; tr, trace (,1 mg). To ensure that the time point of floral and glandular trichome development selected for RNA isolation represented active terpene and cannabinoid biosynthesis, we examined the transcriptome for genes of these pathways. In general, the pooled transcriptome assembly included at least one full-length transcript corresponding to each known step in the cannabinoid and terpene biosynthetic pathways. The 200 most highly expressed genes in the trichome transcriptome included isoprenoid biosynthesis enzymes [(E)-4-hydroxy-3methyl-but-2-enyl pyrophosphate (HMB-PP) synthase (HDS), HMB-PP reductase (HDR), isopentenyl diphosphate isomerase, and GPPS], six cannabinoid biosynthetic enzymes (CBDAS, PKS, OAC, THCAS, and two CBGA synthases [aPT1 and aPT4]), and seven CsTPSs (Supplemental Table S5). Three contigs annotated as fatty acid desaturase, which may be involved in biosynthesis of cannabinoid fatty acid precursors or cell membranes, were also highly expressed. Additionally, several contigs annotated as lipid transfer proteins or ABCG transporters were highly abundant.

RI
PCA was done on the complete set of 15 trichome transcriptomes. The three replicates of each cultivar clustered together, and cultivars were well differentiated (Fig. 6A). cv LS and cv Choc were the two cultivars most similar to each other, while cv AK had the most distance from the other cultivars. We used unsupervised cluster analysis to test for patterns of expression of contigs annotated as terpene or cannabinoid biosynthesis. Contigs were selected by mutual best tBLASTn hit against known sequences involved in isoprenoid and cannabinoid biosynthesis (Fig. 6B). The 26 contigs identified as putatively involved in resin biosynthesis clustered into four groups. Contigs associated with the core MEP pathway (DXS, DXR, HDS, and GPPS) clustered with cannabinoid biosynthetic genes acyl activating enzyme, aPT1, aPT4, and THCAS. Mevalonate (MEV) pathway genes grouped into two clusters, which also included the MEP pathway gene methylerythritol phosphate cytidyltransferase, CBDAS, and an aPT4 contig. A second cluster of MEV contigs were much less highly expressed on average, and also included isopentenyl phosphate kinase (IPK) and CBDAS. The final cluster had the highest average expression levels, and included cannabinoid biosynthetic genes PKS and OAC, as well as the MEP pathway gene HDR and a version of IPK.
Next, we performed a differential gene expression analysis across the five cultivars with all contigs that had expression levels of at least 100 counts per million (CPM). cv BC was used as the reference, because it placed near the center of the PCA of foliar terpene variation (Fig. 4A), and the other four cultivars were compared against cv BC. Differential gene expression analysis was performed with an adjusted P-value cutoff of 0.05 and a log 2 fold change cutoff of 3. In total, across the cultivars, 25,218 contigs were differentially expressed relative to cv BC; 19,987 were upregulated and 19,263 were downregulated. Contigs were identified with at least 95% identity to known enzymes involved in cannabinoid and terpene biosynthesis, but most were not significantly differentially expressed in any cultivar compared to cv BC (Fig. 6C). Most notably, CBDAS was highly upregulated in cv CSH, the only cultivar to produce CBD as a major cannabinoid. CsPT1 was downregulated in cv CSH, and CsPT4 was downregulated in cv LS. HMGR, a component of the MEV pathway, was upregulated in cv LS. OAC was downregulated in cv Choc.

CsTPS Gene Discovery
For discovery and quantification of CsTPS transcripts, separate transcriptomes were assembled for each cultivar (Supplemental Table S4). While separate transcriptomes do not permit quantitative comparison between cultivars, they eliminate the risk of quantitation errors from mapping kmers from similar transcripts between cultivars. We used the RNA-Bloom assembler (Nip et al., 2019), designed for single-cell RNA-seq libraries, to capture the diversity of sequences across the five cultivars while reducing the possibility of chimeric contigs. Contigs with .98% predicted amino acid sequence identity were collapsed under the longest representative sequence. These five single-cultivar transcriptomes and the previously published cv PK trichome transcriptome (van Bakel et al., 2011) were searched with BLASTX to identify known (Gunnewich et al., 2007;Booth et al., 2017;Zager et al., 2019;Livingston et al., 2020) as well as new CsTPS sequences. Sequences representing all but three of the previously functionally characterized and unique CsTPSs (a total of 18; Supplemental Table S1) were present in the transcriptomes of at least one of the five cultivars (Booth et al., 2017;Allen et al., 2019;Zager et al., 2019;Livingston et al., 2020). The three missing CsTPSs were CsTPS13, CsTPS14, and CsTPS33. When we screened the transcriptomes of the six different cultivars, we found a total of 33 unique and apparently fulllength CsTPS sequences, including two that we annotated as copalyl diphosphate synthase (CsTPS65) and entkaurene synthase (CsTPS66) of gibberellin biosynthesis.
A phylogeny of the predicted amino acid sequences of the 33 CsTPSs together with TPSs from other plant species placed CsTPSs into the subfamilies TPS-a, TPS-b, TPS-c, TPS-e/f, and TPS-g ( Fig. 7; Supplemental Table S6). Within the TPS-a subfamily, all CsTPSs fall into one cluster with TPSs from hops (Humulus lupulus) as the nearest noncannabis members. Within TPS-b, the CsTPSs fall into two clades, which we named the CsTPS-b1 and the CsTPS-b2 clades, with two hop mono-TPSs as the nearest relatives. Three CsTPSs fall into TPS-g, but do not cluster together. TPS-c and TPS-e/f each contain one CsTPS.

CsTPS Gene Expression in Five Different Cultivars
We used the separate trichome transcriptome assemblies to determine for each cultivar expression of CsTPSs and to correlate CsTPS gene expression and terpene profiles in each of the five different cultivars. The analysis was limited to predicted CsTPS sequences of 400 amino acids or longer to reduce quantification ambiguity, which allowed expression analysis for 18 different CsTPSs (Fig. 8). Transcript abundance was calculated within each cultivar relative to mean transcripts per million (tpm) values of all contigs across three clonal replicates. Each of the 18 different CsTPSs was highly expressed in at least one cultivar (Fig. 8).
CsTPS18, CsTPS29, and CsTPS35, which belong to the TPS-g subfamily, were the only CsTPSs with abovemean transcript abundance in all five cultivars.
In cv AK trichomes, CsTPS5, CsTPS16, CsTPS18, and CsTPS35 showed the highest expression, while CsTPS2 and CsTPS25 transcripts were barely detected. Expression levels of CsTPS3, CsTPS4, CsTPS9, CsTPS17, CsTPS20, CsTPS21, CsTPS22, CsTPS23, CsTPS29, CsTPS32, and CsTPS36 were similar to the mean trichome transcript abundance, defined as within 4-fold log 2 CPM of the mean. In cv BC, CsTPS5, CsTPS36, CsTPS18, and CsTPS9 were highly expressed, CsTPS25 transcripts were barely detected, and CsTPS23 and CsTPS20 transcript levels were low relative to mean trichome transcript abundance. The other 11 CsTPSs were expressed at levels similar to the mean transcript abundance. In cv Choc trichomes, CsTPS35 was the most highly expressed CsTPS. CsTPS22, CsTPS25, and CsTPS32 transcripts were detected at low levels. The remaining 14 CsTPSs were expressed at levels similar to the mean trichome transcript abundance. In cv CSH, the most highly expressed CsTPS transcripts were CsTPS4 and CsTPS32. CsTPS25 was detected at low levels, with the remaining 15 CsTPSs expressed at levels similar to the mean trichome transcript abundance. cv LS was the only cultivar with above-mean expression of CsTPS25 and the only one with below-mean expression of CsTPS1. The most highly expressed CsTPS in cv LS was CsTPS21. Aside from CsTPS21, all CsTPSs in cv LS were similar to the mean transcript abundance.
Within the TPS-b subfamily, the CsTPS-b1 clade contains three members that had not been previously described, specifically CsTPS17, CsTPS23, and The P-values were determined using a modified Student's t test with the R package "limma" (Ritchie et al., 2015). Significance categories were not significant (NS; gray), significant at a log 2 fold change of 2 (Log2 FC; green); significant at an adjusted P-value of 0.05 (P; blue); and significant by both fold change and adjusted P-value (P & Log 2 FC; red). Contigs labeled with names are those shown with yellow diamonds, representing transcripts that may be associated with resin biosynthesis. Green numbers indicate the number of transcript contigs in each cultivar with abundance significantly higher compared to cv BC, and red numbers the number of transcript contigs significantly lower compared to cv BC. AAE1, Acyl activating enzyme; PKS, polyketide synthase; OAC, olivetolic acid cyclase; 1-deoxy-D-xylulose-5-phosphate reductase: MCT, 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase; HDS, HMB-PP synthase; HDR, HMB-PP reductase; GPPS lsu, GPPS large subunit; GPPS ssu: GPPS small subunit; HMGS, 3-HMG-CoA synthase; HMGR, HMG-CoA reductase; MK, MEV kinase; PMK, MEV-3-phosphate kinase; FPPS, farnesyl diphosphate synthase.

DISCUSSION The CsTPS Gene Family
Previous estimates of the size of the CsTPS family varied from ;30 to 50 different genes (Booth et al., 2017;Allen et al., 2019). The present analysis of the cv PK reference genomes identified 19 complete and five partial CsTPS genes. The transcriptomes reported here and in two recent studies (Booth et al., 2017;Zager et al., 2019) cover 11 different cannabis cultivars. Screening of these cultivar-specific transcriptomes for CsTPS genes revealed variations of the CsTPS gene family, variations of CsTPS transcript expression, and variations of CsTPS enzyme functions with respect to their mono-and sesquiterpene products. Among the different cultivars, some CsTPS genes were more variable in their transcriptome representation across cultivars than others. For example, the CsTPS9 gene, which encodes a b-caryophyllene/a-humulene synthase, was expressed in the transcriptomes of all cultivars reported to date, including this study. The same is the case for CsTPS5, which encodes an enzyme that uses both GPP and FPP and produces multiple monoterpenes and sesquiterpenes, respectively. By contrast, the CsTPS2 gene, which encodes an a-pinene synthase, was not found in the cv PK genome or in the cv PK and cv AK transcriptomes, but was present in the transcriptomes of other cultivars. CsTPS8, which encodes a multiproduct sesquiterpene synthase, was only detected in transcriptomes of cv FN, cv Choc, and cv CSH. Considering these variations, which are based on the analysis of 11 different cultivars, we expect that the full suite of CsTPS genes that differ by sequence, expression, and function, and which contribute to different terpene profiles, will be substantially larger across the many cannabis cultivars that exist around the world. In this study, we used a conservative cutoff of 95% amino acid identity to assign sequences to the same CsTPS identifier to avoid separating minor variants. However, it should be noted that even at this cutoff, minor sequence variation may result in variation of enzyme function. Assigning unique gene identifiers to transcript sequences based on 100% identity would result in a larger number of apparently different CsTPSs (Allen et al., 2019;Zager et al., 2019).
Within the plant TPS phylogeny, CsTPSs of the TPS-a and TPS-b subfamilies cluster with TPS sequences of its close relative hops (Fig. 7). In both subfamilies, we found cannabis-specific expansions, suggesting that the diversity of CsTPSs described here for mono-and sesquiterpene biosynthesis may have resulted from progressive and relatively recent multiplications of a few ancestral CsTPSs. The TPS-b subfamily has two distinct CsTPS blooms, identified here as CsTPS-b1 and CsTPS-b2. The CsTPS-b2 group includes four members, of which all but one (CsTPS30) lack a predicted plastid target peptide. Two of these TPSs produce a-bisabolol, a sesquiterpene found in many cannabis cultivars, but which is not a product of any of the functionally characterized CsTPS-a group enzymes. These results suggested that indeed some CsTPS-b members contribute to sesquiterpene production in cannabis trichomes, although the TPS-b group has been previously described as including mostly mono-TPSs in other plant species (Chen et al., 2011). In sandalwood (Santalum spp.), a member of TPS-b also functions as a sesquiterpene synthase (Jones et al., 2011). It is striking that in both cannabis and sandalwood, these TPS-b members produce bisabolane-type sesquiterpenes, which may be due to similar routes of active site evolution from their respective monoterpene synthase ancestors (Gao et al., 2012).

Relatedness of CsTPS Functions
The expansion of CsTPSs provided an opportunity to assess whether different products of closely related CsTPSs may arise through similar cyclization cascades. We tested this hypothesis with a focus on sesqui-TPSs because of their usually complex cyclization cascades. The sesquiterpenes identified in the different cannabis cultivars of this study, including cv PK, belong to 11 sesquiterpene parent skeletons that may originate from six central carbocationic intermediates (Fig. 10A; Degenhardt et al., 2009). The farnesane, elemane, and germacrene sesquiterpenes of the cannabis resin may be formed by CsTPSs via either a farnesyl or nerolidyl cation. The eudesmane and humulane sesquiterpenes, Figure 10. Proposed routes of sesquiterpene formation by CsTPS and correlation with CsTPS sequence relatedness. A, Schematic of carbocation intermediates and sesquiterpene classes (according to Degenhardt et al. [2009]) for sesquiterpenes identified in Cannabis floral trichomes. B, Intermediates and major and minor products of CsTPSs described in this article. Intermediates include all major proposed cationic intermediates, and "major product" is the class of the most abundant sesquiterpene product of each enzyme. 1, (E,E)farnesyl diphosphate; 2, (E,E)-farnesyl cation; 3, farnesane skeleton; 4, nerolidyl cation; 5, bisabolyl cation; 6, (E,E)-germacranedienyl cation; 7, (E,E)humulyl cation; 8, (Z,E)-germacranedienyl cation; 9, (Z,E)-humulyl cation; 10, bisabolane skeleton; 11, elemane skeleton; 12: eudesmane skeleton; 13, humulane skeleton; 14, cadinane skeleton; 15, germacrane skeleton; 16, guaiane skeleton; 17, aromadendrane skeleton; 18, himachalane skeleton. which were abundant in the leaf and flower metabolite profiles, most likely arise from (E,E)-germacranedienyl and (E,E)-humulyl cations, respectively, which are formed by 10,1 or 11,1 closure of the farnesyl cation. Four different types of cannabis sesquiterpenes may be formed via the (Z,E)-germacranedienyl cation, the elemane and germacrene compounds, and the cadinane and guaiane skeletons. The nerolidyl cation can also cyclize into the (Z,E)-humulyl cation via 11,1 closure, leading to formation of the himachalane and aromadendrane sesquiterpenes. While the latter compounds are generally present in cannabis terpene profiles, they were not abundant in the cultivars of this study. The bisabolane sesquiterpenes are likely formed from the bisabolane carbocation, which is generally the result of 6,1 closure of the nerolidyl cation.
We attempted to correlate CsTPS positions in the TPS phylogeny ( Fig. 7) with their assumed cyclization reactions (Fig. 10B). CsTPS18 and CsTPS35 are related enzymes that each produce acyclic farnesane compounds. CsTPS5, CsTPS31, and CsTPS32 are related enzymes in the CsTPS-b1 group and share many of the same products and likely the same intermediates, bisabolyl and (Z,E)-humulyl cations. The more closely related CsTPS5 and CsTPS32 share three of four of the same product skeletons and are likely to share the same four potential intermediates. CsTPS8(FN), CsTPS28(PK), and CsTPS21(PK) share only four products between them, but their major and secondary products could all be formed from the (E,E)-germacranedienyl cation. Similarly, CsTPS7(FN) and CsTPS22(PK), which are closely related, may also share three of four intermediate carbocations. CsTPS25(LS), which groups with CsTPSs that have mostly cyclic primary products, produces predominantly acyclic sesquiterpenes. Its secondary product, however, is a cadinane sesquiterpene, which may be a result of its recent evolution from a cyclicproduct sesqui-TPS. Overall, we found that similar proposed cyclization routes are more commonly shared between closely related CsTPSs than between more distantly related CsTPSs.

Assessing CsTPS Expression and CsTPS Products to Explain Cannabis Metabolite Profiles
Of the total 61 apparently unique CsTPSs, only 14 had been functionally characterized prior to this work (Supplemental Table S1; Gunnewich et al., 2007;Booth et al., 2017;Zager et al., 2019;Livingston et al., 2020). Here we describe the functional characterization of 13 additional CsTPSs and validation of the functions of several others. The CsTPSs described here, together with those previously reported, account for most of the terpenes identified in the cannabis cultivars of this study. One of the objectives of this work was to explore to what extent information on CsTPS expression and CsTPS function can be used to predict terpene profiles in cannabis trichome extracts. We found that with current knowledge, metabolite profiles can only be partially predicted, and substantially more information is required about the CsTPS proteome, enzyme kinetics, and substrate availability. Across the different cultivars, CsTPSs and other genes for terpene biosynthesis, as well as cannabinoid biosynthesis genes, were highly expressed in floral trichomes (Fig. 6B). This observation is in agreement with previous reports on the cannabis MEP and MEV pathways and selected CsTPS genes previously reported by Booth et al., (2017), Braich et al. (2019, and Livingston et al. (2020).
A single-time point transcript assessment is likely to be insufficient to explain the accumulation of terpene profiles, which occurs over longer periods of time. However, at a qualitative level, we found some general agreement between the presence of terpene products of CsTPSs expressed in a given cultivar (Fig. 8) and the metabolites that accumulate in the trichomes of that cultivar (Fig. 9). For example, cv AK and cv PK, which had no detectable transcript expression of the a-pinene synthase CsTPS2, also had the lowest proportion of a-pinene compared to the other cultivars (Figs. 8 and 9). Similarly, cv AK has a high proportion of nerolidol and relatively high expression of the linalool/nerolidol synthase CsTPS35. An example of a case where current knowledge of CsTPS expression and CsTPS function could not quantitatively explain metabolite profiles is the high proportion of (E)-b-farnesene in the metabolite profile of cv Choc. This cultivar did not reveal high levels of transcripts of any of the three CsTPSs known to encode enzymes that produce (E)-b-farnesene as a major product (CsTPS5, CsTPS25, and CsTPS32). Some possible explanations are that one or more of these CsTPSs may be a highly efficient enzyme, this protein may be highly stable, or additional (E)-b-farnesene synthases may exist to account for the level of (E)-b-farnesene in cv Choc. Similarly, the b-caryophyllene/a-humulene synthase CsTPS9 did not show particularly high transcript levels in any of the cultivars, although these two sesquiterpenes are commonly among the most abundant in cannabis. There are also several terpenes in the metabolite profiles that cannot yet be accounted for by products of known CsTPS functions. These compounds may be the products of CsTPSs that remain to be characterized or minor products of CsTPSs that were below the detection level under assay conditions but accumulate to detectable levels in trichomes over the course of flower development. We also observed the opposite, where a CsTPS product is not found in the metabolite profile despite high transcript levels. Notably, the hedycaryol synthase CsTPS20 was highly expressed across several cultivars, but hedycaryol was not observed in the metabolite profile in any of the cultivars. The labile hedycaryol may be subject to modification (Hattan et al., 2016).

TPS Gene Family Variation and Variation of Terpene Metabolite Profiles
The CsTPS gene family appears to be of a size similar to that reported for other plant species with extensive terpene diversity (Chen et al., 2011). Variation of the composition of the TPS gene family, or variation of TPS gene expression, within a given plant species has been linked to variation of terpene profiles in a number of different systems. This includes both cultivated and noncultivated plants, as well as angiosperms and gymnosperms. For example, in grapevine, members of a large VvTPS gene family are differentially expressed between tissues, developmental stages, and cultivars, leading to differences in terpene profiles depending on the specific combination of TPS genes that are expressed during flowering and fruit ripening (Martin et al., 2009(Martin et al., , 2010Drew et al., 2016;Smit et al., 2019). In rice (Oryza spp.), lineage-specific blooms of similar TPS genes contributed to variation of terpene defenses between different rice species (Chen et al., 2020). Similarly, in corn (Zea mays), variation of expression of ZmTPS genes encoding b-caryophyllene synthase is central to the variation of terpene-mediated indirect defense against corn borer (Ostrinia nubilalis; Köllner et al., 2008). In Sitka spruce (Picea sitchensis), a gymnosperm, copy number variation and variation of expression of PsTPS genes encoding (1)-3-carene synthase caused variation of monoterpene composition associated with insect resistance (Hall et al., 2011;Roach et al., 2014).
The variation of terpene metabolite profiles and CsTPS gene family expression described here for different cannabis cultivars highlights the apparently unlimited opportunity for humans to expand, design, and shape interesting terpene profiles in cannabis. While some of this potential has already been realized through traditional selection and propagation of cannabis strains over past decades and centuries, knowledge of the CsTPS gene family and its contribution to terpene variation will allow cannabis breeders to substantially increase the symphonic diversity of terpene compositions.

CONCLUSION
By identifying suites of CsTPS genes in six cannabis cultivars, we demonstrated variations of expression and function that contribute to the different terpene profiles in cannabis cultivars. The enzymes described here, together with other recent studies on terpene biosynthesis in cannabis (Booth et al., 2017;Livingston et al., 2020;Zager et al., 2019), bring the number of characterized cannabis CsTPSs to 30 across 14 cultivars.

Plant Material
Cannabis (Cannabis sativa) seeds were provided by Anandia Labs, a subsidiary of Aurora Cannabis (www.auroramj.com) under a Health Canada research license, and plants were grown at their laboratory. Seeds were surface sterilized in 5% (w/v) Plant Preservative Mixture (www.plantcelltechnology. com) and placed in petri dishes between filter paper soaked with 0.5% (w/v) of this mixture. Germination occurred within 2 to 10 d. Germinated seeds were planted in soil (Sunshine Mix 4, Sun Gro Horticulture; www.sungro.com) supplemented with Florikote 14-14-14 controlled-release fertilizer (www. americanhort.com). During the vegetative growth stage, plants were kept under an 18 h/6 h light/dark cycle under T5 HO light bulbs. Plants were fertilized twice weekly with Peter's Excel 15-15-15 water-soluble fertilizer (pH 5.6-5.8; www.domyown.com). After ;2 weeks of growth under the 18 h/6 h light/dark cycle, plants were moved to a surface under high-pressure sodium light bulbs and a 12 h/12 h light/dark cycle to induce flowering. During the flowering stage, plants were fertilized twice weekly with MaxiBloom 15-15-14 watersoluble fertilizer (pH 5.6-5.8; www.generallyhydroponics.ca). Between fertilizations, plants were continuously watered with tap water in hydroponic chambers.
For clonal propagation of plants of five different cultivars, cv LS, cv CSH, cv BC, cv AK, and cv Choc, cuttings were taken from well-established stock plants in the vegetative stage and surface sterilized with 5% (v/v) bleach. Cut ends were dipped in 0.4% (w/v) indole-3-butyric acid rooting hormone (www. valleyindoor.com), placed in rockwool cubes soaked for 1 h in pH 6 water, and kept in trays under a clear plastic dome to maintain humidity and promote rooting. Rooted cuttings in rockwool cubes were moved into hydroponic chambers. Female cv PK plants were clonally propagated and grown as described above, but cuttings in rockwool were transferred directly into soil. All plants were grown in growth chambers (BC Northern Lights) under LED lights (3000K 80 CRI spectrum, 1200 W equivalent, BC Northern Lights). The plants were subjected to vegetative growth for 2 to 3 weeks using an 18 h/6 h light/ dark cycle and watered with Peter's Excel (15-5-15). To induce flower development, the light cycle was switched to 12 h/12 h, and plants were watered with Maxibloom (5-15-14).

Harvesting of Leaf and Flower Samples
Leaves (three per plant) were removed from plants 4 weeks post germination with scissors and placed into 50-mL Falcon tubes. Flowers were harvested for trichome isolation by removal of entire inflorescences of plants at two stages, 1 week post induction of flowering and at midstage maturity, between 51 and 60 d post induction of flowering. The time of midstage maturity harvest was based on three criteria: (1) all glandular trichomes had matured to have a stalk; (2) 50% of pistils had begun to brown; and (3) trichome heads were translucent and had not changed color to appear amber or brown (Fig. 2). Flowers were taken from several nodes along the stem. For metabolite analysis, individual florets were removed using scissors and forceps and placed in a 1.5-mL Eppendorf tube. Fresh weight of harvested plant material was recorded and plant material was kept on ice for up to 60 min prior to extractions. After extraction, plant material was dried at 60°C for 16 h and DW was determined.

Terpene Extraction and Analysis
Intact plant material was extracted with three washes using 0.5 mL pentane per 100 mg fresh weight. For the first extraction, plant material was vortexed for 30 s in pentane to disrupt trichomes and then shaken at room temperature for 4 h. For the second and third extractions, the same plant material was shaken in pentane at room temperature for 1 h. The three pentane extracts were combined, centrifuged at 4,300g for 10 min, filtered through a 0.45-mm nylon membrane (Gelman Sciences/Thermo Fisher Scientific) to remove precipitated waxes and starch, and used for terpene analysis.
For the initial screening of terpene profiles in foliage harvested from plants in vegetative growth at 4 weeks post germination, each extract was analyzed by GC/MS on an Agilent 7890A GC coupled with an Agilent 7000A triple-quad MS. An Agilent HP-5 column (5% [w/w] phenyl methylpolysiloxane; 30-m length, 0.25-mm i.d., and 0.25-mm film thickness; 19091S-433HP-5MS, Agilent) was used. The injector was operated in pulsed-splitless mode at 250°C. He gas was used as the carrier with a flow rate of 1 mL min 21 and 30-s pulse at 25 psi. The oven program was 50°C for 3 min, increased by 10°C min 21 to 90°C, then by 20°C min 21 to 120°C, by 10°C min 21 to 150°C, and by 15°C min 21 to 320°C, then held at 320°C for 5 min, giving a total run time of 27.8 min. The mass spectrometer was operated in electron ionization mode at 70 eV and data acquisition was made in full-scan mode with a mass range of 40 to 500 atomic mass units.
Analysis of terpenes in extracts from flowers and TPS assay products was done on an Agilent 6890 GC coupled with an Agilent 5973 mass selective detector. An Agilent DB-Wax column (60-m length, 0.25-mm i.d., and 0.25-mm film thickness; 122-7062, Agilent) was used. The injector was operated in pulsedsplitless mode at 250°C; except for cold injection of the CsTPS28 products, the injector temperature was set to 50°C. He gas was used as the carrier with a flow rate of 1 mL min 21 and 30-s pulses at 25 psi. The initial oven temperature was 40°C, which was increased by 10°C min 21 to 100°C, by 3°C min 21 to 130°C, and by 30°C min 21 to 250°C, then held for 12 min. The mass spectrometer was operated in electron ionization mode at 70 eV and data acquisition was made in full-scan mode with a mass range of 40 to 500 atomic mass units. Chiral analysis was done using a Cyclodex-B column (30-m length, 250-mm internal diameter, 0.25-mm film thickness; 122-2532E, Agilent). The injector was operated in pulsed-splitless mode at 240°C. He gas was used as the carrier with a flow rate of 1 mL min 21 and 30-s pulses at 25 psi. The initial oven temperature was 40°C for 1 min, increased by 3°C min 21 to 80°C, then increased by 25°C min 21 to 240°C, then held for 5 min. The mass spectrometer was operated in electron ionization mode at 70 eV.

Trichome Isolation
Flowers were collected at midstage maturity from all branches of three clonal plants for each cultivar and incubated in water containing 5 mM aurintricarboxylic acid and 1 mM thiourea for 1 to 4 h on ice. After incubation, tissue abrasion to remove trichomes was achieved using a BeadBeater with 30 to 60 g of tissue with 100 g of 1-mm-diameter zirconia/silica beads and 20 g of XAD-4 in enough trichome RNA purification buffer (TRPB; 25 mM HEPES [pH 7.3], 200 mM sorbitol, 10 mM Suc, 5 mM dithiothreitol, 5 mM aurintricarboxylic acid, 1 mM thiourea, 0.6% [w/v] methyl cellulose, and 1% [w/v] polyvinylpyrrolidone 40,000) to fill the BeadBeater chamber completely (total volume 350 mL). Floral tissue was abraded 33 for 15 s with a 30-s rest on ice between abrasions.
Tissue was filtered through 350-and 105-mM nylon mesh, and filtrate was collected on 40-mM mesh. Purified trichome heads were then collected in a 15-mL Falcon tube and rinsed 33 with TRPB without methyl cellulose and polyvinylpyrrolidone 40,000. Purity of the trichome head preparation was determined by light microscopy (Supplemental Fig. S3). Trichome heads were pelleted by centrifugation at 200 g for 1 min. Pellets were weighed and then flash-frozen in liquid nitrogen and stored at 280°C.

RNA Isolation, Transcriptome Sequencing, and Assembly
Trichome pellets (200 mg) were used for RNA isolation. RNA was isolated using PureLink Plant RNA Reagent (Thermo Fisher Scientific) according to the manufacturer's protocol. RNA concentration, purity, and integrity were determined using an Agilent 2100 Bioanalyzer microchip. Three replicates of trichome RNA from each clone were used for RNA-seq. Total RNA in a volume of 15 mL at 100 ng mL 21 was used for each sample. Sequencing was performed by the McGill University and Génome Québec Innovation Centre (Montreal, Canada), who performed strand-specific library preparation without heating the samples. Sequencing was performed on an Illumina HiSeq2000 platform using 100 bp paired-end sequencing. All samples were pooled and sequenced on four lanes, generating ;1.5 billion paired-end reads in total. Quality of the sequences was assessed with FastQC (www.bioinformatics.babraham.ac.uk/ projects/fastqc/). Reads that mapped to cannabis ribosomal RNA sequences, downloaded from NCBI, using Bowtie2 were removed. Adapters were trimmed with BBDuk from the BBTools software suite (www.sourceforge.net/ projects/bbmap/). To improve the contiguity of the assembly, overlapping paired-end reads were joined by BBMerge to generate longer single-end reads. All merged and unmerged reads were pooled and first assembled with Trinity (version 2.6.5) to generate 599,285 nonredundant contigs with an average length of 511 bp. To gain insight into cultivar-specific sequences, we reassembled all of the unmerged sequences using RNA-Bloom (version 0.9.8; Nip et al., 2019) to generate five separate assemblies, one per cultivar, with an average of 260,000 nonredundant contigs and average length of 1,400 bp.
TransDecoder (version 5.5.0) predicted on average 170,000 open reading frames (ORFs) for each assembly. Predicted peptides translated from the ORFs were clustered at 95% amino acid identity, using CD-HIT (version 4.8.1; Fu et al., 2012) to collapse possible allelic variants. Predicted peptides from each assembly were then pooled together and clustered again at 98% amino acid identity to further reduce variations between cultivars to a total of 55,550 sequences. Salmon (version 0.14; Patro et al., 2017) was used to quantify the level of expression on the corresponding ORFs for downstream differential expression analysis.

CsTPS Gene Identification and Genome Annotation
CsTPS candidate genes were identified using the transcriptome assemblies described above as the subject of a tBLASTn search using 100 previously characterized TPS genes from cannabis and other plant species. The completeness of the CsTPS predictions was confirmed by a hmmscan domain search. Gene and splice site prediction on the cv PK reference genome was performed using the Exonerate algorithm (Curwen et al., 2004) from a list of all characterized CsTPS sequences. N-terminal transit peptides were predicted using the TargetP and LOCALIZER tools (Emanuelsson et al., 2007;Sperschneider et al., 2017).

CsTPS cDNA Cloning and Functional Characterization
Complementary DNA (cDNA) was made from trichome RNA using the Maxima First Strand cDNA synthesis kit (Thermo Fisher Scientific). cDNA was amplified using gene-specific primer, and ligated into a pJET vector (Clontech). Sequences were verified by Sanger sequencing, and full-length or N-terminally truncated sequences were subcloned into expression vectors pET28b1 (EMD Millipore) or pASK-IBA37 (IBA Lifesciences), which both carry an N-terminal 6-HIS tag. Full-length CsTPS36BC synthesis was done by IDT (www.idtdna. com). Plasmids were transformed into Escherichia coli strain BL21DE3 for heterologous protein expression, as previously described (Roach et al., 2014). Heterologous protein production was induced using 200 mM isopropylthiob-galactoside (pET28) or 200 ng mL 21 anhydrotetracycline in methanol (IBA37), and protein was expressed at 18°C overnight. Cells were harvested by centrifugation and lysed by freeze-thaw cycles, warming the pellet to 4°C then freezing in liquid N 2 . Recombinant protein was purified using the GE healthcare HIS SpinTrap kit (www.gehealthcare.com). The binding buffer for purification was 20 mM HEPES (pH 7.5), 500 mM NaCl, 25 mM imidazole, and 5% (v/v) glycerol. Cells were lysed in binding buffer supplemented with Roche complete protease inhibitor tablets and 0.1 mg mL 21 lysozyme. The elution buffer was 20 mM HEPES (pH 7.5), 500 mM NaCl, 500 mM imidazole, and 5% (v/v) glycerol. Purified protein was desalted through Sephadex into TPS assay buffer: 25 mM HEPES (pH 7.3), 100 mM KCl, 10 mM MgCl 2 , 5% (v/v) glycerol, and 5 mM dithiothreitol. Protein purity was determined by western blotting using mouse monoclonal anti-polyHis antibody from Sigma-Aldrich (www. sigmaaldrich.com). In vitro assays were performed using 50 to 100 mL of freshly purified protein and TPS assay buffer to a final volume of 500 mL. Isoprenoid diphosphate substrates (www.isoprenoids.com) were dissolved in 50% (v/v) methanol and added to assays at a final concentration of 16 mM GPP or 13 mM FPP. Assays were overlaid with 500 mL pentane with 1.25 mM isobutyl benzene as internal standard. Assays were shaken at 40 rpm at 30°C for 4 h. Reactions were stopped, and products were extracted by vigorous vortexing of the assay vial for 30 s and then centrifuged at 4,300g for 15 min to separate phases. Assay products were determined using the same GC/MS equipment, program, and identification method as for floral terpene extracts described above (see "Terpene Extraction and Analysis").

Phylogenetic Analysis
ClustalW alignment of translated CsTPS and TPS sequences from other plants and maximum-likelihood phylogeny construction were done in CLC Plant Physiol. Vol. 184, 2020 Main Workbench 7. Phylogeny construction used the neighbor-joining method, with 100 bootstrap replicates. Tree visualization and labeling were performed on iTOL (Letunic and Bork, 2019).

Hierarchical Clustering, PCA, Heatmaps, and Differential Expression Analysis
Hierarchical clustering and PCA of the initial 32 seedlings used peak area for each compound normalized to tissue DW and internal standard isobutyl benzene. Clustering was performed using the R function hclust (Kaufman and Rousseeuw, 1990), with Pearson's correlation as a distance measure for metabolites (rows) and Spearman correlation for individual plants (columns). Dendrogram clusters were determined with the number of clusters set to the maximum, where inertia gain is .1. PCA and visualization used the R package "FactoMineR" (Lê et al., 2008) with default settings. Heatmaps were generated using the R package gplots (https://cran.r-project.org/web/packages/gplots/ ), with scale 5 "row" and z-scores used to normalize rows. Transcript abundance was calculated as the mean normalized counts per million of three replicates for each clone. Read counts estimated with Sailfish were normalized using DESeq2 R package version 1.6.1 for differential expression analysis (Love et al., 2014). Transcripts with a normalized CPM , 100 in three or more samples were discarded. The false discovery rate was set at 5%. Differentially expressed genes were defined by an adjusted log 2 fold-change .2 and a normalized P-value ,0.05.

Supplemental Data
The following supplemental materials are available.
Supplemental Figure S1. Representative extracted ion chromatograms of foliar terpene extracts from five cannabis plants.
Supplemental Figure S2. Representative mass spectra for all compounds identified in floral terpene extracts.
Supplemental Figure S3. Trichome head isolates from five cultivars.
Supplemental Figure S4. Representative mass spectra for all CsTPS products.
Supplemental Figure S5. Extracted ion chromatograms showing stereochemical determination of monoterpenes in cannabis juvenile floral terpene extracts and CsTPS enzyme assays.
Supplemental Table S2. Identification and amounts of foliar terpene in 32 different cannabis seedlings.
Supplemental Table S4. Assembly statistics for six transcriptome assemblies used.
Supplemental Table S5. Highly expressed contigs in five cannabis cultivars.
Supplemental Table S6. Accession numbers of TPS sequences used to construct phylogeny.