-
PDF
- Split View
-
Views
-
Cite
Cite
Xu Lu, Lijin Huang, Henrik V Scheller, Jay D Keasling, Medicinal terpenoid UDP-glycosyltransferases in plants: recent advances and research strategies, Journal of Experimental Botany, Volume 74, Issue 5, 13 March 2023, Pages 1343–1357, https://doi.org/10.1093/jxb/erac505
- Share Icon Share
Abstract
Terpenoid glycosides have significant curative effects on many kinds of diseases. Most of these compounds are derived from medicinal plants. Glycosylation is a key step in the biosynthesis of medicinal terpenoids. In plants, UDP-dependent glycosyltransferases comprise a large family of enzymes that catalyze the transfer of sugars from donor to acceptor to form various bioactive glycosides. In recent years, numerous terpenoid UDP-glycosyltransferases (UGTs) have been cloned and characterized in medicinal plants. We review the typical characteristics and evolution of terpenoid-related UGTs in plants and summarize the advances and research strategies of terpenoid UGTs in medicinal plants over the past 20 years. We provide a reference for the study of glycosylation of terpenoid skeletons and the biosynthetic pathways for medicinal terpenoids in plants.
Introduction
Terpenoids, also known as isoprenoids, are a large group of plant natural products. At present, >90 000 terpenoids have been described, most of which are secondary metabolites in medicinal plants (Avalos et al., 2022). Terpenoids are composed of isoprene units and have the general formula (C5H8)n. According to the number of isoprene units, terpenoids can be divided into monoterpenoids (C10), sesquiterpenoids (C15), diterpenoids (C20), triterpenoids (C30), etc. (Lange and Ahkami, 2013). Several terpenoids have extensive application in the medical field and have remarkable curative effects on various diseases. Protopanaxatriol (PPT)-type ginsenosides are considered key compounds to improve blood circulation and alleviate blood stasis (X. Li et al., 2021). Asiaticoside has strong wound healing properties and effects on cardiovascular diseases (Lee et al., 2012; Razali et al., 2019).
The biosynthetic pathways for terpenoid precursors, isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), are relatively well understood and include the mevalonic acid (MVA) pathway and 2C-methyl-d-erythritol 4-phosphate (MEP) pathway. The direct precursors of terpenoids, geranyl pyrophosphate (GPP), farnesyl pyrophosphate (FPP), and geranylgeranyl pyrophosphate (GGPP), are converted by terpene synthases (TPSs) into mono-, sesqui-, di-, triterpenes, and higher. GPP, the precursor of monoterpenoids, is formed by condensation of IPP and DMAPP. FPP, formed by the condensation of DMAPP and two IPPs, is the precursor of sesquiterpenoids, and two molecules of FPP can be condensed to squalene (the precursor of triterpenoids) under the action of squalene synthase (SQS). GGPP is the precursor of diterpenoids, and two molecules of GGPP can form a tetraterpenoid precursor (Banerjee and Sharkey, 2014; Liao et al., 2016; Lu et al., 2016). Terpenoid skeletons are further functionalized, through methylation, glycosylation, hydroxylation, epoxidation, isomerization, and other modifying reactions, to yield a wide range of natural products. The glycosylated products of terpenoids have found important application in food and healthcare products, such as the antitumor, blood pressure-lowering, and immunity-enhancing effects of ginsenoside, the immunostimulatory effects of QS-21 saponin, the anti-inflammatory effects of paeoniflorin, and the antioxidant effects of salidroside (Ju et al., 2017; Mancuso and Santangelo, 2017; Lacaille-Dubois, 2019).
Glycosylation is a key step in the biosynthesis of medicinal terpenoids. Glycosyltransferases (GTs) catalyze the transfer of a glycosyl residue from an activated sugar donor to the aglycone or an already glycosylated acceptor and form various glycoside compounds (Mackenzie et al., 1997). Based on amino acid sequence similarities, 1 046 617 GTs have been classified into 116 families in the Carbohydrate-Active Enzyme (CAZy) database (http://www.cazy.org/GlycosylTransferases.html) as of 25 November 2022 (Drula et al., 2022). In plants, the largest family is GT family 1, UDP-dependent glycosyltransferases (UGTs), which use UDP sugars such as UDP-d-glucose (UDP-Glc), UDP-d-glucuronic acid (UDP-GlcA), UDP-l-rhamnose (UDP-Rha), UDP-l-arabinose (UDP-Ara), UDP-d-xylose (UDP-Xyl), and UDP-d-galactose (UDP-Gal) as sugar donors (Vogt and Jones, 2000; Kurosawa et al., 2002; Yonekura-Sakakibara et al., 2008). UGTs attach the sugar to the recipient molecule through O–C, N–C, C–C, or S–C glycosidic bonds (Liang et al., 2015). Most of the identified plant UGTs are O–C UGTs. With the development of automated sequencing techniques, more and more UGTs have been identified and functionally verified. In this review, the function of and research strategies used to discover terpenoid UGTs in medicinal plants are summarized in detail. The typical characteristics and evolutionary relationship of medicinal terpenoid UGTs are also briefly described. We also provide references for the screening and identification of UGTs involved in the biosynthesis of medicinal terpenoids.
Typical characteristics of UGTs
Despite the low amino acid sequence identity, UGTs have a highly conserved secondary and tertiary structure, and they all adopt a GT-B structural fold. The GT-B fold consists of two separate Rossman-like β/α/β domains (N- and C-terminal domains), and the deep cleft between the two domains is thought to be the catalytic center (Bourne and Henrissat, 2001; Shao et al., 2005; Osmani et al., 2009; Malik and Black, 2012; Albesa-Jové et al., 2014; Rahimi et al., 2019). Several crystal structures of plant UGTs have been solved (Shao et al., 2005; Li et al., 2007; Modolo et al., 2009; Hiromoto et al., 2015; Yang et al., 2019; Zong et al., 2019). UGT71G1s, which glycosylate terpenoid compounds, can be used as structural templates to establish the three-dimensional (3D) structure of candidate UGTs by homology modeling (Fig. 1A). In plants, most UGTs contain a characteristic consensus motif composed of 44 amino acid residues called the plant secondary product GT (PSPG) motif, as shown in Fig. 1B. The highly conserved PSPG motif is located at the C-terminus and is involved in the binding of UDP sugar (Mackenzie et al., 1997; Paquette et al., 2003; Bowles et al., 2006). Some UGTs have promiscuous activity for several UDP sugars but still have a preference for one specific UDP sugar, which may be related to the last two residues of the PSPG motif (Osmani et al., 2009). Glutamine and histidine residues, as the last amino acids of the PSPG, box are considered to be the key residues for the recognition of UDP-Glc and UDP-Gal, respectively (Kubo et al., 2004). Interestingly, as shown in Fig. 1C, multiple sequence alignments indicate that a glutamine residue is found in glucuronosyltransferases, rhamnosyltransferases, xylosyltransferases, and others, whereas the histidine residue indicates that a given UGT might have galactosyl- or arabinosyltransferase activity (Sawada et al., 2005; Montefiori et al., 2011; Han et al., 2014; Louveau et al., 2018; Chen et al., 2019). AsUGT99D1 from Avena strigosa shows high specificity for UDP-Ara, and the histidine residue (H404) at the end of the PSPG box is critical for sugar donor specificity. The AsUGT99D1-H404Q mutant has reduced arabinosyltransferase activity and increased xylosyl- and glucosyltransferase activities (Louveau et al., 2018). GuUGT73F17 from Glycyrrhiza uralensis showed high promiscuity to sugar donors, while the activities of UDP-Gal and UDP-Ara were remarkably lower than those of UDP-Glc and UDP-Xyl, perhaps because the terminal residue of its PSPG motif is glutamine rather than histidine (He et al., 2018). Analogously, although the rhamnosyltransferase AtUGT89C1 also has a histidine at the end of the PSPG motif, the mutation of H357Q slightly increases the activity with UDP-β-l-rhamnose (Zong et al., 2019). There are also other UGT regions that have been implicated in the determination of sugar donor specificity, such as the interdomain linker and N5 loop region (Osmani et al., 2009). In conclusion, the residues interacting with the sugar donor are mainly located at the C-terminus of UGT proteins, but residues positioned within the N-terminal domain may also affect the binding of the sugar donor. UGT specificity is currently very difficult to predict solely from the amino acid sequence, but prediction should be improved in the future with a deeper understanding of the UGT structure–function relationships.

Typical characteristics of UGTs. (A) Crystal structure of Medicago truncatula MtUGT71G1 complexed with UDP-glucose (PDB ID: 2ACW). The PSPG box is marked in green. (B) The PSPG motif among triterpenoid-related UGTs. (C) Multiple sequence alignment of PSPG boxes in various UGTs. MtUGT71G1 (AY747627.1), BvUGT73C10 (AFN26666.1), BvUGT73C11 (AFN26667.1), BvUGT73C12 (AFN26668.1), BvUGT73C13 (AFN26669.1), and CsUGT78A14 (KP682360.1) are glucosyltransferases. CsUGT78A15 (KP682361.1), AcF3GT1 (ADC34700.1), AcGaT (AB103471.1), and PhF3GalTase (AAD55985.1) are galactosyltransferases. VvGT5 (BAI22846.1), BpUGT94B1 (AB190262.1), and GuUGAT (KT759000.1) are glucuronosyltransferases. AtUGT78D3 (AED92375.1), AsUGT99D1 (MH244526.1), and GmSSAT1 (XM_003532226.4) are arabinosyltransferases. CmC12RT1 (AY048882.2), AtUGT89C1 (Q9LNE6.1), and GmUGT91H4 (BAI99585.1) are rhamnosyltransferases. GmUGT73F4 (BAM29363.1), PgUGT94Q13, and Pn3-32-i5 are xylosyltransferases. The important terminal residues of PSPG boxes are indicated in a red block.
While the C-terminal domain of UGTs mainly interacts with the sugar donor, the N-terminal domain is involved in acceptor substrate recognition (Albesa-Jové et al., 2014). The crystal structure of UGTs showed that the N-terminal domain is less conserved than the C-terminal domain, consistent with the great diversity of the acceptors of different UGTs (Akere et al., 2020). MtUGT71G1 recognizes both triterpenoids and flavonoids as glycosyl acceptors (Achnine et al., 2005). SrUGT76G1 has a high promiscuity of substrates, glycosylating aliphatic and branched alcohols, phenols, flavonoids, and terpenoids (Dewitte et al., 2016). UGT promiscuity is also reflected by the acceptor-binding sites, which consist primarily of a deep and open cavity within the N-terminal domain (Louveau and Osbourn, 2019). Interestingly, the acceptor promiscuity of UGTs tends to show regional specificity, that is a strong preference for the same site on related substrates. For example, BvUGT73C10-13 from Barbarea vulgaris glycosylates the C3–OH of similar triterpenoids including oleanolic acid, hederagenin, lupane sapogenin, and betulinic acid (Augustin et al., 2012).
Recent advances in the study of medicinal terpenoid UGTs in plants
Glycosylation of hydroxyl or carboxyl groups in terpenoids can produce various types of terpenoid glycosides. To date, multiple UGT enzymes have been functionally characterized for catalyzing medicinal terpenoid glycosylation, as summarized in Supplementary Table S1.
Monoterpenoid and sesquiterpenoid UGTs
Monoterpenoids, sesquiterpenoids, and their derivatives, existing mainly in the form of essential oils, are widespread in the plant kingdom. Volatile oils have a wide range of biological and pharmacological activities, including antimicrobial, anti-inflammatory, analgesic, anxiolytic, and antidepressive, and are frequently used in natural therapies, cosmetics, and the pharmaceutical industry (Sarmento-Neto et al., 2016; Zárybnický et al., 2018; Isah, 2019). Volatile terpenoids, such as citronellol, nerolidol, linalool, and geraniol, accumulate as water-soluble glycosylated forms in many land plant species. Glycosylated terpenoids are stored in relatively larger amounts in plants compared with their non-glycosylated counterparts due to lower volatility or otherwise less release from the plant, and are easily hydrolyzed to aromatic compounds during industrial processing as important natural sources of flavors and fragrances. However, only a few related UGTs have been reported. Many monoterpenyl β-d-glucosides accumulate in grape berries, which can be hydrolyzed during vinification to enrich the wine flavor. VvGT7 and VvGT14–VvGT16 play important roles in the formation of these glycosides, and show catalytic activity toward monoterpenoids (geraniol, citronellol, nerol, etc.) (Bönisch et al., 2014a, b). CsUGT85K11 and CsUGT94P1 from Camellia sinensis can produce aroma β-primeverosides by sequential glucosylation and xylosylation, respectively (Ohgami et al., 2015). CsUGT91Q2, involved in the modulation of plant cold stress tolerance, was shown to catalyze the glucosylation of nerolidol (Zhao et al., 2020). OfUGT85A84 was found to be responsible for the glycosylation of linalool and its oxides using UDP-Glc as a specific sugar donor (R. Zheng et al., 2019). PpUGT85A2 controls glycosylation of linalool for enhanced aroma and defense in peach (Wu et al., 2019).
Diterpenoid UGTs
Diterpenoid steviol glycosides from Stevia rebaudiana are intense natural sweeteners with various therapeutic benefits, such as preventing diabetes mellitus, hypertension, inflammation, diarrhea, obesity, and tooth decay (Chatsudthipong and Muanprasat, 2009; Momtazi-Borojeni et al., 2017; Ilić et al., 2017). Steviol glycosides are composed of a steviol aglycone core and several molecules of glucose. To date, at least 35 of these compounds have been isolated and identified, including the representative components tri-glycoside stevioside and the tetra-glycoside rebaudioside A (Richman et al., 2005; Ceunen and Geuns, 2013). As shown in Fig. 2B, four SrUGTs related to the biosynthesis of steviol glycosides have been successfully elucidated. SrUGT85C2 catalyzes the addition of glucose to the C-13 hydroxyl group of steviol to form a β-d-glucoside. SrUGT74G1 is responsible for glucosylation of the C19-carboxylic acid-activated group. SrUGT91D2 and SrUGT76G1 catalyze the formation of 1,2-β-d- and 1,3-β-d-glucosidic linkages, respectively (Richman et al., 2005; Dewitte et al., 2016; Olsson et al., 2016; Libik-Konieczny et al., 2021). After identifying the key enzymes in the biosynthesis pathway of steviol glycosides, Wang et al. (2016) successfully reconstructed a novel pathway for de novo synthesis of these compounds in Escherichia coli. Rubus suavissimus (also called ‘Chinese sweet tea’) and Angelica keiskei have been traditionally used for a long time as practical edible and medicinal plants, which possess significant bioactivities such as alleviating hypoglycemia, lowering blood sugar, and anti-inflammatory and antiangiogenic activities (Liu et al., 2006; Koh et al., 2009; H. Zheng et al., 2019). The main active ingredients are the diterpenoid glycosides, dominated by rubusoside. The biosynthesis of rubusoside is similar to that of stevioside and starts with steviol (Fig. 2B). Six novel UGTs were characterized, which are involved in the addition of two glucose residues to produce the final product rubusoside. RsUGT75L20, RsUGT75T4, AkUGT75L21, and AkUGT75W2 can convert the aglycone steviol to steviol 19-O-β-d-glucopyranoside (S19G), and then RsUGT85A57 and AkUGT85A58 subsequently act as 13-O-glucosyltransferases converting S19G to rubusoside. With the functional UGTs, de novo rubusoside production in E. coli has been achieved using whole-cell biocatalysis (Sun et al., 2018). These studies provide examples of successful engineering of E. coli into a chassis cell for heterologous biosynthesis, and also lay the foundation for the biomanufacturing of important diterpenoid glycosides. Crocins have different structures (crocin I, II, III, IV, and V) depending on the number and location of their glycosyl groups, and a series of UGTs related to biosynthesis of crocins have been identified from crocin-producing plants (Nagatoshi et al., 2012; Demurtas et al., 2018; Xu et al., 2020; López et al., 2021).

Examples of the glycosylation of several medicinal terpenoids. (A) Non-volatile aromatic terpenoids. (B) Steviol glycosides. (C) Dammarane-type ginsenosides.
Triterpenoid UGTs
Ginsenosides, abundant in Panax species, have a wide range of pharmacological activities and excellent prospects for new drug development. Based on their aglycone backbone structures, ginsenosides are classified into three subgroups: dammarane type, ocotillol type, and oleanane type. Dammarane types are the most common ginsenosides and can be further classified into protopanaxadiol (PPD) type and protopanaxatriol (PPT) type. Some ginsenosides arise through the action of gut microbes after digestion. Ginsenoside Rh2 and compound K (CK), a partially deglycosylated protopanaxdiol, are involved in regulating inflammation by the phosphorylation and degradation of IκBα, while ginsenoside Rg1 and Rb1 have antiaging activities. Ginsenoside Rg3 and Rh2 have antitumor activities, and ginsenoside Re has antioxidation and heart protection effects (Choi et al., 2007; Yu et al., 2017; Kim, 2018). As illustrated in Fig. 2C, it is obvious that multiple UGTs are often involved in one step of glycosylation. UGTs catalyze the addition of saccharides to triterpenoid aglycones, mainly at the C3/C20 positions for PPD-type ginsenosides or at the C6/C20 positions for PPT-type ginsenosides. UGTs frequently prefer to function on a specific position, for example UGTPg1 specifically glycosylates the C20 position to produce CK, Rd, and F1 (Yan et al., 2014). PgUGT74AE2 can transfer a glucose moiety from UDP-Glc to the C3 position of PPD and CK to form ginsenoside Rh2 and F2, respectively (Jung et al., 2014). PgUGT71A29 glycosylates C20–OH of ginsenosides Rh1 and Rd, producing PPT-type and PPD-type ginsenoside, respectively, which also seems to indicate that some UGTs are universal to PPD and PPT substrates (Lu et al., 2018). UGTPg101 acts on multiple positions of PPT (both C6–OH and C20–OH) and transforms PPT into ginsenoside Rg1 by a two-step glucosylation (Wei et al., 2015). Although UDP-Glc is the most common, the transfer of other sugars (including rhamnose, xylose, and arabinose) in the ginsenosides has been reported. D. Wang et al. (2020) first identified that Pn3-32-i5 uses UDP-Xyl as the donor to convert ginsenoside Rg1 to notoginsenoside R1. Recently, X. Li et al. (2021) successfully obtained 10 UGT94 enzymes from P. ginseng and P. notoginseng, all of which could transfer xylose to C6-O-Glc and convert ginsenoside Rg1/Rh1 to notoginsenoside R1/R2, respectively. Le et al. (2021) identified five UGTs of Gynostemma pentaphyllum involved in ginsenoside biosynthesis, including two GpUGT71s that glycosylate the C20–OH of PPD- and PPT-type ginsenosides; GpUGT74AC3 glycosylates the C3–OH of PPD-type ginsenosides; and two GpUGT94s add a glucose to the C3-O-glucosides of PPD-type ginsenosides. PzOAGT1-3 from Panax zingiberensis and PjOAGT from Panax japonicus were found to catalyze the conversion of oleanolic acid to oleanolic acid 3-O-β-glucuronide, which is a major breakthrough in the biosynthesis of oleanane-type ginsenosides (Tang et al., 2019). PjmUGT1 and PjmUGT2 regionally glycosylate the C-28 carboxyl and C-3 glucuronic acid groups of oleanolic acid 3-O-β-glucuronide, respectively, and participate in the production of the chikusetsusaponin Iva, zingibroside R1, and ginsenoside Ro (Tang et al., 2021).
Mogrosides, a family of cucurbitane-type tetracyclic triterpenoid saponins, are widely used as natural sweeteners and possess various notable pharmacological activities. The mogrol aglycone is successively glycosylated at C3/C24 positions to yield mogrosides containing a different number of glucosyl groups. SgUGT74AC1 showed specificity for mogrol by transferring a glucose moiety to the C3 hydroxyl to produce mogroside IE (Dai et al., 2014). SgUGT720-269-1 first glycosylates the C24–OH of mogrol to produce mogroside I-A1, and then glycosylates C3–OH to produce mogroside IIE; SgUGT94-289-3 is responsible for the addition of multiple glucosyl groups, generating mogroside IIIx, mogroside IV-A, siamenoside, and mogroside V (Itkin et al., 2016). Oleanane-type and ursane-type pentacyclic triterpenoids are widely distributed in the plant kingdom and have remarkable biological activities, such as the anti-inflammatory and neuroprotective effects of glycyrrhizin (C. Wang et al., 2020) and the wound healing and cardiovascular disease treatment of asiaticoside (Lee et al., 2012; Razali et al., 2019). GuUGAT can catalyze the glycosylation of glycyrrhetinic acid to glycyrrhizin via a two-step glucuronosylation, while GuUGT73P12 can only catalyze the second step by transferring a glucuronosyl moiety of UDP-GlcA to glycyrrhetinic acid 3-O-monoglucuronide (Xu et al., 2016). CaUGT73AH1 can glycosylate the C28–COOH of asiatic acid to the corresponding monoglycoside, and CaUGT73AD1 can specifically glycosylate the C28–COOH of asiatic acid and madecassic acid. However, the UGTs that ultimately generate asiaticoside and madecassoside require further study (de Costa et al., 2017; Kim et al., 2017). IaUGT74AG5 can glycosylate the C28–COOH of ursolic acid to form ursolic acid 28-O-β-d-glucopyranoside, and has regiospecificity for catalyzing the 28-O-glucosylation of oleanolic acid, hederagenin, and ilexgenin A to produce their corresponding monoglycosides (Ji et al., 2020). Futhermore, SoUGT74BB2 from spinach acts directly on medicagenic acid 3-O-glucuronic acid to link d-fucose to the carboxyl group at the C28 position of the aglycone, giving rise to Yossoside I; this is the only report of a d-fucosyltransferase in plants, presumably using UDP-d-fucose as substrate. Subsequently, SoUGT79K1, SoUGT79L2, and SoUGT73BS1, respectively, use UDP-Rha, UDP-Glc, and UDP-Xyl as sugar donors to obtain a series of spinach saponins with antibacterial and insecticidal activities (Jozwiak et al., 2020).
Research strategies for identifying medicinal terpenoid UGTs in plants
The screening and functional verification of candidate UGTs usually require numerous studies. Several continuously updated databases involving plant genomes, transcriptomes, and metabolites, such as MPOD (He et al., 2022), TCMPG (Meng et al., 2022), and 1 K-MPGD (Su et al., 2022), provided great convenience for screening out the potential UGTs. Candidate screening strategies include transcriptome analysis, gene-to-metabolite network analysis, chemical proteomics analysis, and genome and gene cluster screening, which are generally used in combination with each other. Validation of candidate UGTs mainly utilizes microbial expression systems including prokaryotic expression and eukaryotic expression, as well as plant expression systems, as shown in Fig. 3.

Research strategies for identifying medicinal terpenoid UGTs in plants.
Transcriptome analysis
With the rapid development of high-throughput sequencing technology, transcriptome studies of plants have progressed remarkably, and increasing numbers of researchers have started to publish omics data (Wang et al., 2018; He et al., 2022). Currently, transcriptome sequencing combined with bioinformatics analysis is one of the most common and efficient methods to screen UGT genes. Based on the phylogenetic analysis, secondary structure prediction, and multiple sequence alignment of the Tripterygium wilfordii transcriptome, Ma et al. (2019) cloned a full-length gene encoding the glycosyltransferase TwUGT1, and in vitro enzyme activity assays confirmed its catalytic activity with triptophenolide, liquiritigenin, pinocembrin, 4-methylumbelliferone, phloretin, and rhapontigenin. Following annotation and KEGG pathway analysis, Cheng et al. (2020) were able to identify 64 unigenes related to triterpenoid skeleton biosynthesis and 122 UGTs from the transcriptome of Aralia elata. Obviously, the number of putative UGTs screened by gene annotation is quite large. The expression patterns of genes involved in the same biosynthetic pathway are often strongly correlated with each other in particular organs or under specific environmental conditions (Fukushima et al., 2011). Co-expression analysis and differential expression analysis are powerful tools to identify functional genes involved in plant secondary metabolism, greatly reducing the scope of candidate UGTs. For example, 434 putative UGTs from G. uralensis were screened by deep transcriptome sequencing and gene annotation. Through co-expression and differential expression analysis with the βAS (beta-amyrin synthase) gene, the key role of GuUGAT in the biosynthesis of glycyrrhizin was finally confirmed (Xu et al., 2016). Analogously, 17 candidate UGT genes most probably involved in the triterpenoid saponin biosynthesis pathway were discovered from the transcriptome sequencing of Platycodon grandiflorum (Ma et al., 2016). Transcriptome sequencing of Andrographis paniculate with methyljasmonate (MeJA) treatment was performed by multiple research teams. Several preferential ApUGTs were screened out, whose expression pattern were consistent with neoandrographolide accumulation. Among these candidates, ApUGT73AU1 was demonstrated to convert andrograpanin to the bioactive neoandrographolide by glucosylation (Sun et al., 2019), and ApUGT and ApUGT86C11 were verified as glucosyltransferases that could catalyze the C19 hydroxyl moiety of andrograpanin, 14-deoxyandrographolide, andrographolide, and 14-deoxy-11,12-didehydroandrographolide in vitro (Y. Li et al., 2018; Srivastava et al., 2021).
Gene-to-metabolite network analysis
Gene-to-metabolite network analysis integrates metabolomics and transcriptomics. The key UGT genes related to the differential metabolites can be found by constructing the regulatory network of the transcriptome and metabolome. This integrative network controls plant metabolism at both the catalytic and regulatory levels and reveals the correlation between gene expression and metabolite accumulation (Hirai et al., 2005; Goossens, 2015). At present, gene and metabolite correlation analysis has emerged as a strategy for screening new functional genes involved in plant secondary metabolism and has the potential to reveal candidate UGT genes for modifying terpenoids (Zhang et al., 2018). Zhao et al. (2020) used metabolite–gene correlation analysis to screen the sesquiterpenoid UGTs. The correlation between the 218 putative UGTs and the content of nerolidol glucoside in C. sinensis was analyzed to narrow down the number of assumed UGTs. Six UGTs showed a significant positive correlation with the formation of nerolidol glucoside. In Paris polyphylla, six candidate UGTs were identified in the polyphyllin correlation matrix, which might be involved in polyphyllin-specific glycosylations (Li et al., 2020a, Preprint).
Chemical proteomics analysis
Chemical proteomics is a method for identifying target proteins by the interaction between small molecules and specific enzymes, and generally relies on MS for analysis (Drewes and Knapp, 2018; Zhu et al., 2019). Glycosyltransferase activity happens to be a specific interaction between UGTs and small molecule substrates (UDP sugars). An important tool in chemical proteomics is based on well-designed activity-based probes, which contain three parts: a reactive ligand, a linker region, and a reporter group. Only active enzymes will react with the ligand and therefore be identified (Savino et al., 2012). A photoaffinity probe containing both terminal alkyne and an alkyl diazirine group was developed for the specific labeling and proteome-wide profiling of UGTs involved in steviol glycoside biosynthesis in S. rebuadiana. This platform could be a rapid tool in mining potential UGTs and, through this platform, SrUGT76G3 was identified as having the potential to glycosylate terpenes, xanthone, and flavone (W. Li et al., 2018). Zhou et al. (2018) established a steviol-derived photoaffinity alkyne-tagged probe to detect glycosyltransferase activity, and then discovered the corresponding UGT of steviol in Arabidopsis thaliana, which illustrates a straightforward strategy for rapidly screening potential target proteins.
Genome and gene clusters analysis
In recent years, the advances in next-generation sequencing technology have transformed whole-genome sequencing, particularly for plants. Genome mining accelerates the ability to identify the underlying enzymes for biosynthesis of terpenoid glycosides. Bioinformatics tools, such as BLAST and HMMER, can be used for homology analysis of the plant genome to seek out functional genes. Three terpenoid UGTs were characterized in the genome of the ‘Valencia’ sweet orange (Citrus sinensis L. Osbeck) (Fan et al., 2010). Jiang et al. (2021) constructed a high-quality chromosome-level genome and identified five UGT genes involved in the notoginsenoside biosynthetic pathway. As more genomes are being published, there are numerous examples from diverse plant species in which the genes encoding the biosynthetic pathways of certain natural products are frequently chromosomally clustered (Chu et al., 2011; Nützmann et al., 2016; Smit and Lichman, 2022). It has become apparent that the phenomenon of gene aggregation is not rare or exceptional, with >30 plant biosynthetic gene clusters (BGCs) having been identified (Polturak and Osbourni, 2021). BGCs reported to date range in size from ~35 kb to several hundred kilobases and consist of 3–12 genes. These clusters contain genes for a particular molecular scaffold and various types of tailoring enzymes such as cytochrome P450s (CYPs), glycosyltransferases, methyltransferases, and acyltransferases (Nützmann et al., 2018). Oats (Avena spp.) synthesize the antimicrobial triterpenoid saponins avenacins in the roots, and the biosynthesis of these compounds is encoded by a gene cluster. Previous studies showed that five of the characterized avenacin biosynthesis genes (βAS, CYP51H10, SCPL1, MT1, and AsUGT74H5) are contiguous on an ~300 kb chromosome contig (Mugford et al., 2013; Owatworakit et al., 2013). Studies of the sequenced A. strigosa genome further revealed that the other three characterized avenacin pathway genes (AsAAT1, CYP72A475, and AsUGT74H7) are clustered with these five genes (Owatworakit et al., 2013; Louveau et al., 2018; Leaveau et al., 2019). Two additional genes encoding cytochrome P450 enzymes (CYP94D65 and CYP72A476) are also located on this scaffold and are co-expressed with the other avenacin pathway genes (Y. Li et al., 2021). AsTG1 and AsUGT91G16 are located within the extended gene cluster, and together realize the glycosylation of the antifungal triterpenoid saponin avenacin A-1 (Orme et al., 2019). This 12 gene avenacin cluster contains genes encoding multiple different types of enzymes involved in the biosynthesis of avenacin and lies in a subtelomeric region of chromosome 1. Interestingly, the gene order of the cluster approximates the order of the biosynthetic steps in the pathway (Y. Li et al., 2021).
Microbial expression of heterologous genes
Escherichia coli and Saccharomyces cerevisiae have typically been employed for identifying the biochemical characteristics of functional genes related to terpenoid glycoside biosynthesis because of their simple genetic manipulation, clear genetic background, rapid growth, and mature large-scale fermentation processes. Escherichia coli is well suited to overproduction of enzymes (Cravens et al., 2019) and is usually adopted for in vitro enzymatic reaction experiments; that is, adding exogenous UDP sugar and substrate to the purified UGT recombinant protein and testing the catalytic function and conversion rate to identify the gene’s function. Itkin et al. functionally expressed ~100 UGT genes in E. coli and tested for the possible activity with mogroside substrates in vitro, which successfully elucidated the biosynthetic pathway of mogroside V (Itkin et al., 2016). UGTs tend to be more promiscuous in the selection of sugar donors and recipients, and the in vitro assays of recombinant UGT proteins allow convenient and rapid identification of different sugar donors and substrates. Using UDP-Glc as the sugar donor, the activity of purified UGT73F17 with a library of 63 structurally diverse substrates was tested, and its strict substrate specificity for the C30/29–COOH of pentacyclic triterpenoids was confirmed (He et al., 2018).
In vitro functional verification of candidate enzymes usually requires the feeding of appropriate substrates. However, some of the substrates are either extremely expensive or not commercially available, which limits the functional characterization of UGTs (Facchini et al., 2012). To solve this problem, S. cerevisiae, which can be engineered to produce the corresponding substrate and grow with simple carbon sources, can be used as a host organism for in vivo characterization of candidate terpenoid UGTs (Dai et al., 2015). Saccharomyces cerevisiae also has the advantage of being able to secrete the resulting metabolic products to reduce toxicity. Furthermore, large-scale fermentation of S. cerevisiae is simple and low cost, and has excellent economic potential (Chu et al., 2020). Squalene-2,3-oxide produced by S. cerevisiae through its native MVA pathway is the precursor of ginsenosides, so it is widely used in the characterization of ginsenoside-related UGTs and for the biosynthesis of valuable ginsenosides (Ren et al., 2017). Multiple UGTs involved in ginsenoside glycosylation have been successfully verified in yeast. Recombinant yeast has been used to identify UGTs associated with ginsenosides F1, Rh1, Rh2, and Rg3, and simultaneously to achieve heterologous biosynthesis of these saponins (Yan et al., 2014; Wang et al., 2015; Wei et al., 2015). D. Wang et al. (2020) created a series of yeast strains capable of producing expensive intermediates (PPD, PPT, ginsenoside Rg1, Rh2, Rd. and UDP-Xyl). With the help of this plug-and-play synthetic biology platform, five key enzymes involved in the glycosylation of saponins in P. notoginseng were successfully elucidated. X. Li et al. (2021) constructed and optimized a yeast cell factory in which the de novo production of ginsenoside Rg1 and notoginsenoside R1 reached 1.95 g l–1 and 1.62 g l–1, respectively.
In planta expression system of heterologous genes
Among higher plants, the protein synthesis machinery, post-translational modifications, cellular compartments, and protein targeting are all conserved. Furthermore, general substrates such as diverse UDP sugars are present in all plants, whereas most microbes have a much more limited set of potential substrates for plant enzymes. Unlike microbial expression systems, in planta expression systems offer several advantages, such as the capacity to produce correctly folded and active enzymes and express native plant genes without changing the coding sequence. Nicotiana benthamiana, as a model plant, can be a powerful tool when used in combination with transient expression to verify the function of candidate terpenoid biosynthesis-related genes (Sayama et al., 2012; Brückner and Tissier, 2013; Louveau and Osbourn, 2019). The transient overexpression of the candidate OfUGTs in tobacco indicates that OfUGT85A84 glycosylates linalool oxides in planta (R. Zheng et al., 2019). Furthermore, transient expression in N. benthamiana can also be scaled up by vacuum infiltration of whole plants (Reed et al., 2017). The function of key genes in biosynthetic pathways can also be clarified by the Agrobacterium-mediated stable expression systems. PgUGT71A29 was integrated into the P. ginseng genome and overexpressed. Consistent with PgUGT71A29 overexpression, ginsenoside Rb1 and Rg1 contents were all higher than those in wild-type lines (Lu et al., 2018). Overexpression of Pq3-O-UGT1 led to increased accumulation of Pq3-O-UGT1 mRNA and PPD-type ginsenosides in transgenic hairy roots, whereas RNAi against Pq3-O-UGT1 resulted in a decrease of Pq3-O-UGT1 transcription level and a lower content of the ginsenoside Rh2 (Lu et al., 2017a). However, plants grow more slowly than microbial hosts, and many plants lack convenient stable genetic engineering methods (George et al., 2018). Fortunately, genome editing techniques, such as clustered regularly interspaced short palindromic repeats (CRISPRs), enable stable knockouts of plant UGT genes. CRISPR knockout mutant lines of AtUGT76B1 show inhibition of the conversion of N-hydroxy-pipecolic acid (NHP) to NHP-O-Glc, which enhances disease resistance against biotrophic pathogens (Mohnike et al., 2021).
Evolution of medicinal terpenoid UGTs
Plants synthesize plentiful secondary metabolites to interact with their living environment and to adapt to high-intensity competition. There has been a general trend of an increasing number of UGTs during natural evolution. The number of putative UGTs of bryophytes is greater than that of algae, and the UGTs of ferns and seed plants are further expanded, suggesting that the adaptation of plants to life on land is related to the expansion of this multigene family (Caputi et al., 2012; Wilson and Tian, 2019). Considering the diversity of sequences and functions, the evolutionary relationship of UGTs has garnered much attention. Most of the current evolutionary studies of UGTs focus on certain species or certain functions (Liang et al., 2015). AtUGTs were divided into 14 phylogenetic groups (A–N) according to sequence homology. Here, several AtUGTs and the characterized functional medicinal terpenoid UGTs from different species were selected to jointly construct a phylogenetic tree (Fig. 4A). Five phylogenetic groups, A, D, E, G, and L, seem to have expanded more than the others during the evolution of higher plants, which is consistent with the larger groups in this evolutionary tree (Caputi et al., 2012).

Evolution analysis of characterized plant glycosyltransferases. (A) The phylogenetic tree was generated from UGTs of A. thaliana and other plants that produce medicinal terpenoids such as P. ginseng, M. trunctula, B. vulgaris, S. rebaudiana, and G. uralensis by ClustalW with the Neighbor–Joining method. Different families of UGTs are color-coded, as shown in the inset. (B) Rationally designed directional evolution of plant UGTs.
Group A enzymes include three UGT families: UGT91, UGT94, and UGT79. Notably, every group A enzyme that has been characterized so far extends a sugar chain on a plant specialized metabolite (including terpenoids, flavonoids, etc.), and members of all three group A UGT families mentioned here are involved in triterpene biosynthesis. Several genes in group D (UGT73s and UGT99s) also have similar activities. Soybean has been broadly studied as a model plant for triterpenoid saponin biosynthesis, and the glycosylation process of soybean saponins is a good example, indicating that the screening of UGTs involved in the biosynthesis of terpenoid glycosides might be focused on groups A and D. GmUGT73P2 transfers a galactosyl group from UDP-Gal to soyasapogenol B monoglucuronide to obtain soyasaponin III with a disaccharide, and then GmUGT91H4 further transfers rhamnosyl to generate soyasaponin I with a trisaccharide. Those UGTs are the first examples that transfer the second and third sugar to the biosynthetic intermediate of triterpenoid saponin (Shibuya et al., 2010). GmUGT73P10 uses the same acceptor as GmUGT73P2, but uses UDP-arabinopyranose as the sugar donor (Takagi et al., 2018).
Highly similar UGTs may have different functions. Gene duplication can retain paralogous genes with the same function, and paralogous copies might acquire better or novel functions over time (Caputi et al., 2012). A small number of key amino acid residues may play a decisive role in the choice of substrates, which may be related to the fact that UGTs evolved from an ancestral UGT and rapidly evolved to form sugar donor or substrate specificity (Sayama et al., 2012; Louveau and Osbourn, 2019). GmUGT73F4 and GmUGT73F2, which are multiple alleles of a single locus and encode closely related enzymes with 98.3% identity, exhibit obvious sugar donor specificity for UDP-Xyl or UDP-Glc, while Ser138 and Gly138 are pivotal residues. These residues are in the N5 loop and are also predicted to interact with sugar donors in the crystals of MtUGT71G1 and MtUGT85H2, supporting the importance of the N5 loop region in UDP sugar specificity (Osmani et al., 2009; Sayama et al., 2012). Conversely, UGTs that have no close evolutionary relationship might have evolved the same biochemical function in the process of convergent evolution. Figures 2B and 4A visually demonstrate that the SrUGT74G1 and UGT75 enzymes not closely related in sequence have the same activity for glycosylation of steviol (Richman et al., 2005; Sun et al., 2018). A similar evolutionary trend has been observed for other gene families involved in the biosynthesis of secondary metabolites, such as cytochrome P450-dependent monooxygenases, sesterterpene synthases, acyltransferases, and terpene synthases (Banks et al., 2011).
UGTs can usually function with multiple acceptor substrates and sugar donors. This promiscuous property is very suitable for the optimization of enzyme activity by rationally designed directional evolution (Fig. 4B). Rational design, irrational design (directed evolution), and semi-rational design are the three major strategies to enhance enzyme activity. Rational design requires an in-depth knowledge of the molecular basis of the protein’s properties, and the complexity of the structure–function relationship in enzymes limits its general application (Chica et al., 2005). Computational modeling of the canonical/variant GuUGT73P12 proteins clearly suggests that only Arg/Ser32 can directly access the sugar moiety of UDP sugars. Arg32 was transformed into Ser32 by site-directed mutagenesis, which displayed 7-and 73-fold increases in the consumption of UDP-Glc and UDP-Gal, respectively, indicating the plasticity of GuUGT73P12 in UDP sugar specificity (Nomura et al., 2019). The directed evolution is a powerful strategy to modify the function of UGTs without crystal structures. Wang et al. (2019) screened for more effective and compatible UGTs from mutants derived from the direct evolution of UGTPg45. In addition, they also enhanced the copy number of the UGTPg45 gene and engineered its promoter to improve its catalytic efficiency. Using these strategies, the yields of ginsenoside Rh2 reached 179.3 mg l–1 in shake flasks and 2.25 g l–1 in 10 liter fed-batch fermentation. Compared with the other two strategies, semi-rational design guided by structural information reduces the capacity of the mutation library and greatly increases the possibility of obtaining positive mutants. The semi-rationally designed ScUGT51 mutant showed a significant increasing catalytic efficiency of ginsenoside Rh2 synthesis, and its kcat/Km value was increased by ~1800-fold (Zhuang et al., 2017). Error-prone PCR, structure-based semi-rational design, and activity-based sequence conservative analysis approaches have been used to improve the activity of SgUGT74AC1 (Li et al., 2020b). Three characterized UGTs were further engineered and introduced into yeast; mogrosides IE, IIE, IIIA, and IVA were produced at a total of 79.30 mg l–1 (Li et al., 2022).
Perspectives
UGTs comprise a large group of enzymes with complex preferences for acceptor and sugar donors. In recent years, with the rapid development of synthetic biology, metabolomics, and high-throughput sequencing technology, the genome and transcriptome information of non-model plants has been continuously analyzed, and plant UGT research has made great progress. Summarizing the characteristics of known terpenoid UGTs is of great theoretical and practical significance to the basic research of medicinal terpenoids.
Despite the acquisition of novel efficient key enzymes, there are some problems in the application of plant UGTs, including narrow substrate selectivity, low catalytic activity in engineering microorganisms, and the fact that they do not display the full range of desirable properties for the biosynthesis of target products (McArthur and Chen, 2016). Due to the lack of sufficient structural information, the determinants of UGT substrate specificity are difficult to understand. Artificial intelligence (AI) prediction of protein structure, such as Alphafold2, created high-accuracy structures with accuracy comparable with the crystal structures obtained experimentally (Senior et al., 2020), broke the limits of unavailable 3D structures, and then the catalytic activity of wild-type UGTs had a better chance to be ameliorated by protein engineering. Although the industrial production of medicinal terpenoid glycosides is still in its infancy, with the continued analysis of medicinal terpenoid biosynthetic pathways and more extensive research into UGTs, the use of synthetic biology and metabolic engineering techniques to produce medicinal terpenoid glycosides has gradually become feasible. Recent achievements in heterologous production of terpenoid glycosides, especially the great breakthroughs in the biosynthesis of ginsenosides and natural sweeteners are shown in Supplementary Table S2.
Another interesting phenomenon is that the glycosylation of plant secondary metabolites is not exclusively executed by enzymes in GT family 1, and there are a few examples of other carbohydrate-active enzyme families. Unlike classical UGTs, AsTG1 is an unusual vacuolar transglucosidase in the glycosyl hydrolase family 1 (GH1) family of carbohydrate-active enzymes, and can use acyl sugars as sugar donors, which provides evidence for GH1 enzymes in triterpenoid glycosylation (Orme et al., 2019). Chung et al. (2020) reported that a cellulose synthase superfamily-derived glycosyltransferase (CSyGT) catalyzed the 3-O-glucuronosylation of triterpenoid aglycones. Almost simultaneously, another enzyme (SOAP5) of this superfamily catalyzing the transfer of GlcA to the C3 position of triterpenoids was identified in spinach (Jozwiak et al., 2020). These non-GT1 enzymes independently evolved to glycosylate terpenoids, which indicates a broader catalytic mechanism in the glycosylation of medicinal terpenoids. This review provides a reference for the discovery of new UGTs in the biosynthesis of medicinal terpenoids, and hopefully will help to advance the more effective production of medicinal terpenoid glycosides in the future.
Supplementary data
The following supplementary data is available at JXB online.
Table S1 Medicinal terpenoid UGTs in plants.
Table S2. Recent achievements in heterologous production of terpenoid glycosides.
Acknowledgements
Thanks are due to Professors Xiaojian Yin and Ping Li for their guidance and revision of the manuscript.
Author contributions
XL and LH: drafting and writing the manuscript; HVS and JDK: suggesting revisions of the manuscript.
Conflict of interest
We declare that all the authors have no conflicts of interest.
Funding
This work was supported by a grant from GSK project and National Natural Science Foundation of China [grant no. 81973414].
Data availability
No new data were generated in this paper.
References
Author notes
These authors contributed equally to this work.
Comments