-
PDF
- Split View
-
Views
-
Cite
Cite
Cláudia P Godinho, Margarida Palma, Jorge Oliveira, Marta N Mota, Miguel Antunes, Miguel C Teixeira, Pedro T Monteiro, Isabel Sá-Correia, The N.C.Yeastract and CommunityYeastract databases to study gene and genomic transcription regulation in non-conventional yeasts, FEMS Yeast Research, Volume 21, Issue 6, September 2021, foab045, https://doi.org/10.1093/femsyr/foab045
- Share Icon Share
ABSTRACT
Responding to the recent interest of the yeast research community in non-Saccharomyces cerevisiae species of biotechnological relevance, the N.C.Yeastract (http://yeastract-plus.org/ncyeastract/) was associated to YEASTRACT + (http://yeastract-plus.org/). The YEASTRACT + portal is a curated repository of known regulatory associations between transcription factors (TFs) and target genes in yeasts. N.C.Yeastract gathers all published regulatory associations and TF-binding sites for Komagataellaphaffii (formerly Pichia pastoris), the oleaginous yeast Yarrowia lipolytica, the lactose fermenting species Kluyveromyces lactis and Kluyveromyces marxianus, and the remarkably weak acid-tolerant food spoilage yeast Zygosaccharomyces bailii. The objective of this review paper is to advertise the update of the existing information since the release of N.C.Yeastract in 2019, and to raise awareness in the community about its potential to help the day-to-day work on these species, exploring all the information available in the global YEASTRACT + portal. Using simple and widely used examples, a guided exploitation is offered for several tools: (i) inference of orthologous genes; (ii) search for putative TF binding sites and (iii) inter-species comparison of transcription regulatory networks and prediction of TF-regulated networks based on documented regulatory associations available in YEASTRACT + for well-studied species. The usage potentialities of the new CommunityYeastract platform by the yeast community are also discussed.
INTRODUCTION
The recent interest of the Yeast research community in non-Saccharomyces cerevisiae species is gaining momentum, although S. cerevisiae remains the major eukaryotic model and the most used yeast cell factory. The large and heterogeneous group of non-conventional yeasts includes species/strains with desirable properties for utilization in the Biotechnology Industry, including the synthesis of a wide range of added-value products due to unusual metabolic features, the capacity of using a wide range of carbon sources and the high tolerance to bioprocess-related stresses (Radecka et al. 2015; Martins et al. 2020). For all these reasons, non-conventional yeasts have recently been on the focus of active research to better explore their biotechnological potential. Genome sequences are being released and exploited, suitable genetic engineering tools are either available or under development (Abdel-Banat et al. 2010; Gao et al. 2016; Weninger et al. 2016; Cao et al. 2017; Löbs et al. 2017; Schwartz et al. 2017), and genetic and metabolic engineering strategies are being applied to improve the performance of several non-conventional yeasts as cell factories. Most of the efforts envisage the efficient production of relevant bioproducts, the complete (co)-utilization of different carbon sources present in the hydrolysates of various residual biomasses and the enhancement of yeast tolerance to the multiple stresses associated to specific bioprocesses. However, studies on the molecular biology of non-conventional yeast species and their metabolic engineering face several challenges, such as the limited availability of stable expression plasmids and lack of highly efficient approaches for foreign DNA integration into the host's genome. Nevertheless, the perspectives are encouraging, supported by the increasing availability of genome sequences and the development and availability of dedicated genome editing and bioinformatics tools. Among these tools is N.C.Yeastract (http://yeastract-plus.org/ncyeastract/) that gathers all published regulatory associations and transcription factor (TF) binding sites for five non-conventional yeast species relevant in biotechnology (Monteiro et al. 2020): (i) Komagataellaphaffii (formerly Pichia pastoris)—a species recently gaining attention as a model organism in fundamental research—extensively applied for the production of heterologous proteins in pharmaceutical and biotechnological industries and capable to efficiently assimilate methanol as a sole carbon and energy source (Zahrl et al. 2017; Bernauer et al. 2021); (ii) Yarrowia lipolytica, an oleaginous yeast (Zieniuk and Fabiszewska 2019; Mamaev and Zvyagilskaya 2021); (iii) Kluyveromyces lactis, a lactose fermenting species (Spohner et al. 2016); (iv) Kluyveromyces marxianus, a thermotolerant yeast species with a high growth rate and ability to use sugars such as lactose and inulin (Lane and Morrissey 2010) and (v) Zygosaccharomyces bailii, a remarkably weak acid-tolerant food spoilage yeast (Palma, Guerreiro and Sá-Correia 2018).
The N.C.Yeastract database was recently associated to the YEASTRACT + portal (http://yeastract-plus.org/), a curated repository of known regulatory associations between TFs and target genes in yeasts (Monteiro et al. 2020). YEASTRACT + is based on more than 2500 bibliographic references and dedicated to the analysis and prediction of transcription regulatory associations in yeasts (Monteiro et al. 2020). Currently, YEASTRACT + comprises three distinct yet interconnected databases: (i) the well-known and worldwide used YEASTRACT (http://yeastract-plus.org/yeastract/scerevisiae/) focused on the model yeast S. cerevisiae (Teixeira et al. 2006, 2018); (ii) PathoYeastract (http://yeastract-plus.org/pathoyeastract/) dedicated to Candida pathogenic species (Monteiro et al. 2017) and (iii) N.C.Yeastract dedicated to non-conventional yeasts of biotechnological relevance (Monteiro et al. 2020). CommunityYeastract (http://yeastract-plus.org/community/), also connected to the other databases in the YEASTRACT + portal, is a repository of automatically generated YEASTRACT-like databases for yeast species or strains that can be constructed according to the request of community members. Currently, it includes information on the oleaginous yeast species Rhodotorula toruloides (Oliveira et al. 2021) and the probiotic yeast S. cerevisiae var. boulardii (Pais et al. 2021).
In this minireview, a guide to the exploitation of the N.C.Yeastract database is offered, aiming for molecular biology studies and cross-species comparative genomics of transcription regulation in five non-conventional yeast species of relevance in Biotechnology. Moreover, the regulatory information included in the database by the end of 2019 (Monteiro et al. 2020) is updated. The usage potentialities of CommunityYeastract by the yeast community are also discussed.
UPDATE OF THE INFORMATION IN N.C.YEASTRACT
Since the first release of N.C.Yeastract in the January 2020 Nucleic Acids Research Database Issue (Monteiro et al. 2020), 7688 additional regulatory associations were included in the database. Currently, N.C.Yeastract contains 17668 regulatory associations between TFs and target genes. This remarkable increase in the number of regulatory associations of five non-conventional yeast species relevant in biotechnology available in N.C.Yeastract indicates that they are being studied also at this level. For Ko. phaffii, 5771 regulatory associations were added to the database for the TFs already included; they are mostly based on indirect evidence, particularly coming from transcriptomic analyses (Fig. 1). Regulatory associations between 10 additional TFs and their target genes were included for Ko. phaffii. These TFs comprise, among others, Mig1 (PAS_chr1-4_0526 and PAS_chr4_0334) and Nrg1 (PAS_chr3_1242) involved in the repression of the AOX1 gene transcription in the presence of glucose or glycerol (Wang et al. 2016a, 2017), Cat8 (PAS_chr2-1_0757) involved in the regulation of ethanol catabolism (Barbay et al. 2021) and the regulators Prm1 (PAS_chr4_0203), Mxr1 (PAS_chr4_0487) and Mit1 (PAS_chr3_0836) involved in methanol-inducible transcription (Wang et al. 2016b; Vogl et al. 2018). A total of six additional Kl. lactis TFs were also included in this update, as it is the case of Msn2 (KLLA0_F26961g), reported being involved in stress-induced mating and mating-type switching (Barsoum, Rajaei and Åström 2011), Sum1 (KLLA0_C14696g), reported to repress cryptic mating-type loci, cell-type–specific genes and sporulation genes (Humphrey et al. 2020), and Upc2 (KLLA0_A04169g), which is necessary for maintaining sterol homeostasis and transcriptional regulation of several other genes in response to azoles (Humphrey et al. 2020). Novel regulatory associations from Y. lipolytica (+583 for five TFs), Kl. marxianus (+1144 for one TF) and Z. bailii (+16 for one TF) were also included in the database (Fig. 1). Most of the regulatory associations are based on indirect evidence, in general resulting from transcriptomic analyses (microarrays or RNA-seq). In these studies, the genome-wide transcript levels in strains in which the TFs are either deleted or overexpressed, under specific environmental conditions, are compared to the corresponding wild-type parental strain. The number of consensus sequences recognized as TF binding sites in the non-conventional yeasts included in N.C.Yeastract is currently still very low (between 0 and 3 TF binding sites per species), reflecting the still very limited available data (Fig. 1).

Current update (June 2021). Number of regulatory associations included in the N.C.Yeastract during this work, based on direct or indirect evidence, number of environmental conditions under which the reported regulatory associations were retrieved, number of TFs containing regulatory information and number of TFs with consensuses included in N.C.Yeastract. Light blue corresponds to the data included in the first release (Monteiro et al. 2020), and dark blue indicates the additions made during the current update.
INFERENCE OF ORTHOLOGS
The bioinformatics tool ‘Search for Orthologs’ is available in N.C.Yeastract to facilitate the identification of orthologous genes, i.e. of genes in different species that evolved from a common ancestral gene, in the yeast species currently included in the database. Ortholog identification is essential for reliable functional annotation and for inferring regulatory networks for under-explored non-conventional yeasts, based on the knowledge gathered for more thoroughly studied species. Ortholog inference is here exemplified using the list of methanol tolerance genes identified using a chemogenomic analysis in the model yeast S. cerevisiae (Mota, Martins and Sá-Correia 2021). The genome-wide identification of these tolerance genes/gene orthologs is a starting point for the development of more robust yeasts, in particular robust methylotrophic yeast strains for biorefinery processes since methanol is a promising feedstock for the production of biofuels, specialty chemicals, polymers and other added-value bioproducts due to its abundance and relatively low cost (Zhang et al. 2018; Fabarius et al. 2020; Frazão and Walther 2020; Zhu et al. 2020). Methanol is also the major impurity of crude glycerol, a biodiesel industry residue that can be considered a relevant substrate for biorefineries (Posada, Rincón and Cardona 2012) and is present in hydrolysates from pectin-rich agro-industrial residues, generated in high amounts worldwide as a result of the industrial processing of fruits and vegetables (Yapo et al. 2007; Müller-Maatsch et al. 2016; Martins et al. 2020). Although being a promising carbon source for metabolically competent yeast strains, methanol toxicity can limit the productivity of methanol-based biomanufacturing (Fabarius et al. 2020). Therefore, the improvement of methanol tolerance is essential for the feasibility of methanol-based bioprocesses. The methylotrophic yeast Ko. phaffii was chosen to exemplify the inference of a large dataset of S. cerevisiae orthologs since it is capable of catabolizing methanol for the production of added value bioproducts (Luo et al. 2018; Peña et al. 2018; Guo et al. 2021), and using crude glycerol as a substrate (Eda Çelik et al. 2008).
The inference of orthology in the YEASTRACT + portal uses a BLAST Reciprocal Best Score approach: the proteome of each species is used as the input for a BLASTp, (E-value threshold of 1e-5) performed in a reciprocal way between all the species of the database. The BLAST hit with the highest score for each protein sequence is considered, but a tolerance of 10% relative to this highest score is applied to assure that alignments with a score almost identical to the best are not lost. Only the gene pairs that are also the best score in the reverse BLAST are considered. This information can be complemented with synteny information by the automatic comparison of the genes adjacent to each homolog locus, where 15 neighbors are considered in each direction. Besides BLAST best-score only, three levels of synteny are considered in this tool, corresponding to at least one, two or three neighbor genes in common to each pair of homologous genes.
From the user point of view, the first step is to select the tool ‘Search for Orthologs’, which is available on the panel ‘Utilities’ on the left side of the YEASTRACT page (Fig. 2, step 1). The following steps include the insertion of the gene list that will be used for the search of orthologs (Fig. 2, step 2) and selection of the synteny level, which can range from only considering homology (lowest stringency) to the conservation of at least three neighbors (highest stringency; Fig. 2, step 3). Almost instantaneously after clicking the Run button, a table with the correspondences of ORF/gene name of S. cerevisiae and Ko. phaffii is provided (Fig. 2, step 5). Using as input the list of 402 genes in the methanol tolerance dataset (Mota, Martins and Sá-Correia 2021), 304 Ko. phaffii genes show homology to the input dataset, 202 of these genes with at least one gene neighbor, 144 with at least two neighbors and 96 with at least three neighbors.

Steps for using ‘Search for Orthologs’, applied to the orthologs in Ko. phaffii of S. cerevisiae methanol tolerance genes. Step 1: on the panel ‘Utilities > Search for Orthologs’. Step 2: Fill in the box with your input dataset, Step 3: choose the level of synteny (only homology or at least one, two or three neighbors). Step 4: Run the tool. Step 5: a table with the correspondences between S. cerevisiae genes and the genes from yeast species included in the database, namely Ko. phaffii, is provided and the results are downloadable as a .csv file. The results shown in the figure are representative of the example explored in the main text.
From the input dataset, a list of 12 S. cerevisiae selected genes belonging to the functional classes autophagy, DNA repair, reserve polysaccharides, cell wall and membrane biosynthesis and transcription regulation, were found to contribute to methanol tolerance in S. cerevisiae (Mota, Martins and Sá-Correia 2021). The corresponding orthologs in Ko. phaffii (Table 1), identified through the procedure described above, are good candidates to improve methanol tolerance when overexpressed in Ko. phaffii.
List of Ko. phaffii orthologs of methanol tolerance genes in S. cerevisiae, clustered in functional classes and their synteny level. Results were obtained using YEASTRACT + and the S. cerevisiae selected gene dataset is from (Mota, Martins and Sá-Correia 2021).
Biological function in S. cerevisiae . | ORF/Gene name in S. cerevisiae . | ORF/Gene name in Ko. phaffii, only homology . | ORF/Gene name in Ko. phaffii, at least one neighbor . | ORF/Gene name in Ko. phaffii, at least two neighbors . | ORF/Gene name in Ko. phaffii, at least three neighbors . |
---|---|---|---|---|---|
Autophagy | ATG11 | PAS_chr1-4_0555 | |||
RAS2 | PAS_chr4_0680 | ||||
VPS36 | PAS_chr3_0312 | ||||
DNA repair | CDC55 | PAS_chr2-1_0831 | |||
RAD27 | PAS_chr1-4_0633 | ||||
RAD51 | PAS_chr3_0904 | ||||
Reserve polysaccharides, cell wall and membrane biosynthesis | ELO2 | PAS_chr3_0602 | |||
FKS1 | PAS_chr2-1_0661 | ||||
TPS2 | PAS_chr1-3_0054 | ||||
Transcription regulators | OPI1 | PAS_chr1-1_0033 | |||
SFP1 | PAS_chr4_0169 | ||||
UME6 | PAS_chr4_0480 | ||||
PAS_chr4_0252 | |||||
PAS_chr1-4_0290 |
Biological function in S. cerevisiae . | ORF/Gene name in S. cerevisiae . | ORF/Gene name in Ko. phaffii, only homology . | ORF/Gene name in Ko. phaffii, at least one neighbor . | ORF/Gene name in Ko. phaffii, at least two neighbors . | ORF/Gene name in Ko. phaffii, at least three neighbors . |
---|---|---|---|---|---|
Autophagy | ATG11 | PAS_chr1-4_0555 | |||
RAS2 | PAS_chr4_0680 | ||||
VPS36 | PAS_chr3_0312 | ||||
DNA repair | CDC55 | PAS_chr2-1_0831 | |||
RAD27 | PAS_chr1-4_0633 | ||||
RAD51 | PAS_chr3_0904 | ||||
Reserve polysaccharides, cell wall and membrane biosynthesis | ELO2 | PAS_chr3_0602 | |||
FKS1 | PAS_chr2-1_0661 | ||||
TPS2 | PAS_chr1-3_0054 | ||||
Transcription regulators | OPI1 | PAS_chr1-1_0033 | |||
SFP1 | PAS_chr4_0169 | ||||
UME6 | PAS_chr4_0480 | ||||
PAS_chr4_0252 | |||||
PAS_chr1-4_0290 |
List of Ko. phaffii orthologs of methanol tolerance genes in S. cerevisiae, clustered in functional classes and their synteny level. Results were obtained using YEASTRACT + and the S. cerevisiae selected gene dataset is from (Mota, Martins and Sá-Correia 2021).
Biological function in S. cerevisiae . | ORF/Gene name in S. cerevisiae . | ORF/Gene name in Ko. phaffii, only homology . | ORF/Gene name in Ko. phaffii, at least one neighbor . | ORF/Gene name in Ko. phaffii, at least two neighbors . | ORF/Gene name in Ko. phaffii, at least three neighbors . |
---|---|---|---|---|---|
Autophagy | ATG11 | PAS_chr1-4_0555 | |||
RAS2 | PAS_chr4_0680 | ||||
VPS36 | PAS_chr3_0312 | ||||
DNA repair | CDC55 | PAS_chr2-1_0831 | |||
RAD27 | PAS_chr1-4_0633 | ||||
RAD51 | PAS_chr3_0904 | ||||
Reserve polysaccharides, cell wall and membrane biosynthesis | ELO2 | PAS_chr3_0602 | |||
FKS1 | PAS_chr2-1_0661 | ||||
TPS2 | PAS_chr1-3_0054 | ||||
Transcription regulators | OPI1 | PAS_chr1-1_0033 | |||
SFP1 | PAS_chr4_0169 | ||||
UME6 | PAS_chr4_0480 | ||||
PAS_chr4_0252 | |||||
PAS_chr1-4_0290 |
Biological function in S. cerevisiae . | ORF/Gene name in S. cerevisiae . | ORF/Gene name in Ko. phaffii, only homology . | ORF/Gene name in Ko. phaffii, at least one neighbor . | ORF/Gene name in Ko. phaffii, at least two neighbors . | ORF/Gene name in Ko. phaffii, at least three neighbors . |
---|---|---|---|---|---|
Autophagy | ATG11 | PAS_chr1-4_0555 | |||
RAS2 | PAS_chr4_0680 | ||||
VPS36 | PAS_chr3_0312 | ||||
DNA repair | CDC55 | PAS_chr2-1_0831 | |||
RAD27 | PAS_chr1-4_0633 | ||||
RAD51 | PAS_chr3_0904 | ||||
Reserve polysaccharides, cell wall and membrane biosynthesis | ELO2 | PAS_chr3_0602 | |||
FKS1 | PAS_chr2-1_0661 | ||||
TPS2 | PAS_chr1-3_0054 | ||||
Transcription regulators | OPI1 | PAS_chr1-1_0033 | |||
SFP1 | PAS_chr4_0169 | ||||
UME6 | PAS_chr4_0480 | ||||
PAS_chr4_0252 | |||||
PAS_chr1-4_0290 |
SEARCH FOR PUTATIVE TF BINDING SITES
N.C.Yeastract allows the search for potential regulators of a given gene based on the occurrence of TF binding sites in its promoter region—set in the database as the 1000 bp upstream the initiation codon. Since very few TF binding sites have been, so far, identified in the species considered in N.C.Yeastract, it is possible to select the TF binding site from another species deposited in YEASTRACT + and search for these regulatory elements in the promoters of specific genes. To illustrate this feature, the Haa1- TPO3 TF-target gene pair was considered. In S. cerevisiae, TPO3 encodes a plasma membrane transporter of the Major Facilitator Superfamily (MFS) proposed to mediate the efflux of acetate when cells are under acetic acid stress (Fernandes et al. 2005). This gene was identified among those regulated by the Haa1 TF, the main player in reprogramming yeast genomic expression in response to acetic acid stress (Mira, Becker and Sá-Correia 2010). Moreover, it has been demonstrated that Haa1 binds, in vivo, to TPO3 promoter in S. cerevisiae (Mira et al. 2011). The occurrence of Haa1 binding sites in the promoter regions of TPO3 orthologs from the species present in N.C.Yeastract was examined in this example (Fig. 3). In N.C.Yeastract, the user can choose one of the species from the database, for example, Z. bailii, and further select the tool ‘Cross species > Promoter analysis’ on the left panel (Fig. 3, step 1). Thereafter, the species from which the TF consensus sequences is to be used in the analysis should be selected (Fig. 3, step 2). The user can also choose yeast species other than Z. bailii for which the target genes will be analysed (Fig. 3, step 3). At last, the user should input the target genes of interest (using the gene nomenclature corresponding to Z. bailii genes, in the provided example) and select the synteny level, as described in the previous example (Fig. 3, steps 4 and 5). By clicking ‘Search’ (Fig. 3, step 6) the user is directed to a new page where the TF binding sites, either unique or common in the gene ortholog promoters in selected species, are depicted in a table. By clicking on ‘Promoter’ on the right column of this table (Fig. 3, step 7), a figure illustrating the distribution of the TF binding sites in the promoter region of the gene orthologs under analysis is depicted. The user has the option to select all TFs or specific TFs (Fig. 3, step 8). In this example, Haa1 binding sites are identified in the promoter of TPO3 orthologs in all species analyzed except for Y. lipolytica. In addition, a table including the exact TF binding sequences considered and the exact position and strand within the promoter region is also shown. It must be mentioned that there is variability between species also at the level of the TF binding sites. For this reason, the results must be carefully analyzed and validated experimentally. Nevertheless, this tool is already being explored by the community, as, for example, for the comparative analysis of TF binding sites in S. cerevisiae, Kl. lactis and Ko. phaffii promoters (Barbay et al. 2021).

Steps for using the ‘Promoter analysis’ tool, applied to TPO3 orthologs in all yeast species included in the N.C.Yeastract database. Step 1: select ‘Cross species > Promoter analysis’; Step 2: select TF binding site from a specific species. In this example, the S. cerevisiae Haa1 TF binding site was used as query; Step 3: select the species to include in the analysis. In this example, all species from N.C.Yeastract were considered; Step 4: type on the regulated gene(s). Here, the user can add one or several genes; Step 5: select the ‘strength’ of synteny (only homology or at least one, two or three neighbors); Step 6: click on search; a table with the unique TF binding sites in each ortholog and with the TF binding sites common to all is depicted; Step 7: click on ‘Promoter’ to visualize all TF binding sites on the promoters of selected orthologs; Step 8: select the TF binding sites to be visualized. The results shown in the figure are representative of the example explored in the text.
PREDICTING A TF-REGULATED NETWORK
Using N.C.Yeastract, the set of genes regulated by a given TF in a non-conventional yeast species can be predicted based on the documented regulatory associations available for other well-studied species. The regulon of Haa1, the major transcription regulator of acetic acid response in S. cerevisiae (Mira, Becker and Sá-Correia 2010; Mira, Teixeira and Sá-Correia 2010), is the case study to be examined here. The objective is to exemplify the use of this information system to predict the corresponding regulons in non-conventional yeast species included in N.C.Yeastract. The choice of Haa1 was also considered because the Haa1-regulon under acetic acid stress was reported not only for S. cerevisiae (Fernandes et al. 2005; Mira, Becker and Sá-Correia 2010; Mira, Teixeira and Sá-Correia 2010; Swinnen et al. 2017), but also for the food spoilage yeast Z. bailii, highly tolerant to acetic acid (Palma et al. 2017; Antunes, Palma and Sá-Correia 2018), and for the human opportunistic pathogen Candida glabrata (Bernardo et al. 2017). The regulatory associations documented for these three yeast species concerning the Haa1-regulon and the Haa1-regulon active in cells under acetic acid stress will be used to predict the corresponding regulons in Ko. phaffii, Kl. lactis, Kl. marxianus and Y. lipolytica.
To reach this goal, first, the Haa1 orthologs in those four species were identified based on BLAST Best-scores using the ‘Utilities > Search for Orthologs’ tool in YEASTRACT by submitting Haa1 as query, as explained in section 3. The genes/ORFs PAS_chr3_0232, KLLA0_A03047g, KLMA_30616 and YALI0_B08206g were identified as the Haa1 orthologs in Ko. phaffii, Kl. lactis, Kl. marxianus and Y. lipolytica, respectively. Then, for each species, in the corresponding N.C.Yeastract page, the ‘Regulatory Associations > Search for Genes’ tool can be used to predict the corresponding regulons, by submitting the Haa1 ortholog gene in a specific species and selecting the yeast species for which the prediction is to be based on (using the ‘Search for Homologous Regulations in’ box). The results are provided as a built-in table and can also be downloaded as a .csv file. The total number of target genes predicted for each Haa1 ortholog in the selected non-conventional yeast species, based on the total documented regulatory associations for S. cerevisiae, C. glabrata or Z. bailii, is shown in Table 2.
Total number of genes predicted as targets for Haa1 in four non-conventional yeast species, based on the documented regulatory associations from S. cerevisiae, C. glabrata and Z. bailii. The results consider all the documented regulatory associations from all environmental conditions.
Species . | #RGs using S. cerevisiae regulatory associations . | #RGs using C. glabrata regulatory associations . | #RGs using Z. bailii regulatory associations . |
---|---|---|---|
Komagataella phaffii | 436 | 255 | 30 |
Kluyveromyces lactis | 498 | 284 | 34 |
Kluyveromyces marxianus | 499 | 292 | 35 |
Yarrowia lipolytica | 427 | 255 | 39 |
Species . | #RGs using S. cerevisiae regulatory associations . | #RGs using C. glabrata regulatory associations . | #RGs using Z. bailii regulatory associations . |
---|---|---|---|
Komagataella phaffii | 436 | 255 | 30 |
Kluyveromyces lactis | 498 | 284 | 34 |
Kluyveromyces marxianus | 499 | 292 | 35 |
Yarrowia lipolytica | 427 | 255 | 39 |
Total number of genes predicted as targets for Haa1 in four non-conventional yeast species, based on the documented regulatory associations from S. cerevisiae, C. glabrata and Z. bailii. The results consider all the documented regulatory associations from all environmental conditions.
Species . | #RGs using S. cerevisiae regulatory associations . | #RGs using C. glabrata regulatory associations . | #RGs using Z. bailii regulatory associations . |
---|---|---|---|
Komagataella phaffii | 436 | 255 | 30 |
Kluyveromyces lactis | 498 | 284 | 34 |
Kluyveromyces marxianus | 499 | 292 | 35 |
Yarrowia lipolytica | 427 | 255 | 39 |
Species . | #RGs using S. cerevisiae regulatory associations . | #RGs using C. glabrata regulatory associations . | #RGs using Z. bailii regulatory associations . |
---|---|---|---|
Komagataella phaffii | 436 | 255 | 30 |
Kluyveromyces lactis | 498 | 284 | 34 |
Kluyveromyces marxianus | 499 | 292 | 35 |
Yarrowia lipolytica | 427 | 255 | 39 |
For each predicted set of genes, an interactive network can be plotted directly from the ‘Menu’ button on the results page. Although possible for the Haa1 regulon, for other regulons from TFs with very high number of documented regulatory associations, the time required for the loading of the interactive network exceeds the time allowed by the server. Therefore, for these cases, the results list (built-in or .csv file) is the only result available.
Another valuable tool is the possibility of prediction of a regulon under a given environmental condition of interest. In the current release, the user can filter the predicted regulatory associations by type of evidence (DNA binding and/or Expression), and environmental condition, directly in the ‘Regulatory Associations > Search for Genes’ tool of the non-conventional yeast page under study. Using this tool, we obtained the network of the Ko. phaffii Haa1-regulon predicted based on S. cerevisiae, C. glabrata and Z. bailii regulatory data, and filtered for weak acid stress (Fig. 4A). The dataset of Haa1 target genes filtered for weak acid stress condition (Fig. 4A) comprises 33, 255 and 13 Ko. phaffii genes corresponding to 33, 240 and 12 unique genes in S. cerevisiae, C. glabrata and Z. bailii, respectively. It is noteworthy that for Ko. phaffii genes identified based on C. glabrata regulatory associations, the total number of genes predicted as Haa1 targets is the same as when the results are filtered for regulations under weak acid stress. This is due to the fact that so far all regulatory associations documented for C. glabrata involving CgHaa1 originate from a single study, reporting CgHaa1-dependent regulatory associations registered only in the presence of acetic acid stress (Bernardo et al. 2017).

The predicted Ko. phaffii Haa1 regulon, under weak acid stress. (A) Predicted networks for the PAS_chr3_0232 gene, which encodes the closest homolog of ScHaa1 in Ko. phaffii, based on the regulatory associations documented for S. cerevisiae, C. glabrata and Z. bailii, and filtered to contain only regulations documented in weak acid stress; (B) Venn diagram constructed using the lists of predicted Haa1-regulated genes in Ko. phaffii shown in (A). The diagram was built using InteractiVenn (Heberle et al. 2015).
By inferring a Venn diagram (Fig. 4B), it is observable that two Ko. phaffii genes are common to the predictions based on the three yeast species: PAS_chr3_1091 and PAS_chr1-4_0090. The PAS_chr3_1091 gene encodes the Ko. phaffii ortholog of the S. cerevisiae Hrk1, an Npr1/Hal5 kinase found to mediate, directly or indirectly, the phosphorylation of about 40% of the membrane-associated acetic acid-responsive proteins in S. cerevisiae (Mira, Becker and Sá-Correia 2010; Guerreiro et al. 2017). The PAS_chr1-4_0090 gene encodes the ortholog of the S. cerevisiae plasma membrane transporters Tpo2 and Tpo3, the importance of which was already discussed in Section 4.
COMMUNITY YEASTRACT
CommunityYeastract is the 4th and most recent database included in the YEASTRACT + portal (Oliveira et al. 2021). It was created to be used by the Yeast community to enable the exploitation of all the information and bioinformatics tools available in the remaining three sister databases with a focus on yeast species or strains of the specific interest of any member, whenever the information is not available in YEASTRACT +. Currently, CommunityYeastract includes information and was tested for the oleaginous yeast Rhodotorula toruloides (Oliveira et al. 2021) and the probiotic yeast S. cerevisiae var. boulardii (Pais et al. 2021). A case study for the use of the database for the basidiomycete yeast R. toruloides was described for the TF Haa1: a key regulator of yeast resistance to acetic acid (Mira, Becker and Sá-Correia 2010; Mira, Teixeira and Sá-Correia 2010), and an important inhibitor of industrial fermentation of lignocellulosic and pectin-rich residue hydrolysates-based biorefineries (Palma, Guerreiro and Sá-Correia 2018; Martins et al. 2020), in particular involving lipid and carotenoid production by R. toruloides (Martins et al. 2021). Cross-species comparative genomics of transcription regulation by exploring the information and bioinformatics tools available in YEASTRACT + for three ascomycete yeast species led to the prediction of a RtHaa1 regulon for the basidiomycete R. toruloides under weak acid stress (Oliveira et al. 2021). This knowledge might be useful to guide studies envisaging the optimization of R. toruloides robustness for lignocellulosic biorefinery processes. The reader is invited to follow the detailed description of the referred analysis in (Oliveira et al. 2021).
In practical terms, a community member can benefit from CommunityYeastract in two distinct ways. The first is to contact the YEASTRACT + team to add a genome of interest available in GenBank. The team will load the genome into the database, perform the necessary BLASTs and make the new species available through the CommunityYeastract website. The main advantage of this first option is to have automatic access to all the species of the YEASTRACT + portal available for cross-species comparison, while, if requested, publicly providing the community with additional resources on new yeast strains/species. Alternatively, a more computationally experienced/advanced community member, can install its own copy of CommunityYeastract and load its genome of interest, perform the necessary BLASTS and use CommunityYeastract on a local machine/server. This second option provides an additional level of privacy, at the cost of all the local configuration specified in (Oliveira et al. 2021) and only containing S. cerevisiae species for cross-species comparison. Independently of the option chosen by the user, YEASTRACT + does not store any information related to user queries therefore ensuring anonymity and the necessary degree of independence.
OUTLOOK
For the past 15 years, the YEASTRACT database has been a useful bioinformatics tool for researchers working in molecular biology, functional genomics and systems biology of S. cerevisiae. YEASTRACT usefulness results from continuous updates and improvements, some suggested and requested by the yeast community of which the YEASTRACT team is part. An equivalent database for pathogenic yeasts has meanwhile been developed for, PathoYeastract. With the recent rapid increase of research and development focused on non-Saccharomyces yeasts of biotechnological interest that have been attracting many from the S. cerevisiae community among whom we include ourselves, a database dedicated to those non-conventional yeasts was lacking. By the end of 2019, N.C.Yeastract was released, being updated regularly ever since. This review article aimed to raise awareness in the community about its potential to help the day-to-day work of researchers working in fundamental and applied Biology of non-conventional yeasts. Clear examples were used to offer a guide to the exploitation of all the information available in YEASTRACT + portal.
The N.C.Yeastract/YEASTRACT + team is committed to continue to offer updated, reliable and complete information on transcriptional regulation in yeasts and appropriate bioinformatics tools to exploit such information. The possibility to run a systematic inter-species comparison of transcription regulatory networks in different yeasts will continue to be improved. However, the future development of N.C.Yeastract is also dependent on the new information to be obtained by the international community as well as on its contribution to the expansion of CommunityYeastract to other species/strains depending on specific interests.
ACKNOWLEDGMENTS
We acknowledge all those who have, over the years, contributed to YEASTRACT, PathoYeastract, N.C.Yeastract and the Community Yeastract projects. We are also grateful to colleagues and friends from the yeast community for their encouragement and suggestions to improve the YEASTRACT + information system. The information about yeast genes other than documented regulations, potential regulations and the transcription factor binding sites contained in YEASTRACT + was gathered from SGD, CGD, CGOB, and the GO Consortium.
FUNDING
The N.C.Yeastract and the CommunityYeastract databases are part of the YEASTRACT + portal, a Node Service of the Portuguese distributed infrastructure for biological data BioData.pt, funded by Programa Operacional Regional de Lisboa 2020 (LISBOA-01–0145-FEDER-022231), and of the European distributed infrastructure for biological data ELIXIR. This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) under project UIDB/04565/2020 and UIDP/04565/2020 (iBB multi-annual funding), project LA/P/0140/2020 of Associate Laboratory Institute for Health and Bioeconomy - i4HB, Project UIDB/50021/2020 (INESC-ID multi-annual funding), the research contrat to MP (IST-ID/092/2018), and the PhD fellowships to MA (DP_BIOTECnico—PhD programme - PD/BD/142944/2018) and to MNM (DP_AEM—PhD programme—PD/BD/146167/2019). This review paper was performed in the framework of the EU COST Action CA18229 ‘Non-Conventional Yeasts for the Production of Bioproducts’ (YEAST4BIO).
Conflict of Interest
The authors declare no conflict of interest.
REFERENCES
Author notes
Cláudia P. Godinho, Margarida Palma and Jorge Oliveira are co-first authors