AYbRAH: a curated ortholog database for yeasts and fungi spanning 600 million years of evolution

Abstract Budding yeasts inhabit a range of environments by exploiting various metabolic traits. The genetic bases for these traits are mostly unknown, preventing their addition or removal in a chassis organism for metabolic engineering. Insight into the evolution of orthologs, paralogs and xenologs in the yeast pan-genome can help bridge these genotypes; however, existing phylogenomic databases do not span diverse yeasts, and sometimes cannot distinguish between these homologs. To help understand the molecular evolution of these traits in yeasts, we created Analyzing Yeasts by Reconstructing Ancestry of Homologs (AYbRAH), an open-source database of predicted and manually curated ortholog groups for 33 diverse fungi and yeasts in Dikarya, spanning 600 million years of evolution. OrthoMCL and OrthoDB were used to cluster protein sequence into ortholog and homolog groups, respectively; MAFFT and PhyML reconstructed the phylogeny of all homolog groups. Ortholog assignments for enzymes and small metabolite transporters were compared to their phylogenetic reconstruction, and curated to resolve any discrepancies. Information on homolog and ortholog groups can be viewed in the AYbRAH web portal (https://lmse.github.io/aybrah/), including functional annotations, predictions for mitochondrial localization and transmembrane domains, literature references and phylogenetic reconstructions. Ortholog assignments in AYbRAH were compared to HOGENOM, KEGG Orthology, OMA, eggNOG and PANTHER. PANTHER and OMA had the most congruent ortholog groups with AYbRAH, while the other phylogenomic databases had greater amounts of under-clustering, over-clustering or no ortholog annotations for proteins. Future plans are discussed for AYbRAH, and recommendations are made for other research communities seeking to create curated ortholog databases.

: Comparison of manually curated acetyl-Coenzyme A synthetase orthologs in AYbRAH to highly cited ortholog databases. N/A indicates genomes that do not have orthology relationships in the public database but have been assigned orthology with AYbRAH. Omission indicates genomes that have annotations in the public ortholog database but do not have an annotation for the given gene. PANTHER is the only database that can distinguish between the three ACS ortholog groups; ACS3 is assigned to a different PANTHER family despite the shared ancestry of all the orthologs. KEGG can only differentiate between ACS1 and ACS3 ortholog groups, while all other database orthologly assignments are polyphyletic. EggNOG includes FOG07524 and FOG07525 in the ACS ortholog group, which both have predicted acetoacetate-CoA ligase activity.
AYbRAH  Table S2: Comparison of manually curated Type II NADH dehydrogenase (NDH2) orthologs in AYbRAH to highly cited ortholog databases. N/A indicates genomes that do not have orthology relationships in the public database but have been assigned orthology with AYbRAH. Omission indicates genomes that have annotations in the public ortholog database but do not have an annotation for the given gene. PANTHER is able to distinguish between most orthologs in the NDH2 family, with the exception of NDE1 and NDE2; NDE0 is in a different PANTHER family than the rest of the NDH2 genes. KEGG is the only other database that can differentiate between some NDH2 genes; the genes are split between the older NDI0/NDE0 ortholog group and more recent NDE1/NDI1 ortholog group. PANTHER and EggNOG contain additional genes not included in other ortholog databases, which may represent ancient paralogs having lower sequence similarities than other NDH2 paralogs. AIF1, which can localize to the mitochondria in S. cerevisiae, is in the same subfamily as NDE0 in PANTHER; the other inconsistency is an Aspergillus niger gene (FOG07265) paralogous to a characterized external NADH dehydrogenase in Neurospora crassa (FOG07264) in EggNOG (Carneiro et al., 2007).
AYbRAH percenty identity Figure S1: Distributions of BLASTP percent identities for proteins identified as orthologous to Saccharomyces cerevisiae in AYbRAH.
log ( bitsore ) Figure S2: Distributions of logarithm BLASTP bitscores for proteins orthologous to Saccharomyces cerevisiae in AYbRAH.
-log( expect-value ) Figure S3: Distributions of negative logarithm of BLASTP expect-values for proteins orthologous to Saccharomyces cerevisiae in AYbRAH.
2 Sample webpages for homolog groups Genes: 34

Protein description
Acetyl-coA synthetase isoform expressed with non-fermentable carbon sources. Spo gene expressed with fermentable carbon sources.

SGD Description
Acetyl-coA synthetase isoform; along with Acs2p, acetyl-coA synthetase isoform is the nuclear source of acetyl-coA for histone acetylation; expressed during growth on nonfermentable carbon sources and under aerobic conditions

References
Armitt S, et al. (1976 Feb). Analysis of acetate non-utilizing (acu) mutants in Aspergillus nidulans. Payton M, et al. (1976 May). Agar as a carbon source and its effect on the utilization of other carbon sources by acetate non-utilizing (acu) mutants of Aspergillus nidulans. Frenkel EP, et al. (1977 Jan 25). Purification and properties of acetyl coenzyme A synthetase from bakers' yeast.
Hynes MJ, et al. (1977 Sep). Induction of the acetamidase of Aspergillus nidulans by acetate metabolism.
Kelly JM, et al. (1981 Apr). The regulation of phosphoenolpyruvate carboxykinase and the NADP-linked malic enzyme in Aspergillus nidulans.
Sandeman RA, et al. (1989 Jul). Isolation of the facA (acetyl-coenzyme A synthetase) and acuE (malate synthase) genes of Aspergillus nidulans. Connerton IF, et al. (1990 Mar). Comparison and cross-species expression of the acetyl-CoA synthetase genes of the Ascomycete fungi, Aspergillus nidulans and Neurospora crassa.
Sandeman RA, et al. (1991 Sep). Molecular organisation of the malate synthase genes of Aspergillus nidulans and Neurospora crassa.
Birch PR, et al. (1992). Nucleotide sequence of a gene from Phanerochaete chrysosporium that shows homology to the facA gene of Aspergillus nidulans.     Kerscher SJ, et al. (1999 Jul). A single external enzyme confers alternative NADH:ubiquinone oxidoreductase activity in Yarrowia lipolytica. Dudin O, et al. (2017 Apr). A systematic screen for morphological abnormalities during fission yeast sexual reproduction identifies a mechanism of actin aster formation for cell fusion. Lee J, et al. (2017 Feb 20). Chromatin remodeller Fun30<sup>Fft3</sup> induces nucleosome disassembly to facilitate RNA polymerase II elongation.

SGD Description
Mitochondrial external NADH dehydrogenase; type II NAD(P)H:quinone oxidoreductase that catalyzes the oxidation of cytosolic NADH; Nde1p and Nde2p provide cytosolic NADH to the mitochondrial respiratory chain; NDE1 has a paralog, NDE2, that arose from the whole genome duplication encode separate mitochondrial NADH dehydrogenases catalyzing the oxidation of cytosolic NADH. Kerscher SJ, et al. (2000 Aug 15). Diversity and origin of alternative NADH:ubiquinone oxidoreductases. Overkamp KM, et al. (2000 May). In vivo analysis of the mechanisms for oxidation of cytosolic NADH by Saccharomyces cerevisiae mitochondria. Joseph-Horne T, et al. (2001 Apr 2). Fungal respiration: a fusion of standard and alternative components. . Yeast mitochondrial dehydrogenases are associated in a supramolecular complex. Davidson JF, et al. (2001 Dec). Mitochondrial respiratory electron carriers are involved in oxidative stress during heat stress in Saccharomyces cerevisiae. Bakker BM, et al. (2001 Jan). Stoichiometry and compartmentation of NADH metabolism in Saccharomyces cerevisiae.
Påhlman IL, et al. (2002 Aug 2). Kinetic regulation of the mitochondrial glycerol-3phosphate dehydrogenase by the external NADH dehydrogenase in Saccharomyces cerevisiae.

SGD Description
Mitochondrial external NADH dehydrogenase; catalyzes the oxidation of cytosolic NADH; Nde1p and Nde2p are involved in providing the cytosolic NADH to the mitochondrial respiratory chain; NDE2 has a paralog, NDE1, that arose from the whole genome duplication Davidson JF, et al. (2001 Dec). Mitochondrial respiratory electron carriers are involved in oxidative stress during heat stress in Saccharomyces cerevisiae. Bakker BM, et al. (2001 Jan). Stoichiometry and compartmentation of NADH metabolism in Saccharomyces cerevisiae.
Påhlman IL, et al. (2002 Aug 2). Kinetic regulation of the mitochondrial glycerol-3phosphate dehydrogenase by the external NADH dehydrogenase in Saccharomyces cerevisiae.

SGD Description
NADH:ubiquinone oxidoreductase; transfers electrons from NADH to ubiquinone in the respiratory chain but does not pump protons, in contrast to the higher eukaryotic multisubunit respiratory complex I; phosphorylated; involved in Mn and H2O2 induced apoptosis; upon apoptotic stress, Ndip is activated in the mitochondria by N-terminal cleavage, and the truncated protein translocates to the cytoplasm to induce apoptosis; homolog of human AMID