Abstract

Summary : Systematically extracting biological meaning from omics data is a major challenge in systems biology. Enrichment analysis is often used to identify characteristic patterns in candidate lists. FungiFun is a user-friendly Web tool for functional enrichment analysis of fungal genes and proteins. The novel tool FungiFun2 uses a completely revised data management system and thus allows enrichment analysis for 298 currently available fungal strains published in standard databases. FungiFun2 offers a modern Web interface and creates interactive tables, charts and figures, which users can directly manipulate to their needs.

Availability and implementation: FungiFun2, examples and tutorials are publicly available at https://elbe.hki-jena.de/fungifun/ .

Contact : [email protected] or [email protected]

1 INTRODUCTION

Fungi form an extremely diverse kingdom of organisms with different lifestyles and interesting human applications ( Blackwell, 2011 ). Fungi are not only important to produce food but also produce bioactive compounds known as secondary metabolites ( Brakhage, 2013 ), which are important for the pharmaceutical and chemical industries. On the other hand, there are many pathogenic fungi that destroy crops and infect humans. The growing amount of omics data from the fungal community will help to identify virulence factors as well as interesting bioactive compounds. Enrichment analysis is often applied along with omics data analysis. Here candidate genes/proteins are assigned to categories from structured vocabularies (ontologies). Afterward, statistical tools help to identify those categories that are significantly enriched with the given candidates. These enriched categories may represent molecular functions, pathways or cellular locations most affected by the experiment. A number of easy-to-use online tools exist, e.g. YeastMine ( Balakrishnan et al. , 2012 ), and are reviewed in Huang et al. , 2009 . However, no user-friendly online tool for the systematic analysis of long candidate lists existed for most fungal species.

Our group implemented the tool FungiFun ( Priebe et al. , 2010 ) supporting enrichment analysis for 28 species with a focus on fungal pathogens. In this article, we present the novel tool FungiFun2, which allows the systematic analysis of candidate lists from all the currently available fungal strains published in standard databases ( Fig. 1 ). Users can choose from 298 strains of 240 species. For data collection, FungiFun2 uses a semi-automatic procedure, which downloads gene to category associations and annotations (names and functions) from online databases. This procedure allows the database to be kept up-to-date and simplifies the addition of further species. In comparison to the previous version, which worked with flat files for annotations, FungiFun2 parses annotation into a standardized database allowing higher data connectivity and flexibility, e. g. alternative input identifiers (IDs), gene annotation and complex search queries. Finally, FungiFun2 offers a modern and user-friendly interface.

 Overview of FungiFun2 functionality. ( A ) With help of a semi-automatic procedure, gene to category associations for three ontologies as well as gene names and functions are downloaded. The numbers in the Venn diagram indicate the number of available strains. ( B ) The user selects a strain and ontology. ( C ) On the Web server, gene to category association and significance tests are performed. ( D ) Schematic visualization of the output (dynamical figures, charts and tables) is shown
Fig. 1.

Overview of FungiFun2 functionality. ( A ) With help of a semi-automatic procedure, gene to category associations for three ontologies as well as gene names and functions are downloaded. The numbers in the Venn diagram indicate the number of available strains. ( B ) The user selects a strain and ontology. ( C ) On the Web server, gene to category association and significance tests are performed. ( D ) Schematic visualization of the output (dynamical figures, charts and tables) is shown

2 METHODS AND IMPLEMENTATION

Figure 1 A illustrates the three functional ontologies that are integrated into FungiFun2, i. e. Gene Ontology (GO; Ashburner et al. , 2000 ), Kyoto Encyclopedia of Genes and Genomes (KEGG; Kanehisa and Goto, 2000 ) and Functional Catalogue (FunCat; Rüpp et al. , 2004 ). FunCat gene to category associations were downloaded from MIPS through the PEDANT database ( Walter et al. , 2009 ). For GO, several data source have been used: Candida Genome Database (CGD; Inglis et al. , 2012 ), Aspergillus Genome Database (AspGD; Cerqueira et al. , 2014 ), Saccharomyces Genome Database (SGD; Cherry et al. , 2012 ), UniProt-GOA-project at European Bioinformatics Institute (EBI) and Ensembl Fungi ( Kersey et al. , 2010 ). Additionally, we included GO gene to category associations by applying Blast2GO ( Conesa et al. , 2005 ). To do so, proteomes were obtained from BROAD, NCBI or in-house data ( Schwartze et al. , 2014 , Linde et al. , 2014 ). Finally, KEGG gene to pathway associations were obtained from the KEGG FTP server.

With the help of a semi-automatic procedure, all available strains in the used databases are listed. Each strain may have different data sources, where the preferred version needs to be manually selected. Afterward, flat files are automatically downloaded and parsed into a MySQL database using Python and R scripts. These scripts guarantee that the database stays up-to-date with only small effort. Currently, ontologies formed by FunCat, KEGG and GO were downloaded from nine different sources. Primarily obtained from EBI, GO gene/protein to category association is available for 258 strains. FunCat gene/protein to category association is available for 180 strains. Finally, KEGG pathway association is available for 71 strains.

Figure 1 B illustrates main features of the user interface. To run FungiFun2, users need to chose a strain, select an ontology, supply the tool with a list of candidate IDs and choose a P -value cutoff. After strain selection, the user may check for available (alternative) IDs. Only those ontologies can be used for which annotation is currently available. Advanced options allow for alternative P -value calculations and multiple test corrections, for upload of a background list, for in/exclusion of categories and for the selection of GO evidence codes.

Figure 1 C illustrates main aspects for the calculation of enriched categories as well as results, graphs and tables. On the server side, a PHP script parses user input, controls calculations of statistics, graphs and tables and finally creates data for the result page. P -values indicating the significance of the enrichment are calculated with Fisher’s exact test or hypergeometric test. Multiple test correction may be performed, e.g. via FDR ( Benjamini and Hochberg, 1995 ). The R-package RamiGO ( Schröder et al. , 2013 ) is used to visualize significantly enriched GO categories within the GO hierarchy. Bar, pie and column charts are created with help of the JavaScript library Highcharts , whereas customizable result tables are created with JavaScript library DataTables .

Figure 1 D illustrates parts of the results of a FungiFun2 run. Each output can be customized directly in the Web interface as well as downloaded in commonly used formats. The number of enriched categories as well as the number of genes within enriched and non-enriched categories give an overview of the results. Specific pie and bar charts allow users to visualize the number of genes in the significant categories compared with the number of genes in the input list. Finally, graphs highlighting enriched categories within the hierarchies of the ontologies are available. Results are displayed in tables focusing on categories or genes, which can be interactively rearranged and filtered.

Funding : J.L. and S.P. were supported by the Deutsche Forschungsgemeinschaft (DFG) CRC/Transregio 124 ‘Pathogenic fungi and theirhuman host: Networks of interaction’, subproject INF.

Conflict of interest : none declared.

REFERENCES

Ashburner
M
et al.
,
Gene ontology: tool for the unification of biology. The gene ontology consortium
Nat. Genet.
,
2000
, vol.
25
(pg.
25
-
29
)
Balakrishnan
R
et al.
,
Yeastmine–an integrated data warehouse for Saccharomyces Cerevisiae data as a multipurpose tool-kit
Database (Oxford)
,
2012
, vol.
2012
pg.
bar062
Benjamini
Y
Hochberg
Y
,
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J. R. Stat. Soc. B Methodol.
,
1995
, vol.
57
(pg.
289
-
300
)
Blackwell
M
,
The fungi: 1, 2, 3 … 5.1 million species?
Am. J. Bot.
,
2011
, vol.
98
(pg.
426
-
438
)
Brakhage
AA
,
Regulation of fungal secondary metabolism
Nat. Rev. Microbiol.
,
2013
, vol.
11
(pg.
21
-
32
)
Cerqueira
GC
et al.
,
The Aspergillus genome database: multispecies curation and incorporation of rna-seq data to improve structural gene annotations
Nucleic Acids Res.
,
2014
, vol.
42
(pg.
D705
-
D710
)
Cherry
JM
et al.
,
Saccharomyces genome database: the genomics resource of budding yeast
Nucleic Acids Res.
,
2012
, vol.
40
(pg.
D700
-
D705
)
Conesa
A
et al.
,
Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research
Bioinformatics
,
2005
, vol.
21
(pg.
3674
-
3676
)
Huang
DW
et al.
,
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists
Nucleic Acids Res
,
2009
, vol.
37
(pg.
1
-
13
)
Inglis
DO
et al.
,
The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata
Nucleic Acids Res.
,
2012
, vol.
40
(pg.
D667
-
D674
)
Kanehisa
M
Goto
S
,
KEGG: Kyoto encyclopedia of genes and genomes
Nucleic Acids Res.
,
2000
, vol.
28
(pg.
27
-
30
)
Kersey
PJ
et al.
,
Ensembl genomes: extending ensembl across the taxonomic space
Nucleic Acids Res.
,
2010
, vol.
38
(pg.
D563
-
D569
)
Priebe
S
et al.
,
FungiFun: a web-based application for functional categorization of fungal genes and proteins
Fungal Genet. Biol.
,
2010
, vol.
48
(pg.
353
-
358
)
Linde
J
et al.
,
De Novo Whole-Genome Sequence and Genome Annotation of Lichtheimia ramosa
Genome Announc.
,
2014
, vol.
2
(pg.
e00888
-
14
)
Rüpp
A
et al.
,
The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes
Nucleic Acids Res.
,
2004
, vol.
32
(pg.
5539
-
5545
)
Schröder
MS
et al.
,
RamiGO: an R/Bioconductor package providing an AmiGO visualize interface
Bioinformatics
,
2013
, vol.
29
(pg.
666
-
668
)
Schwartze
VU
et al.
,
Gene expansion shapes genome architecture in the human pathogen Lichtheimia corymbifera: an evolutionary genomics analysis in the ancient terrestrial mucorales (Mucoromycotina)
PLoS Genet.
,
2014
, vol.
10
pg.
e1004496
Walter
MC
et al.
,
PEDANT covers all complete RefSeq genomes
Nucleic Acids Res.
,
2009
, vol.
37
(pg.
D408
-
D411
)

Author notes

Associate Editor: Janet Kelso

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]