Abstract

Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721 801 orthologous groups, encompassing a total of 4 396 591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101 208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450 904 orthologous groups (62.5%).

INTRODUCTION

Orthology, defined as homology via speciation (1), is a crucial concept in evolutionary biology and is essential for disciplines such as comparative genomics, metagenomics and phylogenomics. The concepts of orthology and paralogy, with the latter being defined as homology via duplication (1), have been used as a foundation to introduce the concept of clusters of orthologous groups: proteins that have evolved from a single ancestral sequence existing in the last common ancestor (LCA) of the species that are being compared, through a series of speciation and duplication events (2). Orthologous groups (OGs) have proven useful for functional analyses and the annotation of newly sequenced genomes (3–5) as orthologs tend to have equivalent functions (6).

A number of orthology prediction methods have been recently introduced that can be classified into (i) graph-based methods, from the reciprocal-best-hit approach (7) to more sophisticated methods, such as the identification of best-hit triangles (2,8–11) and other clustering-based approaches (12–15) or (ii) tree-based methods that can be further classified into methods that use tree reconciliation to infer orthologs (16–19) and those that do not (20,21). Their methodological advantages and disadvantages have been reviewed in refs (22–24).

An important point is that OGs depend on their taxonomic context. The broader the taxonomic range, the deeper the LCA is placed, resulting in larger OGs with lower resolution of the orthologous relationships. Thus, the smaller taxonomical range results in more fine-grained groups. Therefore, the first and most successful resource, COG (2), provided OGs for certain taxonomic ranges, namely COGs for all three domains of life, KOGs for Eukaryotes (8) and arCOGs for Archaea (9). Some automatic orthology prediction methods also provide distinct sets of OGs for an increasing number of taxonomic groups [e.g. OrthoDB (10), eggNOG (11) and OMA (12)].

The functional annotation of OGs is particularly necessary, as functional insights from well-studied proteins/species can be transferred to uncharacterized orthologs. Moreover, several genome annotation tools [e.g. (25)] use the functional annotations of OGs to automatically map function information to large-scale genomic data. The most common form of orthologous group annotation is a consensus-based (longest common string) approach (9,12,18,21,26) in which the description of the OG is derived from available annotations of the member proteins. Only a few available resources conduct a more robust manual annotation of the groups (8) or incorporate multiple annotation sources for the description and annotate the groups with functional categories (8,11).

Here, we describe the third version of eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups), a database that provides orthologous groups for 943 Bacteria, 69 Archaea and 121 Eukaryotes. In total, 721 801 OGs have been computed including about twice as many orthologous relations for genes compared to the previous version. Most importantly, it contains considerably more taxonomically restricted OGs with higher resolution, covering 41 taxonomically relevant ranges such as Proteobacteria or Metazoans.

SELECTION OF GENOMES

We downloaded complete proteomes from RefSeq (27), Ensembl (28), UniProt (29), GiardiaDB (30), JGI (http://genome.jgi-psf.org/) and TAIR (31). This particular set of genomes also forms the basis for the most recent STRING (32) and STITCH (33) database, allowing for easy integration across these databases.

The analyses were performed on 1133 complete genomes, encoding 5 214 234 proteins. The genomes were selected based on pertinence and quality. Except for the many model organisms that were included in the database, the species were selected based on their taxonomic position to ensure a dense sampling of 41 selected taxonomical ranges (see below) as well as a broad coverage of the tree of life. As genome quality significantly affects the accuracy of orthology assignment (34,35) all genomes in eggNOG v3 were manually selected for genomic quality based on sequencing coverage and genome completeness judged by the coverage of 40 phylogenetic marker genes (36,37).

CONSTRUCTION OF ORTHOLOGOUS GROUPS AT DIFFERENT TAXONOMIC LEVELS

The first step of the eggNOG pipeline is an all-against-all similarity search. Due to the quadratic escalation of computational power necessary for such an all-against-all search, eggNOG v3 now uses the SIMAP database (38) for the required homology comparisons. SIMAP uses the FASTA heuristics (39), which are better at capturing sequences with a lower degree of similarity than BLAST (40), which was previously used in eggNOG, at the cost of reduced performance.

After the homology searches and the subsequent clustering step (11), 4 396 591 (84%) of all proteins investigated were assigned to at least one of the 721 801 orthologous groups generated by eggNOG (Figure 1). We extended the COGs, KOGs and arCOGs (8,9) to include the 1133 organisms, 121 eukaryotic and 69 archaeal species, respectively. As an enhancement to the 4873 COGs, 4850 KOGs and 7538 arCOGs, additional groups have been created as non-supervised OGs (NOGs), eukaryote-specific NOGs (euNOGs) and archaea-specific NOGs (arNOGs), extending those original COGs/KOGs/arNOGs by 101 208 NOGs, 41 267 euNOGs and 11 387 arNOGs. To provide a higher resolution of orthologous groups in frequently used taxonomic ranks, we applied our procedure to several subsets of organisms separately. Apart from the level of Eukaryotes (euNOGs) and Archaea (arNOGs), to provide information for all three domains of life, we provide newly derived bacteria-specific NOGs (bactNOGs). Subsequently, the orthology for 22 bacterial levels such as Firmicutes (firmNOGS), Proteobacteria (proNOGs) and Actinobacteria (actNOGs) (Figure 1) is further resolved, as well as for 14 major levels in the eukaryotic clade including Animals (meNOGs) and Fungi (fuNOGs).

Figure 1.

In addition to the over 100 000 orthologous groups in the last universal common ancestor (LUCA), eggNOG v3 also provides orthologous groups and functional annotation for an additional 40 taxonomic levels. Here we display each level with its abbreviated name, species count, orthologous group count and annotation coverage. The annotation coverage for both the functional description of the groups as well as the functional category (in parentheses) is given.

Figure 1.

In addition to the over 100 000 orthologous groups in the last universal common ancestor (LUCA), eggNOG v3 also provides orthologous groups and functional annotation for an additional 40 taxonomic levels. Here we display each level with its abbreviated name, species count, orthologous group count and annotation coverage. The annotation coverage for both the functional description of the groups as well as the functional category (in parentheses) is given.

AUTOMATED ANNOTATION OF PROTEIN FUNCTION

An important feature of eggNOG v3 is the automatic functional annotation of the OGs. The groups are annotated with a function description based on the functional annotations of each protein member within the group (26) and in parallel with one of 25 functional categories (11) compatible with those provided by the COG and KOG databases (8).

In eggNOG v3, the functional annotation pipeline has similarly been optimized to scale to the large amount of data. This has led to a significant improvement in computation time while simultaneously increasing the total number of functionally annotated OGs. Between eggNOG v2 and eggNOG v3, for corresponding taxonomic levels, the total number of annotated OGs increased by 28.8% and 10.0% for function description and functional category, respectively. In summary, of the 721 801 OGs in eggNOG v3, 62.5% have a functional annotation and 47.6% have been classified into a functional category (for details see Figure 1).

FURTHER IMPROVEMENTS

As the exponential growth of genomes and genes therein leads to considerable issues regarding performance, a number of technical improvements and speedups have been introduced; for example the parallelization of some key aspects of the OG pipeline have contributed to the performance enhancement.

One important step in the eggNOG pipeline is the inference of in-paralogs. Proteins that belong to a given subset of species and are more similar to each other than to proteins belonging to species outside that subset are defined as in-paralogs. In this release, we determined the aforementioned subsets automatically: for the universal, domain- and phylum-specific OGs, we grouped organisms within the same taxonomic order. For taxonomical ranges between the phylum and class, we used the taxonomical family, while for ranges below the class level we grouped given species together.

QUALITY ASSESSMENT OF eggNOG v3.0

So far, the majority of quality assessment tests are based on the functional conservation of predicted orthologs (41–44); however, it has been acknowledged that a phylogeny-based benchmarking approach would be more appropriate (44,45). We therefore manually curated a set of orthologous groups exemplifying multiple caveats of orthology prediction (35), named Reference OGs (RefOGs), which were used to assess the quality between this release and eggNOG v2. As many as 95% of the reference orthologs can be detected in the new release compared to only 75% in the previous version (Figure 2). This is mainly due to the updated genome annotations in eggNOG v3. We estimated the impact of four error sources: (i) false assignments, (ii) missing orthologs, (iii) fusions and (iv) fissions (for details see Figure 2). eggNOG v3 is less influenced than eggNOG v2 by false assignments and missing orthologs. Especially, for the missing orthologs, only 41% of the RefOGs are affected in this release compared to 57% in previous one. The high coverage of the benchmark set (95%) due to new genome annotations is the major contributor to this observation, highlighting the importance of frequent database updates, which is one of our goals. On the other hand, the previous release contains slightly fewer artificial fusions and fissions. As coverage of compared species affect the accuracy of orthology assignment (35), it can be expected that the addition of more species does not always improve all benchmark parameters.

Figure 2.

Quality assessment of eggNOG v3. We used 70 manually curated families (RefOGs) to test the accuracy of orthology prediction of the new release compared to eggNOG v2. For each release, we identified the orthologous group (OG) with the largest overlap of each RefOG and calculated how many genes were not predicted in the OG (missing orthologs) and how many genes were over-predicted in the OG (false assignments). Additionally, we checked if members of the same RefOG have been separated into multiple OGs (RefOG fission) and how many of those OGs include more than three false assignments (RefOG fusion). Missing orthologs influence 41% of the RefOGs; however, this is significantly less than the 57% in eggNOG v2. Similarly, less RefOGs include false assignments in eggNOG v3 (60%) compared to version 2 (66%). However, there are slightly less artificial OG fusions and fissions in eggNOG v2. Given that an addition of species can introduce false assignments, our results suggest that the eggNOG methodology can tolerate a large number of species, and at the same time improve its coverage against the tested benchmark dataset.

Figure 2.

Quality assessment of eggNOG v3. We used 70 manually curated families (RefOGs) to test the accuracy of orthology prediction of the new release compared to eggNOG v2. For each release, we identified the orthologous group (OG) with the largest overlap of each RefOG and calculated how many genes were not predicted in the OG (missing orthologs) and how many genes were over-predicted in the OG (false assignments). Additionally, we checked if members of the same RefOG have been separated into multiple OGs (RefOG fission) and how many of those OGs include more than three false assignments (RefOG fusion). Missing orthologs influence 41% of the RefOGs; however, this is significantly less than the 57% in eggNOG v2. Similarly, less RefOGs include false assignments in eggNOG v3 (60%) compared to version 2 (66%). However, there are slightly less artificial OG fusions and fissions in eggNOG v2. Given that an addition of species can introduce false assignments, our results suggest that the eggNOG methodology can tolerate a large number of species, and at the same time improve its coverage against the tested benchmark dataset.

ACCESS OPTIONS

To improve the usability of eggNOG v3, a new, modernized web interface was developed. As with the previous versions, the new interface provides data that can be downloaded under the Creative Commons Attribution 3.0 License at http://eggnog.embl.de. The available data include the OGs, protein sequences, multiple sequence alignments, precomputed gene trees (Figure 3) as well as the annotation of 62% of the OGs. Possible queries include multiple OG names, gene names and/or protein names. One goal of the new interface is to simplify the navigation of the various OGs by (i) a cleaner, more intuitive interface as well as (ii) an interactive species tree on the right side of the search results. The interactive species tree facilitates the navigation across different hierarchical levels by following the orthologs through the taxonomic levels. Homo sapiens serves as the default species for protein name queries; however, this can be changed to a multiple of common species within the search results. The multiple sequence alignments can be displayed using the Jalview applet (46) or downloaded in aligned or unaligned form. Precomputed phylogenetic trees are also provided and can be viewed together with any assigned PFAM (47) and SMART (48) domain via iTOL (49) or downloaded in Newick format.

Figure 3.

Screenshot of a results page. The eggNOG database was queried for the term ‘smoothened’. The top left picture demonstrates the simplified navigation of multiple search terms and species selection. The navigation tree at the top right of the page allows the user to change the view to more coarse-grained orthologous groups, for example, the mammalian orthologous groups. The group features, such as member proteins, alignments (green arrow) and phylogenetic trees with SMART domains (orange arrow), can be accessed inline and do not require a page refresh.

Figure 3.

Screenshot of a results page. The eggNOG database was queried for the term ‘smoothened’. The top left picture demonstrates the simplified navigation of multiple search terms and species selection. The navigation tree at the top right of the page allows the user to change the view to more coarse-grained orthologous groups, for example, the mammalian orthologous groups. The group features, such as member proteins, alignments (green arrow) and phylogenetic trees with SMART domains (orange arrow), can be accessed inline and do not require a page refresh.

CONCLUSIONS/PERSPECTIVES

With eggNOG v3, we provide one of the most comprehensive and up-to-date databases of orthologous groups available that delivers protein function annotation for 1133 genomes across the three domains of life. Not only does eggNOG v3 cover a broad taxonomic spectrum, but it also supplies orthologous groups for 41 manually selected taxonomic ranges. The modern, easy-to-use web interface facilitates the usage of the database with novel extended functionalities, such as an interactive species tree to assist the navigation through the increased number of hierarchical levels. Our future plans include the ongoing improvement of the quality of orthology and functional assignments, a further increase of taxonomic ranges and technical improvements to manage the computational challenges that come along with the expected exponential increase of available genomes.

FUNDING

EMBL; MetaHit RTD EC (201052); Novo Nordisk Foundation Center for Protein Research; Swiss Institute of Bioinformatics; and the University of Zurich through its Research Priority Program ‘Systems Biology and Functional Genomics’. Funding for open access charge: EMBL (internal).

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We would like to thank Yan Yuan for all his help and support on all technical and infrastructure issues we encountered during this project.

REFERENCES

1
Fitch
WM
Distinguishing homologous from analogous proteins
Syst. Zool.
 , 
1970
, vol. 
19
 (pg. 
99
-
113
)
2
Tatusov
RL
Koonin
EV
Lipman
DJ
A genomic perspective on protein families
Science
 , 
1997
, vol. 
278
 (pg. 
631
-
637
)
3
Eisen
JA
Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis
Genome Res.
 , 
1998
, vol. 
8
 (pg. 
163
-
167
)
4
Huynen
MA
Snel
B
von Mering
C
Bork
P
Function prediction and protein networks
CuCrr. Opin. Cell. Biol
 , 
2003
, vol. 
15
 (pg. 
191
-
198
)
5
von Mering
C
Jensen
LJ
Snel
B
Hooper
SD
Krupp
M
Foglierini
M
Jouffre
N
Huynen
MA
Bork
P
STRING: known and predicted protein-protein associations, integrated and transferred across organisms
Nucleic Acids Res.
 , 
2005
, vol. 
33
 (pg. 
D433
-
D437
)
6
Koonin
EM
Orthologs, paralogs and evolutionary genomics
Annu. Rev. Genet.
 , 
2005
, vol. 
39
 (pg. 
309
-
338
)
7
Östlund
G
Schmitt
T
Forslund
K
Köstler
T
Messina
DN
Roopra
S
Frings
O
Sonnhammer
EL
InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Nucleic Acids Res.
 , 
2010
, vol. 
38
 (pg. 
D196
-
D203
)
8
Tatusov
RL
Fedorova
ND
Jackson
JD
Jacobs
AR
Kiryutin
B
Koonin
EV
Krylov
DM
Mazumder
R
Mekhedov
SL
Nikolskaya
AN
, et al.  . 
The COG database: an updated version includes eukaryotes
BMC Bioinformatics
 , 
2003
, vol. 
4
 pg. 
41
 
9
Makarova
KS
Sorokin
AV
Novichkov
PS
Wolf
YI
Koonin
EV
Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea
Biol. Direct.
 , 
2007
, vol. 
2
 pg. 
33
 
10
Waterhouse
RM
Zdobnov
EM
Tegenfeldt
F
Li
J
Kriventseva
EV
OrthoDBL the hierarchical catalog of eukaryotic orthologs in 2011
Nucleic Acids Res.
 , 
2011
, vol. 
39
 (pg. 
D283
-
D288
)
11
Muller
J
Szklarczyk
D
Julien
P
Letunic
I
Roth
A
Kuhn
M
Powell
S
von Mering
C
Doerks
T
Jensen
LJ
, et al.  . 
eggNOG v2.0. extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations
Nucleic Acids Res
 , 
2010
, vol. 
38
 (pg. 
D190
-
D195
)
12
Altenhoff
AM
Schneider
A
Gonnet
GH
Dessimoz
C
OMA 2011: orthology inference among 1000 complete genomes
Nucleic Acids Res.
 , 
2011
, vol. 
39
 (pg. 
D289
-
D294
)
13
Chen
F
Mackey
AJ
Stoeckert
CJ
Jr
Roos
DS
OrthoMCL-DB. Querying a comprehensive multi-species collection of ortholog groups
Nucleic Acids Res.
 , 
2006
, vol. 
34
 (pg. 
D363
-
D368
)
14
Uchiyama
I
MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups
Nucleic Acids Res.
 , 
2007
, vol. 
35
 (pg. 
D343
-
D346
)
15
Linard
B
Thompson
JD
Poch
O
Lecompte
O
OrthoInspector: comprehensive orthology analysis and visual exploration
BMC Bioinform.
 , 
2011
, vol. 
12
 pg. 
11
 
16
Wapinski
I
Pfeffer
A
Friedman
N
Regev
A
Automaticgenome- wide reconstruction of phylogenetic gene trees
Bioinformatics
 , 
2007
, vol. 
23
 (pg. 
i549
-
i58
)
17
Huerta-Cepas
J
Bueno
A
Dopazo
J
Gabaldón
T
PhylomeDB: a database for genome-wide collections of gene phylogenies
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D491
-
D496
)
18
Vilella
AJ
Severin
J
Ureta-Vidal
A
Heng
L
Durbin
R
Birney
E
EnsemblCompara GeneTrees. Complete, duplication-aware phylogenetic trees in vertebrates
Genome Res.
 , 
2009
, vol. 
19
 (pg. 
327
-
35
)
19
Ruan
J
Li
H
Chen
Z
Coghlan
A
Coin
LJ
Guo
Y
Hériché
JK
Hu
Y
Kristiansen
K
Li
R
, et al.  . 
TreeFam. 2008 Update
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D735
-
D740
)
20
van der Heijden
RT
Snel
B
van Noort
V
Huynen
MA
Orthology prediction at scalable resolution by phylogenetic tree analysis
BMC Bioinform.
 , 
2007
, vol. 
8
 pg. 
83
 
21
Datta
RS
Meacham
C
Samad
B
Neyer
C
Sjölander
K
Berkeley PHOG: PhyloFacts orthology group prediction web server
Nucleic Acids Res.
 , 
2009
, vol. 
37
 (pg. 
W84
-
W89
)
22
Kuzniar
A
van Ham
RC
Pongor
S
Leunissen
JA
The quest for orthologs: finding the corresponding gene across genomes
Trends Genet.
 , 
2008
, vol. 
24
 (pg. 
539
-
551
)
23
Gabaldon
T
Large-scale assignment of orthology. Back to phylogenetics?
Genome Biol.
 , 
2008
, vol. 
9
 pg. 
235
 
24
Kristensen
DM
Wolf
YI
Mushegian
AR
Koonin
EV
Computational methods for Gene Orthology inference
Brief Bioinform.
 , 
2011
, vol. 
12
 (pg. 
379
-
391
)
25
Kuzniar
A
Lin
K
He
Y
Nijveen
H
Pongor
S
Leunissen
JA
ProGMap: an integrated annotation resource for protein orthology
Nucleic Acids Res.
 , 
2009
, vol. 
37
 (pg. 
W428
-
W434
)
26
Jensen
LJ
Julien
P
Kuhn
M
von Mering
C
Muller
J
Doerks
T
Bork
P
eggNOG: automated construction and annotation of orthologous groups of genes
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D250
-
254
)
27
Pruitt
KD
Tatusova
T
Maglott
DR
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
Nucleic Acids Res.
 , 
2007
, vol. 
35
 (pg. 
D61
-
D65
)
28
Flicek
P
Amode
MR
Barrell
D
Beal
K
Brent
S
Chen
Y
Clapham
P
Coates
G
Fairley
S
Fitzgerald
S
, et al.  . 
Ensembl 2011
Nucleic Acids Res.
 , 
2011
, vol. 
36
 (pg. 
D491
-
496
)
29
The UniProt Consortium
Ongoing and future developments at the Universal Protein Resource
Nucleic Acids Res.
 , 
2011
, vol. 
39
 (pg. 
D214
-
D219
)
30
Aurrecoechea
C
Brestelli
J
Brunk
BP
Carlton
JM
Dommer
J
Fischer
S
Gajria
B
Gao
X
Gingle
A
Grant
G
, et al.  . 
GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis
Nucleic Acids Res.
 , 
2009
, vol. 
37
 (pg. 
D526
-
D530
)
31
Swarbreck
D
Wilks
C
Lamesch
P
Berardini
TZ
Garcia- Hernandez
M
Foerster
H
Li
D
Meyer
T
Muller
R
Ploetz
L
, et al.  . 
The Arabidopsis Information Resource (TAIR): gene structure and function annotation
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D1009
-
D1014
)
32
Szklarczyk
D
Franceschini
A
Kuhn
M
Simonovic
M
Roth
A
Minguez
P
Doerks
T
Stark
M
Muller
J
Bork
P
, et al.  . 
The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored
Nucleic Acids Res.
 , 
2011
, vol. 
39
 (pg. 
D561
-
D568
)
33
Kuhn
M
Szklarczyk
D
Franceschini
A
Campillos
M
von Mering
C
Jensen
LJ
Beyer
A
Bork
P
STITCH 2: an interaction network database for small molecules and proteins
Nucleic Acids Res.
 , 
2010
, vol. 
38
 (pg. 
D552
-
D556
)
34
Milinkovitch
MC
Helaers
R
Depiereux
E
Tzika
AC
Gabaldón
T
2x genomes–depth does matter
Genome Biol.
 , 
2010
, vol. 
11
 pg. 
R16
 
35
Trachana
K
Larsson
TA
Powell
S
Chen
WH
Doerks
T
Muller
J
Bork
P
Orthology prediction methods: a quality assessment using curated protein families
Bioessays
 , 
2011
, vol. 
33
 (pg. 
769
-
780
)
36
Ciccarelli
FD
Doerks
T
von Mering
C
Creevey
CJ
Snel
B
Bork
P
Toward automatic reconstruction of a highly resolved tree of life
Science
 , 
2006
, vol. 
311
 (pg. 
1283
-
1287
)
37
Creevey
CJ
Doerks
T
Fitzpatrick
DA
Raes
J
Bork
P
Universally distributed single-copy genes indicate a constant rate of horizontal transfer
PLoS One
 , 
2011
, vol. 
6
 pg. 
e22099
 
38
Rattei
T
Tischler
P
Götz
S
Jehl
MA
Hoser
J
Arnold
R
Conesa
A
Mewes
HW
SIMAP–a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters
Nucleic Acids Res.
 , 
2010
, vol. 
38
 (pg. 
D223
-
D226
)
39
Pearson
WR
Rapid and sensitive sequence comparison with FASTP and FASTA
Methods Enzymol.
 , 
1990
, vol. 
183
 (pg. 
63
-
98
)
40
Altschul
SF
Madden
TL
Schaffer
AA
Zhang
J
Zhang
Z
Miller
W
Lipman
DJ
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Res.
 , 
1997
, vol. 
25
 (pg. 
3389
-
3402
)
41
Pryszcz
LP
Huerta-Cepas
J
Gabaldon
T
MetaPhOrs. Orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score
Nucleic Acids Res.
 , 
2010
, vol. 
39
 pg. 
e32
 
42
Hulsen
T
Huynen
MA
de Vlieg
J
Groenen
PM
Benchmarking ortholog identification methods using functional genomics data
Genome Biol.
 , 
2006
, vol. 
7
 pg. 
R31
 
43
Chen
F
Mackey
AJ
Vermunt
JK
Roos
DS
Assessing performance of orthology detection strategies applied to eukaryotic genomes
PLoS One
 , 
2007
, vol. 
2
 pg. 
e383
 
44
Altenhoff
AM
Dessimoz
C
Phylogenetic and functional assessment of orthologs inference projects and methods
PLoS Comput. Biol.
 , 
2009
, vol. 
5
 pg. 
e1000262
 
45
Boeckmann
B
Robinson-Rechavi
M
Xenarios
I
Dessimoz
C
Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees
Brief Bioinform.
 , 
2011
, vol. 
12
 (pg. 
423
-
435
)
46
Waterhouse
AM
Procter
JB
Martin
DM
Clamp
M
Barton
GJ
Jalview Version 2–a multiple sequence alignment editor and analysis workbench
Bioinformatics
 , 
2009
, vol. 
25
 (pg. 
1189
-
1191
)
47
Finn
RD
Tate
J
Mistry
J
Coggill
PC
Sammut
SJ
Hotz
HR
Ceric
G
Forslund
K
Eddy
SR
Sonnhammer
EL
, et al.  . 
The Pfam protein families database
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D281
-
D288
)
48
Letunic
I
Doerks
T
Bork
P
SMART 6. Recent updates and new developments
Nucleic Acids Res.
 , 
2009
, vol. 
37
 (pg. 
D229
-
D232
)
49
Letunic
I
Bork
P
Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy
Nucleic Acids Res.
 , 
2011
, vol. 
39
 (pg. 
W475
-
W78
)

Author notes

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments