Abstract

This article introduces the second release of the Gypsy Database of Mobile Genetic Elements (GyDB 2.0): a research project devoted to the evolutionary dynamics of viruses and transposable elements based on their phylogenetic classification (per lineage and protein domain). The Gypsy Database (GyDB) is a long-term project that is continuously progressing, and that owing to the high molecular diversity of mobile elements requires to be completed in several stages. GyDB 2.0 has been powered with a wiki to allow other researchers participate in the project. The current database stage and scope are long terminal repeats (LTR) retroelements and relatives. GyDB 2.0 is an update based on the analysis of Ty3/Gypsy , Retroviridae , Ty1/Copia and Bel/Pao LTR retroelements and the Caulimoviridae pararetroviruses of plants. Among other features, in terms of the aforementioned topics, this update adds: (i) a variety of descriptions and reviews distributed in multiple web pages; (ii) protein-based phylogenies, where phylogenetic levels are assigned to distinct classified elements; (iii) a collection of multiple alignments, lineage-specific hidden Markov models and consensus sequences, called GyDB collection; (iv) updated RefSeq databases and BLAST and HMM servers to facilitate sequence characterization of new LTR retroelement and caulimovirus queries; and (v) a bibliographic server. GyDB 2.0 is available at http://gydb.org.

INTRODUCTION

Mobile genetic elements (MGEs) are ubiquitous, autonomous genetic units that often constitute a significant part of their host genomes. It is commonly accepted that mobile DNA elements are powerful vectors for disease and evolution, from which distinct host genes have evolved during the history of life ( 1 , 2 ). The emergence and subsequent role played by viruses and MGEs in the history of life is an exciting topic that requires further investigation. In this respect, researchers aim to discern relevant aspects of the molecular changes responsible for various characteristics in organisms related to horizontal transfer, infection and disease. Among the distinct initiatives launched with the aim of investigating the diversity of MGEs (see for example 3–5 ) was the Gypsy Database (GyDB) of MGEs ( 6 ), a research project devoted to the evolutionary dynamics of viruses and MGEs (and their related host proteins), which was launched in 2008. The GyDB project is a highly informative database established within an evolutionary context of classification, where one piece of research delivers one conclusion that drives individuals towards another goal. The most captivating aspect of this project is that a share of our efforts are dedicated to the interpretation of analyses, paying particular attention to non-redundant elements displaying a certain degree of distance and investigating how they can be collectively aligned or related, in terms of protein domain architecture, with other lineages and elements. Because of the impressive molecular diversity of viruses and MGEs, the GyDB is a long-term project that has been arranged in a database in continuous progression, and must be achieved in stages. The current database stage and scope is retroviruses and retrotransposons with long terminal repeats (LTR retroelements) and their relatives. Following the outline of the earlier release (the study of Ty3/Gypsy and Retroviridae LTR retroelements), this article presents the GyDB update based on the phylogenetic evaluation of the most representative LTR retroelement families and the plant caulimoviruses. This update, called GyDB 2.0, is available at http://gydb.org and includes sequence phylogenetic classification in addition to significant bioinformatic improvements. In particular, the new infrastructure implements a wiki management system constructed with the aim of promoting a world-wide community of researchers collaborating in the analysis and classification of MGEs and viruses inhabiting (or circulating in) living organisms.

THE UPDATE: NEW FEATURES

GyDB 2.0 consists of 1234 web pages addressing the phylogenetic study of Ty3/Gypsy , Retroviridae, Ty1/Copia and Bel/Pao LTR retroelement. Caulimoviruses ( Caulimoviridae ) are formally plant DNA pararetroviruses, but they were considered in GyDB 2.0 owing to their relationship with LTR retroelements based on the common gag/coat and pol regions [for more details, see ( 7 ) and references therein]. Table 1 summarizes the topics addressed in this update, as well as the servers and database sections it offers. The sequences on which GyDB 2.0 is based were retrieved from GenBank ( 8 ) and the methodologies employed were the same as those described earlier in references ( 6 , 7 , 9 ). At GyDB we evaluate the phylogenetic signal of classified distinct elements and create hidden Markov model (HMMs) profiles ( 10 ) per lineage and protein domain. In addition, the project is concerned with the evolutionary relationships between MGEs and their host genomes, based on the analysis of common protein families. In this regard, GyDB 2.0 focuses on two protein superfamilies including protein products commonly encoded by LTR retroelements and their host genomes; the chromodomain superfamily ( 11 ) and clan AA of aspartic peptidases ( 12 , 13 ). This second release is accompanied by bibliographic data-mining from PubMed databases hosted at the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/ ) to document up to date information regarding the distinct classified elements.

Table 1.

GyDB 2.0 new features: topics and contents

Systems Families Lineages Elements Protein domains Accessory proteins LTRs 
LTR retroelements Ty3/Gypsy 34 96 Yes 
LTR retroelements Ty1/Copia 19 69 – Yes 
LTR retroelements Retroviridae 50 41 Yes 
LTR retroelements Bel/Pao 23 – Yes 
LTR retroelements a Caulimoviridae 30 10 27 No 
Related families Clan AA 35 323 – No 
Related families Chromodomains 123 – No 
Systems Families Lineages Elements Protein domains Accessory proteins LTRs 
LTR retroelements Ty3/Gypsy 34 96 Yes 
LTR retroelements Ty1/Copia 19 69 – Yes 
LTR retroelements Retroviridae 50 41 Yes 
LTR retroelements Bel/Pao 23 – Yes 
LTR retroelements a Caulimoviridae 30 10 27 No 
Related families Clan AA 35 323 – No 
Related families Chromodomains 123 – No 
Topics Sections Availability 
Systematics Side menu 
Domains 14 Side menu 
Database Side menu 
Servers Top menu: BLAST, HMM, Literature 
Wiki tools and utilities Top menu 
Topics Sections Availability 
Systematics Side menu 
Domains 14 Side menu 
Database Side menu 
Servers Top menu: BLAST, HMM, Literature 
Wiki tools and utilities Top menu 
Databases Items  Sections 
Genomes (full-length genomes) 271 sequences BLAST search and RefSeq DBs 
LTRs (nucleotide sequences) 413 sequences BLAST search and RefSeq DBs 
Cores (protein cores sequences) 1895 sequences BLAST search and RefSeq DBs 
HMMs 314 HMM profiles HMM search and GyDB collection 
Multiple alignments 131 alignments GyDB collection 
Consensus sequences 314 MRC sequences GyDB collection 
Phylogenetic trees 70 trees Phylogenies 
Clan AA ancestral reconstruction 70 alignments CAARD database 
Literature 100797 references Literature server 
Databases Items  Sections 
Genomes (full-length genomes) 271 sequences BLAST search and RefSeq DBs 
LTRs (nucleotide sequences) 413 sequences BLAST search and RefSeq DBs 
Cores (protein cores sequences) 1895 sequences BLAST search and RefSeq DBs 
HMMs 314 HMM profiles HMM search and GyDB collection 
Multiple alignments 131 alignments GyDB collection 
Consensus sequences 314 MRC sequences GyDB collection 
Phylogenetic trees 70 trees Phylogenies 
Clan AA ancestral reconstruction 70 alignments CAARD database 
Literature 100797 references Literature server 

a We included caulimoviruses in the second release in view of their relationship with LTR retroelements based on the common gag/coat and pol region.

DATABASE ORGANIZATION

GyDB 2.0 is deployed over a Linux-MySQL-Apache-PHP (LAMP) stack, with additional Ajax programming to minimize server responses to client browsers. The design is similar to that of the previous release but implements various changes on the web interface. As shown in Figure 1 , the database organization is founded upon two major menus––a top menu and a side menu. The top menu allows access to the three servers: An additional new tool in GyDB 2.0 is its wiki, powered by the MediaWiki content management system ( http://www.mediawiki.org/ ). This tool has been implemented to allow other users participate in the project by editing or creating topics. Accession to this wiki is free but it requires a subscription (registration). The rationale behind this choice is that edits are registered by date and author in order to credit contributions, and secondly, we have programmed a revision mechanism to review all changes constructively before making them public. The top menu includes three sections to log in and manage the distinct wiki resources. Finally, to the right of the top menu, GyDB 2.0 includes a text field to search the whole project under two modes (detailed in Figure 1 ). The side menu divides the distinct GyDB sections into three major demarcations (emphasized with boxes in Figure 1 ). The first collects sections associated with the systematics applied at GyDB. The second implements information concerning the domains typically observed in the genomic structure of the elements we classify. The third demarcation offers free access to distinct databases, which are organized into three sections: Finally, a variety of links to other database initiatives relevant to the topic are included in the side menu.

  • BLAST server; implements a BLAST search powered by the NCBI BLAST package ( 14 ), allowing protein and DNA comparisons with the GENOMES, LTRs and CORES databases. These databases collect the full-length genomes, the LTR sequences and all the protein sequences on which the second release is based, respectively.

  • HMM server; implements HMMER3 package ( http://hmmer.janelia.org ) and allows protein comparisons against a database of protein domain lineage-specific HMM profiles created based on the update. This server provides additional comparisons between HMM profiles and the aforementioned CORES database.

  • LITERATURE server; allows users to search bibliography of interest in the topic.

  • Trees and Networks; consists of the collection of inferred phylogenetic trees based on distinct protein domains encoded by the classified elements, or based on their concatenation (when they are parts of polyproteins). Remarkably, inferred pol polyprotein phylogenies based on the concatenation of the protease, reverse transcriptase, RNaseH and integrase domains, are the major criterion for assigning phylogenetic levels at GyDB 2.0 [results introduced in ( 7 )]. Phylogenetic trees provide links to the corresponding element page at GyDB 2.0. By clicking any element name in any tree an entry assigned to this element is opened. These tree image maps were created using Phylograph 1.0 ( 15 ). This section includes the clan AA reference database (CAARD) of ancestral maximum likelihood (ML) reconstructions ( 13 ) that has been implemented and maintained at GyDB.

  • GyDB collection ( 16 ) or the repository of multiple alignments, HMMs, and majority rule consensus (MRC) sequences offered at GyDB 2.0. When a deposited alignment, profile or MRC sequence is associated with a journal publication, its entry in the collection includes citation information.

  • REF SEQ DATABASES or the repository for downloading the databases (GENOMES, CORES and LTRs) implemented in the BLAST server.

Figure 1.

GyDB 2.0 organization and implementation.

Figure 1.

GyDB 2.0 organization and implementation.

FUTURE PERSPECTIVES

Sequencing projects constantly deliver new types of MGEs [for example ( 17–22 )]; hence the classification of non-redundant elements based on their phylogenetic signal is an open issue at GyDB, and results in the preparation of new sections. For example, we are committed to improving the understanding of the diversity and evolutionary dynamics of MGEs in eukaryotic and prokaryotic organisms. In this regard of eukaryotic LTR retroelements (the current database scope), the sequence repertoire at GyDB with representative elements retrieved from recently sequenced marine secondary endosymbionts including the brown alga Ectocarpus siliculosus (heterokont) and the coccolithophore Emiliania huxleyi (haptophyte) will be implemented. In terms of other research topics in preparation, one concerns the construction of a server devoted to the study of the complete set of MGEs and repeats (the mobilome) of biological genomes. This server will be introduced with two forthcoming publications focusing on the LTR retroelements and their related transposases of the pea aphid Acyrthosiphon pisum genome [see ( 23 )]. At the technical level, we are exploring the application of formal grammars and machine learning algorithms to automate, as far as possible, the management and classification of the sequence data. We are also committed to developing solutions for other non-trivial difficulties that arise with the growing size of the databases. Viruses and MGEs usually show different rates of evolution and high variability depending on the evaluated protein or region. Therefore, we aim to implement more than one method of phylogenetic reconstruction to offer the user different perspectives based on different methods (or the opportunity to upload updated phylogenies via the wiki). On the other hand, the traditional view of the origin and evolution of biological systems is that they are usually monophyletic, but such an assumption has been challenged by increasing evidence suggesting that natural evolution can frequently proceed by gradual and vertical means, in addition to distinct modular, saltatory and reticulate events ( 24–36 ). In this respect, we are investigating appropriate protocols to combine phylogenetic inference with new tendencies in network biology [see also ( 7 )].

FUNDING

Centro de Desarrollo Tecnológico Industrial (CDTI) (grant IDI-20100007, partial); Empresa Nacional de Innovación, S.A (ENISA) (17092008, partial); IMPIVA (IMIDTA/2009/118 and IMDTA/2010/740, partial); European Regional Development Fund (ERDF); Ministerio de Ciencia e Innovación (MICINN) (Torres-Quevedo grants PTQ-09-01-00020, PTQ-09-01-00670 and PTQ-10-03552, partial). Funding for open access charge: University of Valencia.

Conflict of interest statement . None declared.

ACKNOWLEDGEMENTS

We thank all the colleagues detailed in the list available at ( http://gydb.org/index.php/Acknowledgments ) for their support in contributing images of biological host organisms. We are also grateful to Senior NAR Editor Dr Michael Galperin and to the two anonymous reviewers for their constructive comments in improving this article. Finally we also thank Denys Wheatley and Angela Panther from Biomedes for copyediting of this article.

REFERENCES

1
Hurst
GDD
Schilthuizen
M
Selfish genetic elements and speciation
Heredity
 , 
1998
, vol. 
80
 (pg. 
2
-
8
)
2
Volff
JN
Brosius
J
Modern genomes with retro-look: retrotransposed elements, retroposition and the origin of new genes
Genome Dyn.
 , 
2007
, vol. 
3
 (pg. 
175
-
190
)
3
Fauquet
CM
Mayo
MA
Desselberger
U
Ball
LA
Virus Taxonomy, VIIIth Report of the ICTV
 , 
2005
London
Elsevier/Academic Press
4
Jurka
J
Kapitonov
VV
Pavlicek
A
Klonowski
P
Kohany
O
Walichiewicz
J
Repbase Update, a database of eukaryotic repetitive elements
Cytogenet. Genome Res.
 , 
2005
, vol. 
110
 (pg. 
462
-
467
)
5
Leplae
R
Hebrant
A
Wodak
SJ
Toussaint
A
ACLAME: a CLAssification of Mobile genetic Elements
Nucleic Acids Res.
 , 
2004
, vol. 
32
 (pg. 
D45
-
D49
)
6
Llorens
C
Futami
R
Bezemer
D
Moya
A
The Gypsy Database (GyDB) of mobile genetic elements
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
38
-
46
)
7
Llorens
C
Munoz-Pomer
A
Bernad
L
Botella
H
Moya
A
Network dynamics of eukaryotic LTR retroelements beyond phylogenetic trees
Biol. Direct.
 , 
2009
, vol. 
4
 pg. 
41
 
8
Benson
DA
Karsch-Mizrachi
I
Lipman
DJ
Ostell
J
Sayers
EW
GenBank
Nucleic Acids Res.
 , 
2009
, vol. 
37
 (pg. 
D26
-
D31
)
9
Llorens
C
Fares
MA
Moya
A
Relationships of Gag–pol diversity between Ty3/Gypsy and Retroviridae LTR retroelements and the three kings hypothesis
BMC Evol. Biol.
 , 
2008
, vol. 
8
 pg. 
276
 
10
Eddy
SR
Profile hidden Markov models
Bioinformatics
 , 
1998
, vol. 
14
 (pg. 
755
-
763
)
11
Koonin
EV
Zhou
S
Lucchesi
JC
The chromo superfamily: new members, duplication of the chromo domain and possible role in delivering transcription regulators to chromatin
Nucleic Acids Res.
 , 
1995
, vol. 
23
 (pg. 
4229
-
4233
)
12
Rawlings
ND
Barrett
AJ
Bateman
A
MEROPS: the peptidase database
Nucleic Acids Res.
 , 
2010
, vol. 
38
 (pg. 
D227
-
D233
)
13
Llorens
C
Futami
R
Renaud
G
Moya
A
Bioinformatic flowchart and database to investigate the origins and diversity of Clan AA peptidases
Biol. Direct.
 , 
2009
, vol. 
4
 pg. 
3
 
14
Altschul
SF
Madden
TL
Schaffer
AA
Zhang
J
Zhang
Z
Miller
W
Lipman
DJ
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Res.
 , 
1997
, vol. 
25
 (pg. 
3389
-
3402
)
15
Llorens
C
Futami
R
Vicente-Ripolles
M
Moya
A
Phylograph: a multifunction Java editor for handling phylogenetic trees
Biotechvana Bioinformatics
 , 
2008
 
Biotechvana, Valencia, SOFT: Phylograph
16
Llorens
C
Muñoz-Pomer
A
Futami
R
Moya
A
The GyDB Collection of Viral and Mobile Genetic Element Models
Biotechvana Bioinformatics
 , 
2009
 
Biotechvana, Valencia, CR: GyDB Collection
17
Piskurek
O
Nishihara
H
Okada
N
The evolution of two partner LINE/SINE families and a full-length chromodomain-containing Ty3/Gypsy LTR element in the first reptilian genome of Anolis carolinensis
Gene
 , 
2008
, vol. 
441
 (pg. 
111
-
118
)
18
Novikova
O
Mayorov
V
Smyshlyaev
G
Fursov
M
Adkison
L
Pisarenko
O
Blinov
A
Novel clades of chromodomain-containing Gypsy LTR retrotransposons from mosses (Bryophyta)
Plant J.
 , 
2008
, vol. 
56
 (pg. 
562
-
574
)
19
Bae
YA
Ahn
JS
Kim
SH
Rhyu
MG
Kong
Y
Cho
SY
PwRn1, a novel Ty3/gypsy-like retrotransposon of Paragonimus westermani: molecular characters and its differentially preserved mobile potential according to host chromosomal polyploidy
BMC. Genomics
 , 
2008
, vol. 
9
 pg. 
482
 
20
Gao
D
Gill
N
Kim
HR
Walling
JG
Zhang
W
Fan
C
Yu
Y
Ma
J
SanMiguel
P
Jiang
N
, et al.  . 
A lineage-specific centromere retrotransposon in Oryza brachyantha
Plant J.
 , 
2009
, vol. 
60
 (pg. 
820
-
831
)
21
Gottlieb
AM
Poggio
L
Genomic screening in dioecious “yerba mate” tree (Ilex paraguariensis A. St. Hill., Aquifoliaceae) through representational difference analysis
Genetica
 , 
2010
, vol. 
138
 (pg. 
567
-
578
)
22
Maumus
F
Allen
AE
Mhiri
C
Hu
H
Jabbari
K
Vardi
A
Grandbastien
MA
Bowler
C
Potential impact of stress activated retrotransposons on genome evolution in a marine diatom
BMC Genomics
 , 
2009
, vol. 
10
 pg. 
624
 
23
The International Aphid Genomics Consortium
Genome sequence of the pea aphid Acyrthosiphon pisum
PLoS Biol.
 , 
2010
, vol. 
8
 pg. 
e1000313
 
24
Malik
HS
Eickbush
TH
Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons
J. Virol.
 , 
1999
, vol. 
73
 (pg. 
5186
-
5190
)
25
Lerat
E
Brunet
F
Bazin
C
Capy
P
Is the evolution of transposable elements modular?
Genetica
 , 
1999
, vol. 
107
 (pg. 
15
-
25
)
26
Goodwin
TJ
Poulter
RT
A group of deuterostome Ty3/ gypsy-like retrotransposons with Ty1/ copia-like pol-domain orders
Mol. Genet. Genomics
 , 
2002
, vol. 
267
 (pg. 
481
-
491
)
27
Eickbush
TH
Malik
HS
Craig
NL
Craigie
R
Gellert
M
Lambowitz
AM
Origin and evolution of retrotransposons
Mobile DNA II
 , 
2002
Washington DC
ASM Press
(pg. 
1111
-
1144
)
28
Malik
HS
Eickbush
TH
Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses
Genome Res.
 , 
2001
, vol. 
11
 (pg. 
1187
-
1197
)
29
Marco
A
Marin
I
How Athila retrotransposons survive in the Arabidopsis genome
BMC. Genomics
 , 
2008
, vol. 
9
 pg. 
219
 
30
Rambaut
A
Posada
D
Crandall
KA
Holmes
EC
The causes and consequences of HIV evolution
Nat. Rev. Genet.
 , 
2004
, vol. 
5
 (pg. 
52
-
61
)
31
Flavell
AJ
Long terminal repeat retrotransposons jump between species
Proc. Natl Acad. Sci. USA
 , 
1999
, vol. 
96
 (pg. 
12211
-
12212
)
32
Jordan
IK
Matyunina
LV
McDonald
JF
Evidence for the recent horizontal transfer of long terminal repeat retrotransposon
Proc. Natl Acad. Sci. USA
 , 
1999
, vol. 
96
 (pg. 
12621
-
12625
)
33
Bousalem
M
Douzery
EJ
Seal
SE
Taxonomy, molecular phylogeny and evolution of plant reverse transcribing viruses (family Caulimoviridae) inferred from full-length genome and reverse transcriptase sequences
Arch. Virol.
 , 
2008
, vol. 
153
 (pg. 
1085
-
1102
)
34
Koonin
EV
Mushegian
AR
Ryabov
EV
Dolja
VV
Diverse groups of plant RNA and DNA viruses share related movement proteins that may possess chaperone-like activity
J. Gen. Virol.
 , 
1991
, vol. 
72
 
Pt 12
(pg. 
2895
-
2903
)
35
Llorens
JV
Clark
JB
Martinez-Garay
I
Soriano
S
deFrutos
R
Martinez-Sebastian
MJ
Gypsy endogenous retrovirus maintains potential infectivity in several species of Drosophilids
BMC Evol. Biol.
 , 
2008
, vol. 
8
 pg. 
302
 
36
de Setta
N
Van Sluys
MA
Capy
P
Carareto
CM
Multiple invasions of Gypsy and Micropia retroelements in genus Zaprionus and melanogaster subgroup of the genus Drosophila
BMC Evol. Biol.
 , 
2009
, vol. 
9
 pg. 
279
 
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments