-
PDF
- Split View
-
Views
-
Cite
Cite
Val F Lanza, Fernando Baquero, Fernando de la Cruz, Teresa M Coque, AcCNET (Accessory Genome Constellation Network): comparative genomics software for accessory genome analysis using bipartite networks, Bioinformatics, Volume 33, Issue 2, January 2017, Pages 283–285, https://doi.org/10.1093/bioinformatics/btw601
Close - Share Icon Share
Abstract
AcCNET (Accessory genome Constellation Network) is a Perl application that aims to compare accessory genomes of a large number of genomic units, both at qualitative and quantitative levels. Using the proteomes extracted from the analysed genomes, AcCNET creates a bipartite network compatible with standard network analysis platforms. AcCNET allows merging phylogenetic and functional information about the concerned genomes, thus improving the capability of current methods of network analysis. The AcCNET bipartite network opens a new perspective to explore the pangenome of bacterial species, focusing on the accessory genome behind the idiosyncrasy of a particular strain and/or population.
AcCNET is available under GNU General Public License version 3.0 (GPLv3) from http://sourceforge.net/projects/accnet
Supplementary data are available at Bioinformatics online.
1 Introduction
Recent developments in network based methods (sequence-similarity networks, genome networks, gene family graphs), significantly improve the usability of molecular datasets in comparative genomics analysis. Furthermore, they open a way to explore reticulate evolutionary processes (Corel et al., 2016). A wide range of bioinformatic tools for either characterizing core and pangenome of bacterial species or for carrying out phylogenomic analysis is available. Unfortunately, only a fraction of them allow a specific analysis of accessory genomes (Agren et al., 2012; Contreras-Moreira and Vinuesa, 2013). Here, we present AcCNET (Accessory Genome Constellation Network), a tool based on bipartite networks that permits the simultaneous analysis of the accessory genes of a large number of genomes. AcCNET, by working with proteomes, does not require to use synteny information in the datasets, thus making it ideal for working with contig sets produced by de novo genome sequencing. AcCNET offers a comprehensive 2D view of genome-proteome relationships associated with the adaptive attributes of single strains and/or specialized populations. Therefore, AcCNET helps to elucidate the dynamics of species with high plasticity and lateral gene exchange from which populations can emerge or evolve (Georgiades and Raoult, 2010).
2 Implementation
AcCNET is a Perl application, based exclusively on freeware. From a given set of accessory genomes, it builds a bipartite network of with two types of nodes: Genomic Units (GUs, the vehicles or containers of genes: i.e. chromosomes, plasmids, viruses) and Homologous protein Clusters (HpC, the set of gene-encoded proteins clustered according to identity, coverage and e-value). The edges represent links between HpC and GU components. The edge-weights reflect the phylogenetic distance of the corresponding protein in a given GU and the consensus HpC. Some layout and clustering algorithms use edge-weight values. It is especially important in some cluster methods in which are mandatory include edge-weight values.
2.1 Input data
AcCNET works with proteome files in FASTA format, each proteome being represented in an individual file. Proteomes can be linked to individual genomic elements (chromosomes, plasmids, virus…) or to complex sets (genomes, pangenomes, genomic communities…). In fact each element will be treated in the network as a GU node.
2.2 Output data
AcCNET retrieves three files: a network, a table and a FASTA file. The network file has a tabular format with four columns: Source, Target, Weight and Type. The output file is compatible with most graph visualization software, such as Cytoscape (Smoot et al., 2011), Gephi (Bastian et al., 2009) or R (https://www.r-project.org/). The table file comprises the attributes information of the network nodes and has four columns: ID, Type, TwinGroup and Description. The Type attribute discriminates between HpC nodes and GU nodes. The TwinGroup attribute refers to HpCs with the same neighbours (Corel et al., 2016). AcCNET takes the annotation of the representative sequence of each HpC for the node Description field. The FASTA file contains the representative sequence of each HpC with the header adapted to the network index.
3 Biological application
Lateral gene transfer extensively contributes to the evolution and diversification of Archaea and Bacteria by interchanging different adaptive traits. A plethora of bacterial species of medical, veterinary or agricultural relevance is able to adapt to disparate lifestyles. Thus, characterization of core genome and the dynamics of pangenomes are increasingly analysed to identify mechanisms of evolution, the traits that make populations functionally different and finally, to understand the concept of species. AcCNET can be configured not only to analyse accessory genomes but also to represent whole pangenomes. This feature results especially useful for the study of the dynamics of mobile genetic elements. In fact, AcCNET facilitates the analysis of large collections of plasmids, making the tool especially useful to massive characterization of conjugative elements as plasmids or ICEs (Surveillance of antibiotic resistance, highly pathogenic species, characterization of emerging lineages). As an example of AcCNET applicability, we analysed a set of 138 genomes of Escherichia coli, taken from NCBI database. The example is comprehensively explained in the Supplementary material.
4 Conclusion
AcCNET is a novel freeware based on bipartite networks that aims to analyse accessory genomes with high resolution and accuracy. By providing an interactive visualization tool, AcCNET allows deep comprehensive inspection of the network. Although initially designed for the analysis of bacterial genome sequences, its features allow exploring all type of GUs, from transposons or viruses to mammals or primates.
Acknowledgements
Authors acknowledge Rocio Rodríguez-Villanueva for revision of the English grammar
Funding
This work was supported by the European Commission, Seven Framework Program (EVOTARFP7-HEALTH-282004 for VFL, FB, FdlC and TMC (PI15-00466) and PLASWIRES-612146/FP7-ICT-2013-10 for FdlC). Authors also acknowledge the European Development Regional Fund ‘A way to achieve Europe’ (ERDF) for cofounding the Plan Nacional de I + D+ I 2012-2015 (PI12-01581 to TMC and BFU2014-55534-C2-1-Pfor FdlC) and CIBER actions (CIBER in Epidemiology and Public Health, CIBERESP; CB06/02/0053 to FB) and the Regional Government of Madrid (PROMPT-S2010/BMD2414). Val F. Lanza is further supported by a Research Award Grant 2016 of the European Society for Clinical Microbiology and Infectious Diseases (ESCMID).
Conflict of Interest: none declared.
References