-
PDF
- Split View
-
Views
-
Cite
Cite
Nikolas Dovrolis, George Kolios, George M Spyrou, Ioanna Maroulakou, Computational profiling of the gut–brain axis: microflora dysbiosis insights to neurological disorders, Briefings in Bioinformatics, Volume 20, Issue 3, May 2019, Pages 825–841, https://doi.org/10.1093/bib/bbx154
- Share Icon Share
Abstract
Almost 2500 years after Hippocrates’ observations on health and its direct association to the gastrointestinal tract, a paradigm shift has recently occurred, making the gut and its symbionts (bacteria, fungi, archaea and viruses) a point of convergence for studies. It is nowadays well established that the gut microflora’s compositional diversity regulates via its genes (the microbiome) the host’s health and provides preliminary insights into disease progression and regulation. The microbiome’s involvement is evident in immunological and physiological studies that link changes in its biodiversity to its contributions to the host’s phenotype but also in neurological investigations, substantiating the aptly named gut–brain axis. The definitive mechanisms of this last bidirectional interaction will be our main focus because it presents researchers with a new conundrum. In this review, we prospect current literature for computational analysis methodologies that accommodate the need for better understanding of the microbiome–gut–brain interactions and neurological disorder onset and progression, through cross-disciplinary systems biology applications. We will present bioinformatics tools used in exploring these synergies that help build and interpret microbial 16S ribosomal RNA data sets, produced by shotgun and high-throughput sequencing of healthy and neurological disorder samples stored in biological databases. These approaches provide alternative means for researchers to form hypotheses to their inquests faster, cheaper and swith precision. The goal of these studies relies on the integration of combined metagenomics and metabolomics assessments. An accurate characterization of the microbiome and its functionality can support new diagnostic, prognostic and therapeutic strategies for neurological disorders, customized for each individual host.
The host and its microflora: an interesting symbiosis
The philosophical expression ‘no man is an island’ takes a whole new meaning, when one considers the fact that from the time of birth, each of us coexists with an assortment of bacteria, fungi, archaea and viruses. These ∼1014 microorganisms constitute the human microflora [1] (also known as microbiota) colonizing the skin, mouth, lungs, reproductive and gastrointestinal (GI) tract of everyone, creating a mutualistic biological interaction, a symbiosis. Especially the gut, with its physiology and large surface, acts as the perfect host environment for the microflora’s development, exhibiting the greatest diversity and abundance of bacterial populations. The composition of the human microflora, although evolving through the early stages of life and being perturbed by habitat, lifestyle, medication and health, is unique in each individual, creating a form of personal ‘fingerprint’ [2]. This evolution includes interactions between the members of the microflora fighting for ‘dominance’ among themselves. There are of course similarities across the field with bacterial phyla like Bacteroidetes, Firmicutes and Actinobacteria being present in every host [3], but the difference lies in the abundance of their subpopulations. Interestingly enough, in 2009, Turnbaugh et al. [4] observed that even though the microflora composition may vary between individuals, its core function remains the same in similar pathophysiological conditions.
In recent years, the combined genetic composition of the microflora, called the microbiome, has been implicated directly with numerous aspects of human health in ways that previously were, and in many cases still remain, unknown [5]. The beneficial role of the host–microflora relationship is dependent on a semi-stable homeostasis which, when disturbed, leads to dysbiosis [6], a status inducing or signifying pathological conditions. Under homeostasis, the functional role [7] of the microbiome includes defense versus pathogens and inflammation via its interactions with the mucosa, vitamin synthesis, energy production, metabolism alteration, dietary modifications like turning fibers into short-chain fatty acids (SCFAs) while contributing to neurodevelopment [8], adult brain function [9] and longevity [10, 11]. During dysbiosis on the other hand, certain microbial populations become differentially abundant driving their metabolic contributions to follow accordingly, strongly affecting the host epigenome [12–14]. The gut microflora actively attributes to the development and maintenance of the gut immune system [15, 16], the permeability of the blood brain barrier (BBB) [17] and its imbalance has already been linked to various pathological conditions like inflammatory bowel diseases (IBDs) [18], cardiovascular conditions [19], atherosclerosis [20], diabetes [21], cancer [22], metabolic syndrome [23], human immunodeficiency virus (HIV) [24], chronic kidney disease [25], antiphospholipid syndrome [26] and most importantly for the premise of this review various neurological [27] and neuropsychiatric [28] conditions.
The gut–brain–microbiome axis
It has been known for a while now that the enteric nervous system acts as a kind of second ‘brain’ [29, 30] providing a bridge between the gut, the mucosal immune system, the neuroendocrine system, the autonomic nervous system, the vagus nerve and by extension the brain [31]. Previous hypotheses pointed at the brain as the instigator of this relationship trying to ‘control’ the gut, but later studies pointed at a bidirectional relationship. These observations provided the basis for the investigations of the gut–brain axis on a more advanced level revealing four distinct signaling pathways composed of neural, immunological, endocrinological and microbial communications [32]. With the newfound knowledge of the microflora’s implication in human health, the axis expanded to include the microbiome among its components forming what can be found in literature as the microbiome–gut–brain axis [33, 34]. Microbial metabolites interact with the host environment, controlling immune responses via the mucosa, reaching the brain via the bloodstream and modulating neural responses. It is clear that there is a whole ecosystem that affects the homeostasis and pathological conditions alike, via known and unknown mechanisms [35]. For example, the microbiome’s contribution to the metabolism of tryptophan, an essential amino acid for the synthesis of serotonin in the central nervous system (CNS), leads to its absorption by the gut and the crossing of the BBB [36]. The SCFAs, which are immunoregulating metabolites of gut microflora, influence microglia homeostasis and shape brain development [37]. Nitric oxide inhibition via microbial metabolites contributes to microglia maturation [14]. Recently, Bellono et al. [38] have shown that enterochromaffin cells express chemosensors that regulate serotonin-sensitive nerve fibers and establish a direct communication between the gut ecosystem and the nervous system.
Current knowledge has linked the gut–brain axis to variable systematic pathological conditions like obesity [39–41], irritable bowel syndrome [42–44], upper GI disorders [45] like gastroparesis, dyspepsia and anorexia, infant colic [46] but mainly to neurological conditions affecting mental state and development, memory and behavior [47]. Clinical and preclinical studies have delved into characterizing the gut microflora dysbiosis in neurological conditions, pointing at differentially abundant microbial genera. From the early stages of life through adolescence, the gut microflora appears to influence not only normal neurological development but also the onset and/or the progression of pathological conditions like autism, schizophrenia, psychosis and bipolar disorder in both animal models and patients [48–50]. Autism spectrum disorders (ASDs), which are characterized by pathological neurodevelopment, have been linked to altered microbiome states in recent studies [51–56]. Increases of the population of bacteria of the genus Lactobacillus have been identified in patients exhibiting first episodes of psychosis and correlated positively with symptom severity (whereas Lachnospiraceae and Ruminococcaceae correlated negatively) in a study by Schwarz et al. [57]. These kinds of differences in microbial composition could possibly provide future strategies in the development of diagnostic tools for various disorders. A longitudinal study performed by Evans et al. [58] highlighted the population loss of Faecalibacterium as important in bipolar disorder, after excluding covariant factors. In 2013 Nieto et al. [59], using oral antibiotics in mice altered the gut microbial composition leading to an increase of brain-derived neurotrophic factor’s expression in the hippocampus that is implicated in cognitive impairment, morphological and functional synaptic pathology and contribution to N-methyl-D-aspartate receptor dysfunction. This dysfunction has been associated with schizophrenia.
The gut–brain axis continues to shape our neurological and mental health beyond adolescence. Stress [60], insomnia [61], depression [62], anxiety [63] and even fear-related signaling [64], although not fatal in most cases, directly affect the quality of life of millions daily, regardless of age. As an example, Zheng et al. [65] in a 2016 paper, presented a four-part study, which at first tested germ-free mice and observed a reduction of depression-like symptoms prompting a microbiota–gut–brain axis involvement in depression. They, then, continued the experiment on patients exhibiting major depression disorder (MDD) versus healthy controls to find significant differences in the abundance of the bacterial phyla Firmicutes, Actinobacteria and Bacteroidetes. The third step was fecal microflora transplantation from both MDD and healthy controls to the germ-free mice, which concluded that the mice recipients of the ‘MDD microflora’ after 2 weeks showed increased depression-like and anxiety-like symptomology. Finally, by applying functional shotgun metagenomics, they investigated the metabolic effects of microbiota on ‘MDD microflora’ mice and identified several dysregulated metabolic pathways, especially those involved with carbohydrate metabolism and its function in depression.
When it comes to quality of life and in some cases even mortality, strokes and progressive neurodegenerative diseases show dramatic percentages in the ageing population [36]. The microbiota–gut–brain axis has been implicated in the outcome of ischemic brain injury [66] and also in amyotrophic lateral sclerosis [67], multiple sclerosis [68], Parkinson’s [69] and Alzheimer’s disease (AD) [70]. A few months before this review, Bonfili et al. [71] using 3xTg-AD mouse models (transgenic mice with three mutations associated with familial AD) investigated the role of microflora regulation via administration of SLAB51 probiotics (a mixture of lactic acid bacteria and bifidobacteria) in the etiopathology of AD. Their experiments provided insights in regulating amyloid load, counteracting cognitive decline and brain damage, increasing gut hormone concentrations and regulating proteosomal and autophagic pathways. They calculated statistically significant microflora compositional and functional changes between wild-type and AD models, after probiotic treatment, specifically attributed to the increase in Bifidobacterium spp., the reduction in Campylobacterales and their role in inflammation via the regulation of pro-inflammatory cytokines.
As evident from the above examples, preclinical and clinical studies can be enhanced significantly by bioinformatics approaches, enriching our apprehension of the microbiome’s involvement. The findings of such approaches provide a unique perspective to the composition and functional role of the microbiome, allowing researchers to theorize on dysbiosis as a cause or an effect of specific conditions and at the same time investigating the effects of intervention to the microflora (Figure 1). The next chapters of this review highlight these technology-based methodologies and provide the outline of how the insilico process formulates in microbiome studies.

A graphical abstract of this review highlighting the gut–brain axis communication pathways, the host mechanisms the microflora regulates and some of its major perturbagens. It also presents a basic pipeline of computational analysis found in contemporary microbiome publications.
Computational metagenomic approaches
In the field of metagenomics research, some fundamental questions often arise: How do we know so much about the microbiome and how did we get there so fast after decades of speculation? How exactly do we know what the microbiome is composed of? Can we identify interactions between populations of the microflora? How did we associate specific members of the microflora and their metabolic products with a diverse spectrum of health conditions? The response lies in the technological advantage, gene and next-generation sequencing (NGS) [72, 73] has provided for the uncultured microflora and the fast strides of Bioinformatics.
Before delving into the functional role of microbial populations in the pathophysiology of disorders, we must be able to identify them with high sensitivity and specificity. NGS has provided to a large extend these capabilities by introducing shotgun along the 16S ribosomal RNA (rRNA) sequencing [74, 75]. The 16S rRNA gene is considered to be the de facto housekeeping gene of bacterial and archaeal populations. At this point, the first concession of studying the microbiome is introduced in the form of focusing on the bacteriome’s (bacterial microbiome) implications and often foregoing the mycobiome’s (fungal microbiome) [76, 77] and virome’s [78, 79] (viral microbiome), which both have been associated with pathological conditions but are still largely understudied. This concession is largely based on the richness (quantification of how many distinct species) and abundance (quantification of how many members the species have) of bacterial populations over those of the fungi and viruses but also on their ease of detection and better understanding of their biological processes.
Metagenomics [80, 81] is the term introduced to specify the study of the metagenome, which is the combined DNA composition of environmental samples. In the case of human microflora, in fecal and histological biopsy samples, it refers to the identification and quantification of the genetic contributions of microbial subpopulations. [82]. Shotgun metagenomics, although more expensive, provide a higher resolution and accuracy of the results but those become more complex because they include all the microorganisms of a sample [83], including host DNA. 16S rRNA metagenomics, on the other hand, are more accessible and faster to achieve in a laboratory setting when the focus of the study is the bacteria and archaea in multiple control and patient samples. Both approaches use a practice that introduces an amount of variance between different studies, the utilization of NGS library construction for RNA or DNA [84]. Additionally, the 16S rRNA standard operating procedure requires another step in the library building with a fair amount of uncertainty, the amplification of hypervariable regions of the 16S rRNA gene via multiplex polymerase chain reaction (PCR) primers [85]. In both cases, whether it is the sequencing of a whole sample or of the 16S rRNA amplicons, we end up with small reads (25–500 base pairs) allowing for microorganisms who are unknown or in small abundances to be detected. These reads require extensive bioinformatics preprocessing with specialized tools for read trimming, merging, assembly, scaffolding and mapping [86]. Table 1 provides an overview of preprocessing tools and supplies information on their ability to perform:
Tool . | Trimming, merging, scaffolding, assembly . | Quality contol . | Denoising . | Chimera detection . | Reference . |
---|---|---|---|---|---|
Abyss 2.0 | ✓ | [88] | |||
Bambus 2 | ✓ | [89] | |||
BBAP | ✓ | [90] | |||
CATCh | ✓ | [91] | |||
ChimeraSlayer | ✓ | [92] | |||
dupRadar | ✓ | [93] | |||
EP_metagenomic | ✓ | [94] | |||
IDBA-UD | ✓ | [95] | |||
IM-TORNADO | ✓ | ✓ | ✓ | ✓ | [96] |
InteMAP | ✓ | [97] | |||
IPED | ✓ | [98] | |||
MAP | ✓ | [99] | |||
MeFiT | ✓ | [100] | |||
MEGAHIT | ✓ | [101] | |||
MESER | ✓ | [102] | |||
MetAMOS | ✓ | ✓ | ✓ | ✓ | [103] |
metaSPAdes | ✓ | [104] | |||
MetaVelvet | ✓ | [105] | |||
mothur | ✓ | ✓ | ✓ | ✓ | [106] |
NoDe | ✓ | [107] | |||
OCToPUS | ✓ | ✓ | ✓ | ✓ | [108] |
Orione | ✓ | ✓ | ✓ | ✓ | [109] |
PRICE | ✓ | [110] | |||
QIIME | ✓ | ✓ | ✓ | ✓ | [111] |
Qualimap2 | ✓ | [112] | |||
Ray Meta | ✓ | [113] | |||
ROP | ✓ | [114] | |||
Sequins | ✓ | [115] | |||
sleuth | ✓ | [116] | |||
Snowball | ✓ | ✓ | [117] | ||
Trimmomatic | ✓ | ✓ | [118] | ||
UCHIME | ✓ | [119] | |||
VSEARCH | ✓ | [120] | |||
Xander | ✓ | [121] |
Tool . | Trimming, merging, scaffolding, assembly . | Quality contol . | Denoising . | Chimera detection . | Reference . |
---|---|---|---|---|---|
Abyss 2.0 | ✓ | [88] | |||
Bambus 2 | ✓ | [89] | |||
BBAP | ✓ | [90] | |||
CATCh | ✓ | [91] | |||
ChimeraSlayer | ✓ | [92] | |||
dupRadar | ✓ | [93] | |||
EP_metagenomic | ✓ | [94] | |||
IDBA-UD | ✓ | [95] | |||
IM-TORNADO | ✓ | ✓ | ✓ | ✓ | [96] |
InteMAP | ✓ | [97] | |||
IPED | ✓ | [98] | |||
MAP | ✓ | [99] | |||
MeFiT | ✓ | [100] | |||
MEGAHIT | ✓ | [101] | |||
MESER | ✓ | [102] | |||
MetAMOS | ✓ | ✓ | ✓ | ✓ | [103] |
metaSPAdes | ✓ | [104] | |||
MetaVelvet | ✓ | [105] | |||
mothur | ✓ | ✓ | ✓ | ✓ | [106] |
NoDe | ✓ | [107] | |||
OCToPUS | ✓ | ✓ | ✓ | ✓ | [108] |
Orione | ✓ | ✓ | ✓ | ✓ | [109] |
PRICE | ✓ | [110] | |||
QIIME | ✓ | ✓ | ✓ | ✓ | [111] |
Qualimap2 | ✓ | [112] | |||
Ray Meta | ✓ | [113] | |||
ROP | ✓ | [114] | |||
Sequins | ✓ | [115] | |||
sleuth | ✓ | [116] | |||
Snowball | ✓ | ✓ | [117] | ||
Trimmomatic | ✓ | ✓ | [118] | ||
UCHIME | ✓ | [119] | |||
VSEARCH | ✓ | [120] | |||
Xander | ✓ | [121] |
Note: These steps precede the microbial characterization (binning/OTU picking).
Tool . | Trimming, merging, scaffolding, assembly . | Quality contol . | Denoising . | Chimera detection . | Reference . |
---|---|---|---|---|---|
Abyss 2.0 | ✓ | [88] | |||
Bambus 2 | ✓ | [89] | |||
BBAP | ✓ | [90] | |||
CATCh | ✓ | [91] | |||
ChimeraSlayer | ✓ | [92] | |||
dupRadar | ✓ | [93] | |||
EP_metagenomic | ✓ | [94] | |||
IDBA-UD | ✓ | [95] | |||
IM-TORNADO | ✓ | ✓ | ✓ | ✓ | [96] |
InteMAP | ✓ | [97] | |||
IPED | ✓ | [98] | |||
MAP | ✓ | [99] | |||
MeFiT | ✓ | [100] | |||
MEGAHIT | ✓ | [101] | |||
MESER | ✓ | [102] | |||
MetAMOS | ✓ | ✓ | ✓ | ✓ | [103] |
metaSPAdes | ✓ | [104] | |||
MetaVelvet | ✓ | [105] | |||
mothur | ✓ | ✓ | ✓ | ✓ | [106] |
NoDe | ✓ | [107] | |||
OCToPUS | ✓ | ✓ | ✓ | ✓ | [108] |
Orione | ✓ | ✓ | ✓ | ✓ | [109] |
PRICE | ✓ | [110] | |||
QIIME | ✓ | ✓ | ✓ | ✓ | [111] |
Qualimap2 | ✓ | [112] | |||
Ray Meta | ✓ | [113] | |||
ROP | ✓ | [114] | |||
Sequins | ✓ | [115] | |||
sleuth | ✓ | [116] | |||
Snowball | ✓ | ✓ | [117] | ||
Trimmomatic | ✓ | ✓ | [118] | ||
UCHIME | ✓ | [119] | |||
VSEARCH | ✓ | [120] | |||
Xander | ✓ | [121] |
Tool . | Trimming, merging, scaffolding, assembly . | Quality contol . | Denoising . | Chimera detection . | Reference . |
---|---|---|---|---|---|
Abyss 2.0 | ✓ | [88] | |||
Bambus 2 | ✓ | [89] | |||
BBAP | ✓ | [90] | |||
CATCh | ✓ | [91] | |||
ChimeraSlayer | ✓ | [92] | |||
dupRadar | ✓ | [93] | |||
EP_metagenomic | ✓ | [94] | |||
IDBA-UD | ✓ | [95] | |||
IM-TORNADO | ✓ | ✓ | ✓ | ✓ | [96] |
InteMAP | ✓ | [97] | |||
IPED | ✓ | [98] | |||
MAP | ✓ | [99] | |||
MeFiT | ✓ | [100] | |||
MEGAHIT | ✓ | [101] | |||
MESER | ✓ | [102] | |||
MetAMOS | ✓ | ✓ | ✓ | ✓ | [103] |
metaSPAdes | ✓ | [104] | |||
MetaVelvet | ✓ | [105] | |||
mothur | ✓ | ✓ | ✓ | ✓ | [106] |
NoDe | ✓ | [107] | |||
OCToPUS | ✓ | ✓ | ✓ | ✓ | [108] |
Orione | ✓ | ✓ | ✓ | ✓ | [109] |
PRICE | ✓ | [110] | |||
QIIME | ✓ | ✓ | ✓ | ✓ | [111] |
Qualimap2 | ✓ | [112] | |||
Ray Meta | ✓ | [113] | |||
ROP | ✓ | [114] | |||
Sequins | ✓ | [115] | |||
sleuth | ✓ | [116] | |||
Snowball | ✓ | ✓ | [117] | ||
Trimmomatic | ✓ | ✓ | [118] | ||
UCHIME | ✓ | [119] | |||
VSEARCH | ✓ | [120] | |||
Xander | ✓ | [121] |
Note: These steps precede the microbial characterization (binning/OTU picking).
Read preprocessing
Quality control, to ensure error reads, artifacts and bias are detected and corrected
Denoising, to remove the noise often introduced by DNA/RNA preparation and PCR
Chimera detection, to identify and remove chimeras, which are artificial recombinants formed during the PCR amplification stage [87]
It is obvious that there is no clear winner on sequencing methodologies but rather a better suited for the job in front of us. The products of the sequencing process, regardless of the technology used, are distinct sequences of the microflora members of the samples reported in fasta or fastq files and a mapping file containing all the necessary metadata for the samples. These files will be the input of the next steps for the identification of the species the sequences belong to and assigning them taxonomies. Operational taxonomic unit (OTU) is a term introduced to describe clusters of similar sequences, which might represent a species. Although not necessarily flawless, this approach typically uses a 97% similarity of sequences for the clustering and leads to the selection of 1 sequence per OTU to represent the taxa it belongs to via phylogenetic alignment. Various bioinformatics approaches and algorithms exist for this process, which also known as binning, either in workflows or in individual implementations of homology- and prediction- based methods both for shotgun and 16S rRNA metagenomics.
Most of these algorithms rely primarily on two specific practices and hybrid implementations of them: denovo and closed reference OTU picking for 16S rRNA data or homology-independent/dependent binning for shotgun data accordingly. Denovo OTU picking is largely based on prediction-based implementations like Infernal [122], UPARSE [123], UCLUST [124], CD-HIT [125], PyNAST [126] METAXA2 [127], CLUSTOM-CLOUD [128], SWARM [129], OptiClust [130] and NINJA-OPS [131], which when clustering do not take into account any existing database for reference sequences but rather try to construct their own phylogenetic tree and assign taxonomies to OTUs after aligning them. The same concept applies to homology-independent binning through applications like CONCOCT [132], GroopM [133], MetaFast [134], MetaBAT [135], MaxBin [136], VizBin [137], COCACOLA [138] and MetaProb [139]. This methodology is better suited when trying to identify metagenomes of habitats with largely unknown members or trying to identify pathogenic microorganisms of unknown origin. It is by far the most computationally demanding approach albeit the most accurate, as no reads are disregarded. On the contrary, when the host environment contains by large known species, like the gut microflora, a closed reference OTU picking strategy (or a homology-dependent one for shotgun data) can provide accurate results in really fast times by using algorithms, which look up reference sequences in the latest versions of databases like RDP [140], GreenGenes [141], SILVA [142], RefSeq [143], HPMCD [144], etc., and cluster the data according to their similarity with those. Implementations of this approach include Taxonomer [145], IMSA-A [146], BLCA [147] and SPINGO [148] for closed reference OTU picking, and MetaPhlAn [149], MEGAN6 [150], Centrifuge [151], MGMapper [152] and OPAL [153] for homology-dependent binning. The output of these pipelines, independent of the methodology used, is usually an OTU table, which contains all the OTUs found in a sample, how many times and their assigned taxonomy among various other metadata. The processes described above are summarized visually in Figure 2.

16S rRNA and shotgun metagenomics pipelines for extracting information on the host's gut microbiome.
Owing to the fact that different tools are required for shotgun and 16S rRNA approaches, with the help of specialized platforms for bioinformatics resource like OMICtools [154], researchers can create their own workflows to achieve results by combining applications from any of the aforementioned categories or use standardized ones like QIIME, mothur and many others [103, 106, 109, 155–164], which perform multiple tasks of data preparation and downstream analysis. It is the easiest way for scientists to acquire and analyze their microbiome data with the added benefit of creating standardized reproducible results. At this point, we should highlight the fact that metagenome bioinformatics are computationally cumbersome and require copious amounts of processing power, memory and storage but are rapidly advancing because of their rising popularity, the employment of Bioinformatics scientists and their open-source nature.
It is widespread practice today for researchers to store their sequence and OTU data on online platforms after their publication to help promote knowledge of the microbiome. These platforms are in fact supported and sometimes financed by organizations and global microbiome initiatives like the Human Microbiome Project [165], whose goal is to standardize the process and disseminate the necessity of similar studies. This way we are rapidly acquiring not only the tools but also the actual data to perform evaluations between different approaches and meta-analyses to infer answers for hypotheses the original authors might not have considered. This is highly dependent on the correct metadata annotation of the stored data, constituting it crucial for reuse and repurposing. There is a variety of online solutions for metagenomics data publishing, a nonexhaustive list of which is included in Table 2. Users of these databases should take note that comparing studies or samples created via different methodologies can be problematic on principle, as the data might not be directly comparable but in need of further analysis.
Repositories containing public data sets of sequence/OTU data that can be used for metagenomics studies
Database . | URL . | Description . | References . |
---|---|---|---|
EBI-metagenomics | https://www.ebi.ac.uk/metagenomics/ | Part of the European Nucleotide Archive, it offers a pipeline for raw sequence analysis and archiving of metagenomic data. The added value is the fact that users can view the analysis results of each sample | [166] |
Human Microbiome Project Data Portal | https://portal.hmpdacc.org/ | Perhaps the most daunting of the databases, hmpdacc provides a way for users to browse and download data from the Human Microbiome Project. The interface is hard to navigate to find what you are looking for regarding specific conditions. The iHMP spin-off website which focuses on three specific health conditions (pregnancy, IBD and diabetes type 2) makes things a little easier just for those conditions | [167] |
Human Pan-Microbe Community database | http://www.hpmcd.org/index.php | Taking an approach similar to IMG/M, HPMCD is offering comparison metagenomics based on microbial populations. The samples are based on EBI metagenomics samples | [144] |
IMG/M | https://img.jgi.doe.gov/cgi-bin/m/main.cgi | The Integrated Microbial Genomes and Microbial Samples database takes a unique approach of providing microbial genomes from different studies and the ability to compare them. Perhaps not the most intuitive of the databases for reanalyses of specific conditions but rather the role of specific organisms | [168] |
iMicrobe | https://www.imicrobe.us/ | iMicrobe provides an intuitive search for their data sets based on metadata, which is user-friendly. One drawback is similar to MG-RAST where whole studies cannot be downloaded at once but rather their individual samples. | [169] |
MG-RAST | http://metagenomics.anl.gov/ | A constantly updated database and pipeline for NGS metagenomics. Data can be accessed via http, ftp and directly via their API. Perhaps a small drawback is the inability to download a whole study from their website something that is possible via ftp | [170] |
QIITA | https://qiita.ucsd.edu/ | Web-based metagenomic database and pipeline of tools for 16S rRNA and shotgun data sets, originally created for the American Gut Project. QIITA offers data sets in various states of assembly from raw sequences to OTU tables. End user-friendly with resources, which can easily be added in a different pipeline for reanalysis | [171] |
Repositive | https://repositive.io/ | Repositive is an all-purpose repository of genomic data created as a central hub for genomic data, but it contains metagenomic studies as well. Requires a free account to get started on the data | [172] |
Database . | URL . | Description . | References . |
---|---|---|---|
EBI-metagenomics | https://www.ebi.ac.uk/metagenomics/ | Part of the European Nucleotide Archive, it offers a pipeline for raw sequence analysis and archiving of metagenomic data. The added value is the fact that users can view the analysis results of each sample | [166] |
Human Microbiome Project Data Portal | https://portal.hmpdacc.org/ | Perhaps the most daunting of the databases, hmpdacc provides a way for users to browse and download data from the Human Microbiome Project. The interface is hard to navigate to find what you are looking for regarding specific conditions. The iHMP spin-off website which focuses on three specific health conditions (pregnancy, IBD and diabetes type 2) makes things a little easier just for those conditions | [167] |
Human Pan-Microbe Community database | http://www.hpmcd.org/index.php | Taking an approach similar to IMG/M, HPMCD is offering comparison metagenomics based on microbial populations. The samples are based on EBI metagenomics samples | [144] |
IMG/M | https://img.jgi.doe.gov/cgi-bin/m/main.cgi | The Integrated Microbial Genomes and Microbial Samples database takes a unique approach of providing microbial genomes from different studies and the ability to compare them. Perhaps not the most intuitive of the databases for reanalyses of specific conditions but rather the role of specific organisms | [168] |
iMicrobe | https://www.imicrobe.us/ | iMicrobe provides an intuitive search for their data sets based on metadata, which is user-friendly. One drawback is similar to MG-RAST where whole studies cannot be downloaded at once but rather their individual samples. | [169] |
MG-RAST | http://metagenomics.anl.gov/ | A constantly updated database and pipeline for NGS metagenomics. Data can be accessed via http, ftp and directly via their API. Perhaps a small drawback is the inability to download a whole study from their website something that is possible via ftp | [170] |
QIITA | https://qiita.ucsd.edu/ | Web-based metagenomic database and pipeline of tools for 16S rRNA and shotgun data sets, originally created for the American Gut Project. QIITA offers data sets in various states of assembly from raw sequences to OTU tables. End user-friendly with resources, which can easily be added in a different pipeline for reanalysis | [171] |
Repositive | https://repositive.io/ | Repositive is an all-purpose repository of genomic data created as a central hub for genomic data, but it contains metagenomic studies as well. Requires a free account to get started on the data | [172] |
Repositories containing public data sets of sequence/OTU data that can be used for metagenomics studies
Database . | URL . | Description . | References . |
---|---|---|---|
EBI-metagenomics | https://www.ebi.ac.uk/metagenomics/ | Part of the European Nucleotide Archive, it offers a pipeline for raw sequence analysis and archiving of metagenomic data. The added value is the fact that users can view the analysis results of each sample | [166] |
Human Microbiome Project Data Portal | https://portal.hmpdacc.org/ | Perhaps the most daunting of the databases, hmpdacc provides a way for users to browse and download data from the Human Microbiome Project. The interface is hard to navigate to find what you are looking for regarding specific conditions. The iHMP spin-off website which focuses on three specific health conditions (pregnancy, IBD and diabetes type 2) makes things a little easier just for those conditions | [167] |
Human Pan-Microbe Community database | http://www.hpmcd.org/index.php | Taking an approach similar to IMG/M, HPMCD is offering comparison metagenomics based on microbial populations. The samples are based on EBI metagenomics samples | [144] |
IMG/M | https://img.jgi.doe.gov/cgi-bin/m/main.cgi | The Integrated Microbial Genomes and Microbial Samples database takes a unique approach of providing microbial genomes from different studies and the ability to compare them. Perhaps not the most intuitive of the databases for reanalyses of specific conditions but rather the role of specific organisms | [168] |
iMicrobe | https://www.imicrobe.us/ | iMicrobe provides an intuitive search for their data sets based on metadata, which is user-friendly. One drawback is similar to MG-RAST where whole studies cannot be downloaded at once but rather their individual samples. | [169] |
MG-RAST | http://metagenomics.anl.gov/ | A constantly updated database and pipeline for NGS metagenomics. Data can be accessed via http, ftp and directly via their API. Perhaps a small drawback is the inability to download a whole study from their website something that is possible via ftp | [170] |
QIITA | https://qiita.ucsd.edu/ | Web-based metagenomic database and pipeline of tools for 16S rRNA and shotgun data sets, originally created for the American Gut Project. QIITA offers data sets in various states of assembly from raw sequences to OTU tables. End user-friendly with resources, which can easily be added in a different pipeline for reanalysis | [171] |
Repositive | https://repositive.io/ | Repositive is an all-purpose repository of genomic data created as a central hub for genomic data, but it contains metagenomic studies as well. Requires a free account to get started on the data | [172] |
Database . | URL . | Description . | References . |
---|---|---|---|
EBI-metagenomics | https://www.ebi.ac.uk/metagenomics/ | Part of the European Nucleotide Archive, it offers a pipeline for raw sequence analysis and archiving of metagenomic data. The added value is the fact that users can view the analysis results of each sample | [166] |
Human Microbiome Project Data Portal | https://portal.hmpdacc.org/ | Perhaps the most daunting of the databases, hmpdacc provides a way for users to browse and download data from the Human Microbiome Project. The interface is hard to navigate to find what you are looking for regarding specific conditions. The iHMP spin-off website which focuses on three specific health conditions (pregnancy, IBD and diabetes type 2) makes things a little easier just for those conditions | [167] |
Human Pan-Microbe Community database | http://www.hpmcd.org/index.php | Taking an approach similar to IMG/M, HPMCD is offering comparison metagenomics based on microbial populations. The samples are based on EBI metagenomics samples | [144] |
IMG/M | https://img.jgi.doe.gov/cgi-bin/m/main.cgi | The Integrated Microbial Genomes and Microbial Samples database takes a unique approach of providing microbial genomes from different studies and the ability to compare them. Perhaps not the most intuitive of the databases for reanalyses of specific conditions but rather the role of specific organisms | [168] |
iMicrobe | https://www.imicrobe.us/ | iMicrobe provides an intuitive search for their data sets based on metadata, which is user-friendly. One drawback is similar to MG-RAST where whole studies cannot be downloaded at once but rather their individual samples. | [169] |
MG-RAST | http://metagenomics.anl.gov/ | A constantly updated database and pipeline for NGS metagenomics. Data can be accessed via http, ftp and directly via their API. Perhaps a small drawback is the inability to download a whole study from their website something that is possible via ftp | [170] |
QIITA | https://qiita.ucsd.edu/ | Web-based metagenomic database and pipeline of tools for 16S rRNA and shotgun data sets, originally created for the American Gut Project. QIITA offers data sets in various states of assembly from raw sequences to OTU tables. End user-friendly with resources, which can easily be added in a different pipeline for reanalysis | [171] |
Repositive | https://repositive.io/ | Repositive is an all-purpose repository of genomic data created as a central hub for genomic data, but it contains metagenomic studies as well. Requires a free account to get started on the data | [172] |
Information overload and microbiome analytics
As with all -omics approaches, metagenomics is plighted by vast amounts of data which, although characterized using the techniques above, need to be analyzed, comprehended and rationalized. Apart from computers, humans also must be able to see these data in ways easily understood and offer conjecture to their involvement in human health. Certain metrics and visualization techniques were introduced with the advancement of Bioinformatics toward that goal. Most of the standardized workflows mentioned previously, like QIIME, perform analysis of the microbiome data and exportation of results in diagrams and figures. A categorization of analyses and feedback bioinformatics applications can provide us with is:
Microbial community composition, hierarchy and quantitative representation (taxa abundance)
These tools focus on representing which taxa are abundant and at which percentage, in the individual samples or in the sample groupings based on their metadata. Raw reads abundance percentages derive from counting the number of OTU sequences present in the samples or a comparison between them to calculate their relative abundance. Following the biological taxonomy of phylum-> class-> order-> family-> genus-> OTU (species), we visualize the microbial composition in distinct levels and even in hierarchies using phylogenetic trees, homocentric diagrams and barplots.
Diversity analysis
There are two basic metrics of Diversity analyses in microbial samples. α-Diversity, which represents the biodiversity of the samples (how rich a sample is in different microbial communities), and β-Diversity, which characterizes how different the composition of the microbiome in the samples is across groupings of metadata that characterize the environment (e.g. healthy controls versus patients). α-Diversity is usually calculated via rarefaction [173] and algorithms like Chao1, Shannon, etc., and represented via rarefaction or box plots, while β-diversity is predominantly calculated using UniFrac distance metrics [174] and illustrated with principal coordinates analysis plots. In the case of the latter, there is also the ability to use a jackknifing algorithm [175].
Multivariate statistical analysis of microbiome composition in correlation to sample metadata
This category focuses on inferring biological associations between microbial species and specific sample groupings. It is important for researchers testing a specific hypothesis to know the differential abundance between sample groupings to see which taxa contribute in statistically significant measurements to dysbiosis. Negative binominal (DeSEQ2), RandomForest, Kruskal–Wallis, Wilcoxon rank test, analysis of variance, t-test and other parametric and non-parametric statistical tests are used to that effect. As metagenomics analysis is based on multiple testing, false discovery rate correction of the P statistical importance via algorithms like Bonferroni, Benjamini–Hochberg or the more recent StructFDR [176], which is specialized for metagenomic data, is important. Guides like GUSTA ME [177] and Statistics How to (http://www.statisticshowto.com/) offer a way for researchers to understand these statistical strategies faster to decide which one conforms to their needs. Also algorithms like MixMC [178] Pearson’s correlation heatmaps, canonical correspondence analysis, redundancy analysis, etc. [179], measure how quantitatively different the microbial composition is in different groupings and what changes researchers can expect to find while studying them.
Network analysis
Network metrics are engaged to detect microbial species that co-occur, are mutually exclusive or point to specific associations with the sample metadata. This helps researchers model microbial community interactions and infer relationships. Networks are visualized in their traditional node–edge form, where nodes usually represent individual taxa and edges represent their relationships. Pearson’s correlation, Spearman’s rho or the recent mLDM [180] are some of the algorithms used to calculate these relationships. Specialized network construction and analysis tools for microbe–microbe and microbe–host interactions like MMinte [181] have been created to provide a semantic point of view to the microbiome. Additionally, external all-purpose network analysis and visualization applications like Cytoscape [182], Gephi [183] and the Network Workbench Tool [184] can also be used, as many of the microbiome applications can export their constructed networks in appropriate formats.
Biomarker discovery
Biomarker discovery in metagenomics is the way to identify which specific microbial taxa and their combinations contribute to explanatory variables. Once again, parametric or nonparametric tests are applied to OTU tables, and their results are represented in various forms like odds ratio diagrams. These tests usually apply when one wants to compare two different states in tandem. In recent years, implementations, such as LeFSe [185], have been introduced, which can analyze multiple factors simultaneously to discover biomarkers of dysbiosis.
Functional analysis of the microbiome—metabolomics
Even though quantification of the microflora’s composition is important to understand the parties involved in dysbiosis and their association with pathophysiology, their actual functionality is the key for examining if they are the cause or mere casualties of disorders. As showcased earlier when talking about the gut–brain axis, microbial metabolic processes, the preeminent way of the microbes to interact with the host, play a vital role to health. Metabolomics is the large-scale study to identify and quantify metabolites, which can provide insights into the host environment during homeostasis or disease. Studies can be focused either on cellular processes that affect the microbiome by creating a nurturing or hostile environment for the microflora or on the extragenomic perturbations caused by microbial metabolites on the host. Usually, modern studies focus on the latter trying to prove or disprove correlation between certain microbial populations and host disorders. Metabolite identification can either occur by analyzing the results of traditional methods like chromatography, mass spectrometry and nuclear magnetic resonance [186–189] or by using metagenomics tools that infer the metabolic products of microbial populations via their genes. Similar to the OTU classification process, functional metagenomics require different approaches in their analysis and visualization of results.
Owing to the nature of metagenomics downstream analysis tools to offer insights to multiple of the above categories, Table 3 summarizes some stand-alone implementations and R packages along with their functionalities. Most of the applications require the appropriate input of sequences or OTU tables to analyze and provide visualizations of their results. Even though Tool A might offer a wider variety of operations than Tool B and can be preferred, the truth is that most of them are interchangeable and their usage relies on scientific community adoption and subjective ease of use. Some might argue that the speed and computational requirements of some of the implementations are not subjective, and there are clear winners, but it all depends on the computational power of the end-user’s equipment. Bioinformaticians may choose to even adapt some of them to their own needs, as they are open source, and create their mix and match pipelines. What is important though, is that the interpretation of their statistical analyses, remains in the hands of the researchers and should be used properly regarding different hypotheses. Statistics by themselves if not critically viewed can lead toward skewed conclusions especially in metagenomics, where so many variables are relevant and should be considered. Some researchers might even choose to run their data through multiple applications with the same functionality to verify their findings and use each tool’s resolution and specificity to their benefit. Figure 3 also summarizes frequently asked questions, which may arise during metagenomics research and which of these categories of tools are able to provide answers to them.
Tool . | Microbial community composition, hierarchy and quantitative representation . | Diversity analysis . | Multivariate statistical analysis of microbiome composition in correlation to sample metadata . | Network analysis . | Biomarker discovery . | Functional analysis/ metabolomics . | Reference . |
---|---|---|---|---|---|---|---|
Stand-alone implementations | |||||||
BugBase | ✓ | ✓ | ✓ | ✓ | [190] | ||
Calypso | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [191] |
COGNIZER | ✓ | [192] | |||||
EMPeror | ✓ | ✓ | [193] | ||||
Explicet | ✓ | ✓ | ✓ | [194] | |||
FishTaco | ✓ | [195] | |||||
FMAP | ✓ | [196] | |||||
FragGeneScan | ✓ | [197] | |||||
FuncTree | ✓ | [198] | |||||
Galaxy/Hutlab | N/A | ||||||
Genboree Microbiome Toolset | ✓ | ✓ | ✓ | ✓ | [199] | ||
Glimmer-MG | ✓ | [200] | |||||
GraPhlAn | ✓ | [201] | |||||
HUMAnN2 | ✓ | [202] | |||||
IMP | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [203] |
Krona | ✓ | [204] | |||||
LEfSe | ✓ | ✓ | [185] | ||||
MEGAN6 | ✓ | ✓ | ✓ | ✓ | [150] | ||
MetaCoMET | ✓ | ✓ | ✓ | [205] | |||
METAGENassist | ✓ | ✓ | ✓ | [206] | |||
MetaShot | ✓ | [161] | |||||
Metaviz | ✓ | ✓ | ✓ | [207] | |||
MG-RAST | ✓ | ✓ | ✓ | [170] | |||
Microbiome Analyst | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [208] |
Mminte | ✓ | ✓ | [181] | ||||
MOCAT 2 | ✓ | ✓ | [209] | ||||
mothur | ✓ | ✓ | ✓ | [106] | |||
Parallel-META 3 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [210] |
Phoenix 2 | ✓ | ✓ | [211] | ||||
PICRUSt | ✓ | [212] | |||||
Prodigal | ✓ | [213] | |||||
QIIME | ✓ | ✓ | ✓ | ✓ | [111] | ||
Rhea | ✓ | ✓ | ✓ | [214] | |||
SAMSA | ✓ | [215] | |||||
ShortBRED | ✓ | [216] | |||||
STAMP | ✓ | ✓ | ✓ | ✓ | [217] | ||
Tax4Fun | ✓ | [218] | |||||
Taxonomer | ✓ | [145] | |||||
VAMPS | ✓ | ✓ | [219] | ||||
Vikodak | ✓ | [220] | |||||
R packages | |||||||
ade4 | ✓ | ✓ | [221] | ||||
enveomics | ✓ | ✓ | ✓ | [222] | |||
metaDprof | ✓ | ✓ | [223] | ||||
metagenomeSeq | ✓ | ✓ | [224] | ||||
MMiRKAT | ✓ | [225] | |||||
mmnet | ✓ | ✓ | ✓ | [226] | |||
phyloseq | ✓ | ✓ | ✓ | ✓ | [227] | ||
RAIDA | ✓ | [228] | |||||
RevEcoR | ✓ | ✓ | [229] | ||||
ShotgunFunctionalizeR | ✓ | [230] | |||||
vegan | ✓ | ✓ | ✓ | [231] |
Tool . | Microbial community composition, hierarchy and quantitative representation . | Diversity analysis . | Multivariate statistical analysis of microbiome composition in correlation to sample metadata . | Network analysis . | Biomarker discovery . | Functional analysis/ metabolomics . | Reference . |
---|---|---|---|---|---|---|---|
Stand-alone implementations | |||||||
BugBase | ✓ | ✓ | ✓ | ✓ | [190] | ||
Calypso | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [191] |
COGNIZER | ✓ | [192] | |||||
EMPeror | ✓ | ✓ | [193] | ||||
Explicet | ✓ | ✓ | ✓ | [194] | |||
FishTaco | ✓ | [195] | |||||
FMAP | ✓ | [196] | |||||
FragGeneScan | ✓ | [197] | |||||
FuncTree | ✓ | [198] | |||||
Galaxy/Hutlab | N/A | ||||||
Genboree Microbiome Toolset | ✓ | ✓ | ✓ | ✓ | [199] | ||
Glimmer-MG | ✓ | [200] | |||||
GraPhlAn | ✓ | [201] | |||||
HUMAnN2 | ✓ | [202] | |||||
IMP | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [203] |
Krona | ✓ | [204] | |||||
LEfSe | ✓ | ✓ | [185] | ||||
MEGAN6 | ✓ | ✓ | ✓ | ✓ | [150] | ||
MetaCoMET | ✓ | ✓ | ✓ | [205] | |||
METAGENassist | ✓ | ✓ | ✓ | [206] | |||
MetaShot | ✓ | [161] | |||||
Metaviz | ✓ | ✓ | ✓ | [207] | |||
MG-RAST | ✓ | ✓ | ✓ | [170] | |||
Microbiome Analyst | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [208] |
Mminte | ✓ | ✓ | [181] | ||||
MOCAT 2 | ✓ | ✓ | [209] | ||||
mothur | ✓ | ✓ | ✓ | [106] | |||
Parallel-META 3 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [210] |
Phoenix 2 | ✓ | ✓ | [211] | ||||
PICRUSt | ✓ | [212] | |||||
Prodigal | ✓ | [213] | |||||
QIIME | ✓ | ✓ | ✓ | ✓ | [111] | ||
Rhea | ✓ | ✓ | ✓ | [214] | |||
SAMSA | ✓ | [215] | |||||
ShortBRED | ✓ | [216] | |||||
STAMP | ✓ | ✓ | ✓ | ✓ | [217] | ||
Tax4Fun | ✓ | [218] | |||||
Taxonomer | ✓ | [145] | |||||
VAMPS | ✓ | ✓ | [219] | ||||
Vikodak | ✓ | [220] | |||||
R packages | |||||||
ade4 | ✓ | ✓ | [221] | ||||
enveomics | ✓ | ✓ | ✓ | [222] | |||
metaDprof | ✓ | ✓ | [223] | ||||
metagenomeSeq | ✓ | ✓ | [224] | ||||
MMiRKAT | ✓ | [225] | |||||
mmnet | ✓ | ✓ | ✓ | [226] | |||
phyloseq | ✓ | ✓ | ✓ | ✓ | [227] | ||
RAIDA | ✓ | [228] | |||||
RevEcoR | ✓ | ✓ | [229] | ||||
ShotgunFunctionalizeR | ✓ | [230] | |||||
vegan | ✓ | ✓ | ✓ | [231] |
Note: These tools use microbial sequences and/or OTU tables to extract information on the microflora’s composition and functionality.
Tool . | Microbial community composition, hierarchy and quantitative representation . | Diversity analysis . | Multivariate statistical analysis of microbiome composition in correlation to sample metadata . | Network analysis . | Biomarker discovery . | Functional analysis/ metabolomics . | Reference . |
---|---|---|---|---|---|---|---|
Stand-alone implementations | |||||||
BugBase | ✓ | ✓ | ✓ | ✓ | [190] | ||
Calypso | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [191] |
COGNIZER | ✓ | [192] | |||||
EMPeror | ✓ | ✓ | [193] | ||||
Explicet | ✓ | ✓ | ✓ | [194] | |||
FishTaco | ✓ | [195] | |||||
FMAP | ✓ | [196] | |||||
FragGeneScan | ✓ | [197] | |||||
FuncTree | ✓ | [198] | |||||
Galaxy/Hutlab | N/A | ||||||
Genboree Microbiome Toolset | ✓ | ✓ | ✓ | ✓ | [199] | ||
Glimmer-MG | ✓ | [200] | |||||
GraPhlAn | ✓ | [201] | |||||
HUMAnN2 | ✓ | [202] | |||||
IMP | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [203] |
Krona | ✓ | [204] | |||||
LEfSe | ✓ | ✓ | [185] | ||||
MEGAN6 | ✓ | ✓ | ✓ | ✓ | [150] | ||
MetaCoMET | ✓ | ✓ | ✓ | [205] | |||
METAGENassist | ✓ | ✓ | ✓ | [206] | |||
MetaShot | ✓ | [161] | |||||
Metaviz | ✓ | ✓ | ✓ | [207] | |||
MG-RAST | ✓ | ✓ | ✓ | [170] | |||
Microbiome Analyst | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [208] |
Mminte | ✓ | ✓ | [181] | ||||
MOCAT 2 | ✓ | ✓ | [209] | ||||
mothur | ✓ | ✓ | ✓ | [106] | |||
Parallel-META 3 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [210] |
Phoenix 2 | ✓ | ✓ | [211] | ||||
PICRUSt | ✓ | [212] | |||||
Prodigal | ✓ | [213] | |||||
QIIME | ✓ | ✓ | ✓ | ✓ | [111] | ||
Rhea | ✓ | ✓ | ✓ | [214] | |||
SAMSA | ✓ | [215] | |||||
ShortBRED | ✓ | [216] | |||||
STAMP | ✓ | ✓ | ✓ | ✓ | [217] | ||
Tax4Fun | ✓ | [218] | |||||
Taxonomer | ✓ | [145] | |||||
VAMPS | ✓ | ✓ | [219] | ||||
Vikodak | ✓ | [220] | |||||
R packages | |||||||
ade4 | ✓ | ✓ | [221] | ||||
enveomics | ✓ | ✓ | ✓ | [222] | |||
metaDprof | ✓ | ✓ | [223] | ||||
metagenomeSeq | ✓ | ✓ | [224] | ||||
MMiRKAT | ✓ | [225] | |||||
mmnet | ✓ | ✓ | ✓ | [226] | |||
phyloseq | ✓ | ✓ | ✓ | ✓ | [227] | ||
RAIDA | ✓ | [228] | |||||
RevEcoR | ✓ | ✓ | [229] | ||||
ShotgunFunctionalizeR | ✓ | [230] | |||||
vegan | ✓ | ✓ | ✓ | [231] |
Tool . | Microbial community composition, hierarchy and quantitative representation . | Diversity analysis . | Multivariate statistical analysis of microbiome composition in correlation to sample metadata . | Network analysis . | Biomarker discovery . | Functional analysis/ metabolomics . | Reference . |
---|---|---|---|---|---|---|---|
Stand-alone implementations | |||||||
BugBase | ✓ | ✓ | ✓ | ✓ | [190] | ||
Calypso | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [191] |
COGNIZER | ✓ | [192] | |||||
EMPeror | ✓ | ✓ | [193] | ||||
Explicet | ✓ | ✓ | ✓ | [194] | |||
FishTaco | ✓ | [195] | |||||
FMAP | ✓ | [196] | |||||
FragGeneScan | ✓ | [197] | |||||
FuncTree | ✓ | [198] | |||||
Galaxy/Hutlab | N/A | ||||||
Genboree Microbiome Toolset | ✓ | ✓ | ✓ | ✓ | [199] | ||
Glimmer-MG | ✓ | [200] | |||||
GraPhlAn | ✓ | [201] | |||||
HUMAnN2 | ✓ | [202] | |||||
IMP | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [203] |
Krona | ✓ | [204] | |||||
LEfSe | ✓ | ✓ | [185] | ||||
MEGAN6 | ✓ | ✓ | ✓ | ✓ | [150] | ||
MetaCoMET | ✓ | ✓ | ✓ | [205] | |||
METAGENassist | ✓ | ✓ | ✓ | [206] | |||
MetaShot | ✓ | [161] | |||||
Metaviz | ✓ | ✓ | ✓ | [207] | |||
MG-RAST | ✓ | ✓ | ✓ | [170] | |||
Microbiome Analyst | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [208] |
Mminte | ✓ | ✓ | [181] | ||||
MOCAT 2 | ✓ | ✓ | [209] | ||||
mothur | ✓ | ✓ | ✓ | [106] | |||
Parallel-META 3 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | [210] |
Phoenix 2 | ✓ | ✓ | [211] | ||||
PICRUSt | ✓ | [212] | |||||
Prodigal | ✓ | [213] | |||||
QIIME | ✓ | ✓ | ✓ | ✓ | [111] | ||
Rhea | ✓ | ✓ | ✓ | [214] | |||
SAMSA | ✓ | [215] | |||||
ShortBRED | ✓ | [216] | |||||
STAMP | ✓ | ✓ | ✓ | ✓ | [217] | ||
Tax4Fun | ✓ | [218] | |||||
Taxonomer | ✓ | [145] | |||||
VAMPS | ✓ | ✓ | [219] | ||||
Vikodak | ✓ | [220] | |||||
R packages | |||||||
ade4 | ✓ | ✓ | [221] | ||||
enveomics | ✓ | ✓ | ✓ | [222] | |||
metaDprof | ✓ | ✓ | [223] | ||||
metagenomeSeq | ✓ | ✓ | [224] | ||||
MMiRKAT | ✓ | [225] | |||||
mmnet | ✓ | ✓ | ✓ | [226] | |||
phyloseq | ✓ | ✓ | ✓ | ✓ | [227] | ||
RAIDA | ✓ | [228] | |||||
RevEcoR | ✓ | ✓ | [229] | ||||
ShotgunFunctionalizeR | ✓ | [230] | |||||
vegan | ✓ | ✓ | ✓ | [231] |
Note: These tools use microbial sequences and/or OTU tables to extract information on the microflora’s composition and functionality.

Common questions in metagenomics research and the specific categories of downstream analysis that can provide answers.
Finally, worth mentioning is that many commercial solutions, which in some cases come as bundles with sequencing equipment, provide similar functionality, as the tools mentioned above with the added benefit of offering training and troubleshooting support, but carrying the disadvantage of their cost. These solutions include products like ERA-7 (https://era7bioinformatics.com/), CLC Genomics Workbench (https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/), Strand NGS (http://www.strand-ngs.com/) and NovoWorx (http://www.novocraft.com/products/novoworx/).
Computational systems have catered to the needs of life sciences for many years now, following a parallel progress and evolution. Algorithms have been developed, applications coded and hardware constructed specifically for bioinformatics and medical informatics as demonstrated here. The goal of these efforts is to enhance research and to accommodate new and complex hypotheses that could be examined with speed and precision. Future strives will bring scientists closer to a complete modeling and emulation of the brain and the gut, allowing us to see, in silico, the machinations and evolution of the gut-brain axis even in real time. Recent strives toward that goal have shown great potential like the works of Cockrell et al. [232], Leber et al. [233], Abedi et al. [234] and others. It is our belief that these computational analyses will drive not only the identification but also the treatment of various conditions.
Treating the disease, the patient or the patient–microflora complex. Will precision medicine be treating all of them?
In 2015, the Precision Medicine Initiative (recently renamed to ‘All of Us’ [235, 236]) was announced by the US government to facilitate a better focus on personalized health and the type of treatment, which accounts for variability and identifies the unique features of each individual. With everything this review has shown about the microbiome and how close we are today to characterize it uniquely for everyone, because of our achievements in bioinformatics, we believe that the parallelism with this initiative is clear. If we are to talk about a person’s diagnosis, prognosis and therapy, it seems almost imperative to consider the whole microflora–host system. It is the entire system that suffers and, perhaps therein, lies the correct course of treatment or the necessary diagnostic and prognostic biomarkers. After all the microbiome has been implicated in regulating pharmacokinetics, pharmacodynamics and driving pharmacogenetics [237–239], providing added value to our investigations of drug metabolism and response.
Exercise, diet and a lifestyle away from sedentary conditions have long been known to promote health for assorted reasons especially concerning the cardiovascular system [240, 241]. Today, we know that these factors perturb the gut’s microflora [242–244], driving the homeostasis and by extension the systemic health. Our diet and our medication regiment regulate our microflora’s composition in a larger scale, by adding new microorganisms or creating a hostile environment for others, affecting, among other systems, our gut–brain axis [245, 246]. By using the wisdom acquired via the downstream analysis of the microbiome, we can discuss targeted practices of diet and antibiotic usage, customized for everyone according to their microbial profile. It is an innovative approach to the well-known expression ‘We are what we eat’. There is also a special category of intervention, which includes probiotics, prebiotics and synbiotics that can influence the microflora, can be used as treatment for various conditions and have been the focus of many studies [247–252].
The terms, although popular in literature and gaining popularity in everyday life, are not well understood by the public. Probiotics are live organisms (bacteria, yeasts, etc.) that can supplement a person’s microflora when they are introduced in their diet. Prebiotics are ingredients that help specific microorganisms, already introduced to the organism, flourish and fight off pathogens and/or reach the appropriate numbers for dysbiosis. Finally, synbiotics are a mixture of the previous two groups. Owing to their mechanism of action, these dietary supplements can be used to target specific populations, which the current insights into dysbiosis have already identified by methods such as the ones described previously in this review. For example, Mehta et al. [253] have proposed the usage of lactic acid bacteria probiotics in reducing the oxidative stress implicated in AD, by suppressing D-galactose, which is implicated in increased reactive oxygen species production and nerve growth factor suppression. Finally, in recent years, a new term emerged, psychobiotics [254], which refers to living organisms (gut bacteria) introduced in the host’s system to treat mental disorders. Their method of action targets specifically the gut–brain axis via the neurotropic metabolic products of these microorganisms [255].
Although probiotics and prebiotics may be valuable additions to a personalized treatment regimen, they rely on daily consumption to be useful and contribute to homeostasis. In the past few years, the more targeted and permanent solution of fecal microbiota transplantation [256–258] has been successfully deployed to help the host’s microflora to be repopulated by ‘healthy’ symbionts. Based on what we know, for a transplantation to be successful a plethora of cofounding factors must be considered. What can be deemed as ‘healthy’ donor and ‘normal’ microflora? Is a transplant from someone living in the United States appropriate for someone in Asia? Considering location and different lifestyles, we must rely on our knowledge of the functional role of the microbiome as discussed previously. Also is the host’s lack of clinical symptomology enough to consider a transplant ‘healthy’ or do we have to test for ‘dormant’ GI pathogens [259]? Is fecal material a reliable source of microflora, as it can change constantly because of external factors [260]? Despite of the many difficulties, recent studies have shown promise in treating a variety of pathological conditions, including neuropsychological ones. For example, microflora transplantation has been successful in alleviating autism symptomology in a recent study by Kang et al. [261] where ASD-related behavior was improved by 22% following transplantation and up to 24% in a 8-week period after that (according to the Childhood Autism Rating Scale).
Treating the microflora is not the only thing one must consider when trying to combat dysbiosis. One of the major reasons of microbial population loss is the broad usage of antibiotics [262]. Although critical for our health, the extended usage of these drugs has caused some issues going beyond the creation of antibiotic resistant bacteria [263]. Especially during early life, antibiotics can help combat pathogens introduced into the host but are also responsible for dysbiosis [264]. Once more, the need for targeted precision antibiotics comes into the foreground requiring an extensive understanding of their implications to the microflora synthesis and how populations vital to homeostasis can be spared. Complimenting antibacterial treatments with probiotics, which are not susceptible to the antibiotics themselves [265], can prove useful for customized approaches to the needs of patients [266, 267].
In the past 5 years, the microbiome has seen a significant boost in scientific interest and publications. A relative term search (microbiome, microbiota, microflora), in PubMed alone, yields over 35 000 results for just this period with an exponential growth each passing year. Some researchers [268] have even characterized the year 2016 as the ‘banner year’ for microbiome research, something that can be directly attributed to the bioinformatics approaches at our disposal and the constant flow of information linking it to systemic health. This shift toward a better understanding of all the mechanisms describing and being perturbed by our microbiome is driven by our need to be able to better understand the host–microflora relationship. The acquisition of this knowledge can lead, not only in more precise definition of the pathophysiological attributes of disorders but also to the customization of treatment for individuals or specific patient groups. Several aspects of today’s medicine are being driven by genomics, proteomics, epigenomics, metabolomics, microbiomics and their integration via systems biology, allowing researchers to accurately predict the onset, progression and pharmacological response of a pathological condition [269, 270]. Scientists are now able not only to precisely identify and evaluate the microbiome but also track its changes and the ones it provokes through time, dynamically tracking bacterial population abundance differences and metabolite production [271–273].
The complexity of the gut–brain–microbiome axis makes for an interesting target for the application of our research efforts and a perfect candidate to be supported by integrated multidisciplinary approaches [274]. As the embryonic stage via our maternal microbiome and developing rapidly in the first 3–5 years of life [275, 276], our microbial partners help shape the development of our CNS and behavior. During our lifespan, the gut microbiome contributes toward neurological and mental health. The cross talk between the microflora ecology and the host’s physiology is based on interactions on a genetic, protein and metabolic level for both sides involved.
The studies previously mentioned in this review highlight the gut microbiome as a modulator of brain development and neurotransmitter signaling systems but also as a mediator of neurological, mental and behavioral function in adults. We are confronted with vast networks of signals and interactions, in which we are called to identify the essential components for homeostasis and understand what perturbations are applied by dysbiosis. It is important in a dynamic ecosystem that research will be focused on the factors that drive permanent or reversible changes that are essential in a variety of functions and their involvement in molecular mechanisms. These fundamental biological mechanisms can be explored via novel high-throughput computational methodologies that combine and analyze the evolution of the microbial communities and their genetic composition, microbial–host biological systems interaction and the effects of external environmental factors on the microbial–host ecosystem. More specifically, computational metagenomics cross-analysis and host genetic susceptibility/genomic background will provide new insights into the onset and progression of CNS disease. In addition, the characterization and quantification of the genomic composition of the microbiome under different environmental factors can provide information of the microbiome’s role as a cause or effect of disease, something that is currently under investigation. Translating the biological networks into computational ones, which include host-omics, meta-omics and related phenotypes in tandem, we can construct prediction models that can reveal valuable information on metabolic and other molecular components as well as signaling pathways mediated in brain health and disease. The development of new combinational databases [277], which proliferate the knowledge derived by our research, will help to make it accessible and usable by other investigators.
These novel bioinformatics avenues lead to a better understanding of neurological and mental disease by pinpointing the modifiable factors that influence the microbiome and act as regulators of health. The outcome of this knowledge can be new therapeutic strategies that complement a possible prognostic and diagnostic role of the gut microbiome, in medicine at a personalized as well as a general population level.
Gut–brain axis is a complex communication system mediating human health.
Microflora–gut–brain axis is based on a bidirectional relationship.
Shotgun and 16S rRNA sequencing precision is essential for our data.
Computational downstream analysis of the microbiome provides answers regarding its composition and function.
Microbiome research could offer a novel approach to precision medicine.
Funding
G.M.S. holds the Bioinformatics ERA Chair position funded by the European Commission Research Executive Agency (REA) Grant BIORISE (grant number 669026), under the Spreading Excellence, Widening Participation, Science with and for Society Framework.
Nikolas Dovrolis is a PhD candidate of Pharmacology at the Democritus University of Thrace. He is a Computer Science graduate with a Master’s Degree in Molecular Biology and Genetics.
George Kolios, MD, PhD, is a Professor of Pharmacology at Democritus University Thrace, Greece. He is a clinical Gastroenterologist, with extensive research in mucosal immunology, focused on intestinal inflammation and microbiota.
George M. Spyrou, PhD, holds the Bioinformatics ERA Chair and is the Head of the Bioinformatics Group at the Cyprus Institute of Neurology and Genetics.
Ioanna Maroulakou, PhD, is a Professor of Genetics at Democritus University of Thrace and has extensive experience and expertise in Translational Research and Acquired genetic disorders including neurodegenerative diseases.