Connections between the human gut microbiome and gestational diabetes mellitus

Abstract The human gut microbiome can modulate metabolic health and affect insulin resistance, and it may play an important role in the etiology of gestational diabetes mellitus (GDM). Here, we compared the gut microbial composition of 43 GDM patients and 81 healthy pregnant women via whole-metagenome shotgun sequencing of their fecal samples, collected at 21–29 weeks, to explore associations between GDM and the composition of microbial taxonomic units and functional genes. A metagenome-wide association study identified 154 837 genes, which clustered into 129 metagenome linkage groups (MLGs) for species description, with significant relative abundance differences between the 2 cohorts. Parabacteroides distasonis, Klebsiella variicola, etc., were enriched in GDM patients, whereas Methanobrevibacter smithii, Alistipes spp., Bifidobacterium spp., and Eubacterium spp. were enriched in controls. The ratios of the gross abundances of GDM-enriched MLGs to control-enriched MLGs were positively correlated with blood glucose levels. A random forest model shows that fecal MLGs have excellent discriminatory power to predict GDM status. Our study discovered novel relationships between the gut microbiome and GDM status and suggests that changes in microbial composition may potentially be used to identify individuals at risk for GDM.

encouraged to cite Research Resource Identifiers (RRIDs) for antibodies, model organisms and tools, where possible.
Have you included the information requested as detailed in our Minimum Standards Reporting Checklist?
Availability of data and materials All datasets and code on which the conclusions of the paper rely must be either included in your submission or deposited in publicly available repositories (where available and ethically appropriate), referencing such data using a unique identifier in the references and in the "Availability of Data and Materials" section of your manuscript.
Have you have met the above requirement as detailed in our Minimum Standards Reporting Checklist?

No
If not, please give reasons for any omissions below. as follow-up to "Availability of data and materials All datasets and code on which the conclusions of the paper rely must be either included in your submission or deposited in publicly available repositories (where available and ethically appropriate), referencing such data using a unique identifier in the references and in the "Availability of Data and Materials" section of your manuscript.
Have you have met the above requirement as detailed in our Minimum Standards Reporting Checklist?

"
We are planning to submit our datasets to GigaDB database, but now is not yet complete.

Abstract Background
Human gut microbiome can modulate metabolic health and affect insulin resistance, and may play an important role in the etiology of gestational diabetes mellitus (GDM). Here, we compared the gut microbial composition of 43 GDM patients and 81 healthy pregnant women via whole-metagenome shotgun sequencing of their fecal samples collecting at 21-29 weeks, to explore associations between GDM and the composition of microbial taxonomic units and functional genes.

Results
Metagenome-wide association study (MGWAS) identified 154,837 genes, which enabled to cluster into 129 metagenome linkage groups (MLGs) for species description, with significant abundance differences between two cohorts. Parabacteroides distasonis, Klebsiella variicola, etc., were enriched in GDM patients, whereas Methanobrevibacter smithii, Alistipes spp., Bifidobacterium spp. and Eubacterium spp. were enriched in controls. The GDM-associated species showed correlations with maternal blood glucose, indicating a potential relationship between gut microbes and blood glucose tolerance. We further evaluated the performance of gut microbiota as biomarker to identify GDM status with a Random Forest model and demonstrated that fecal MLGs may offer new indicators for prognosis of GDM.

Conclusions
Our study discovered novel relationships between gut microbiome and GDM status, and suggested that changes in microbial composition may potentially be used to identify individuals at risk for GDM.

Background
The increasing prevalence of gestational diabetes mellitus (GDM), and its subsequent health outcomes, are a significant public health concern and a major challenge for obstetric practice [1].
GDM represents a heterogeneous group of metabolic disorders [2] which affects 3-14% of pregnancies, and 20-50% of these affected women are expected to develop type 2 diabetes (T2D) within 5 years [3,4]. Emerging evidence has revealed a link between the gut microbiome and human metabolic health [5,6], leading us to hypothesize that the gut microbiome may impact gestational metabolism and development of GDM.
Microbial dysbiosis in the human gut may be an important environmental risk factor for abnormal host metabolism, as recently exemplified in the studies of obesity and T2D (reviewed by Karlsson,et. al)[7]. A study using experimental animal model revealed reduced numbers of Bifidobacteria led to enhanced endogenous lipopolysaccharide production, endotoxemia, and associated obesity and insulin resistance [8]. In humans, excessive weight gain and obesity in pregnancy resulted in deteriorated glucose tolerance and increased risk of GDM [9,10]. Prevotella copri and Bacteroides vulgatus have been identified as the main species driving the association between biosynthesis of branched-chain amino acids, insulin resistance, and glucose intolerance [11], and Bacteroides spp. and Staphylococcus aureus are significantly more abundant in overweight women than in normal-weight women [12].
While the majority of previous studies have focused on associations between intestinal microbiota and obese states or T2D [6,[13][14][15], some recent studies have sought to characterize microbiota changes during pregnancy, with the goal of providing novel insights into the relationship between microbiota changes during pregnancy and potential metabolic consequences [16]. Studies based on sequencing of 16S ribosomal RNA have revealed novel relationships between gut microbiome composition and the metabolic hormonal environment in overweight and obese pregnant women in early gestation [17]. Koren et al. found that maternal gut microbiota changed from first to third trimesters, with a decline in butyrate-producing bacteria and increased Bifidobacteria, Proteobacteria, and lactic-acid producing bacteria [16]. Further, transplants of fecal material obtained during different trimesters were sufficient to confer different phenotypes in mouse models, with third-trimester fecal transplants leading to increased adiposity and inflammation [16]. These studies suggest that pregnancy is associated with major shifts in the gut microbiome which may play an important role in observed increases in gestational inflammation, thereby potentially contributing to development of GDM. However, studies focusing on changes in the gut microbiome during pregnancy and development of GDM have not been reported so far.
Metagenomic shotgun sequencing, in which the full complement of genes present in the microbiome are sequenced, can furnish information about the relative abundance of genes in functional pathways and at all taxonomical levels [18]. In this study, we used whole-metagenome 4 shotgun sequencing analyses of the gut microbiome during pregnancy to explore associations between GDM and the composition and abundance of microbial taxonomic units and functional genes. The objective was to obtain a comprehensive understanding of the gut microbiome's role in the etiopathogenesis of GDM.

Data description
We obtained the fecal samples from 124 pregnant women, including 43 GDM patients and 81 healthy control individuals, during their second trimester in Guangzhou Women and Children's Medical Center (GWCMC). Whole-metagenome shotgun sequencing of the samples were preformed based on the Illumina HiSeq2000 platform in BGI-Shenzhen, China. We constructed a paired-end library with insert size of 350 base pairs (bp) for every sample, and sequenced with 100bp read length from each end. Sequencing reads for fecal samples were independently processed for quality controlling and host sequences removing based on an in-house pipeline (see Methods), and totaling 795 Gbp high quality metagenomic data (average per sample, 6.4 Gbp) were finally generated for further analysis. We performed de novo assembly and gene calling for data of each sample, and constructed a non-redundant gene catalogue of all pregnant women samples containing 4,344,984 genes. This gene catalogue provided a suitable reference for metagenomic gene quantification, microbial diverisity analysis, and metagenome-wide association study for the pregnant women samples.

Comparison of the gut microbiota between GDM patients and healthy pregnant women
First, we explored potential differences in the gut microbiome between 43 GDM patients and 81 healthy pregnant women. In order to perform this analysis, we obtained 795.3 Gb of high-quality data (6.4 ± 1.3 Gb per sample, Table S1) via metagenomic shotgun sequencing of their fecal samples. We aligned the sequencing reads (43.8%) against available microbial genomes from the National Center for Biotechnology Information and generated taxonomic composition for all samples at the taxonomic levels of phylum, class, order, family, genus and species. Multivariate analysis based on Bray-Curtis distances between microbial genera revealed significant differences between GDM patients and healthy controls (Figure 1a). We then preformed the Mann-Whitney U   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 test to identify phylogenetic differences between GDM patients and healthy controls. No significant differences were found at the phylum and class levels, however, the order Clostridiales was enriched in healthy controls. At the genus level, GDM patients had a significantly higher abundance of Parabacteroides, Megamonas and Phascolarctobacterium, while healthy controls were significantly enriched for Ruminiclostridium, Roseburia,Eggerthella,Fusobacterium,Haemophilus,Mitsukella,and Aggregatibacter (Figure 1b). We also found a number of bacterial species that differed significantly between GDM patients and healthy controls, consistent with the genus level observations (Table S2). These findings suggest dysbiosis of the gut microbiota based on GDM status.

Identification of GDM-associated markers from gut microbiome
To explore detailed signatures of the gut microbiome in GDM patients and heathy controls, we constructed a non-redundant gene catalogue consisting of 4.34 million genes, which allowed an average reads mapping rate of 79.5% for sequenced samples. We identified 154,837 genes that displayed significant abundance differences between the two groups (Mann-Whitney U test, q<0.05). ( Figure S1 shows the P-value distribution between GDM patients and healthy pregnant women for all genes tested). ~68% of these genes were clustered into 129 MLGs (Table S3) between each other), representing a cooperative promoting function of Enterobacteriaceae to GDM development. Of particular interest, we also observed that the relative abundance of Enterobacteriaceae was positively associated with pro-gestational body mass index (PBMI, Figure S2).

Correlations between maternal blood glucose and gut microbiota
In order to explore the potential clinical paths by which changes in the microbiome might lead to GDM, we investigated whether the MLGs can affect blood glucose tolerance. The gross abundance of GDM-enriched and control-enriched MLGs were obviously associated with the blood glucose level during the second trimester of pregnancy (as measured by OGTT, Figure 3).

Functional characterization of gut microbiota in GDM
Next, we utilized KEGG pathway comparisons to explore potential differences in the functional composition of microbiomes of GDM patients vs. controls. Although the functional composition of GDM patients and control subjects were highly similar (Figure 5a), the microbiome of GDM patients showed a greater abundance of membrane transport and energy metabolism pathways, while the microbiome of control subjects was enriched in the function of amino acid metabolic pathways. We also found that KEGG modules involving the phosphotransferase system (PTS, a major component of membrane transport) and lipopolysaccharide (LPS, a major component of the outer membrane of Gram-negative bacteria that induces a profound inflammatory response) biosynthesis and export systems, were associated with glucose tolerance levels ( Figure 5b). These findings are consistent with those of a previous study [19] that reported an increase in microbial functions for membrane transport and LPS metabolism in the microbiome of patients with type 2 diabetes.

Gut microbiota-based classification of GDM
Finally, we utilized random forest models to assess the ability of MLGs and species abundance profiles to predict GDM status. We found that 20 MLGs provided the best discriminatory power, as indicated by the area under the ROC curve (AUC) 0.91 (95% CI 0.87-0.96); this was higher than that achieved using species profiles with this model (the best AUC was 0.80; 95% CI 0.73-0.86) using 40 species (Figure 6a). The increased AUC for the MLG-based model may be due to the fact that MLGs furnish taxonomic and functional information for unknown or unanalyzable species. Bacterial species providing the highest discriminatory power were primarily members of the Bacteroides or Parabacteroides genera (Figure 6b-c), consistent with our earlier observation that Parabacteroides is the predominant genus accounting for differences in the gut microbiome between GDM patients and control subjects ( Figure 1b). When age and PBMI of pregnant women were included along with the 129 MLGs, PBMI was selected as a marker together with the highest discriminatory MLGs, but the performance of the model did not obviously improve ( Figure S3 and Figure 6d). Therefore, fecal MLGs, by themselves, may provide a mechanism and biomarkers for early detection of GDM or, potentially, for identifying risk of developing GDM.

Discussion
To identify and understand alterations in the gut microbiome associated with GDM, we characterized the genic, taxonomic, and functional repertoire of microbiomes of 43 GDM patients and 81 healthy pregnant women. To our knowledge, this is the first metagenomics study on stools of GDM patients, revealing significant dysbiosis, taxonomic shifts and functional changes in their microbiome as compared with healthy pregnant women.
Our study furnished a powerful set of microbial markers for GDM prediction, which achieved an AUC of 0.91, for identifying GDM status based on 20 species-level MLGs. The discriminatory power of this set of markers was higher than prediction models based on genomic markers identified by genome-wide association studies (GWAS) (AUC 0.5-0.7) [20]. Thus, as demonstrated in the current study, analysis of fecal microbiota could be used for the early diagnosis of GDM or, potentially, for identifying individuals at risk for developing GDM. Future systematic assessment of the key species and gene markers identified here will be required to further develop this tool.
In the current study, microbes found to be enriched in the gut of control subjects included have also found these bacteria to be enriched in the gut microbiome of healthy control subjects.
This suggests that alterations in gut microbiota resulting in decreased relative abundance of these species may be a considerable risk factor for multiple metabolic syndromes, including GDM.
Thus, it is conceivable that alterations in the microbiome associated with GDM, contribute to the pathogenesis of GDM.
Other bacteria identified in the current study as being over-or under-represented in GDM patients have also been previously demonstrated to play important roles in the human gut, with potential functional relevance to GDM. For example, patient-enriched Bacteroides spp. and Parabacteroides distasonis are considered to be opportunistic pathogens in infectious diseases with potential for developing antimicrobial drug resistance [26], and control-enriched Alistipes spp.
The main limitations of our study are the relatively limited sample size, and the fact that we only analyzed one stool sample per subject, collected in the second trimester of pregnancy. It is well known that immune and metabolic changes occur throughout pregnancy, and that the gut microbiota shifts from first to third trimesters [16]. Consequently, associations between the microbiome and GDM status need to be examined at other time points during pregnancy to provide further insights into when the changes we observed at 21-29 weeks develop, and whether they are sustained for the remainder of the pregnancy. In addition, metadata information available on the effect of maternal GDM status and changes in microbiota composition in pregnancy were limited. Confounding factors such as life style, diet, and antibiotic treatment may further affect both blood glucose levels and gut microbiota composition. In order to more definitively establish the associations observed in the current study, a large cohort investigation, with analysis of other potentially significant variables, will be necessary. Further, the observational, cross-sectional design of the current study precluded examination of potential causality. Whether the microbiome impacts blood glucose levels, glucose levels impact the microbiome, or the relationship is bidirectional, cannot be proven without further experimentation, most likely involving animal models. Furthermore, to identify fecal metagenomic markers with sufficient predictive power to identify GDM, future work will be necessary to refine the diagnostic approach developed in our study, to identify additional markers with improved predictive value and, eventually, to validate them in other larger cohorts.
In summary, our findings extend findings of earlier studies showing a correlation between gut microbiota and various metabolic derangements. Specifically, we demonstrated an important associations between the gut microbiota and GDM. Our results suggest that changes in composition of the gut microbiome may be used to identify individuals with GDM, could potentially be used to identify individuals at risk for GDM, and may contribute to the pathogenesis of GDM.

De novo assembly, gene calling and gene catalogue construction
To determine the best assembling method for the obtained high-quality Illumina sequencing reads, we compared the performance of two assemblers, SOAPdenovo v2.04 (as previously used in the MetaHIT and IGC projects) [31, 32] and IDBA-UD v1.1.1 (a de novo assembler for metagenomic sequences) [33]. For the SOAPdenovo, we tested the k-mer length ranging from 23bp to 123bp by 10bp step for each sample, and selected the assembled contig set with longest N50 length. For the IDBA-UD, parameters "--mink 21 --maxk 81 --step 20 --pre_correction" were used. For most samples, IDBA-UD obtained a better assembled contig set than SOAPdenovo. This could be attributable to the relative efficiency of IDBA-UD in assembling bacterial genomes within regions of highly uneven depth in metagenomic samples. As a result, we obtained an average 197.9 ± 50.3 Mbp (mean ± SD) contig sets for each pregnant women sample, with N50 length 8.8 ± 3.9 kbp.

Quantification of metagenomic genes
The abundance of genes in the combined non-redundant gene catalogue (combining the pregnant women gene catalogue and IGC) was quantified as relative abundance of reads. First, high-quality reads from each sample were aligned against the gene catalogue using SOAP2.21 [30], with thresholds that allowed a maximum of two mismatches in the initial 32bp seed sequence and 90% similarity over the whole reads. Only two types of alignments were accepted: (1) the entire paired-end read can be mapped onto a gene with the correct insert-size; (2) one end of the paired-end read can be mapped onto the end of a gene, only if the other end of read was mapped outside the genic region. The relative abundance of a gene in a sample was estimated by dividing the number of reads that uniquely mapped to that gene by the length of the gene region and by the total number of reads from the sample that uniquely mapped to any gene in the catalogue. The resulting set of gene relative abundances of a sample was its gene profile.

Richness
We used the gene count and Shannon index to represent the richness and evenness of the gut microbiota for each sample. As defined previously [5], the gene counts of a metagenomic sample were calculated based on their reads mapping number on the non-redundant gene catalogue. To eliminate the influence of sequencing depth fluctuation, an equal number of 11 million reads for all samples were randomly extracted for mapping, and then, the mean number of genes over 30 random drawings was generated. The Shannon index (within sample diversity) was calculated as previously described [19].

Taxonomical and functional analyses
Taxonomical classification of genes. Reference microbial genomes were downloaded from the NCBI-genome database (version May-2015), which included 8,953 bacterial/archaea genomes (of which, 2,785 genomes were complete and 6,168 were draft genomes), and 4,400 viral genomes.

Functional annotation of genes. The Kyoto Encyclopedia of Genes and Genomes (KEGG orthologous, version Apr-2015) and evolutionary genealogy of genes: Non-supervised
Orthologous Groups (eggNOG, v4) databases were used for functional annotation of genes.
Translated amino acid sequences of genes were searched against these databases using USEARCH v8.0.1616 [38] (evalue < 1e-5, query_cov > 0.70) with a minimum similarity of 30%. Each protein was assigned a KEGG orthologue (KO) or eggNOG orthologue group (OG) based on the best-hit gene in the database. Using this approach, 43.6% and 71.9% of the genes in the combined gene catalogue could be assigned a KO or OG, respectively. As a final step, the abundance profiles of KEGG and eggNOG were calculated by summing up the relative abundance of genes annotated to a feature.

Metagenome-wide association study (MGWAS)
We used the MGWAS methodology to identify gene markers that showed significant abundance differences between the GDM and control individuals. The MGWAS was performed using methodology developed by Qin et al [19]. Briefly, gene relative abundance profiles were initially adjusted for population stratifications using the modified EIGENSTRAT method [39] that allows the use of covariance matrices estimated from abundance levels instead of genotypes. Then, a two-tailed Mann-Whitney U test was performed in the adjusted gene profiles, and the Benjamin-Hochberg procedure [40] was subsequently used to correct the p-values to generate the false discovery rate (FDR, known as "q-value") for each gene.

Metagenomic linkage group (MLG) analysis
Co-abundance genes were clustered into MLGs based on the previously described methodology [19]. Taxonomic assignment and abundance profiling of the MLGs were performed according to the taxonomy and the relative abundance of their constituent genes as previously described [19].

Statistical analysis
Statistical analysis was implemented using the R platform. Distance-based redundancy analysis (dbRDA) was performed using the "vegan" package [42] based on the Bray-Curtis distances on normalized taxa abundance matrices, then visualized using the "ggplot2" package. Permutational multivariate analysis of variance (PERMANOVA) was performed using the "vegan" package, and the permuted p-value was obtained by 10,000 permutations. The Random Forest model has been shown [6] to be a suitable model for exploiting metagenomic data. Random Forest models were trained using the "randomForest" package (default parameters and 10,000 trees) to identify GDM status in a subset of GDM patients and control subjects by using the abundance profiles of species and MLGs. Performance of the predictive model was evaluated with cross-validation error. Variable importance by mean decrease in accuracy was calculated for the Random Forest models using the full set of species or MLGs. By ranking the variables by importance, smaller models were constructed that contained only the most important variables.

Acknowledgements
We thank all the pregnant women who participated in the Born in Guangzhou Cohort Study and all staff in the cohort team for their contribution to this study, particularly the research nurses and midwives and other recruiting staff for their excellent work.
This study is supported by the National Natural Science Foundation of China (81673181) and Guangzhou Science and Technology Bureau, Guangzhou, China (201508030037).The sponsors had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; or in preparation, review, or approval of the manuscript; or the decision to submit the manuscript for publication.

Author contributions
XQ and HX designed the birth cohort on which this study was based. XQ and HX designed the study and directed its implementation. YG, YK, MY, JH, JL, NC, WX, SS, LQ, YW, CH, QC, WL and YW were involved in study design and sample collection. YG, YK and SL analyzed the data and drafted the manuscript. XQ, HD, JL and CP revised the manuscript. All authors critically revised the manuscript, and approved the final version.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 IQR from the first and third quartiles, respectively. The circles represent outliers beyond the whiskers.   Only MLGs or species with average relative abundances greater than 0.001% and correlated (P<0.05) with at least one index are shown for clarity.      File format: .docx         Dear Dr. Laurie Goodman, Enclosed please find our manuscript entitled "Connections between human gut microbiome and gestational diabetes mellitus" for your consideration of publication as an Article in

GigaScience.
Emerging evidence has revealed a link between the gut microbiome and human metabolic health.
The gut microbiome can modulate metabolic health and affect insulin resistance, and may play an important role in the etiology of gestational diabetes mellitus (GDM). However, studies focusing on changes in the gut microbiome during pregnancy and development of GDM have not been reported so far.
In the present study, we performed whole-metagenome shotgun sequencing analyses of the gut microbiome during pregnancy to explore associations between GDM and the composition and abundance of microbial taxonomic units and functional genes (43 GDM patients versus 81 healthy control subjects). The objective was to obtain a comprehensive understanding of the gut microbiome's role in the etiopathogenesis of GDM.
Our data showed that GDM patients had significantly higher abundance of Parabacteroides, Megamonas and Phascolarctobacterium. Metagenome-wide association study (MGWAS) identified 154,837 genes with significant abundance differences between GDM patients and healthy pregnant women. Furthermore, the GDM microbiome showed greater abundance of membrane transport and energy metabolism pathways, involving the phosphotransferase system, and the lipopolysaccharide biosynthesis and export system, that may contribute to the potential relationship between gut microbes and blood glucose tolerance. Our data also imply that fecal metagenome linkage groups may provide a mechanism and biomarkers for early detection of GDM or, potentially, for identifying risk of developing GDM.
We believe that findings from our study represent an novel relationship between gut microbiome composition and GDM status, by which the gut microbiome may potentially be used to identify individuals at risk for GDM, and may contribute to the pathogenesis of GDM. And as such should cover letter Click here to download Personal Cover cover letter-1.docx