A large-scale metagenomic survey dataset of the post-weaning piglet gut lumen

Abstract Background Early weaning and intensive farming practices predispose piglets to the development of infectious and often lethal diseases, against which antibiotics are used. Besides contributing to the build-up of antimicrobial resistance, antibiotics are known to modulate the gut microbial composition. As an alternative to antibiotic treatment, studies have previously investigated the potential of probiotics for the prevention of postweaning diarrhea. In order to describe the post-weaning gut microbiota, and to study the effects of two probiotics formulations and of intramuscular antibiotic treatment on the gut microbiota, we sampled and processed over 800 faecal time-series samples from 126 piglets and 42 sows. Results Here we report on the largest shotgun metagenomic dataset of the pig gut lumen microbiome to date, consisting of >8 Tbp of shotgun metagenomic sequencing data. The animal trial, the workflow from sample collection to sample processing, and the preparation of libraries for sequencing, are described in detail. We provide a preliminary analysis of the dataset, centered on a taxonomic profiling of the samples, and a 16S-based beta diversity analysis of the mothers and the piglets in the first 5 weeks after weaning. Conclusions This study was conducted to generate a publicly available databank of the faecal metagenome of weaner piglets aged between 3 and 9 weeks old, treated with different probiotic formulations and intramuscular antibiotic treatment. Besides investigating the effects of the probiotic and intramuscular antibiotic treatment, the dataset can be explored to assess a wide range of ecological questions with regards to antimicrobial resistance, host-associated microbial and phage communities, and their dynamics during the aging of the host.

this dataset can find all relevant details on experimental design, experimental approach and primary data processing in this manuscript.
-The authors have removed most of the concerning analysis details. Few comments; 1) Data description / OTUs. Are OTUs used or "ASVs". I understand sortMeRNA was used that includes qiime v1.x but current methods use a DaDa2 similar approach moving away from OTUs (as is qiime v2.x for over a long period already)?
RE: Although we are aware of the advances that ASV inference methods have led to in the analysis of 16S amplicon sequencing data, there are no rigorous performant methods, that we are aware of, to obtain ASVs from shotgun metagenomic reads. We therefore left the existing analysis unchanged.
2) Data description: to largely describe (overall picture) a simple alpha-diversity plot and beta-diversity ordination plot would be an easy/quick way that brings more meaningful insight in describing the overall samples and individual variation observed. Now all data of the various groups is averaged per piglet or mother in a bardiagram/Krona plot which does not make sense since there is a large diversity within those groups of treatments.
RE: With regards to the diversity, an analysis of diversity (included in our first submission) suggested that differences in microbial community composition (alpha and beta diversity) between treatment groups was mild, while more prominent shifts of diversity were detected between samples from distinct time points (age of the piglets). The revised manuscript now includes a PCoA to describe the beta diversity of samples. We removed the Krona plot of the mothers, and merged the Krona plot of the piglets to a panel in a combined figure with the PCoA (beta diversity). The PCoA plots highlight the strong effect of time/aging, and the (dis)similarity between the mothers and piglets at distinct time points in the trial. In the revised submission we also report alpha diversity indices.

3) Supplement Fig2
; make more clear that frequency is N samples. Also add bin size in legend for both sub figures. RE: Done. 4) Supplement Fig4; would this be a possible result of using cfu which does not take into account dead/alive ratio's? That discussion/mentioning seems missing in the current text.
RE: Yes, that is true. This has now been added to the manuscript.

Pig trial and sample collection
Animal studies were conducted at the Elizabeth Macarthur Agricultural Institute (EMAI) NSW, Australia and were approved by the EMAI Ethics Committee (Approval M16/04). The trial animals comprised 4-week old male weaner pigs (n=126) derived from a commercial swine farm and transferred to the study facility in January 2017. These were cross-bred animals of "Landrace", "Duroc" and "Large White" breeds and had been weaned at approximately 3 weeks of age (Supplementary  Figure 1). Each room had nine pens, consisting of a set of six and a set of three pens, designated a-f and g-i respectively, with the two sets of pens being physically separate, i.e. animals could come in contact with each other through the pen's bars within each set of pens, but not between sets. The rooms were physically separated by concrete walls and contamination between rooms was minimized by using separate equipment (boots, gloves, coveralls) for each room. In addition, under-floor drainage was flushed twice weekly and the flushed faeces/urine was retained in under-floor channels that ran the length of the facility, so that Rooms 1, 2 were separate from Rooms 3, 4 and flushing was in the direction 1 to 2 and 3 to 4.
The pigs were fed ad libitum a commercial pig grower mix of 17.95% protein free of antibiotics, via self-feeders. On the day of arrival (day 1) 30, 18, 18, and 60 pigs were allocated randomly to Rooms 1, 2, 3 and 4 respectively in groups of 6, 6, 6 and 6-7 pigs per pen respectively (Supplementary Figure 1A). Pigs were initially weighed on day 2, and some pigs were moved between pens to achieve an initial mean pig weight per treatment of approximately 6.5 kg (range: 6.48-6.70; mean±SD: 6.53±0.08). Pigs were weighed weekly throughout the trial, and behaviour and faecal consistency scores were taken daily over the 6-week period of the trial (Supplementary Table 2). Developmental and commercial probiotic paste preparations ColiGuard® and D-Scour™ from International Animal Health, were used in some treatment groups.
The animals were acclimatised for 2 days before the following treatments were   Table 2), were obtained throughout this study. At the end of the trial period, all samples were transported from EMAI to the University of Technology Sydney (UTS) for further processing. The experimental workflow is schematically represented in Figure 3.

Positive controls
As a positive control "mock community" for this study, four Gram positive (Bacillus

DNA extraction
Piglet and sow faecal samples, mock community samples, negative controls and probiotic samples (D-Scour™ and ColiGuard® paste) were allocated to a randomized block design to control for batch effects in DNA extraction and library preparation.
The faecal samples were thawed on ice first, followed by the probiotics and mock community samples. MetaPolyzyme (Sigma-Aldrich) treatment was performed according to the manufacturer's instructions except for the dilution factor, which we allowed to be 4.6 times higher. Immediately after incubation, DNA extraction was performed with the MagAttract PowerMicrobiome DNA/RNA EP kit (Qiagen) according to the manufacturer's instructions. Quantification of DNA was performed using PicoGreen (Thermofisher) and measurements were performed with a plate reader (Tecan, Life Sciences) using 50 and 80 gain settings. All samples were diluted to 10 ng/µL.

Library preparation
Sample index barcode design using a previously introduced method 4 yielded a set of 96 x 8nt sequences with a 0.5 mean GC content and none of the barcodes containing 3 or more identical bases in a row. Nine hundred sixty different combinations of i5 and i7 primers were used to create a uniquely barcoded library for each sample. The detailed sample-to-barcode assignment is given in Supplementary Table 3. Library preparation was carried out using a modification of the Nextera Flex protocol to produce low bias, called Hackflex, that allows the production of low cost shotgun

Normalization and sequencing
The master pool was sequenced on an Illumina MiSeq v2 300 cycle nano flow cell (Illumina, USA). Read counts were obtained and used to normalise libraries. The liquid handling robot OT-One (Opentrons) was programmed to re-pool libraries based on read counts obtained from the previous MiSeq run. The code used to achieve the normalization is available through our Github repository.
The read count distribution after normalisation is displayed in Supplementary   Figure 2. The normalized and purified pooled library was sequenced on an Illumina NovaSeq 6000 S4 flow cell at the Ramaciotti Centre for Genomics (Sydney, NSW, Australia), generating a total of 27 billion read pairs from 911 samples.

Comparison of the expected and the observed taxonomic profile of the positive controls
All the mock community members, in seven of the eight technical replicates, were detected by MetaPhlAn2 (version 2.7.7) (Supplementary Figure 3) Figure 3). An additional 25 taxa were detected, of which 18 and 7 were identified at the species and at the genus level, respectively. Contaminants were present at a higher concentration in three technical replicates (R3, R7, R8) with the most frequent contaminant (Methanobrevibacter spp.) being present in 5 of the 8 replicates (Supplementary Figure 5).  (Supplementary Figure 3). ColiGuard® contained a total of 20 contaminants, of which 16 and 4 were identified at the species and the genus level, respectively. Contaminants were present at a higher level in two technical replicates (R5, R7), with R7 displaying the most diverse and highest contamination rate (R7: 14 taxa; total contaminating reads: 2.67%; R5: 9 taxa; total contaminating reads: 0.30%).

Technical controls in metagenomic studies and methodological limitations
Taxonomic assignment of the raw reads from the positive controls was performed with MetaPhlAn2 8 which relies on a ca. 1M unique clade-specific markers derived from 17,000 reference genomes. Such a database to map against the positive controls suffices as these organisms are cultivable, and for this reason they are widely studied hence the sequences are known. This is not the case for real-world samples where mapping against a database (the completeness of which relies on studied and often cultivable organisms) would narrow the view on the true diversity within the sample.
Positive controls with well-studied members and known ratios within the samples have proven to be a valuable approach to assess consistency among technical replicates across batches and to detect possible biases derived from the DNA extraction method.
Systematic taxonomic bias in microbiome studies, resulting from differences in cell wall structures between Gram positive and Gram negative bacteria, have previously been reported; bead beating and sample treatment with enzymatic cocktails can modestly reduce this bias 9-12 . Although we implemented such steps in our workflow, it seems that, from the read abundance of our mock community, which contained three In terms of contamination we concluded that: a) contamination in our study was not batch specific; b) a problem of sample cross-contamination may have occurred at the DNA extraction step between neighbouring wells. During the bead-beating step of DNA extraction, the deep-well plate is sealed with a rubber sealing mat, rotated and placed in a plate shaker for the bead beating to take place. As leakage was observed around the wells despite the presence of the sealing mat, we consider that sample cross-contamination is most likely to occur during this step.

Taxonomic profiling of samples
All

Alpha and beta diversity
The abundance profile of all samples, based on the 16S rRNA reads that passed

Potential uses
This dataset can be utilised to assess a broad range of ecological questions pertaining to host-associated microbial communities of the post-weaning piglet. These include the assessment of: 1. the compositional and functional core faecal microbiome of the postweaning piglet, 2. the microbial changes that piglets undergo between the first and the 5 th week after weaning, 3. the degree of strain-host specificity, 4. the variability of microbiomes within or between host species, 5. the variability of microbiomes between different cross-breeds and small age differences of the hosts, 6. the degree of strain transfer from mothers to piglets, 7. the effects of two probiotic treatments and of intramuscular antibiotic treatment on the post-weaning pig faecal microbiome, 8.
species co-occurrence and co-exclusion, 9. the repertoire of antimicrobial resistance genes and how it is impacted by antibiotic and probiotic treatment, 10. the extent of within-host and population evolution of microbes over a 5-week period.

Data availability
The sequencing reads from each sequencing library have been deposited at NCBI     Scholarships. NSW DPI approved the paper before submission for publication.

Competing interests
D-Scour™ was sourced from International Animal Health Products (IAHP).
ColiGuard® was developed in a research project with NSW DPI, IAHP and AusIndustry Commonwealth government funding.

Author contributions
Pig