Niche differentiation in microbial communities with stable genomic traits over time in engineered systems

Abstract Microbial communities in full-scale engineered systems undergo dynamic compositional changes. However, mechanisms governing assembly of such microbes and succession of their functioning and genomic traits under various environmental conditions are unclear. In this study, we used the activated sludge and anaerobic treatment systems of four full-scale industrial wastewater treatment plants as models to investigate the niches of microbes in communities and the temporal succession patterns of community compositions. High-quality representative metagenome-assembled genomes revealed that taxonomic, functional, and trait-based compositions were strongly shaped by environmental selection, with replacement processes primarily driving variations in taxonomic and functional compositions. Plant-specific indicators were associated with system environmental conditions and exhibited strong determinism and trajectory directionality over time. The partitioning of microbes in a co-abundance network according to groups of plant-specific indicators, together with significant between-group differences in genomic traits, indicated the occurrence of niche differentiation. The indicators of the treatment plant with rich nutrient input and high substrate removal efficiency exhibited a faster predicted growth rate, lower guanine–cytosine content, smaller genome size, and higher codon usage bias than the indicators of the other plants. In individual plants, taxonomic composition displayed a more rapid temporal succession than functional and trait-based compositions. The succession of taxonomic, functional, and trait-based compositions was correlated with the kinetics of treatment processes in the activated sludge systems. This study provides insights into ecological niches of microbes in engineered systems and succession patterns of their functions and traits, which will aid microbial community management to improve treatment performance.


Kinetics modeling
The first-order [1] and Grau second-order [2] models were applied to the combined temporal COD data of all of the AS systems, and the model giving the best fit was adopted to estimate the COD removal kinetics of the AS system of each plant.The first-order model is , where Si and Se are the influent and effluent COD concentration, respectively; Q is the influent flow rate, V is the volume of the tank, and K1 is the coefficient of the first-order COD removal rate.Under pseudo-steady-state conditions with the rate of change of COD concentration (dS/dt) being negligible, the model simplifies to where K2 is the coefficient of the secondorder COD removal rate and Xb is the biomass concentration.After linearization, the model simplifies to The fits of the two kinetics models were evaluated by linear modeling using the R package lme4 (v1., with plant as the additional fixed effect.Specifically, for the first-order model, the linear model structure was Q(Si -Se)/V ∼ Se + plant; for the Grau second-order model, the linear model structure was VSi/Q(Si-Se) ∼V/Q + plant [3,4].The model with the higher R 2 value was utilized to estimate the COD removal kinetics in the AS system of each plant using the R package nls.mutltstart(v1.2.0).

Sample collection and metagenomic sequencing
From October 2018 to October 2019, samples were collected from the AS and AT tanks of the four plants (Figure S1a).Planktonic wastewater samples (500 mL) were collected biweekly from the well-mixed tanks by filtration (0.22 µm, 47 mm; Durapore, Germany), and biofilm samples were obtained monthly by scraping ~50 g of material from the cotton carriers.
Additionally, two months of intensive sampling (daily to weekly) were carried out at plant IG to obtain planktonic samples (from December 2018 to January 2019).All of the samples were stored at −80°C until genomic DNA was extracted using the DNeasy PowerSoil Kit (Qiagen, Germantown, MD, USA) according to the manufacturer's instructions.Three new autoclaved filters were processed in parallel with the samples and served as negative controls.
Metagenomic sequencing of a total of 146 AS and 186 AT samples from the four WWTPs was performed according to the manufacturer's workflow on an Illumina NovaSeq platform to generate 150-bp paired-end reads (Novogene, China).An average of ~14.8 million raw pairedend reads were generated per AS or AT sample.The quality of the sequencing reads was evaluated using FastQC [5] (v0.11.5), and illumina-utils [6] (v2.4.1) with the default parameters was used for quality filtering.An average of ~14.3 million paired-end reads per sample remained after quality control.In contrast, the three negative controls yielded only ~58,000 paired-end reads per sample, which was two-to-three orders of magnitude lower than the WWTP samples.Due to the negligible number of reads generated in the negative controls, no further decontamination procedures were deemed necessary for the AS and AT samples.

Microbial community assembly mechanism and trajectory directionality
The Sloan neutral model [7], which predicts the relationship between the occurrence frequencies of taxa and their relative abundances across meta-communities, was used to determine the assembly mechanisms of plant-specific indictors and non-indicators in individual plants during temporal succession.Based on the deviation from the 95% confidence interval around the neutral prediction, all of the HQ rMAGs of each plant were classified into the three predicted partitions: above neutral, below neutral, or neutral.The normalized stochasticity ratio (NST) of the microbial communities, based on all of the HQ rMAGs or non-indicators in each S3 plant, were quantified using a null model-based approach in the R package NST [8] (v3.0.6).
A NST greater than 0.5 indicates that a community is likely to be neutral.Trajectory directionality analysis of the plant-specific indicators and non-indicators in individual plants was performed based on the Bray-Curtis dissimilarity in the R package ecotraj (v0.1.0)to evaluate the temporal dynamics of microbial community members.In the trajectory directionality analysis, random sampling was applied to the non-indicators to ensure that the sample size between the plant-specific indicators and non-indicators was the same.

Community weighted means (CWMs) of genomic traits and trait-based composition
Genomic traits at the community level of samples from each plant were calculated using all of the plant-specific indicators and the CWMs method [9] as follows: ∑ , / 0 /1& -/ , where pi is the relative abundance of the plant-specific indicator i (i = 1, 2, …, S), and xi is the trait value of plant-specific indicator i.Furthermore, the dissimilarity in trait-based compositions between samples was calculated based on the CWMs of the four genomic traits of plant-specific indicators or all HQ rMAGs using the Bray-Curtis dissimilarity.Specifically, the trait-based composition dissimilarity was estimated by first normalizing the values of genomic traits, then summing the CWMs of all four genomic traits, and finally determining the Bray-Curtis dissimilarity distance between two communities [9].

Time-decay slope and halving-time
The temporal succession of taxonomic, functional, and trait-based compositions in each plant were evaluated by the time-decay slope [10] and halving-time [11].The time-decay slope is an estimated rate of succession per unit time that is obtained via a linear regression equation that models the relationship between logarithm community dissimilarity (DS) and logarithm time interval (T) according to log DS = a + b log10 (T), where T is the difference between two sampling dates, a is the intercept, and b represents the succession rate.A high value of b indicates a rapid succession.
The halving-time is an estimate of the time at which the community similarity becomes half of what it was initially.This metric was determined using the logarithmic decay model S = c ln (T) + int, where S represents the community similarity in a time interval (i.e., the difference between two sampling dates; denoted as T), c is the rate of time decay, and int is the intercept of the model.By assuming a community similarity of 1 when the time interval is 0, the corresponding halving-time (HT) is ./= 0 ( $ % # -!() * ) , where S0 represents the initial community similarity in the shortest time interval [12].A long halving time indicates a slow succession over time.
Fig. S1 Schematic diagram of the wastewater treatment plants and environemental condtiondions of the activated sludge (AS) and anaerobic treatment (AT) systems in four plants.(a) Schematic diagram of the wastewater treatment system setup.(b) Principal coordinate analysis (PCoA) of environmental conditions of the samples from the respective AS and AT systems in the four plants over time.In the PCoA, the points are colored according to plants and ellipses are colored based on the multivariate normal distribution at a 95% confidence interval for each plant.(c) Assessment of the effects of individual environmental parameters on the community variations of AS and AT systems using metagenomic short reads and high-quality representative metagenome-assembled genomes (HQ rMAGs) through PERMANOVA analysis.

2 Fig.
Fig. S6 Quality of the metagenome-assembled genomes (MAGs) and the numbers of classified and unclassified representative metagenome-assembled genomes (rMAGs) recovered from the activated sludge and anaerobic treatment systems.(a) Numbers of low-, medium-, and high-quality MAGs recovered.(b) Distribution of the quality metrics of the medium-and high-quality MAGs recovered.Each point represents a MAG.(c) Number of high-quality rMAGs whose 5S, 16S, and/or 23S rRNA genes were detected.(d) Number of classified and unclassified medium-and high-quality rMAGs based on GTDB-Tk at various taxonomic ranks.
c c y tc c b b 3 B e n z o a te d e g r a d a ti o n B e n z e n e d e g r a d a ti o n S a li c y la te d e g r a d a ti o n T r a n sc in n a m a te d e g r a d a ti o n

Fig. S8 F r u c t o s e d e g r a d a t i o n A c e t o c l a s t i c m e t h a n o g e n e s i(
Fig. S8 Profiles of the potential functions of the high-quality representative metagenome-assembled genomes (HQ rMAGs) in bacterial lineages from the (a) activated sludge and (b) anaerobic treatment systems.Complete pathways are indicated by solid circles, and the categories of pathways are color-coded.The bar chart on the left indicates the taxonomy of HQ rMAGs at the phylum level.Scale bar indicates the tree scale.

Fig. S15
Fig. S15 Potential metabolic functions of the high-quality representative metagenome-assembled genomes (HQ rMAGs) in the activated sludge and anaerobic treatment systems related to (a, c) carbon and (b, d) nitrogen cycles.The thickness of an arrow is proportional to the number of HQ rMAGs capable of carrying out each metabolic step.The numbers of HQ rMAGs and proportions of HQ rMAGs (%) capable of completing each metabolic pathway are indicated.The distribution of the metabolic pathways in each group of plant-specific indicators is shown in a bar plot.