Mixed waste contamination selects for a mobile genetic element population enriched in multiple heavy metal resistance genes

Abstract Mobile genetic elements (MGEs) like plasmids, viruses, and transposable elements can provide fitness benefits to their hosts for survival in the presence of environmental stressors. Heavy metal resistance genes (HMRGs) are frequently observed on MGEs, suggesting that MGEs may be an important driver of adaptive evolution in environments contaminated with heavy metals. Here, we report the meta-mobilome of the heavy metal-contaminated regions of the Oak Ridge Reservation subsurface. This meta-mobilome was compared with one derived from samples collected from unimpacted regions of the Oak Ridge Reservation subsurface. We assembled 1615 unique circularized DNA elements that we propose to be MGEs. The circular elements from the highly contaminated subsurface were enriched in HMRG clusters relative to those from the nearby unimpacted regions. Additionally, we found that these HMRGs were associated with Gamma and Betaproteobacteria hosts in the contaminated subsurface and potentially facilitate the persistence and dominance of these taxa in this region. Finally, the HMRGs were associated with conjugative elements, suggesting their potential for future lateral transfer. We demonstrate how our understanding of MGE ecology, evolution, and function can be enhanced through the genomic context provided by completed MGE assemblies.


Introduction
An environmental mobilome, or meta-mobilome, consists of all mobile genetic elements (MGEs) found within an environmental sample [1].MGEs are pieces of DNA that can move within the genome of an organism or between the genomes of two different organisms.Prokaryotic MGEs include transposable elements, plasmids, and viruses/phages.MGEs are the major vehicle for the movement of genetic material between prokaryotes, a process known as horizontal gene transfer (HGT) [2].The spread of MGEs within microbial communities allows for the acquisition of novel traits that may facilitate adaptation to f luctuating resources and environmental stressors, driving the ecological diversification of close microbial relatives [3].Common MGE-encoded traits include novel carbon substrate utilization [4], antibiotic resistance [5], toxin production [6], and heavy metal resistance [7].
Anthropogenic compounds that induce cellular stress responses can increase rates of MGE mobilization in microbial populations [8].Thus, heavily polluted environments may be major hot spots of HGT.The Oak Ridge Reservation (ORR), located in Oak Ridge, TN, is a well-characterized experimental site for examining the ecological impacts of legacy industrial waste [9].The near-source subsurface is highly acidic and contaminated with a mixture of nitrate, uranium, and other heavy metals (e.g., Ni, Co, Cu, Fe, Al, Cd, Mn, Hg).Prior studies of ORR microbial populations have suggested that their adaptive evolution may have been facilitated by the lateral transfer of MGEs.For example, metal eff lux pumps and mercury resistance genes were shown to be highly mobilizable among Rhodanobacter species that predominate in the contaminated ORR groundwater [10,11].Martinez et al. [12] found evidence for horizontal transfer of heavy metal-translocating P-type ATPases among a large collection of isolates from the ORR.
High-throughput sequencing has enabled the untargeted, culture-independent analysis of MGEs from diverse environments [13][14][15][16][17].However, only a subset of these studies reported circularized MGEs.Recently, using a method to enrich for plasmids during DNA extractions, Kothari et al. [18] recovered 615 circularized MGEs from an aquifer, and Kirstahler et al. [19] recovered 159 322 circularized MGEs from global sewage samples.However, in the latter case, the circularized MGEs skewed toward shorter sizes (all <17 400 bp).Similarly, Kothari et al. recovered a relatively small number of MGEs >20 000 bp in length (5.7%, 35/615 circular elements) with only three (0.5%) being >100 000 bp in length.These results may represent a filtering effect of the targeted enrichments for MGEs from environmental samples resulting in a loss in the diversity of mid-(∼20 000-100 000 bp) to larger-sized (≥100 000 bp) MGEs that are known to be widely distributed in microbial populations [20,21].However, assembling complete, circular MGEs from metagenomes without enrichment is challenging because (i) MGEs are often in low abundance and (ii) MGEs often share sequences with bacterial genomes and other MGEs [22].Their low abundance means that the MGEs may not be fully covered by short-read sequencing data.The shared sequences result in complex and tangled assembly graphs, resulting in fragmented assemblies [23].
The cataloging of MGE gene function predictions and host range is essential to accurately model microbial species abundances, population dynamics, and functional output of environmental microbiomes at the ecosystem level l [24].Thus, there is a continued need for method optimization for the recovery of diverse MGEs across a representative range of sizes from metagenomes.Here, we use SCAPP, or Sequence Contents-Aware Plasmid Peeler, an algorithm designed for the specific purpose of reconstructing plasmid sequences from metagenomic data [22].SCAPP leverages biological knowledge of plasmid sequences and uses plasmidspecific genes to annotate the assembly graphs.Nodes in the graph are assigned weights based on the likelihood that they represent plasmids.SCAPP prioritizes peeling off circular paths in the assembly graph that include plasmid genes and highly probable plasmid sequences.It also utilizes plasmid-specific genes and plasmid scores to filter out potential chromosomal false positives.Despite its initial design for plasmid circularization, SCAPP is also very good at circularizing other MGEs, likely due to the genetic similarities between all MGEs [25] and its ability to extract circular DNA elements from assembly graphs.We hypothesized that distinct populations of MGEs would be present in the highand low-contamination regions of the ORR subsurface.We further predicted that MGEs from the highly contaminated regions of the ORR subsurface would be enriched in clusters of multiple HMRGs due to their role in the adaptive evolution of microorganisms in these anthropogenically disturbed environments.

Sampling and geochemical data
The majority of the sub-surface well geochemistry data were obtained from publicly available datasets, either from Smith et al. [26], Wilpiszeski et al. [27], Gushgari-Doyle et al. [28], or from the publicly accessible US Department of Energy Office of Science Subsurface Biogeochemical Research web page (https://www.esd.ornl.gov/orifrc/)[29].For this study, we performed additional mercury analyses by ICP-MS (Supplemental Methods).Not all well samples collected for metagenome sequencing had a complete set of associated geochemical metadata.However, uranium (U) measurements were available for all samples and were used to distinguish between high and low contamination sites.We divided the sampling sites into two categories based on the US Environmental Protection Agency (EPA) maximum contaminant level U in drinking water [30]: (i) highly contaminated sites with [U] > 0.126 μM and (ii) low contamination sites with [U] < 0.126 μM).
For the sediment samples, contamination levels were inferred from the [U] of the groundwater of adjacent wells.

Metagenome sequencing and mobile genetic element assembly
Samples for metagenome sequencing and assembly were collected from well water pumped from 17 sites across the ORR located within the Bear Creek Valley in Oak Ridge, TN, USA (Fig. 1A, Tables S1 and S2).Groundwater samples from wells DP16D, FW021, FW104, FW106, FW215, FW300, FW301, FW303, FW305, FW602, GW715, and GW928 were pumped between November 2012 and February 2013 and were processed and initially sequenced as part of a study by Tian et al. [31].Sediment samples from borehole FW306 were collected in June 2015 and the metagenomes were initially sequenced as part of a study by Wu et al. [32].Groundwater samples from FW115, GW271, FW106, and sediment samples from boreholes EB271 and EB106 were collected in March and April 2017 as described in Lui et al. [33].All metagenomics sequencing data for this current study were re-processed through the same bioinformatics pipeline as described in Lui et al. [33].We only used samples that were sequenced using the same technology (2x150bp Illumina reads).We also ensured that the samples were sequenced deeply (>5Gbp per assembly, except for EB271-05-01 and EB106-05-06).Brief ly, the Illumina reads were quality-filtered and trimmed using BBTools 38.86 [34] and assembled with SPAdes Version 3.15.4[35,36].Samples were co-assembled if they were replicates taken from the same physical groundwater or sediment sample.Co-assemblies are outlined in Supplementary Table 1 of Lui et al. [33].Reads for the 2017 EB106, EB271, FW115, GW271, and FW106 metagenomes were deposited in NCBI's Sequence Read Archive in BioProject PRJNA1001011 under accession numbers SAMN36786281-SAMN36786357. The assembly graphs and sequencing reads were used as inputs into SCAPP [22] using default parameters to obtain circular DNA elements.The circular elements were de-replicated within the high and low [U] sample sets before further analyses.

Classification of circular elements
Circular elements representing viral genomes were predicted using VirSorter (v1.0.5) [44] implemented in the DOE KBase [45].Low confidence category 3 and 6 predictions were removed from the analysis.These circular elements are referred to as "viral genomes" in the text.Taxonomic classification of categories 1, 2, 4, and 5 assemblies was performed using vConTACT2 (v0.9.19) [46] implemented in KBase with default parameters.PhaGCN with default parameters was utilized for viral taxonomy classification based on ICTV2022 [47,48].PhaTYP with default parameters was utilized for phage lifestyle prediction [49].We compared the remaining non-viral circular elements to known plasmids in the Plasmid Database (PLSDB, v2021_06_23_v2) using the default search parameters for mash dist, P = 0.1, distance = 0.1.The circular elements with similarity to known plasmids are referred to as "plasmids" in the text.The circular elements that were not already classified by PLSDB or VirSorter were further analyzed by searching the eggNOG-mapper annotation files for mobile-genetic element associated Pfams (Table S3).These circular elements are referred to as "Unclassified MGEs" in the text.For the remaining circular elements, we manually examined the annotation files from the comparisons to CDD models, SMART, NCBI's Protein Clusters models, TIGRFAMs, Pfam hidden Markov models [50], and PGAP hidden Markov models.This allowed us to identify additional "Unclassified MGEs".All remaining circular elements are referred to as "cryptic circular elements" in the text.

Host prediction
Viral hosts were predicted using two parallel methods: (i) k-mer similarity: as a broader primary method, hosts for all assemblies were predicted using the Prokaryotic Virus Host Predictor (PHP) tool [51].PHP has an accuracy of 80% at the phylum level.These phylum-level assignments were used for further analysis.(ii) CRISPR spacer matches: a BLASTn search of viral assemblies was performed against the IMG/VR v4 databases of isolate and uncultivated microorganism CRISPR spacers [52].Default parameters were used.The results were filtered to allow for 0 or 1 mismatches between the sequences.
Phylum-and class-level taxonomic assignments were performed for the non-viral circular elements using a gene taxonomy-based approach.Phylogenetic assignments for each annotated coding sequence were extracted from the eggNOGmapper annotation files.The phylogenetic assignments were tabulated for each non-viral assembly.A taxonomic assignment at the phylum level was made if >50% of CDS belonged to the same phylum or class.A similar analysis was performed for class-level predictions.For all downstream analyses, viral hosts were considered separately from non-viral hosts, and different taxonomic levels were analyzed independently.

Functional analyses
To examine HMRGs and antibiotic resistance genes (ARGs) on the circular elements, COGs annotations were extracted from the eggNOG-mapper annotations for all analyzed circular elements.From the COG database, we curated lists of heavy metal resistance and antibiotic resistance-related COGs (Table S4).We then searched the annotation files for these curated COGs.These COG counts were normalized against the total number of coding sequences (CDS) found on the circular elements in each sample set (i.e.high [U], low [U] sets).For the identification of conjugative elements, we curated a list of conjugative transfer system COGs and searched the eggNOG-mapper annotation files for these COGs (Table S5).Normalization was performed by the same method described above.Functional enrichment calculations, using the absolute gene counts, were performed using a two-tailed Fisher exact test.The Fisher exact test has been applied previously to examine functional enrichment within microbial genomic datasets [53][54][55][56].For these statistical comparisons, the Benjamini-Hochberg false discovery rate (FDR) multiple test correction was applied [57].

Metagenome-assembled putative mobile genetic elements from the Oak Ridge Reservation subsurface
From a total of 32 sediment and 25 groundwater metagenomes from the ORR [31,32,33], we input the assembly graphs into SCAPP to obtain 1713 unique, circular elements (>1000 bp in length) representing putative MGEs [62].We divided these circular elements into two contamination levels: those originating from (i) 7 high [U] or (ii) 10 low [U] sites (Fig. 1A and B, Table S1).For the sediment samples, contamination levels were inferred from the groundwater of adjacent wells.Compared with low [U] sites (median [U] = 0.0 μM), the high [U] (median [U] = 58.9μM) had higher levels of other metal contaminants including manganese (Mn), cobalt (Co), nickel (Ni), copper (Cu), cadmium (Cd), and mercury (Hg) as well as higher concentrations of nitrate and lower pH (Fig. 1C).
We manually examined the annotation files of these circular elements to identify false positive MGEs, removing 98 assemblies that included (i) partial mitochondrial/chloroplast genomes or (ii) likely long repeat regions that were inappropriately circularized.This resulted in 338 putative MGEs from high [U] sites and 1277 putative MGEs from low [U] sites (Table S6).Out of the 1615 circular elements, 259 (16%) were greater than 20 000 bp in length and 22 (2%) were greater than 100 000 bp in length.A bimodal distribution with peaks at ∼3000 bp and ∼70 000 bp was observed in both sets (Fig. 1D).However, the distribution of the circular elements from the low [U] sites was skewed toward the smaller sizes (i.e., the ∼3000 bp peak).The average length of the circular elements from the high [U] samples was significantly longer (22 247 vs. 11 880 bp; two-tailed Welch's t-test; P = 0.0008) than those from the low [U] samples.We next examined the structural and functional implications of this differential size distribution between these two groups of putative MGEs.

Classification of circular elements
Expected circular MGEs in our dataset include plasmids and viruses as well as integrative and conjugative elements (ICEs), integrons, and transposons (Fig. 2A, Table S6).We identified 111 unique viral genomes (7% of all circular elements) with 39 from the high [U] sites and 72 from low [U] sites (Fig. 2B, Table S7).The size distribution of viral assemblies was similar (two-tailed Welch's t-test; P > 0.05) between the two datasets (Fig. S1A).A similar proportion of predicted viral assemblies from high and low [U] sites clustered with known viral taxa within a gene-sharing network (Fig. S1B and C).Lifestyle analysis predicted that, from the high [U] regions, 23/39 (59%) of the genomes were temperature phages while 15/39 (38%) of the genomes were virulent (i.e.lytic) phages.While, from the low [U] regions, 20/72 genomes (28%) and 51/72 genomes (71%) were predicted temperate and virulent phages, respectively (Fig. S1C).
We then compared the 1504 non-viral circular elements with the PLSDB.Only 23 circular elements (16 from the high [U] sites and 7 from the low [U] sites) had similarity to known plasmids (Fig. 2A and B, Table S8).These known plasmids were associated with various bacterial hosts and originated from both animal and environmental samples.The most common hosts were Acinetobacter and Sphingobium for the high and low [U] regions, respectively.The remaining unclassified circular elements were then analyzed for known MGE-related Pfam domains (Tables S3 and S6) [19].From the high [U] sites, 121 circular elements were identified with MGE-related Pfams, including 75 (62%) with plasmid replication domains.From the low [U] sites, an additional 279 circular elements were identified, including 54 (19%) with plasmid replication domains (Table S9).As highlighted with the plasmid replication domains, these "unclassified MGE" populations are distinct between the high and low [U] sites (Fig. S2).We further annotated the remaining 1104 circular elements using several additional protein domain databases (see Methods section for complete list), allowing us to identify 15  S6).
The remaining 43 and 64% of the circular elements from the high and low [U] sites, respectively, are cryptic, with no known domains involved in MGE replication or mobilization (Fig. 2A, Table S6).While these cryptic circular elements were significantly shorter (AVG: 3026 bp vs. 30 288 bp; two-tailed Welch's t-test; P = 4E-11) (Fig. S3) than the non-cryptic elements, these sequences nonetheless encode genes that may serve important ecological functions (Table S10).For example, the cryptic circular element AA_WF_A-C-Q_17, originating from high [U] groundwater, carries nine genes with putative metal resistance functions.While AA_WF_A-C-Q_17 does not encode any known MGE-related Pfams, a gene encoding a DUF6088 family protein is present on this assembly (Table S11).This uncharacterized protein family is distantly related to the AbiEi antitoxin family of proteins, suggesting that this cryptic circular element could be a true novel MGE.Among the most common domains annotated on these cryptic circular elements included non-ribosomal peptide synthases involved in the biosynthesis of various secondary metabolites, chemotaxis proteins, pilus assembly proteins, and a variety of response regulators (Table S10).Overall, we propose that the difference in circular element size distributions (Fig. 1D) between the two sample sets is attributable to the distinct populations of MGEs present.

Predicted hosts of circular elements
Of the viral genomes from the high [U] regions, 47% were predicted by k-mer similarity to infect Proteobacteria hosts (Fig. 3A, Table S7) compared with 29% of the viral genomes from the low [U] regions.In the high [U] regions of the ORR, Proteobacteria predominate, while pristine regions are characterized by greater microbial diversity [63,64].Several Proteobacteria genera in the high [U] regions of the ORR are believed to play significant roles in nitrogen cycling in the subsurface, a process of critical interest due to the elevated levels of nitrate contaminating these areas of the ORR [63].For example, denitrifying Rhodanobacter spp.dominate in the most contaminated regions of the ORR subsurface.A previous study found that 82% of the microbial community in contaminated (i.e.high [U]) groundwater well FW106 was Rhodanobacter [64].
In parallel, we performed CRISPR spacer analysis against the IMG/VR database to determine more specific taxonomic assignments.Five out of 111 viral genomes were matched to an isolate host.An additional, 13 viral genomes were matched to metagenome-associated hosts.As an example, we identified a putative Rhodanobacter phage (EB106_02_03_10) (Fig. 3B) in the high [U] samples.EB106_02_03_10 is a predicted temperate phage with XerC and RecT-type recombinases that may facilitate integration in the host genome (Table S11).Interestingly, the EB106_02_03_10 genome encodes a BglII endonuclease that may provide host immunity against simultaneous infection when this phage is in its lysogenic life stage [65].Also present in the genome is a gene encoding a putative oxidoreductase, but it is unclear if this represents a novel auxiliary metabolic gene (AMG) or a phage lifecycle-related gene.This phage genome also encodes two different cell wall hydrolases that can promote cell lysis and drive biomass turnover in the ORR subsurface [66][67][68].
Host predictions for the remaining (i.e.non-viral) circular elements were performed using gene taxonomy information extracted from the eggNOG-mapper annotation files (Table S6).Due to the absence of known genes, a host could not be determined for 53% of the non-viral circular elements.Like the viral genomes, the most frequently identified host phylum in both sets was the Proteobacteria, with a greater proportion observed among the circular elements originating from high [U] samples (Fig. 3C).Where matches were previously made to plasmids in the PLSDB (Table S8), host predictions were cross-checked with

Functional analysis of mobile genetic element assemblies
We next compared the functional gene content of the circular elements from high and low [U] regions of the ORR subsurface.We hypothesized that both HMRGs and ARGs would be enriched in the high [U] circular elements due to the selective pressure of the heavy metal contamination.

Minimal enrichment of antibiotic resistance genes in circular elements from high [U] sites
In some environments, heavy metal contamination can co-select for ARGs alongside HMRGs [69][70][71].However, the ARG count in our complete circular element dataset was low (Table S12).The ARG counts were normalized against the total CDS count for each sample set (i.e., high [U] and low [U] circular elements) to account for the size differences in the circular elements noted in the prior section.The overall ARG abundance was similar between the high and low [U] sample sets (two-tailed Fisher's exact test, P > 0.05) (Fig. S4A).When considering individual genes, no enrichment pattern was observed (Fig. S5B).We repeated this analysis in a more conservative manner by first removing the cryptic circular elements from the dataset.This calculation yielded similar results to the analysis that had included the cryptic circular elements (Table S13).Overall, these data do not support a substantial enrichment of ARG among our putative MGEs from the high [U] samples.

Assemblies from high [U] sites are enriched in heavy metal resistance gene clusters
As MGEs are frequently vectors of HMRGs [12,72], we next examined our dataset for genes involved in heavy metal resistance (Tables S4, S11, S12).We found a significant (two-tailed Fisher's exact test, P < 0.0001) overabundance of HMRG content on the circular elements from the high [U] sites (Fig. S5A).The assemblies from the high [U] regions encoded 32 HMRGs per 1 million base pairs (∼6.4% of total CDS) while the assemblies from low [U] sites encoded 6 HMRGs per 1 million base pairs (∼1.5% CDS).This trend was largely replicated in the analysis of individual HMRGs (Fig. 4A).Major exceptions included genes involved in arsenic (acr3, arsB, arsC, arsA) and tellurium resistance (terC, tehB) where no enrichment was observed in either direction.No HMRGs were enriched on the circular elements from the low [U] sites.We repeated this analysis using datasets that excluded the cryptic elements which yielded the same results as the previous analysis with the cryptic elements, except for cusF which was no longer significantly enriched in the high [U] circular elements (Table S14).
We predicted that MGEs from higher [U] regions may be more likely to encode multiple HMRGs than those from the low [U] regions due to the multi-metal components of the contaminant plume.We found that HMRG on the MGEs from the high [U] regions are more likely to co-occur in proximity in the form of gene clusters than those from the low [U] regions (Fig. 4B).We next analyzed specific HMRG co-occurrence patterns on the circular elements (Fig. 4C).Among the high [U] circular elements, we observed frequent co-occurrences of merR, chrB1, chrA, copZ, merA, arsR, cusA, cusF, pcoB, czcD, and zntA with each other.The HMRGs confer resistance to a wide range of metals.An example of this HMRG-clustering phenomenon is seen on the plasmid EB106_03_01_3 from the high [U] region.This plasmid contains merA, merR, zntA, czcD, and arsR in close physical proximity to each other (Fig. S5B).In contrast, the low [U] circular elements primarily had co-occurrences of individual HMRGs and their corresponding metal-responsive transcriptional regulators, merR and arsR (Fig. S5B).

Partitioning of HMRGs between classes of MGEs
We hypothesized that both plasmids and viral genomes would be significant vectors of HMRGs in the ORR subsurface.Viruses that infect bacteria or archaea may significantly augment host metabolic potential during infection via AMGs.AMGs encode diverse functions such as nutrient metabolism [73], virulence factors [74], oxidative stress defense [75], and toxicant (e.g.antibiotics and heavy metals) resistance [76].We identified AMGs in our predicted viral assemblies by excluding genes involved in viral replication, structure, and function (Table S15).No AMGs related to heavy metal resistance were found in the assemblies from the low [U] regions.Within the high [U] regions, a single viral assembly (AA_WF_G-H-W_1) carried AMGs that may function in heavy metal resistance including genes encoding TerD domain-containing proteins and TelA family protein, which have annotated functions in tellurium resistance but likely represent general cell envelope stress resistance proteins (Table S15) [77,78].Additionally, a gene encoding a TerC family protein was present; however, TerC was recently found to be involved in the metalation of exoenzymes during protein secretion [79].While Proteobacteria predominate in the contaminated ORR subsurface [26], AA_WF_G-H-W_1 is predicted to infect a member of the phylum Caldiserica.Thus, the phages in our dataset more are likely to impact Proteobacteria population dynamics in the contaminated ORR subsurface through biomass turnover rather than the introduction of novel adaptive genes via HGT.
In contrast to the viral genomes, assemblies with similarity to known plasmids in the PLSDB (i.e., predicted plasmids) carried a substantial proportion of the HMRGs (19% of HMRG-encoding MGEs) despite representing only 1% of the total MGEs in our dataset (Fig. 5).The remaining HMRGs were associated with (i) the "unclassifiable" circular elements that encode MGE-related protein domains (70% of HMRG-encoding MGEs) and (ii) the cryptic circular elements (9%) (Fig. S3).We examined the host predictions for these non-viral MGEs that carry HMRGs.At the class level, Beta/Gammaproteobacteria are the most common hosts for the HMRGs from the high [U] regions while Alphaproteobacteria are the most common hosts of these genes in the low [U] regions (Fig. 5B).
We next explored the possible mechanisms by which the MGEassociated HMRGs could move within the microbial populations in the ORR subsurface.As described above, the viral genomes in our dataset are not the main vectors for the movement of HMRG within the ORR microbial populations.However, 54 of the non-viral circular elements carried at least one COG associated with conjugal transfer systems found on conjugative (i.e., selftransmissible) plasmids and integrative and conjugative elements (ICEs).We examined the co-occurrence patterns of HMRGs and genes encoding conjugative transfer machinery (Table S5).We found that the HMRGs in our dataset were significantly more likely to be associated with a conjugative element than a nonconjugative element (two-tailed Fisher's exact test, P < 0.0001) (Fig. 5C).In fact, 90% of the identified HMRG were located on a conjugative element.These trends remained even when the data were analyzed by contamination levels (i.e., high vs. low [U]).Thus, most HMRGs in our dataset have a high potential for future conjugative transfer within the ORR community, independent of the current level of contamination at their origin site.

Discussion
This is the first study to characterize the meta-mobilome of a highly contaminated subsurface environment.Through the usage of short-read sequences alone, we recovered 1615 unique circular elements that we propose represent MGEs.In this dataset, we simultaneously identified and characterized viral genomes, plasmids, and other types of MGEs, including potentially novel families.Thus, this work diverges from prior studies that have primarily focused on one class of MGEs (e.g.plasmids, viruses) within the meta-mobilome.This agnostic approach allowed for the examination of the partitioning of key functional genes between different classes of MGEs.For example, we found that the HMRGs in our dataset were exclusively associated with non-viral contigs-many of which were predicted conjugative elements.Additionally, we speculate that the lack of a "plasmid safe" DNA digestion step employed by previous studies allowed us to recover a larger number of substantially longer putative MGEs than prior metamobilome studies [19].
Environmental stressors may select for a pool of MGEs that confer fitness advantages to their hosts within the specific environmental context [13,19,80,81].We observed an enrichment of multiple HMRGs in our meta-mobilome extracted from samples from high [U] sites.The HMRG enrichment pattern was consistent with the contaminant profile of the ORR subsurface.For example, we observe the enrichment of genes involved in Cd, Cu, and Co resistance-all of which are found at sufficiently elevated concentrations in the contaminated ORR subsurface to be toxic to certain native microbiota [82][83][84].In contrast, the abundances of genes involved in Te and As resistance were similar between the two datasets.In the contaminated regions of the ORR, arsenic is slightly elevated relative to the background ([As] Median = 5 nM vs. 0.5 nM); however, these values are orders of magnitude below typical toxicity thresholds for arsenic in microorganisms [82].Likewise, tellurium is an exceptionally rare element that is not a component of the ORR contamination plume [26].Additionally, several studies have also suggested that the classical "tellurium resistance genes" (e.g.ter, teh) may have primary functions unrelated to tellurium resistance, such as phage and antibiotic resistance [85,86].
A distinctive feature of the MGE-associated HMRGs from the high [U] regions of the ORR was their physical clustering on the circular elements.The non-random associations of functionally similar genes are well-characterized in bacteria [87].However, the evolutionary mechanisms driving this clustering remain controversial and are likely to be context specific.The Selfish Operon Model proposes that gene clusters are formed and maintained through the emergent benefit to the genes themselves rather than their host organism [87].In this model, the clustering of genes is essential for the successful horizontal transfer of genes that individually confer minimal selectable function.For example, the clustering of multiple ARGs on plasmids can be explained by an extension of the Selfish Operon Model [3].Co-localization of ARGs on a single plasmid may be a successful strategy for longterm ARG persistence in human pathogens that are frequently targeted by multi-antibiotic therapy.We can apply a similar line of logic to the HMRGs in our dataset: individual HMRGs may provide minimal selectable function in a complex high-stress environment, such as the ORR subsurface, where multiple metal contaminants co-exist.Successfully retained HMRGs may be frequently co-localized with other HMRGs.In a larger-scale example of this phenomenon, Staehlin et al. [88] used a molecular clock to link the origin and diversification of a 19-gene copper homeostasis and silver resistance transposon in Enterobacteriaceae to increases in human metallurgical activity throughout history.
In our study, a substantial proportion of the MGE-encoded HMRGs (all sites: 94%; H: 94%; L: 93%) were associated with Proteobacteria hosts.A recent study by Finks and Martiny [89] found that plasmid traits varied significantly with their host's taxonomic assignment at the phylum level using a large dataset of publicly available plasmids.However, certain trait variation was still controlled, in part, by differences in the environment of origin.When we only consider circular elements with predicted Proteobacteria hosts, the HMRG content remains significantly enriched (two-tailed Fisher's exact test, P < 0.0001) among those originating from high [U] regions of the ORR subsurface compared with those from the low [U] regions.Considering our findings here in the context of prior work [89,90], we propose a model for our system where host taxonomy establishes a baseline "genetic potential" for the acquisition of novel MGE-encoded genes.This genetic potential may be controlled, in part, by positive or negative interactions between horizontally acquired resistance genes and cellular metabolic genes in the host genome [91].However, selective pressures in the host's environment control the enrichment of certain MGE-encoded traits within a taxonomic group.As a final point of consideration: HMRG are best characterized in readily culturable Proteobacteria [92].Thus, the association between the two in our environment may ref lect, to an extent, this undersampling in the scientific literature.However, this association is not entirely spurious as Proteobacteria are indeed highly enriched within the high [U] ORR subsurface relative to low [U] regions of the site [63].The ability to rapidly acquire and maintain MGEs carrying clusters of HMRGs may have been one factor that has facilitated the success of members of this phylum in the ORR subsurface.
Viral genomes have been proposed as significant vectors for the shuttling of metabolic genes between different host cells [73].However, we identified only a singular instance of a HMRG encoded on a viral genome.Instead, we found that the HMRGs in our dataset were largely associated with conjugative elements.These "conjugative elements" likely include both conjugative plasmids and ICEs [93].These HMRG-carrying conjugative elements included both (i) circular elements with high similarity to known plasmids in the PLSDB as well as (ii) potentially novel conjugative elements.Our results mirror those of Finks and Martiny [89] who found a significant association between resistance genes (i.e., antibiotic, metal, and biocide resistance) and the MOB family relaxase, which is encoded on both conjugative and mobilizable plasmids.The underlying mechanism for this association between resistance genes and conjugation/mobilization genes is unclear and warrants further investigation.One possibility is that simple genetic traits conferring a strong and immediate selective advantage (i.e.resistance genes) may increase the fitness of the new host strain, offsetting the fitness costs of maintaining these larger plasmids during the initial period of plasmid-host adaptation following acquisition, promoting the linkage of resistance genes to these classes of larger plasmids [94].This initial observation reported here could be further addressed with a future large-scale analysis of global metagenomic datasets.Based on our findings and the recent report from Finks and Martiny, we predict that a similar association between HMRGs and conjugative elements will be observed in the meta-mobilomes from diverse environments.
Another significant distinction between the high and low [U] sites is the size distribution of the circular elements.We speculate on two possible (and not mutually exclusive) origins of this size differential.First, there are significant differences in the community compositions between high and low [U] sites.The high [U] sites are very simplified communities predominated by Gamma/-Betaproteobacteria [26,63,64].The low [U] sites have much greater diversity.Plasmid size distributions, for example, are known to vary between bacterial families [20,89].These differences in community diversity likely also contributed to the lower number of circular elements recovered from the high [U] sites.A second possibility is a filtering effect due to the various environmental stressors at the high [U] sites.For example, prior studies have found that environmental conditions may differentially impact plasmid stability within microbial communities [95,96].Under metal exposure, a broad-host-range plasmid lacking HMRGs had reduced transmissibility within an agricultural soil community [95].We speculate that smaller plasmids lacking HMRGs may be readily lost within the high [U] sites of the ORR subsurface due to both reduced transmissibility and a high metabolic burden without any offsetting fitness benefit.
Consistent with other metagenomic studies, we have uncovered an enormous diversity of potentially novel MGEs in the ORR subsurface.Across our entire dataset, 60% of the circular DNA elements did not encode any identifiable MGE-related genes.This is similar to the results of Kirstahler et al. [19] who found that 53% of the circular DNA elements in their global sewage meta-mobilome did not carry any identifiable MGE-related Pfam domains.However, several studies have identified small cryptic plasmids in bacterial isolates that contain no recognizable replication machinery [97][98][99][100].It has been speculated that cryptic plasmids may play roles in viral defense [101] or functionally serve as empty backbones for the acquisition of novel genes [102] Additionally, some of these cryptic elements may represent currently undescribed classes of MGEs.In recent years, numerous new classes of MGEs have been identified and described in the literature [103,104].We expect that as time goes on, our ability to better classify these currently cryptic elements will improve.One limitation of this work is our usage of shortread sequencing technology, increasing the likelihood that some of these cryptic elements may represent misassembles due to failure to span repetitive elements of larger sequences [105].However, we did remove the obvious instances of this occurrence from our analysis.Nonetheless, our work here highlights the value of assembling completed MGEs from metagenomic datasets.These data provide useful genomic and taxonomic context to key resistance traits often considered in isolation in metagenomic studies.

Figure 1 .
Figure 1.Origin geochemistry and size distribution of ORR MGEs.For all panels, red colors indicate a high [U] area while blue colors indicate a low [U] area.The color scheme for the entire figure is shown under the graph in panel B. (A) Map showing the sampling locations within the ORR.(B) Distributions of uranium concentrations between high and low [U] regions of the site.Map is from Google Maps (Map Data ©2022 Imagery ©2022, Airbus, Landsat/Copernicus, Maxar Technologies.(C) Distributions of other metal concentrations, nitrate concentrations, and pH between high and low [U] regions of the site.(D) Violin plots of the size distribution of de-replicated MGE assemblies between the high and low [U] regions of the site.A Welch's two-sided t-test was used to test for significant differences between the two distributions.

Figure 2 .
Figure 2. Classification of ORR MGEs.(A) Analysis workf low for the ORR mobilome.Illumina short reads were assembled using SPAdes and SCAPP was used to peel off putative circular MGEs from the SPAdes assembly graphs.Assemblies were then sorted by whether they originated from a high or low [U] sample and were de-replicated within those samples.Downstream analyses included MGE type classification, functional gene analysis, and host prediction.(B) Proportions of classified circular elements (left to right): green indicates viral genomes that were predicted using VirSorter; pink indicates assemblies predicted to be plasmids based off similarities to known plasmids in the PLSDB; light blue represents assemblies not classified by the prior two methods, but that carry MGE-related protein domains; and gray represents otherwise unknown (i.e."cryptic") circular elements.The color key is also present on the figure.

Figure 3 .
Figure 3. Prediction of circular element hosts.(A) Predictions of viral host phyla using a k-mer based method (PHP).Proportions were determined by normalizing against the total number of unique circular elements in each sample set.The color scheme key is shown on the figure.(B) Viral genome map with CRISPR-spacer-predicted Rhodanobacter host.(C) Predictions of host phyla for non-viral circular elements using a gene phylogeny-based approach.Proportions were determined by normalizing against the total number of unique circular elements in each sample set.The color scheme is the same as shown in panel A.

Figure 4 .
Figure 4. Analysis of MGE-associated HMRGs.(A) Relative abundance of individual HMRGs normalized against the total annotated CDS in each de-replicated dataset.Statistical comparisons were performed using a two-sided Fisher's exact test ( * P < 0.05; n.s = not significant, with Benjamini-Hochberg FDR correction).The associated grid indicates metals that each gene confers resistance to.(B) Neighborhood analysis of individual HMRGs.The histogram displays the probability that an individual HMRG is immediately adjacent (+/− 4 ORFs) to another HMRG.(C) The heat map displays co-occurrence patterns of HMRG pairings on circular elements.Only half of each matrix is shown for simplicity.Co-occurrence >0 are shown on the heat map.Clustering was performed using a Euclidian distance metric.

Figure 5 .
Figure 5. Partitioning of HMRGs between classes of MGEs and hosts.(A) The top bar shows the classifications of the circular elements that encode at least one HMRG (n = 47).The bottom bar shows the classifications of all circular elements in the dataset (n = 1615).Proportions of classified circular elements (left to right): green indicates viral genomes that were predicted using VirSorter; pink indicates assemblies predicted to be plasmids based off similarities to known plasmids in the PLSDB; light blue represents assemblies not classified by the prior two methods, but that carry MGE-related protein domains; and gray represents otherwise unknown (i.e."cryptic") circular elements.The color key is also shown in panel A. (B) Chord diagrams linking HMRG (top of diagram) in the dataset to associated host class (bottom).The numbers on the diagram represent the ordering (and, by extension, counts) of the genes in the dataset.(C) Proportions of conjugative and non-conjugative elements (left to right) that carry 1+ HMRG (dark purple) or no HMRG (light purple).Proportions are normalized against the total number of unique assemblies.Viral genomes were excluded from this analysis.The color key is also shown in panel C.