Practical implications of erythromycin resistance gene diversity on surveillance and monitoring of resistance

Abstract Use of antibiotics in human and animal medicine has applied selective pressure for the global dissemination of antibiotic-resistant bacteria. Therefore, it is of interest to develop strategies to mitigate the continued amplification and transmission of resistance genes in environmental reservoirs such as farms, hospitals and watersheds. However, the efficacy of mitigation strategies is difficult to evaluate because it is unclear which resistance genes are important to monitor, and which primers to use to detect those genes. Here, we evaluated the diversity of one type of macrolide antibiotic resistance gene (erm) in one type of environment (manure) to determine which primers would be most informative to use in a mitigation study of that environment. We analyzed all known erm genes and assessed the ability of previously published erm primers to detect the diversity. The results showed that all known erm resistance genes group into 66 clusters, and 25 of these clusters (40%) can be targeted with primers found in the literature. These primers can target 74%–85% of the erm gene diversity in the manures analyzed.


INTRODUCTION
Antibiotic resistance is a global challenge, with increasing resistance to antibiotics threatening our ability to treat both human and animal diseases (WHO). Antibiotic use in human medicine and animal agriculture has increased environmental reservoirs of antibiotic resistance genes, which in turn has increased the risk of transmission of antibiotic-resistant bacteria to both humans and animals (McEwen and Fedorka-Cray 2002;Vaz-Moreira et al. 2014). This linkage has resulted in the prioritization of understanding how resistance moves from environmental sources to clinical pathogens and the associated influence of human activity. To understand the movement of antibiotic resistance in the environment, we need accessible tools that can provide large-scale surveillance of resistance in diverse environmental samples. Molecular microbiology advances have allowed us to leverage amplification and subsequent sequencing of DNA that encodes for antibiotic resistance genes, resulting in our awareness of an incredibly diverse global reservoir of environmental "resistomes". Generally, metagenomic shotgun sequencing is a costly tool for antibiotic gene surveillance as it provides information on 'all' genes in an environmental sample. Among these genes, only a fraction (0.01%-1%) are related to antibiotic resistance, resulting in a significant majority of sequences from metagenomes not readily usable for resistance detection (Shi et al. 2013;Li et al. 2015). A promising alternative to metagenomic sequencing is high-throughput amplicon qPCR assays, such as the Wafergen Smartchip that has been previously used for several resistance surveillance studies (Shi et al. 2013;Wang et al. 2014;Karkman et al. 2016;Muziasari et al. 2016;Stedtfeld et al. 2017). Unlike the broad scope of metagenomic sequencing, high-throughput qPCR assays target a suite of genes using primers and can quantify hundreds of targeted resistance genes and multiple samples simultaneously (e.g. one Wafergen Smartchip contains 5184 assays). Consequently, the price per gene or sample of these assays for resistance gene detection is orders of magnitude less than metagenomic sequencing, making it more conducive to largescale surveillance. A significant limitation of this technology is the need to develop primer-based assays for each targeted gene of interest that are effective for high-throughput amplification conditions.
We are increasingly aware that certain genes may be more related to the risks of the emergence or persistence of resistance than others. For example, integrons and sulfonamide resistance genes have been used to detect anthropogenic contaminants (Wang et al. 2014;Gillings et al. 2015). Further, specific environments (mammalian gut, manure, wastewater, etc.) have been observed to be enriched in antibiotic resistance genes relative to soil or water environments (Chee-Sanford et al. 2009;Koike et al. 2010;Garder, Moorman and Soupir 2014;Joy et al. 2014;Luby, Moorman and Soupir 2016), suggesting that these environments are potential reservoirs of resistance genes. Among the hundreds of genes associated with antibiotic resistance that are observed in environmental metagenomes, selecting the key targets relevant to the spread of resistance is a significant and important opportunity. In this study, we demonstrate how we have chosen specific genes that are the most effective among previously targeted genes to serve as indicators for antibiotic resistance and to understand resistance hotspots and transmission. This framework, while developed for agriculturally impacted environments, can be broadly applied to the selection of genes from varying resistance gene classes and environments. Specifically, this effort focuses on understanding the diversity of erythromycin ribosomal methylase (erm) gene and the most relevant gene targets for understanding the spread of erm-associated resistance from manure sources to the environment.
Erm genes encode resistance to macrolide antibiotics, which have long been used to treat Gram-positive and certain Gramnegative pathogens infecting humans, swine and cattle (Roberts et al. 2008;Pyörälä et al. 2014). Broadly, macrolide antibiotics act by binding to the 23S subunit of the bacterial ribosome, causing premature release of peptides during translation. The erm genes cause resistance by methylating rRNA at the active site, reducing the ability of macrolide antibiotics to bind to the ribosome (Weisblum 1998;Vester and Douthwaite 2001). Erm-mediated resistance to macrolides has also been observed to confer resistance against other antibiotics, including lincosamide and streptogramin B (MLS B resistance) (Leclercq and Courvalin 1991). The widespread use of macrolides and their relevance for both animal and human health has resulted in a research emphasis on erm genes and their bacterial hosts as key targets for understanding the development of resistance and its spread in agricultural environments. Previously, erm genes have been detected in various agricultural settings, including swine manure, lagoon water, soils, surface and subsurface drainage from fields, and groundwater surrounding and underlying animal production facilities (Chen et al. 2007;Knapp et al. 2010;Koike et al. 2010;Joy et al. 2013Joy et al. , 2014Whitehead and Cotta 2013;Fahrenfeld et al. 2014;Garder, Moorman and Soupir 2014;Soni et al. 2015;Luby, Moorman and Soupir 2016).
Most of our previous knowledge of erm genes and their associated amplicon targets have stemmed from the characterization and sequencing of bacterial isolates and their phenotypic resistance to MLS B antibiotics (Pyörälä et al. 2014). A total of 21 unique classes of erm genes have been identified based on sequence homology to protein-coding erm sequences from cultured bacteria (Roberts et al. 1999). More recently, metagenomic analyses of DNA from the total microbial community in environmental samples has expanded what is known about erm diversity beyond these 21 classes, showing that the erm class of genes is comprised of numerous sequence variants from diverse bacterial hosts (Fang et al. 2015;Li et al. 2015). These sequence variants are present in a range of abundances depending on their environment of origin. The focus of this study was to better understand the diversity of erm genes and to target the gene variants that could be indicative of resistance originating from manure and spreading to agricultural soil and water environments.

Phylogenetic analysis of erm genes
Gene sequencing sharing high similarity to ermA, ermB, ermC and ermF were obtained from publicly available databases. The Ribosomal Database Project Fungene Repository (Fish et al. 2013) was used to obtain ermB-and ermC-associated sequences. It was required that sequences share 97% amino acid sequence coverage to established HMM protein models for Fungene gene families "Resfam ermA", "Resfam ermB" and "Resfam ermC" (Version 8.8). Additionally, ermF gene nucleotide sequences were obtained from proteins listed in the ARDB-Antibiotic Resistance Genes Database (version 1.1, July 3, 2009) (Liu and Pop 2009) and associated with the annotation "ermF". All erm-associated sequences were combined and clustered at 99% nucleotide similarity using CD-HIT (v4.6.1c) (Li and Godzik 2006;Fu et al. 2012), resulting in 66 unique clusters. One representative sequence for each cluster was identified by CD-HIT and was aligned using Muscle (v3.8.31) (Edgar 2004) with the following parameters: gap open -400, gap extend 0, clustering method UPGMB. A maximum-likelihood phylogenetic tree was constructed from this alignment using Fast-Tree (v2.1.8) (Price, Dehal and Arkin 2010) with default parameters. Taxonomy was identified based on annotations in the NCBI non-redundant nucleotide database (NCBI Resource Coordinators 2017).
To consider an erm gene sequence to be associated with a previously targeted PCR primer sequence, both forward and reverse primers were required to share 100% nucleotide similarity over a minimum of 17 bp of the primer length.  Sequences were compared to representatives of erm genes described above (BLAST, v2.4.0+) (Camacho et al. 2009). Sequences were annotated as erm genes if they matched the representative sequence within a cluster with a minimum e-value of 1e-5 and if both paired-end reads matched the same representative target. The abundance of erm sequences in each sample was calculated as the total number of reads meeting these criteria.
Cattle manure metagenomes were obtained from a previously published study of antibiotic resistant genes in commercial cattle as they moved through the process of beef production from feedlot entry to slaughter (Noyes et al. 2016). The presence of erm sequences in these samples was determined by the total number of reads that shared sequence homology (BLAST, v2.4.0+, e-value 1e-5) to the best matched erm representative sequence. Similarly, metagenomes from human-impacted (Fitzpatrick and Walsh 2016) and pristine environment (Staley et al. 2013) were aligned against erm sequences and considered a match if alignment scores resulted in e-value scores of at least 1e-5.

RESULTS
A total of 5648 erm DNA sequences were identified from annotated genes based on sequence similarity to well-characterized erm genes and were clustered at 99% nucleotide similarity to Figure 1. Maximum likelihood phylogenetic tree of 66 erm sequence clusters based on 99% nucleotide similarity of 5648 DNA sequences extracted from known erm genes described in existing databases. Clusters that contain gene targets from existing PCR primers (see Table 2) are highlighted in color. The relative number of sequences comprising each cluster among the 5648 DNA sequences is also shown.
identify 66 unique erm variant clusters. A representative sequence of each cluster was defined as the longest consensus sequence in each cluster as determined by a greedy incremental clustering algorithm (see Methods, Table 1). These representative sequences were aligned and used to construct a phylogenetic tree describing the diversity of erm genes (Fig. 1). Based on sequence homology, the resulting erm gene clusters encompass the majority of erm genes studied in previous literature: ermA, ermB, ermC, ermF, ermG and ermT (reviewed in Roberts et al. 2008). Among the gene clusters, a cluster associated with ermA was the most represented in our erm gene database (Cluster 15, 3542 genes), followed by an ermB cluster (Cluster 18, 1387 genes), and then an ermC cluster (Cluster 30, 399 genes). These three gene clusters comprise 94% of erm genes and are evidence to biases in the previous characterization of erm genes towards specific gene variants. Beyond the three most abundant gene clusters, the next most represented cluster (Cluster 11, 50 genes) is not well-characterized (e.g. most similar to unannotated erm gene clusters in our database) and is most closely related to genes belonging to Streptococcus agalactiae strain TR7 (100% nucleotide identity). Most clusters (53 of 66) are associated with five or less gene sequences, demonstrating that much of what we know of specific erm gene families is based on very few characterized representatives.
the order Lactobacillales, while ermA and ermT genes were associated with members of the Bacillales order (Fig S2, Supporting  Information). These results demonstrate a wide range of potential host diversity for erm genes and highlight the impact of the choice of primer gene targets selecting for or against specific host bacteria. Historically, erm genes have been extensively targeted for qPCR quantification of gene abundances in the environment (Table 2), and we evaluated the ability of previously published PCR primers to detect the erm gene diversity described above by computationally hybridizing the primer sequences from the literature with the representative erm gene sequences in our database. Overall, published primer pairs were 100% similar to 25 of the representative sequences of erm clusters (Fig. 1). Generally, well-characterized gene clusters (e.g. containing the most known gene sequences) were observed to be associated with previous primer development. Several clusters were not associated with previously published primer targets, very likely due to the few well-characterized erm sequences within these clusters. Previously, observed diversity in natural samples have weak correlations with well-characterized genes (Choi et al. 2017), suggesting that primer targets selected based on the most wellstudied genes may not be effective in environmental samples.
We next evaluated the diversity of erm genes in 12 947 environmental metagenomes (Table S1, Supporting Information), resulting in the observation that significantly more erm genes are present in human-impacted environments (feces-and animalassociated soil and water) than in natural environments (Fig. 2). We also searched an additional 39 metagenomes originating from relatively pristine freshwaters along the Upper Mississippi River (Staley et al. 2013 , Table S1, Supporting Information), resulting in only 3 reads out of 716 million, sharing similarity (evalue < 1e-5) to erm genes. Combined, these results demonstrate that erm genes are rare in environments with minimal human impact and suggest that erm genes associated with feces or manure are ideal for tracking the spread and persistence of resistance through the environment. These results are also consistent with previous observations that manure contains abundant genes related to erm resistance and is a source of these genes into the environment (e.g. soil and water)  Table S1, Supporting Information). * For ermC gene, the average number of reads in animal-associated soil metagenomes was 1665 ± 659 reads. (Chee-Sanford et al. 2009;Koike et al. 2010;Heuer, Schmitt and Smalla 2011;Joy et al. 2013;Luby, Moorman and Soupir 2016).
Consequently, we next identified erm genes in manure metagenomes. We aligned erm gene sequences against metagenomes derived from two large manure metagenomic studies (requiring n manure > 3): swine manure collected near Nashua, IA (Luby, Moorman and Soupir 2016) and cattle manure from a previously published study (Noyes et al. 2016). These manure metagenomes were strategically selected based on the number of biological replicates and sequencing depth. Three erm clusters comprised 46% and 45% of the total abundance of erm genes in swine and cattle manure, respectively (Table S1, Supporting Information). The genes associated with these most abundant clusters differed between swine and cattle manures. In swine metagenomes, sequences associated with the ermB gene cluster (Fig. 1, sharing 93%-99% similarity) captured 26% of all erm sequences, followed by ermG-associated sequences capturing 11% and ermA-associated sequences capturing 9%. In cattle metagenomes, sequences associated with ermF represented 23.5% of all erm abundances, followed by sequences associated with ermG capturing 12.4% and sequences associated with ermB capturing 9%.
Only a subset of erm genes detected in manure are targeted by existing primer sets. Overall, a total of 25 out of the 66 erm clusters (40%) could be computationally detected with known primers (Table 1, Supporting Information), and these genes also encompass much of the total erm abundances observed in manure metagenomes. Collectively, if all primers were used, 74% and 85% of the total erm gene sequence diversity observed in swine and cattle metagenomes, respectively, could be detected, suggesting good coverage of these genes for PCR or qPCR assays. Specifically, in swine manure metagenomes, ermB primers could detect 29% of erm sequences, followed by ermF primers capturing 14% and ermG primers capturing 12% (Fig. 3). In cattle, ermF primers are the most effective, capturing 30% of erm sequences, followed by 21% with ermG primers, and 15% with ermB primers. Consequently, depending on the environmental sample in a study, in this case swine versus cattle manure, the choice of erm gene targets can significantly alter erm abundance estimations. For example, in swine manure, two times more erm gene abundance would be estimated if ermB primers were used instead of ermF primers. Even within the same gene clade, different primers could result in significant differences in abundance estimations, and this result is observed especially for ermC primers where a near two-fold difference in abundance estimations would result based on selection of primers from Patterson et al. (2007) versus Jensen, Frimodt-Moller and Aarestrup (1999). The selection of Patterson primers would result in the detection of genes from up to seven erm gene clusters over the two to three gene cluster detected with Jensen, Frimodt-Moller and Aarestrup (1999) or Koike et al. (2010) primers. Similar results are noted in the cattle manure, where ermC primers designed by Patterson capture 13% of the total abundance of erm sequences in the metagenomes, while Koike and Jensen primers only capture 4.4% and 2.1%, respectively. These results emphasize that the targeting of a specific erm gene, even within closely related gene variants, can significantly alter estimations of associated resistance in manures.
Thus, overall, for swine manure, the most effective gene target based on abundance in swine metagenomes (26% of erm genes) originates from an ermB cluster (Cluster 25) and is associated with Clostridium perfringens. The next most abundant ermB cluster in swine (Cluster 19, most similar to a gene in Streptococcus pneumoniae CGSP14) represented only 2% of erm abundances. These results indicate that while ermB primers can target multiple strains ( Fig. S1 and S2, Supporting Information), in these swine metagenomes, it is one gene cluster that specifically dominates. This gene cluster is also abundant in cattle manure metagenomes, though comprising less of total erm gene abundance (9%). Within our erm gene database, this particular sequence cluster is represented by a single gene representative and shares 100% similarity to experimental Clostridium acetobutylicum strains in the NCBI non-redundant gene database (mutant HQ683763.1 and clone HQ25744.1). The overall lack of similar homologous genes in NCBI nr suggests that this specific ermB gene is abundant in manures but is a gene for which we have few sequenced representatives. We identified this gene during our exploration of the effectiveness of current primers on manure metagenomes, and our observations suggest that this gene would benefit from further study given its prevalence.

DISCUSSION
Over the past 20 years, an abundance of literature has been published quantifying macrolide resistance in agricultural landscapes using qPCR approaches. However, these previous studies often use primers for erm genes designed in only a handful of publications (Table 2). Our study found that current published primer sets, used on their own, are effective at capturing only a subset of the erm diversity in manure samples. For example, if only one primer set were used, less than one-third of erm genes would be detected. To increase our ability to detect erm genes in agricultural systems, we identified the most abundant erm clusters in both swine and cattle manures, identifying the best gene targets for future studies. These genes and their associated primers are recommended for high-throughput qPCR assays that can scale the detection and quantification of these genes for antibiotic gene surveillance.
In all amplicon assays, quantifying environmental abundances of gene targets is limited by the effectiveness of primer design. The results presented here emphasize that estimates of abundances of a gene of interest cannot simply be based on primers to genes that have previously been successfully detected. Rather, genes appropriate for antibiotic gene surveillance should be indicative of the spread of resistance (e.g. originate from manure but lacking from pristine environments), representative of diverse hosts (especially those with clinical risks) and accurately represent gene abundances in environmental samples. Our specific effort targeted the erm gene and evaluated the effectiveness of previously published primers sets. The increasing availability of metagenomes makes these evaluations possible, as demonstrated in this study. Although metagenomic sequencing advances will continue to provide powerful tools to understand the broad diversity of resistance in environments, metagenomes are limited by both detection rate and resolution. Short read lengths, the difficulty of assembling many resistance genes (because of their common association with mobile elements containing repeated sequences) and their presence in multiple bacterial hosts challenges the detection of resistance genes using metagenomics. Going forward, high-throughput amplicon assays with strategic gene targets and primer designs are a complementary alternative to help fill these gaps and help us understand the movement of resistance genes among complex environments.

SUPPLEMENTARY DATA
Supplementary data are available at FEMSEC online.