-
PDF
- Split View
-
Views
-
Cite
Cite
Vittorio Tracanna, Anne de Jong, Marnix H. Medema, Oscar P. Kuipers, Mining prokaryotes for antimicrobial compounds: from diversity to function, FEMS Microbiology Reviews, Volume 41, Issue 3, May 2017, Pages 417–429, https://doi.org/10.1093/femsre/fux014
Close - Share Icon Share
Abstract
The bacterial kingdom provides a major source of antimicrobials that can either be directly applied or used as scaffolds to further improve their functionality in the host. The rapidly increasing amount of bacterial genomic, metabolomic and transcriptomic data offers unique opportunities to apply a variety of approaches to mine for existing and novel antimicrobials. Here, we discuss several powerful mining approaches to identify novel molecules with antimicrobial activity across structurally diverse natural products, including ribosomally synthesized and posttranslationally modified peptides, nonribosomal peptides and polyketides. We not only discuss the direct mining of genomes based on identification of biosynthetic gene clusters, but also describe more advanced and integrative approaches in ecology-based mining, functionality-based mining and mode-of-action-based mining. These efforts are likely to accelerate the discovery and development of novel antimicrobial drugs.
INTRODUCTION
Microbial specialized metabolites are the major source of antimicrobials currently used in the clinic, in agriculture and in food manufacturing. Due to the rapid development and spread of resistance against these molecules, there is an urgent need for novel compounds that can supplement the current arsenal. Since the ‘golden age of antibiotics’ in the sixties and seventies, there has been a steady decrease of novel antibiotic molecules entering the market. However, the recent development of computational genomic approaches to natural product discovery is replenishing hopes that this trend can be turned around: in prokaryotic genome sequences, tens of thousands of biosynthetic gene clusters (BGCs) have been identified (Cimermancic et al.2014; Doroghazi et al.2014; Skinnider et al.2016). Thousands of these BGCs are likely to encode the biosynthesis of thus far unknown molecules (Dejong et al.2016). To uncover these, many innovative approaches are being developed to link them to metabolomic data with high throughput (Kersten et al.2011, 2013; Medema et al.2014; Mohimani et al.2014) or to refactor synthetic versions of them for heterologous expression (Chang et al.2013; Shao et al.2013; Yamanaka et al.2014; Kang, Charlop-Powers and Brady 2016; Montalbán-López and Kuipers 2016; van Heel et al.2016). However, detailed biochemical characterization of biosynthetic pathways and their products is still painstakingly slow and laborious, and many BGCs encode the production of natural products without any (useful) antimicrobial activity. Therefore, targeted approaches are needed to selectively mine genomes for natural products with antimicrobial properties, and to narrow down from tens of thousands of potentially interesting BGCs to manageable numbers that can be tested in the laboratory. Here, we outline and discuss several currently emerging computational and experimental strategies to this end, based on the analysis of chemical diversity, ecology and evolution, organismal function and/or modes of action.
DIVERSITY-BASED MINING FOR ANTIMICROBIALS
Genome-based identification of biosynthetic pathways for antimicrobials
Genomic information has become more and more important in the process of identifying novel biosynthetic pathways for the production of antimicrobials. To identify the potential of a bacterium to produce bioactive natural products, mining for BGCs is particularly useful. A wide range of bioinformatic tools (e.g., antiSMASH (Weber et al.2015), BAGEL3 (van Heel et al.2013) and PRISM (Skinnider et al.2016)) are available to identify these, mostly based on shared properties among known classes of biosynthetic pathways (Boddy 2014; Medema and Fischbach 2015; Pi et al.2015; Ziemert, Alanjary and Weber 2016). For instance, the modification enzymes for the production of lanthipeptides are well conserved (Knerr and van der Donk 2012); as such, they can be used as ‘anchors’ or ‘signatures’ for genome mining.
Taxonomically, there are significant differences in biosynthetic richness and diversity across various branches of the tree of life (Fig. 1). Traditionally rich sources of natural products derived from Streptomyces, Bacillus (Zhao and Kuipers 2016) and Pseudomonas are now becoming supplemented with newly discovered genera. A great example of taxonomic diversity-based mining for antimicrobials is provided by Entotheonella, a group of sponge-microbiota-derived organisms that can produce extremely highly modified peptides with over six different posttranslational modifications, including D-amino acids, methylated amino acids and dehydrated amino acids and dozens of residues modified (Wilson et al.2014). Several other microbial taxa with high biosynthetic potential, such as Clostridium, Burkholderia, Pseudonocardia, Photorhabdus, Xenorhabdus, Chitinophaga, Herpetosiphon and Planctomyces, are emerging as novel target genera (Challinor and Bode 2015). Most of these genera are characterized by large genome sizes, which has been shown to correlate with an increased percentage of genomic capacity devoted to specialized metabolism (Cimermancic et al.2014); however, a very large percentage of extant biosynthetic diversity in the biosphere is still found in the smaller genomes of the majority of bacteria, which should not be discarded in the search for antimicrobials; e.g. the biosynthesis of the recently discovered antibiotic non-ribosomal peptide lugdunin (Zipperer et al.2016) is encoded in the genome of the human commensal Staphylococcus lugdunensis, which harbors just four other BGCs (as identified in antiSMASH-DB, Blin et al.2017). The wide diversity of BGCs found in large as well as small genomes emphasizes the almost boundless chemical diversity of nature, and it is not at all unlikely that new classes of antimicrobials will be discovered from many novel sources in the next decades.
Taxonomic diversity of BGCs across bacteria and archaea. Bar plots indicating gene cluster counts as detected by antiSMASH+ClusterFinder (Cimermancic et al.2014; Weber et al.2015) are plotted onto prokaryotic taxonomy. Circles on the tree indicate the amount of within-taxon variation, estimated using the quadratic entropy index. Figure adapted from Cimermancic et al. (2014) with permission.
Even though natural product BGCs can thus be accurately identified and quantified, the question still remains which of these are most likely to encode the production of potent antimicrobials. Not only the taxonomic origins, but also the chemical structures of antimicrobials are highly diverse. Indeed, known antimicrobial compounds represent a cross-section of several chemical and biosynthetic classes, such as ribosomally synthesized and posttranslationally modified peptides (RiPPs), non-ribosomal peptides, polyketides, terpenoids and even oligosaccharides (see Table 1 for a comprehensive schematic overview). Although the percentage of antimicrobials is probably higher among RiPPs than among saccharides, for example, each chemical class of natural products comprises both many antimicrobials and many compounds with different biological activities. To prioritize specifically for antimicrobials, it is necessary to go beyond the genome sequences and couple the genomic information to ecological and functional data.
Overview of known antimicrobial compound classes. For information on the biosynthetic diversity of all these compounds the reader is referred to several excellent recent reviews (Sanchez, Chiang and Wang 2008; Strieker and Marahiel 2009; Arnison et al.2013; Fisch 2013; McCranie and Bachmann 2014; Yim et al.2014; Dickschat 2016; Helfrich and Piel 2016; Ortega and van der Donk, Wilfred 2016).
![]() |
![]() |
Overview of known antimicrobial compound classes. For information on the biosynthetic diversity of all these compounds the reader is referred to several excellent recent reviews (Sanchez, Chiang and Wang 2008; Strieker and Marahiel 2009; Arnison et al.2013; Fisch 2013; McCranie and Bachmann 2014; Yim et al.2014; Dickschat 2016; Helfrich and Piel 2016; Ortega and van der Donk, Wilfred 2016).
![]() |
![]() |
ENVIRONMENTAL AND ECOLOGY-BASED MINING FOR ANTIMICROBIALS
Using metagenomics to chart biosynthetic diversity
From soil to ocean, from plant roots to animal guts, the ecosystems in which natural products are found are highly diverse. Also within these ecosystems, the diversity is enormous: a gram of soil is estimated to contain hundreds to thousands of different species (Curtis, Sloan and Scannell 2002) that form an extremely intertwined society. The metabolic potential hidden in those communities is immense, and systematic analysis of soils across the globe shows very little overlap between the secondary metabolite repertoire of similar soils (Charlop-Powers et al.2014). Potentially, understanding of ecology and microbial communities can be used to chart this variation and prioritize BGCs that are likely to encode the synthesis of molecules that function as potent antimicrobials. Metagenomics is a key technology that allows surveying BGCs and their abundances across varying communities (Wilson and Piel 2013; Charlop-Powers, Milshteyn and Brady 2014).
In the hunt for novel antimicrobials from the environment, there are two important strategies: searching for novel chemical scaffolds and searching for novel congeners. Compounds with novel scaffolds are more difficult to discover (also due to rediscovery of known molecules for which no BGCs have yet been characterized), and predicting chemical structures and biological activities from BGC sequence data alone is very challenging. BGCs encoding the biosynthesis of potential congeners, variants upon an existing (and often extensively studied) molecular scaffold, however, can easily be identified based on sequence homology. While sharing the biosynthetic origin with their well-known counterparts, the small differences in the structures of some congeners can have a major effect on different characteristics of the molecule. Notably, those changes may affect the potency, toxicity profile, or the target of the compound, or even the resistance of microorganisms against it. Phylogenetic studies on novel variants of known BGCs can potentially be used to infer the substrates involved and the final products synthesized by the encoded pathway. Regardless of the product type and its discovery method, engineering expression of identified BGCs in a native or heterologous host is frequently necessary for both novel scaffolds and congeners, and is currently a rate-limiting step in antibiotic discovery. Another bottleneck lies in the charting of biosynthetic diversity, as this is often limited to organisms that are easy to culture in the laboratory.
Metagenomics can overcome culture restrictions by sampling material directly from the environment of interest. Alternatively, functional amplicon sequencing approaches can achieve high-sequencing depth for BGCs by targeting the shared conserved regions of, e.g. polyketide synthases or non-ribosomal peptide synthetases with degenerate primers based on previously characterized genes. The molecule type for which the biosynthesis is encoded in an underlying BGCs can be predicted by assessing the sequence similarity between the sequence of the amplicon in question and curated data aggregation platforms such as MIBiG (Medema et al.2015) and eSNaPD (Reddy et al.2014). Development of primer pairs based on BGC classes that specifically encode the biosynthesis of antimicrobial compounds can be used to specifically target BGCs for potential congeners of known antimicrobials. The enormous cost decrease of high-throughput sequencing technologies now also allows generation of the immense quantity of data necessary to assemble BGCs directly from microbial communities. In shotgun metagenomics, environmental DNA, eDNA, is extracted from a community sample and sequenced with short read NGS technology. Sequence information can be assembled with metagenomics-specific assemblers as metaSPAdes (Nurk et al.2016) or Ray Meta (Boisvert et al.2012). Additional sequencing platforms such as PacBio and more recent Oxford Nanopore can be used to produce long sequence reads, which can aid in assembling contigs long enough to harbor complete BGCs, even in complex communities. Perhaps even more powerful to this end are artificial long read technologies: due to their high throughput, platforms as 10X Genomics and TruSeq, may start a new era in metagenomics studies. These synthetic long read platforms, can be used to reconstruct large numbers of long eDNA stretches with low error rates from complex metagenomes. eDNA is first digested in high molecular weight fragments, which are sorted and barcoded in different pools. Standard shotgun sequencing of each pool is then followed by an assembly of each high molecular weight DNA molecule. Although some species bias is introduced during library preparation, the approach has been shown to enable the assembly of synthetic long reads from relatively rare microorganisms in soil (Sharon et al.2015). TruSeq has the read length necessary to assemble whole-length BGCs even in complex metagenomes, aided by specialized algorithms such as TruSPAdes (Bankevich and Pevzner 2016).
Metagenome sequencing is able to find novel BGCs regardless of their conservation or representation across known genomes. The main advantage of these techniques compared to amplicon-based approaches lies in their unbiased nature (Table 2). The main disadvantage of this approach lies in the data complexity. However, a plethora of different solutions are becoming available to tackle this. One direct solution that allows untangling of the community is represented by single-cell sequencing. Although the protocols vary and evolve over time, the scope is unchanged: a single cell is isolated, its DNA extracted and amplified to undergo a PCR screen or genome sequencing step. There are examples of successful application of single-cell sequencing to identify important secondary metabolite gene clusters such as the apratoxin pathway from a filamentous cyanobacterium (Grindberg et al.2011).
Strong and weak points of the different sequencing methods are described in this table. In the figures, arrows represent genes that are part of a BGC, and genes with the same colors originate from the same operon. The pins in the top right figure indicate conserved stretches targeted by custom primers.
![]() |
![]() |
Strong and weak points of the different sequencing methods are described in this table. In the figures, arrows represent genes that are part of a BGC, and genes with the same colors originate from the same operon. The pins in the top right figure indicate conserved stretches targeted by custom primers.
![]() |
![]() |
Important steps are being made to apply the latest chromatin fixation techniques, such as Hi-C and 3C, to improve metagenomic assemblies. As of now, those techniques were successfully used to aid the assembly of synthetic metagenomes (Beitel et al.2014; Burton et al.2014). Potentially, fixation techniques can become powerful tools to validate metagenomics assemblies, as they provide an extra layer of information that is not used by current binning tools. In addition, information on contiguity or genomic distance between contigs will allow to reconstruct longer and more complete clusters from raw sequence data.
Metagenomics-derived BGCs are not easy to revive in heterologous hosts. For instance, proper amounts of physical DNA that contains the BGC of interest is often not readily available; hence, the isolation of the natural producer or synthetic refactoring of the gene cluster is required. Also, regulation of the transcription and the precursors required for the encoded pathway are sometimes not functionally available in classic hosts. Synthetic DNA costs have decreased substantially in the last years, opening the path to high-throughput expression of BGCs through synthesis and refactoring in order to match the impressive output of metagenomics analysis. Refactored BGCs are designed to achieve a better control over expression levels in the heterologous host, by replacing native regulation by synthetic promoters, ribosome-binding sites and terminators (Medema et al.2011; Smanski et al.2016). Also, codon usage can be redesigned to match the host to increase mRNA translation speed. Further manipulation of refactored BGCs is much easier compared to the original, greatly reducing the development time for BGC-derived products. Key challenges that need to be overcome here are DNA synthesis costs, tuning stoichiometry of gene expression and avoiding the introduction of unexpected functions into synthetic DNA.
Ecology-based prioritization for antimicrobial function
To understand bacteria and harness their metabolic potential, it is important to consider them within a microbial community framework. Indeed, bacteria that thrive under physicochemical conditions in which they have an elaborate interaction network with other species are excellent targets for identification of secondary metabolites. Accordingly, complex communities are reported to be more resistant to invasion from alien species (van Elsas et al.2012). Multiple factors can influence community resilience, including the production of secondary metabolites with a negative interspecies interaction function, i.e. antibiotics (Cordero et al.2012).
Given the ubiquitous nature of potential target communities, the selection of promising candidates is vital during the experimental design phase. Biologists can use knowledge on an unexpected phenotype to deduce the presence of potent antimicrobial compounds. Marine sponges are a great example of prime candidates when hunting for novel and potent antibacterial compounds. Sponges have a strong endosymbiotic relation with bacteria to the point that up to 40% of their bio mass is composed by bacterial cells (Friedrich et al.2001). The microbial community inside the sponge is radically different from the surrounding water. Lately, sponge communities were successfully targeted for identification of novel natural products with antibacterial properties, such as polytheonamides (Trindade-Silva et al.2012), which might play a role in protecting the sponge host against predators. Suppressive soils are another great example of ecology-inspired mining. These soils provide protection to plants against specific pathogens. Conducive soils, which do not restrain the development of a disease, can be gradually transformed into suppressive soils by infecting plants that grow in it in multiple cycles (Berendsen, Pieterse and Bakker 2012). Hence, by investigating the BGCs that are abundant in or expressed by a community in specific conditions (e.g. treatment of suppressive soil with a pathogen compared to control), candidates responsible for the phenotype can be prioritized: this type of strategy has led to the discovery of the lipopeptide thanamycin, which suppresses fungal root pathogens (Mendes et al.2011; Watrous et al.2012).
Principled approaches to prioritize environmental BGCs based on ecology require more than just bare metagenome sequences. Specifically, metatranscriptomics and/or environmental metadata can be used to map and understand differences in BGC abundance and expression, in order to prioritize for those that are most likely to function as antimicrobials. Also, metadata on environmental and physicochemical conditions pertaining microbial communities can potentially be used to direct the search of BGC hotspots. Localized nutritional hotspots, such as organic particles in the ocean, may create highly competitive environmental niches, where microbes utilize antimicrobials to secure resources against competitors. Therefore, metadata on, e.g., nutrient availability could potentially be exploited to identify priority targets. When metatranscriptomics data are generated for diverse samples for which metadata are also recorded, one could even identify BGCs that are expressed specifically under such conditions; these would then have an elevated probability to be involved in generating antimicrobial activity.
Several tools and databases are available to study metagenomes in the context of metadata: e.g. the eSNaPD webserver (Owen et al.2013) is a metagenomics atlas that holds BGC distributions in soil across the globe. Different physiochemical characteristics and secondary metabolite repertoires are aggregated for over 100 samples of different biomes. The tools integrated in the webserver allow a quick and intuitive inspection of the data. Similarly, the Tara ocean webserver (https://www.embl.de/tara-oceans/) hosts metagenomic information and metadata on a wide range of marine environments. Particular attention was directed towards the different sampling depths as they are related to the light intensity, an essential driver of community composition. For the human microbiome, which also hosts a wide range of BGCs (including antimicrobials, see Donia et al.2014), several datasets are available with rich clinical metadata, such as the Belgian Flemish Gut Flora Project (Falony et al.2016) and the Dutch LifeLines-DEEP study (Zhernakova et al.2016).
As more and more metagenomes become available with better assemblies (and including more and more metagenome-assembled genomes), richer metadata and rapidly rising amounts of comprehensive (time-series) metatranscriptomics data, the opportunities for ecology-based antibiotic discovery are likely to increase drastically.
FUNCTION-BASED MINING FOR ANTIMICROBIALS
Use of protein domains for genome mining based on predicted function
In any strategy for genome-based mining for antimicrobials, a key step is the identification of genes involved in their biosynthesis. A robust method to achieve this, especially for proteins that do not have high similarity to proteins with known function, is through the detection of one or more conserved protein domains using curated models. Certain protein domains or domain combinations are indicative of biochemical functions that are specific to certain biosynthetic pathways; hence, they can be used as ‘anchors’ or ‘signatures’ to identify certain classes of BGCs. The gene cluster identification algorithms in tools such as antiSMASH, BAGEL3 and PRISM are based on this principle. As we learn more about the enzymology of novel classes of natural products, this will allow the addition of more ‘domain markers’ that can be used for new mining strategies. Recently, such strategies have been used to systematically mine genomes for biosynthetic pathways encoding cyanobactins (Leikoski et al.2013), thiazole/oxazole-modified microcins (Cox, Doroghazi and Mitchell 2015) and enediynes (Shen et al.2015).
Identifying natural products by phenotypic and metabolic profiling
Several types of high-throughput profiling methods have become available to screen large collections of organisms for their natural-product-producing potential. Besides (meta)genomic information, these also leverage phenotypic and metabolomic data.
Traditionally, phenotypic information has been key in the discovery of antimicrobials, as strain collections were screened for activity against various pathogens. Besides conventional growth inhibition assays, an interesting emerging technique is cytological profiling, which allows ‘function-first’ mining of natural products with desired biological activities by predicting the mechanisms of action based on microscopy analysis of the cellular responses of target strains (Nonejuie et al.2013; Potts et al.2013; Schulze et al.2013; Woehrmann et al.2013). When applied onto microbial extracts, this technique can be used to specifically search for natural products with certain functional profiles (i.e. activity profiles that match those of molecules with known modes of action) (Ochoa et al.2015). A proof-of-principle study in Bacillus subtilis showed that through activity-guided purification, molecules with multiple mechanisms of action can even be separated based on this technology (Nonejuie et al.2016). Moreover, connecting molecules to activities can even be automated through a procedure called Compound Activity Mapping (Zhang et al.2016), which applies networking analysis to correlate specific molecules to each activity.
Metabolic profiling can be done, for example, through large-scale mass spectrometric analysis of strain collections in order to identify all the molecular families (Watrous et al.2012; Nguyen et al.2013) that are produced under typical laboratory conditions. This is facilitated by public platforms such as Global Natural Products Social Molecular Networking (http://gnps.ucsd.edu) (Wang et al.2016), which allow connecting metabolomic data from multiple sources. A recent study showed how comprehensive molecular networking analysis of hundreds of strains from the same genus (Pseudomonas) can effectively identify novel natural products by comparing all molecules observed across these strains with a set of reference molecules (Nguyen et al.2016). For some compound types, such as RiPPs and NRPs, dereplication (i.e. identification of novelty) can also be automated with algorithms such as DEREPLICATOR (Mohimani et al.2016). A disadvantage of metabolomics-based natural product discovery is that one can only observe molecules that are produced under the conditions tested; yet, this can be partially mitigated by performing molecular networking on co-cultivated strains (Traxler et al.2013; Briand et al.2016).
Although the throughput of metabolic and cytologic profiling methods is increasing, smart selection strategies for strains to be used as input are still very important. Genome-based analysis of biological features that correlate with natural product biosynthetic potential could be an interesting strategy for this. For example, bacteria that have a large capacity for the biosynthesis of posttranslationally modified (and thus more stable) ribosomally synthesized natural products (instead of unmodified bacteriocins) would be expected to show relatively high proteolytic activity. Hence, genomic profiling of proteolytic capacity could enable selection of taxonomic groups that are more likely to yield novel RiPPs, as the RiPP BGC types that can be detected with current bioinformatic methods are likely to be just the tip of the iceberg. Such an analysis could help identify bacteria that are good candidates for discovery of novel RiPP classes. Figure 2 shows that, indeed, for a number of taxa, the presence of proteases and peptidases in the genome shows a clear correlation to the presence of lanthipeptide BGCs. To analyze this, we selected 180 peptidase and 30 protease motifs from the PFAM database (Table S1, Supporting Information) and screened all complete bacterial genomes for the number of motifs present to get a rough indication of the proteolytic activity of each bacterium. One would expect highly proteolytic bacteria such as Bacillales and Actinomycetes to produce a high fraction of modified peptides relative to unmodified ones, to avoid proteolytic degradation. Based on simple matching, the correlation between the presence of predicted lantibiotic and high protease or peptidase abundance (number of motifs above the median) is 0.75 and 0.62, respectively. Although the expected correlation can be shown in several cases, it is not always consistent. Staphylococci and corynebacteria show low proteolytic enzyme-coding gene abundance, but do encode the biosynthesis of many putative lantibiotics. Clearly, there are other factors also involved in the evolution of RiPP repertoires. One of them could be the nature of the environment the producing bacteria are thriving in, which could also be very proteolytic. Still, exploration of these and other genomic and phenotypic features that correlate with biosynthetic potential could be useful for strain selection purposes in large-scale genomic and metabolomic experiments.
Correlation between the presence of lanthipeptide BGCs and the predicted proteolytic activity. On top, the taxonomic tree of Gram-positive bacteria and the green branches indicate the putative presence of lanthipeptide biosynthetic genes. The purple bars below the tree indicate the predicted abundance of proteolytic enzyme-coding genes. In gray the organism names which can be read in Table S1. The orange bars indicate the position of groups of bacteria.
MODE-OF-ACTION-BASED GENOME MINING FOR ANTIMICROBIALS
As indicated earlier, it is difficult to predict which BGCs encode the production of natural products with antimicrobial activity from sequence alone. Yet, there are at least two potential options to do so: resistance-based genome mining and mining for synergistic antibiotics.
Resistance-based mining (recently also reviewed in some detail by Ziemert, Alanjary and Weber 2016) utilizes the fact that many BGCs encoding the biosynthesis of antibiotics will also encode one or more genes that confer self-resistance to the molecule produced, in order to avoid suicide. These self-resistance genes are frequently the same or very similar to the resistance genes used by other bacteria to evade antibiotics, and are probably acquired by the latter through horizontal gene transfer (Mak et al.2014; Ogawara 2016); they include transporters, drug-modifying enzymes and, most interestingly, paralogous genes encoding ‘resistant’ copies of housekeeping proteins that are targeted by the antibiotic. Hence, the identification of self-resistance genes in BGCs can be very effective predictors of antibiotic function of their products. Moreover, if these genes are resistant paralogs of housekeeping genes that are targeted by the antibiotic, they can even be used to predict the target. Recently, Tang et al. (2015) used this approach to identify the BGC for the thiotetronate antibiotic thiolactomycin in Salinispora genomes based on the presence of a resistant copy of a fatty acid synthase gene. Similarly, Yeh et al. (2016) identified the biosynthetic genes for the proteasome inhibitor fellutamide B through the identification of a proteasome subunit-encoding gene inside a gene cluster. Intriguingly, the strategy can also be used to predict natural products with new modes of action; Johnston et al. (2016) recently used resistance-based mining to show that the telomycin family of natural products targets the phospholipid cardiolipin.
Several good libraries of known self-resistance genes and detection models for them are now available (Jia et al.2016). In particular, the ResFams database (Gibson, Forsberg and Dantas 2015) contains a rich set of profile Hidden Markov Models (pHMMs) that allow detection of resistance genes against a wide range of antibiotics; in their abovementioned study, Johnston et al.(2016) supplemented these pHMMs with an additional 91 models for additional resistance genes found in the literature. Additionally, phylogenetic analysis (reminiscent of the EvoMining approach recently described by Cruz-Morales et al.2016) may make it possible to hunt for additional resistant paralogs of housekeeping genes, by identifying aberrant copy numbers of such genes in genomes, where the additional copy has long branch lengths and resides inside a BGC. By doing this systematically, while using inclusive and broad probabilistic BGC predictions as offered by algorithms such as ClusterFinder (Cimermancic et al.2014), large numbers of both novel modes of action and novel antibiotic biosynthetic classes may be discovered.
The second possible bioinformatic strategy to prioritize BGCs for their potential to encode novel antibiotics is based on synergistic interactions. Many cases are known in which a combination of two bioactive compounds acts in a synergistic way to obtain higher antimicrobial efficacy or circumvent resistance mechanisms in the target strains. A famous example of this, that is still used in the clinic, is Augmentin®, a combination of the beta-lactam antibiotic amoxicillin with the beta-lactamase inhibitor clavulanic acid. In theory, synergistic pairs of natural products have a major advantage in the battle against antimicrobial resistance, as it is more difficult for pathogens to develop resistance against both molecules. Synergistic pairs of natural products also occur in nature. In fact, Augmentin was modeled after the combination of cephamycin and clavulanic acid, which is naturally produced by Streptomyces clavuligerus. In the genome of S. clavuligerus, the BGCs for the two compounds are intertwined in a ‘supercluster’ configuration, allowing the coordinated regulation of both biosynthetic pathways (Ward and Hodgson 1993; Medema et al.2010). Intriguingly, multiple other cases have been identified where two synergistic natural products are jointly encoded in such superclusters: examples include the synergistic antibiotics lankamycin and lankacidin (Suwa et al.2000), which bind complementary sites on the large ribosomal subunit (Belousoff et al.2011), and the synergistic antibiotics griseoviridin and viridogrisein (Xie et al.2016). Hence, it is likely that intertwined supercluster configurations are indicative of synergistic interactions between natural products, which thus far has only been linked to antibiotic function. Automated comparative genomic analysis of exceptionally large ‘hybrid’ BGCs (encoding enzymes to generate multiple different scaffold types) predicted by tools such as antiSMASH, may make it possible to trace the evolutionary histories of their genes, and assess whether they encode ‘superclusters’ encoding the production of multiple different (and possibly synergistic) molecules, based on whether the BGC has relatively recently originated by merger of two previously independent gene clusters. Furthermore, analysis of regulatory motifs for transcription factor binding sites (Wolf et al.2016) can potentially distinguish such superclusters from pairs of gene clusters which happen to be encoded right next to each other on the genome by chance, by assessing the likelihood that genes on both sides of the putative supercluster are in fact coregulated.
All in all, at least two strategies based on sequence analysis alone are available to mine the vast numbers of extant BGCs for novel antibiotics. Yet, when sequence analysis is complemented by ecological information, e.g. from metagenomics, many more identification possibilities become available.
DISCUSSION AND OUTLOOK
Considering the wealth of opportunities offered by the ever-increasing sequencing power that allows studies ranging from community mining to elucidating expression circuitries and protein functional analyses, we can only expect an ever-increasing pile of information on known and novel antimicrobials. Big tasks ahead are the functional and structural characterization of these compounds, assessing their potential synergistic or antagonistic activities, and determining the roles of (cross)immunity against these compounds. As also completely novel biosynthetic routes and new peptide modifications are being discovered at a regular pace, we need to codevelop high-throughput methodologies for activity screening, employing elicitors of their production if necessary. The interplay of bacteria with their environment, with other organisms and with hosts (e.g. gut microbiota, plant microbiota) will certainly involve antimicrobials of all sorts, and assessing their role in population dynamics will be an important area of research in the coming decade. Moreover, synthetic biology and novel chemistry approaches will enable the production of new-to-nature compounds to fight the increasing and important problem of antibiotic resistance in major human pathogens. A combination of mining approaches at all levels, followed by biochemical and pharmaceutical studies, will be greatly beneficial to find the solutions for major societal problems ahead.
SUPPLEMENTARY DATA
Supplementary data are available at FEMSRE online.
Acknowledgments
VT is supported by the research programme NWO-Groen, which is jointly funded by the Netherlands Organisation for Scientific Research (NWO), BASF SE and Baseclear BV, under project number ALWGR.2015.1. MHM is supported by VENI grant 863.15.002 from The Netherlands Organization for Scientific Research (NWO).
Conflict of interest. None declared.



