Phylogenomic Analyses of Non-Dikarya Fungi Supports Horizontal Gene Transfer Driving Diversification of Secondary Metabolism in the Amphibian Gastrointestinal Symbiont, Basidiobolus

Research into secondary metabolism (SM) production by fungi has resulted in the discovery of diverse, biologically active compounds with significant medicinal applications. The fungi rich in SM production are taxonomically concentrated in the subkingdom Dikarya, which comprises the phyla Ascomycota and Basidiomycota. Here, we explore the potential for SM production in Mucoromycota and Zoopagomycota, two phyla of nonflagellated fungi that are not members of Dikarya, by predicting and identifying core genes and gene clusters involved in SM. The majority of non-Dikarya have few genes and gene clusters involved in SM production except for the amphibian gut symbionts in the genus Basidiobolus. Basidiobolus genomes exhibit an enrichment of SM genes involved in siderophore, surfactin-like, and terpene cyclase production, all these with evidence of constitutive gene expression. Gene expression and chemical assays also confirm that Basidiobolus has significant siderophore activity. The expansion of SMs in Basidiobolus are partially due to horizontal gene transfer from bacteria, likely as a consequence of its ecology as an amphibian gut endosymbiont.

Basidiobolus zygomycetes secondary metabolism horizontal gene transfer siderophores Fungi produce a wealth of biologically active small moleculessecondary or specialized metabolitesthat function in interactions with other organisms, environmental sensing, growth and development, and numerous other processes (Rokas et al. 2020). Several of these compounds have led to the successful development of pharmaceuticals (e.g., antibiotics, immunosuppressants, statins, etc.) that have had dramatic and positive impacts on human health. Understanding the evolution of fungal secondary metabolites and linking them with their ecological and physiological functions in nature can inform searches for compounds with applications in human society.
Secondary metabolism (SM) is imprecisely defined but can be characterized generally as the production of bioactive compounds that are not part of primary metabolism and that are not required for growth and survival in the laboratory (Keller et al. 2005;Brakhage 2013;Rokas et al. 2020). In fungi, the genes responsible for the synthesis of secondary metabolites are frequently co-located in biosynthetic gene clusters (Smith et al. 1990;Brakhage 2013), which primarily includes alkaloids, peptides, polyketides, and terpenes (Collemare et al. 2008;Helaly et al. 2018). Each of these groups of compounds are synthesized by core genes that are characteristic of the pathways and include, but are not limited to, dimethylallyl tryptophan synthases (DMAT), non-ribosomal peptide synthetases (NRPS), polyketide synthetases (PKS), and terpene cyclases (TC). These bioactive compounds fulfill various roles that are hypothesized to increase the fitness of the fungus by promoting better recognition and adaptation to environmental cues.
Biosynthesis of secondary metabolites is heterogeneous across the fungal tree of life, but the vast majority of discovered and predicted secondary metabolites are reported within the fungal phyla Ascomycota and Basidiomycota of the subkingdom Dikarya (Wisecaver et al. 2014;Rokas et al. 2018). Filamentous ascomycetes are the major producers of polyketide and peptidic secondary metabolites (Rokas et al. 2018) (e.g., penicillin, cyclosporin, etc.), with the majority of genes, gene clusters, and enzymes involved with fungal SM found in the subphylum Pezizomycotina (Collemare et al. 2008;Wisecaver et al. 2014;Helaly et al. 2018;Rokas et al. 2018). Although less than Ascomycota, Basidiomycota is also a prominent producer of SM, including diverse terpene genes (e.g., sesquiterpenes, Quin et al. 2014), some of the better-known hallucinogens (e.g., psilocybin of Psilocybe; Reynolds et al. 2018), and compounds toxic to humans (e.g., amanitin of Amanita; Luo et al. 2010).
For reasons that are unclear, the remainder of kingdom Fungi is characterized by a paucity of secondary metabolites (Voigt et al. 2016). This includes the zoosporic fungi and relatives classified in Blastocladiomycota, Chytridiomycota and Rozellomycota, and the nonflagellated, zygomycete fungi of Mucoromycota and Zoopagomycota. This pattern of SM diversity supports the hypothesis that diversification of secondary metabolism is a characteristic of Ascomycota and Basidiomycota (subkingdom Dikarya), which share a more recent common ancestor relative to the other phyla. Recent genome sampling efforts have focused on increased sequencing of non-Dikarya species Kohler et al. 2015;Spatafora et al. 2016;Quandt et al. 2017;Ahrendt et al. 2018). These efforts have provided a better understanding of the relationships of the phyla of kingdom Fungi (e.g., Spatafora et al. 2016), and processes and patterns that shaped the evolution of morphologies (e.g., Nagy et al. 2014) and ecologies (e.g., Chang et al. 2019;Quandt et al. 2017) within the kingdom. The availability of a diversity of these genomes provides an opportunity to characterize and focus on the secondary metabolism composition of non-Dikarya taxa, which have remained relatively unexplored.
While the majority of non-Dikarya taxa have low SM diversity, genomic sequencing of the genus Basidiobolus (Phylum Zoopagomycota) revealed that it possesses an unusually large composition of SM gene clusters. Species of Basidiobolus have complex life cycles, that involve the production of multiple spore types that occur in various environmental niches. Species have been found in the digestive tracts and feces of fish, reptiles, and amphibians (Nickerson and Hutchison 1971), where their function and impact on the host remains unknown. They have also been isolated from cadavers of mites and collembola species (Manning et al. 2007;Werner et al. 2012Werner et al. , 2018, centipedes (Fonseca et al. 2019), spiders (Henriksen et al. 2018), and associated with the gut microbiota of rove beetles (Stefani et al. 2016). Basidiobolus is known from leaf litter (Smith and Callaghan 1987;Manning and Callaghan 2008) and it has been identified as part of the microbiome of aquatic carnivorous plants (Sirová et al. 2018). Finally, species of the genus have also been implicated in opportunistic infections of humans and other mammals (Vikram et al. 2012;Geramizadeh et al. 2015;Okada et al. 2015). Across these environments Basidiobolus is dispersed by both forcibly discharged asexual spores (blastoconidia) and passively dispersed asexual spores (capilloconidia) that adhere to exoskeletons of small insects. Basidiobolus also reproduces sexually through the production of zygospores (meiospores) either by selfing (homothallic) or outcrossing (heterothallic) according to species. Based on these diverse ecologies, Basidiobolus must have adapted for survival in numerous environmental niches including the digestive systems of diverse animal species, feces, insect phoresis, and on decaying plant matter or leaf litter.
In this study we demonstrate that the genomes of Basidiobolus contain a larger number of genes related to SM than predicted by phylogeny and that in several cases the evolution of many of these SM genes is inconsistent with vertical evolution. Our objectives were to: i) characterize the diversity of SM in Basidiobolus, ii) identify the phylogenetic sources of this diversity, and iii) determine which classes of SM gene clusters are functional and may predict the secondary metabolites produced by species of Basidiobolus. Finally, we propose a model in which the amphibian gastrointestinal system is an environment that promotes noncanonical evolution of its fungal inhabitants.

Data collection
Annotated genome and amino-acid translation of predicted gene model sequences for three isolates of two species within the genus Basidiobolus were used in this study: Basidiobolus meristosporus CBS 931.73 (Mondo et al. 2017), isolated from gecko dung in the locality of Lamco, Ivory Coast; B. meristosporus B9252 (Chibucos et al. 2016) isolated from human eye in Saudi Arabia; and B. heterosporus B8920 (Chibucos et al. 2016) isolated from plant debris in India. The genomic sequence of B. meristosporus CBS 931.73 was sequenced with PacBio and annotated by Mondo et al. (2017) and obtained from the US Department of Energy Joint Genome Institute MycoCosm genome portal (https://mycocosm.jgi.doe.gov; Grigoriev et al. 2014). Genomic sequences and annotation of B. meristosporus B9252 and B. heterosporus B8920 sequenced by Chibucos et al. (2016) were obtained directly from the authors. The raw reads for these two species are available in GenBank (Accession numbers GCA_000697375.1 and GCA_000697455.1). Additional genomes of 66 Mucoromycota and Zoopagomycota species were used in this study (Ma et al. 2009, Tisserant et al. 2013, Wang et al. 2013, Schwartze et al. 2014, Chang et al. 2015, Chibucos et al. 2016, Corrochano et al. 2016, Lastovetsky et al. 2016, Wang et al. 2016, Mondo et at. 2017, Uehling et al. 2017, Ahrendt et al. 2018, Chen et al. 2018. Data sources for the genomes included in these analyses are available in Table S1 Secondary metabolite gene cluster prediction Predicted SM proteins were retrieved for genomes of 66 Mucoromycota and Zoopagomycota species (not including Basidiobolus) based on the Secondary Metabolite Unique Regions Finder (SMURF) (Khaldi et al. 2010) predictions available at MycoCosm for NRPS and PKS, and using local prediction with AntiSMASH V4.2.0 (Blin et al. 2017) for Terpene Cyclases. Secondary Metabolite gene clusters for the three genomes of Basidiobolus were predicted with Anti-SMASH v4.2.0 and SMURF from the annotated genomes of the three isolates used in this study. The AntiSMASH prediction was performed on local HPC, while SMURF predictions were obtained by submission of the genomes to SMURF web server (http://smurf.jcvi.org/). Predictions were contrasted manually to include all clusters of secondary metabolites from both sources. Orthologous sets of core SM genes across the three genomes of Basidiobolus were identified using OrthoFinder (Emms and Kelly 2015).

Secondary metabolite expression analysis
To assess the expression of predicted SM proteins, we calculated summarized counts of RNA transcript per million (TPM) for genes in B. meristosporus CBS 931.73 isolate by aligning RNA-Seq reads to the assembled B. meristosporus CBS 931.73 genome with HiSat v2.1.0 (Kim et al. 2019). The aligned sequence reads were processed with HTS-seq (Anders et al. 2015) to generate the counts of overlapping reads found for each gene and the normalized TPM for the genes was calculated using the cpm function in edgeR (Robinson et al. 2010). A distribution of RNA-Seq read counts per gene was plotted using the ggplot2 package in the R statistical framework (R Core Team 2018).

Phylogenomic analyses
Phylogenetic analyses were used to assess the evolutionary relationships of the NRPS, PKS, and terpene cyclase/synthase predicted proteins. For the NRPS genes, the adenylation domains (A-domains) were identified by hmmsearch from HMMer 3.0 suite (Eddy 2004), using the A-domain profile reported by Bushley and Turgeon (2010) as a reference profile HMM. The predicted A-domains were extracted from the resulting HMMER table into a FASTA file using the eslreformat program included in the HMMER suite. The predicted A-domains for all Basidiobolus, Mucoromycota and Zoopagomycota species were added to an A-domain amino-acid alignment reported by Bushley and Turgeon (2010) with MAFFT v7 (Katoh et al. 2017) (File S1). This reference alignment contains A-domains from NRPS proteins from nine major subfamilies of fungal and bacterial NRPS proteins. The phylogenetic domain tree was constructed using a maximum likelihood approach implemented in RAxML v. 8.2.11 (Stamatakis 2014) with the JTT amino acid substitution matrix, after model selection using the PROTGAMMAAUTO option, and 1000 bootstrap replicates (raxmlHPC-PTHREADS -T 12 -n NRPS -s infile.fasta -f a -x 12345 -p 12345 -m PROTCATJTT -N 1000). A graphical representation of the A-domains from the NRPS/NRPS-like core gene models was constructed by coloring the A-domain position in the core gene according to its phylogenetic origin using the ggplot2 R package (Wickham 2016).

Horizontal gene transfer in Basidiobolus species
Identification of genic regions with evidence for horizontal gene transfer (HGT) from bacteria was performed by searching all translated proteins from the predicted gene models of the Basidiobolus isolates against a custom BLAST protein database. This database included all amino-acid sequences available in the NCBI RefSeq proteomics database and all available amino-acid sequences for Mucoromycota and Zoopagomycota species at the DOE-JGI Myco-Cosm genome portal. The proteins were searched with BLASTP against the combined RefSeq/Mucoromycota/Zoopagomycota custom database, using an e-value cutoff of 1e -10 and the BLOSUM62 substitution matrix. A summary of the taxonomy identifier for all best hits was obtained by identifying whether the best hit was a Mucoromycota/Zoopagomycota protein, or a RefSeq protein. We used the rentrez package (Winter 2017) in the R statistical framework to extract the top-ten taxonomic identifiers from the NCBI database when hits did not correspond to Mucoromycota/Zoopagomycota. Proteins that had no hits to a fungal protein and only hits to a bacterial protein were considered candidate HGT genes.
To increase the accuracy of the prediction of genes product of HGT, we tested whether HGT candidate genes showed signs of errant assembly or in silica incorporation into the Basidiobolus genomes. The mean genomic read coverage of each candidate gene was calculated and compared to the mean coverage of the scaffold harboring the HGT candidate gene for all three Basidiobolus isolates. A z-score was calculated in R to determine the number of standard deviations of the HGT candidate from the mean coverage of the harboring scaffold. All HGT candidates with standard deviations greater or less than two were removed from the analysis. The genomic reads were mapped to each reference genome using BWA (Li and Durbin 2009). Coverage per HGT and harboring scaffold was estimated using samtools depth . A summary plot of the proportion of genes with bacterial best hits was constructed using the ggplot2 package in R. Lastly, the best taxonomic hit of genes that co-occur on common scaffolds with HGT candidate genes was assayed to determine if the HGT genes are located in areas with fungal genes. HGT candidates that had no fungal genes either upstream or downstream from the HGT candidates were removed from subsequent HGT analyses.
To identify differences in the GC content between HGT candidates and genes of fungal origin, ANOVA was performed between the mean GC content for HGT candidates, fungal genes on common scaffolds with HGT genes, and all fungal genes. ANOVA was performed on Basidiobolus meristosporus CBS 931.73, as this genome possessed the longest contigs and highest quality assembly. A Tukey's HSD test was performed to test significant differences between each set of genes. In addition, to identify if the HGT candidate genes had a reduced number of introns than the fungal genes, a summary of the number of introns and normalized intron length (intron length divided by gene model length) was performed in the GenomicRanges R package (Lawrence et al. 2013). A Kruskal-Wallis test was performed in R to identify significant differences in the number of introns and the normalized intron length for the HGT candidates and the fungal genes. A nucleotide composition analysis based on 5-mers and codon usage was performed to observe differences between candidate HGT genes and fungal genes for Basidiobolus meristosporus CBS 931.73. The 5-mer analysis was conducted in all predicted coding sequences from the annotated gene models using the oligonucleotideFrequency function of the Biostring R package (Pagès et al. 2019). Codon usage was estimated using the uco function of the seqinr R package (Charif and Lobry 2007). Both these indices were divided by gene length to normalize the nucleotide composition by the effect of gene length. Principal component analyses were performed in R for both 5-mer and codon usage analysis to compare the HGT candidates to the fungal candidate genes. Lastly, the putative functions of the genes with evidence for HGT were summarized using the functional annotations available in the gene format file (GFF) of B. meristosporus CBS 931.73.

Measuring siderophore activity in Basidiobolus
To measure the siderophore activity (chelation of ferric ions) of Basidiobolus, an assay of detection of siderophore activity based on the universal chrome azurol S (CAS) assay (Andrews et al. 2016) was performed for the strain Basidiobolus meristosporus CBS 931.73. This colorimetric assay uses a complex of Fe(III) -CAS -DDAPS (Surfactant). When this complex is combined with acetate yeast agar (AY agar), it results in a greenish-blue color, where the color changes to yellow upon the removal of the iron. A total of 450 ml of AY agar (Andrews et al. 2016) was mixed with 50 ml of autoclaved 10X CAS assay (20 ml of 10 mM Fe(NO 3 ) 3 , 40 ml of 10 mM chrome azurol S, and 100 ml of 10 mM DDAPS). A 10 cm layer of the AY-CAS media was poured in small petri dishes and cooled. An upper layer of 10 cm of AY agar was poured after cooling, and the media was left to diffuse overnight and stored at 4°for 24 hr. B. meristosporus, Conidiobolus thromboides FSU 785 (negative control), and Cladosporium sp. from the herbarum species complex PE-07 (positive control) were transferred into the AY-CAS plates by transferring a small amount of mycelium via a sterile toothpick and piercing the media in the center. The assay was performed in triplicate for each isolate used. Cultures were grown at room temperature for 12 days. Siderophore activity was measured as the area of the plate that has changed to yellow color, when compared to a negative control and an empty petri dish with AY-CAS agar. Pictures of the plates were taken. These images were imported into Adobe Photoshop CC 2019 and size-corrected to 5.5 cm (diameter of the small petri dish). All images were concatenated in the same file. The Color Range tool of Adobe Photoshop CC tool was used to measure the yellow area for all plates. The area measurements were exported into R, where an analysis of variance was performed to determine significant differences across the siderophore activity of the strains.

Data availability
File S1 includes the ortholog groups of SM for Basidiobolus species. File S2 includes the alignment of all A-domains from NRPS sequences used in the study. File S3 includes the alignment of A-domains from Zoopagomycota and Mucoromycota NRPS sequences. File S4 includes the KS alignment for all PKS sequences used in this study. File S5 includes all terpene cyclase core genes used in this study. Figure S1 contains maximum likelihood reconstruction of NRPS-A domains. Figure S2 contains graphical representation of cluster 5 from B. meristosporus CBS 931.73. Figure S3 contains phylogenetic sources and abundance of KS domains. Figure S4 contains maximum likelihood reconstruction of PKS KS domains. Figure S5 contains presence and absence of domains characteristic of PKS core gene models in non-zygomycete fungal species. Figure S6 contains presence and absence of domains characteristic of PKS core gene models in zygomycete fungal species. Figure S7 contains phylogenetic sources and abundance of terpene cyclases. Figure S8 includes the proof of concept for HGT assays. Figure S9 includes intron number comparisons for HGT genes. Figure S10 includes a PCA 5-mer analysis for HGT genes. Figure S11 includes a PCA for codon usage analysis for HGT genes. Figure S12 includes siderophore activity assays. Table S1 includes all Mucoromycota and Zoopagomycota isolates used in the manuscript. Table S2 is a summary of secondary metabolite genes predicted for Basidiobolus species. Table S3 includes the isolates of Dikarya used for the terpene cyclase assay. Table S4 includes the summary of hits of HGT genes against NBCI taxonomy. Table S5 is a summary of gene annotations from HGT candidates. Supplementary files, tables and figures have been deposited to figshare. All additional files included as supplementary materials in this manuscript and custom scripts used for this analysis can be found in the ZyGoLife GitHub repository (https://github.com/zygolife/Basi-diobolus_SM_repo). Supplemental material available at figshare: https://doi.org/10.25387/g3.12691859.
The highest number of SM gene clusters were predicted for Basidiobolus. A total of 38 gene clusters and 44 SM core genes were predicted in the B. meristosporus CBS 931.73 genome, 40 SM gene clusters and 44 SM core genes for B. meristosporus B9252, and 23 SM gene clusters and 23 SM core genes for B. heterosporus B8920 (Table  1; Table S2). Seventy-eight percent of the SM gene models predicted were found to be shared across the Basidiobolus isolates. Ten SM core genes were found to be unique to B. meristosporus CBS 931.73, 10 SM genes unique to B. meristosporus B9252, and 5 SM genes unique to B. heterosporus B8920 (File S1).
Expression of core SM genes A total of 83.45% of the RNA sequenced reads were mapped uniquely to the reference genome of B. meristosporus CBS 931.73, while 12.08% of the reads were mapped in more than one location. Only 4.47% of the RNA sequenced reads did not map to the reference genome. The majority of predicted SM for B. meristosporus CBS 931.73 were expressed at the same or higher levels than constitutive housekeeping genes, such as Beta-tubulin, Elongation Factor 1, Actin, and Ubiquitin

Phylogenetic analysis of NRPS/NRPS-Like A-domains
The phylogenetic reconstruction of A-domains was performed with 951 A-domains from a combined dataset including the A-domain dataset from Bushley and Turgeon (2010) and the predictions of NRPS/NRPS-like A-domains from Mucoromycota and Zoopagomycota genome sequences (File S2). A total of 395 NRPS A-domains were predicted for Basidiobolus, Mucoromycota and Zoopagomycota genome sequences (File S3). The phylogenetic analyses recovered the nine major families reported by Bushley and Turgeon (2010) with the addition of the two new clades, surfactin-like and ChNSP 12-11-like, reported here. The total number of Mucoromycota/Zoopagomycota A-domains and their distribution across these clades are as follows: 74 to the AAR clade; 115 to the major bacterial clade (MBC), including 104 A-domains with the surfactin-like clade with and an additional 11 A-domains scattered elsewhere in the MBC; the CYCLO clade with eight A-domains; 65 A-domains to the ChNSP 12-11-like clade; 25 A-domains to the ChNSP 12-11 clade; 11 Adomains to the PKS/NRPS clade; 16 A-domains to the SID clade; 76 A-domains to the SIDE clade; and 5 A-domains to the EAS clade ( Figure 3). No A-domains were clustered into the ACV clade or the ChNSP 10 clade, and the remaining A-domains grouped within the outgroup clade (Figures 3 and 4, Figure S1). Similar phylogenetic origins for multiple A-domains were found in a single NRPS core gene (Figure 4) Gene model 387529 was predicted as a tetra-modular protein, with two N-methyltransferase domains and a thioesterase domain. In addition, AntiSMASH analyses predict a six gene cluster (cluster 5) that includes gene model 387529, a zinc finger transcription factor (312194), carrier protein (277549), transporter (277552), peptidase (277554), and a SNARE associated protein (334759) ( Figure S2). All of these gene models have evidence of gene expression (Figure 2 The major clades of PKS proteins reported by Kroken et al. (2003) were recovered by this study, including the fungal reducing (R) PKS, animal fatty acid synthases (FAS), fungal non-reducing (NR) PKS, and bacterial PKS, as well as fungal FAS I and II clades ( Figure S3 and S4). The Fungal R PKS1 clade included domains from B. meristosporus   Figure S3 and S4).
To better understand the predicted PKSs of Mucoromycota and Zoopagomycota that clustered in the fungal FAS clade, the patterns of domains that comprise fungal PKS protein sequences were analyzed. Predicted PKS of Ascomycota contained AT, KS and PP domains in all sequences ( Figure S5). Predicted PKS from Basidiomycota contained AT and KS domains in all sequences, while KR and PP domains were present on 36% of the sequences.  Figure S6). Some Basidiobolus PKS were differentiated from the other Mucoromycota and Zoopagomycota PKS as they possessed one or more of the DH or PP domains ( Figure S6).
Evolutionary relationships of predicted terpene cyclase gene models A total of 1,108 terpene cyclase or terpene cyclase-like gene models were predicted and used for phylogenetic analyses (Additional file 5). These include 253 identified in the Mucoromycota species genomes, 59 in the Zoopagomycota genomes, and 401 ortholog proteins from 58 fungal genomes of Basidiomycota and Ascomycota. An additional 395 TC candidates were identified in bacterial genomes from RefSeq  via BLASTP. The phylogenetic reconstruction of TC resulted in two main clades: an outgroup clade that comprised predicted TC annotated as tRNA threonylcarbamoyladenosine dehydratase and phytoene synthases, and a main ingroup TC core clade ( Figure 5, Figure  S7). The tRNA threonylcarbamoyladenosine dehydratase subclade contained mostly bacterial sequences plus one TC gene model of B. meristosporus B9252 and B. meristosporus CBS 931.73 each. The phytoene synthases clade comprised two subclades, one clade that grouped most bacterial gene models, and a second clade clustering bacterial and Mucoromycota predicted TC gene models, but no Basidiobolus TC core gene models were found in this clade. The TC core clade ( Figure 5 and Figure S7) comprised a bacterial exclusive clade (Bacteria I), followed by four highly supported subclades containing TC core genes from Mucoromycota, Zoopagomycota, and Dikarya, and are referred to here as Fungi TC clades I -IV. The majority of TC genes from Mucoromycota and Zoopagomycota clustered within clade Fungi TC II, which included no other fungal or bacterial TC. These genes were annotated as Squalene synthetases by the Mycocosm genome portal.
The Fungi TC clades I, III and IV included mostly Mucoromycota TCs. The Fungi TC I clade contained TC genes from Mucoromycota isolates, as well as TC genes from the ascomycete isolates Fusarium verticillioides 7600 and Hypoxylon sp. EC38 v1.0, and the basidiomycete Suillus luteus UH-Slu-Lm8-n1 v2.0. These gene models were annotated as associated with the Ubiquitin C-terminal hydrolase UCHL1 according to the MycoCosm portal. Fungi TC III clade also comprised mostly Mucoromycota isolates, with additional TCs from Zoopagomycota isolates Piptocephalis cylindrospora RSA 2659 single-2cell v3.0 and Syncephalis pseudoplumigaleata Benny S7121 single-2cell v1.0. Annotations of genes in Fungi TC III clade in Mycocosm indicated that these proteins are part of the isoprenoid/propenyl synthetases, responsible for synthesis of isoprenoids. Isoprenoids play a role on synthesis of various compounds such as cholesterol, ergosterol, dolichol, ubiquinone or coenzyme Q (Finn et al. 2009). Finally, Fungi TC IV clade followed a similar Mucoromycotaenriched pattern with the exceptions of Dimargaris cristalligena RSA 468 single2cell v1.0, Linderina pennispora ATCC 12442 v1.0, M. pterosporus CBS 209.56 v1.0, and Syncephalis fuscata S228 v1.0. Fungi TC IV clade also included TC genes from Ascomycota sequenced genomes. The majority of these genes were annotated as containing a terpene synthase family, metal binding domain, and a polyprenyl synthetase domain in the MycoCosm genome portal.
Within the fungal clades, the NADH dehydrogrenase (ubiquinone) complex clade represented a highly supported clade clustering bacterial, Mucoromycota and Zoopagomycota TCs. Annotations associated with gene models found in this clade indicated that this clade comprised TC associated to NADH dehydrogrenase (ubiquinone) complex.
The terminal clades of TC clusters include the Bacteria II + Basidiobolus clade that included bacterial TC and six TC SM core genes predicted from Basidiobolus (two from each genome), and the clade comprising TC core genes solely from Dikarya and one gene from Mortierella verticillata NRRL 6337 and Rhizopus microsporus ATCC11559 v1.0 (Figures 5 and Figure S7).

Signatures for HGT in Basidiobolus genomes
The identity search for genes with evidence for HGT identified 934 genes in B. meristosporus CBS 931.73, 620 genes of B. meristosporus B9252, and 382 genes of and B. heterosporus B8920 with zero BLASTP hits to fungal proteins. These genes were used for the coverage assay to identify gene model coverage deviation from the harboring scaffold median coverage ( Figure S8). This assay resulted in 810 genes of B. meristosporus CBS 931.73, 532 genes of B. meristosporus B9252, and 301 genes of and B. heterosporus B8920 with z-scores under 2 standard deviations from the harboring scaffold median coverage. Finally, 4 HGT genes from B. meristosporus CBS 931.73, 106 genes from B. meristosporus B9252, and 89 genes from B. heterosporus B8920 were not located in scaffolds surrounded by genes with best fungal hits. All other HGT genes were located in scaffolds surrounded by fungal genes: 806 from from B. meristosporus CBS 931.73, 426 genes from, B. meristosporus B9252, and 212 genes of of and B. heterosporus B8920. These candidate genes with evidence for HGT represented 5%, 3% and 2% of the gene content of B. meristosporus CBS 931.73, B. meristosporus B9252, and B. heterosporus B8920, respectively. The HGT candidates showed significant differences in GC content (F-value = 6.395, df = 2, p-value = 0.00167) when contrasted to all genes considered of fungal origin (Mean fungal GC content = 0.4865, Mean HGT GC content = 0.4911, Tukey HSD p-value = 0.001), but no differences were detected in GC content between HGT candidates and fungal genes that co-occur on common scaffolds (Mean surrounding gene GC content = 0.488, Tukey HSD p-value = 0.30). Similarly, no differences in codon usage, or in 5-mer nucleotide composition (Figures S10 and S11, respectively), were found between HGT candidates and genes of fungal origin. Differences were found, however, in intron number and normalized intron length between genes HGT candidate genes and fungal genes, where the distribution of intron number showed a median of 0 introns for HGT candidates and 2 for fungal genes (Kruskal-Wallis test; x 2 = 272.88, df = 1, p-value , 2.2e-16; Figure S9). The majority of HGT candidate genes (61% of the genes) have no introns, compared to 36% of genes from fungal origin with no introns ( Figure S9). The candidate HGT genes have a significantly smaller normalized intron length than the genes with a fungal origin (Kruskal-wallis test; x 2 = 272.88, df = 1, p-value , 2.2e-16).
A large percentage of SM core genes appeared to be the product of HGT from bacterial species into Basidiobolus. The SM core genes identified as candidate HGT genes is 61%, 27% and 30% for B. meristosporus CBS 931.73, B. meristosporus B9252, and B. heterosporus B8920, respectively (Table 2). NRPS/NRPS-like SM core genes represent the largest percentage of HGT evidence, while TC have the lowest percentage of HGT evidence ( Table 2, Table 3). The identification of taxonomic sources for HGT into Basidiobolus indicated that most HGT comes from bacteria in the phylum Proteobacteria with 232, 135 and 69 gene models in B. meristosporus CBS 931.73, B. meristosporus B9252, and B. heterosporus B8920, respectively. Firmicutes (112, 68, and 35 gene models, respectively) and Actinobacteria/ High Gram C+ (127, 53, and 25 gene models, respectively) ( Figure 6, Table S4) were the second and third most abundant source of HGT from bacteria into Basidiobolus.

Siderophore activity in Basidiobolus meristosporus
The siderophore activity assay resulted in visible activity for the three replicates for B. meristosporus and Cladosporium sp., and a small halo for one of the replicates of C. thromboides. No visible siderophore activity was detected for the other replicates of C. thromboides or for the empty AY-CAS plates (Figure 7, Figure S12). The analysis of Figure 4 Graphical representation of the A-domains of each NRPS core gene predicted for Basidiobolus genomes. Horizontal gray lines represent the length of the predicted NRPS core gene. Arrows represent A-domains and are located in the position within the gene model. Colors represent the NRPS cluster assigned in Fig. 3. Numbers represent the name of each gene model and predicted SM cluster for each genome.
variance for the area of siderophore activity measured showed significant differences across isolates/empty plates (ANOVA, F value = 19.62, P = 0.0004). Post-hoc tests showed no differences between B. meristosporus and Cladosporium sp. siderophore activity (Tukey HSD, P = 0.9801489) nor between C. thromboides and the empty AY-CAS plates (Tukey HSD, P = 0.9396603), but significant differences were found for all other comparisons (Tukey HSD, P , 0.001 for all remaining comparisons).

DISCUSSION
Secondary, or specialized, metabolite (SM) production is an important element of fungal metabolism. It has resulted in numerous natural products with human health implications, such as mycotoxins in our food supply, and medical applications, including antibiotics, immunosuppressants, and antitumor agents. The majority of the known genetic and chemical diversity of fungal SM has been described from the phyla Ascomycota and Basidiomycota with few SM gene clusters, and limited SM production, reported for fungi in Mucoromycota and Zoopagomycota. This observation has led to the dogma that 'zygomycete' species are depauperate of these chemical pathways (Voigt et al. 2016). Here we report gene clusters involved in SM production in the largest survey to date of Mucoromycota and Zoopagomycota species using genomics approaches and estimate the SM potential of these fungi.
Genome sequencing of a total of 69 isolates across diverse lineages of Mucoromycota and Zoopagomycota enabled detailed identification of SM clusters. These results support the hypothesis that zygomycete fungi have a low abundance of secondary metabolism (Figure 1, Table S1) and agree with previous reports (Voigt et al. 2016). Outliers to this pattern exist, however, and are particularly true of the genus Basidiobolus (Zoopagomycota), which possesses a large number of SM gene clusters predicted for the NRPS, PKS and TC families. This discovery of abundant candidate genes for production of secondary metabolite in Basidiobolus is novel and its presence is most consistent with a signal of horizontal gene transfer from bacteria to fungi, a phenomenon we propose is facilitated by living in the amphibian gut environment.

Distribution and evolution of NRPS genes across Mucoromycota and Zoopagomycota
A deeper examination of Basidiobolus SM gene clusters indicates that this genus surpasses the number of expected NRPS genes for zygomycete species (Figure 1). Most of these SM genes also show evidence for transcription, indicating that the majority of these genes are expressed constitutively under laboratory conditions (Figure 2). Several of these core genes, such as the NRPS gene model 387529, appear to be expressed at a higher rate than several housekeeping genes.  A more detailed census of core genes reveals unique patterns of evolution of NRPS genes across Mucoromycota and Zoopagomycota when compared to Dikarya fungi ( Figure 3, Table 1). The only NRPS A-domains found throughout the two phyla are members of the AAR clade (with the exception of Rhizopus delemar 99-80 and Piptocephalis cylindrospora RSA 2659). These A-domains are from the genes that encode a-aminoadipate reductases (AAR), an enzyme responsible for the reduction of alpha-aminoadipic acid, which is essential for the lysine biosynthesis pathway and is present in all fungal phyla (Bushley and Turgeon 2010). In contrast, the remaining NRPS genes, and their respective A-domains, show discontinuous and patchy distributions across Mucoromycota and Zoopagomycota (Figure 3).
The most pronounced NRPS diversification in Basidiobolus are for genes that are predicted to encode for siderophores, iron chelating metabolites. A-domains for predicted siderophores are distributed throughout four clades including the three clades of major bacterial genes and the fungal SID and SIDE clades (Figure 3, Figure S1). Major bacterial clades (MBC) exclusively comprise bacterial siderophore synthases, such as pyoverdine, yersiniabactin and pyochelin (Bushley and Turgeon 2010), with the exception of the surfactin-like clade. Our results show that genomes of Mortierella and Basidiobolus contain A-domains that are members of the MBC, and that they are the only fungal representatives of these clades. SID clade contains all NRPS associated with siderophore production from Ascomycota and Basidiomycota species. All Basidiobolus isolates contain one NRPS A-domain in this clade, as well as three A-domains from three NRPS gene models of Conidiobolus coronatus NRRL28638. Finally, SIDE, a clade comprising NRPS genes responsible for the production of siderophores in filamentous ascomycetes is expanded in Basidiobolus (Figure 3), which is also the only zygomycete with A-domains n■ clustered in this clade. These findings are consistent with enrichment of both bacterial and fungal siderophores and are suggestive of the importance of iron metabolism in Basidiobolus.
The CYCLO clade contains the A-domains for core genes associated with biosynthesis of cyclic peptides, such as beauvericin and cyclosporin (Bushley and Turgeon 2010;Bushley et al. 2013). Sister to all other CYCLO clade A-domains are the A-domains that comprise the NRPS core gene model 387529 from B. meristosporus ( Figure S2). 387529 is expressed at the highest rate of any SM gene under laboratory conditions (Figure 2). It is annotated as a tetra-modular gene model, that includes two N-methylation domains, four adenylation domains, four condensation domains, and a TE domain. When compared to simA, the NRPS responsible for biosynthesis of cyclosporin, structural similarities can be found, such as the presence of the N-methylation domains and the TE terminator domain. Its phylogenetic and structural similarities to simA suggest that 387529 results in the synthesis of a cyclic peptide with methylated amino acid residues. Cyclopentapeptides were recently described from B. meristosporus and were hypothesized to be produced by gene model 387529 (listed as gene model ORX93211.1 in Luo et al. 2020), but their linkage to an NRPS gene cluster requires genetic confirmation.
The surfactin-like clade contains A-domains for bacterial core genes with similarities, but not identical, to the Bacillus subtilis surfactin termination module (srfA-C gene; Peypoux et al. 1999) including the third, fourth and fifth A-domains of the five A-domain gene model NP_930489.1 of Photorhabdus luminescens gene, two domains of the gene PvdD (AAX16295.1; pyoverdine synthetase) of Pseudomonas aeruginosa, and a single A-domain of the bimodular NRPS dbhF protein of Bacillus subtilis (AAD56240.1) which is involved in the biosynthesis of the siderophore bacillibactin. This surfactin-like clade contains eleven A-domains predicted from Basidiobolus genomes including the gene models 298977, 368581 and 372991 from B. meristosporus CBS 931.73. Gene model 298977 showed no evidence for gene expression, while gene models 368581 and 372991 show high rates of expression ( Figure 2). This is the first report of the prediction of a surfactin-like gene in fungi, but surfactant production was recently reported in Mortierella alpina (Baldeweg et al. 2019). These include malpinins, amphiphilic acetylated hexapeptides that function as natural emulsifiers during lipid secretion, and malpibaldins, hydrophobic cyclopentapeptides. This finding is consistent with the genomic data and reveals that Mortierella, in addition to Basidiobolus, possesses homologs in the surfactin-like clade that are phylogenetically different from B. subtilis surfactins.
Surfactins, encoded by the srfA gene cluster in Bacillus subtilis, are functionally active as surfactants, as well as toxins and antibiotics. However, the surfactin genes from B. subtilis (SrfA-AA, SrfA-AB and SrfA-AC) were included in our analysis and clustered in a different clade within the MBC. A-domains from single module NRPS-like protein from B. meristosporus CBS 931.73 (Gene model 146993) and from a NRPS-PKS hybrid A-domain from of B. meristosporus B9252 (Gene model N161_14278) clustered with the B. subtilis SrfA genes. The placement of this NRPS-PKS A-domain is interesting because surfactins are lipopeptides, which contain a hydrophobic fatty acid chain, whose biosynthesis is consistent with an NRPS-PKS hybrid. The A-domains from the remaining Mucoromycota and Zoopagomycota NRPS-PKS hybrids clustered as sister to the original clade of Dikarya NRPS-PKS hybrids, supporting the hypothesis of a single origin of fungal NRPS-PKS hybrids (Bushley and Turgeon 2010).

Mucoromycota and Zoopagomycota lack PKS diversity
Polyketide synthases (PKS) are abundant SM of Ascomycota and Basidiomycota and are involved in antibiotic production, carotenoid biosynthesis and other functional roles. In contrast, literature on PKS diversity for Mucoromycota and Zoopagomycota is limited. Our analyses update the phylogenetic reconstruction reported by Kroken et al. (2013) by adding genomic information from Chytridiomycota, Mucoromycota and Zoopagomycota (Figures S3 and S4). Overall, we Figure 6 Plausible taxonomic sources of HGT genes. Bar-plot represents the proportion of diversity of HGT candidates for each Basidiobolus genome. Colors represent the taxonomy term for the RefSeq best hit from BLAST. Overall, between 2-5% of the gene models predicted for Basidiobolus species appear to be product of HGT from taxonomic groups of bacteria associated to reptilian and amphibian gut tracts (Proteobacteria, firmicutes, and CFB/bacteroidetes).

Figure 7
Siderophore activity of Basidiobolus meristosporus CBS 931.73 in a universal CAS assay using layered AY-CAS plates after 12 days. Bars represent mean siderophore activity measured per strain as the yellow area in AY-CAS plates for three replicates. Error bars represent the standard deviation for each replicate. Cladosporium sp. PE-07 represents the positive control. Conidiobolus thromboides FSU 785 represents a zygomycete with no evidence for siderophore NRPS expansion.
report the discovery of two new clades: A clade of non-reducing PKS proteins (clade IV) comprising Neocastimastigomycota and Basidiomycota, and a novel FAS II clade consisting of Neocastimastigomycota, Chytridiomycota and Basidiomycota KS domains. The results of our PKS prediction revealed a number of potential PKS core genes in zygomycete species (Figures S3 and S4), but the results of our phylogenetic reconstruction show that the KS domain of the predicted PKS gene models are fungal fatty acid synthases (FAS). Only Basidiobolus meristosporus genomes possessed KS domains associated with fungal PKS, either reducing or non-reducing, and only B. meristosporus CBS 931.73 possessed a PKS in the Fungal FAS clade.
A domain-by-domain presence/absence analysis of PKS genes models (Figures S5 and Figure S6) shows that in addition to AT and KT domains, a third domain (either KR, DH or PP) is found in the majority of fungal PKSs ( Figure S5). Conversely, the zygomycete genes in the FAS clade only possess the AT and/or KT domains, which are domains common between FAS and PKS. However, Basidiobolus KS domains are found in fungal reducing PKS clades. Gene models 292783 and 237744 from both isolates of B. meristosporous cluster with fungal reducing PKS II and possess the DH domain ( Figure S4), and the expression levels of 292783 is consistent with an actively transcribed gene ( Figure 2).
Terpene cyclase-like genes are expanded in zygomycetes Terpene cyclases (TC) are the most common SM predicted for zygomycete species (Figure 1, Table S1). Phylogenetic reconstruction shows that zygomycete TC core genes are clearly distinct from Ascomycota and Basidiomycota TC, where at least 4 new clades of predominantly zygomycete TC are found ( Figure 5, Figure S7). Fungi II TC clade comprises TC from all zygomycete genomes analyzed with the exception of Piptocephalis cylindrospora RSA 2659 single-2cell v3.0, Phycomyces blakesleeanus NRRL1555 v2.0, and five genomes of Rhizophagus irregularis, which show no prediction of TC in their genomes. No Dikarya TC genes cluster within this clade. Fungi I, III and IV group TC genes are found almost exclusively in Mucoromycota species, as well as some Dikarya species. TC genes from Fungi I are present only in Mucoromycotina genomes. Functional annotations of TC genes from this clade indicate that these TC genes code for proteins associated with squalene and phytoene synthases and are part of the synthesis of carotenoids. Carotenoids are important compounds for the synthesis pathway of trisporic acid, the main molecule responsible for initiating sexual reproduction in zygomycetes (Burmester et al. 2007). Finally, Basidiobolus is the only non-Dikarya genus with TC genes clustered within the Bacteria II clade of TC. Both these SM core genes show evidence for expression comparable to housekeeping genes or other SM core genes (Figure 2). The presence of bacterial-like TC genes in Basidiobolus present more evidence on the plasticity of the genome of Basidiobolus and its ability to integrate and possibly express foreign SM associated DNA.

Horizontal gene transfer of SM genes to Basidiobolus
Basidiobolus is a genus with a complex biology, alternating ecologies, and multiple spore types. This complex biology is also reflected at the genomic scale. The sequenced genomes show a larger genome size than other zygomycetes, as well as a higher number of genes than any other Zoopagomycota genomes sequenced to date. Our SM prediction assay is concordant with these patterns in which Basidiobolus has an excess of SM gene clusters when compared to other zygomycetes. Evidence points to HGT as a main driver of SM diversity in Basidiobolus as supported by the phylogenetic reconstructions of NRPS and TC gene clusters with bacterial homologs. Basidiobolus and Mortierella are the only fungi with genes associated with the bacterial clades in each of these SM phylogenetic reconstructions. Moreover, these HGT candidates are integrated into the Basidiobolus genome assembly and do not show evidence of artifactual assembly as evidenced by discontinuous coverage ( Figure S8). The most abundant functional group of SM core genes overall are siderophores and their overall functionality is supported by both the RNA expression analyses ( Figure 2) and siderophore plate assays (Figure 7).
In one stage of the Basidiobolus life cycle, the fungus lives as a gut endosymbiont where it co-occurs with bacteria and other organisms that comprise the gut microbiome. Animal gut environments can facilitate HGT between bacteria and fungi, as previously reported for the zoosporic species of Neocallimastigomycetes (Chytridiomycota), which live in the ruminant gut environment and whose genomes exhibit a 2-3.5% frequency of genes with HGT evidence (Wang et al. 2019;Murphy et al. 2019). Phylogenomic analyses of the Basidiobolus NRPS A-domains support a phylogenetic affinity with A-domains from bacterial taxa and more rarely other fungi, a pattern most consistent with HGT. Our HGT survey comprised an extensive search of reference genomes across the tree of life available in NCBI RefSeq, as well as all of the gene models predicted for the Mucoromyocta and Zoopagomycota genomes. We find that 2-5% of all predicted gene models present in Basidiobolous genomes are consistent with signatures of HGT from bacteria. However, the percentages change from 27 to 61% of predicted SM core gene models with bacterial HGT evidence. It should be noted that the two genomes with the lower percentage of predicted HGT SM genes, B. meristosporus B9252 (27%) and B. heterosporus B8920 (30%), are more fragmented, Illumina-only genomes, which complicates assembly of biosynthetic gene clusters. The SM gene models with bacterial signatures are highly abundant, with NRPS and NRPS-like genes comprising the top 25 ontology categories of HGT genes in B. meristosporus (Table S5). These results are consistent with the life history of Basidiobolus, where the fungus lives in close proximity with other microorganisms associated with the amphibian gut environment.
The additional analyses to discover distinct genetic features for HGT candidates showed that the largest differences found are in intron number and intron length, but not in nucleotide composition or by codon usage. A significantly smaller number of introns and smaller normalized intron length in HGT candidates provide more support to the HGT hypothesis, where we expected that bacterially transferred genes would maintain a smaller number and length of introns. Intronic expansion of transferred genes into fungal species after an HGT event appears to be rapid in order to reflect the genetic makeup across the genome (Da Lage et al. 2013) and can explain the introns in some of the HGT candidates. However, up to 60% of the HGT candidates still maintain absence of introns as expected for genes of bacterial origin. Finally, the nucleotide composition of HGT candidates and fungal genes were indistinguishable. Reports show that foreign genes with similar codon usage are more likely to become fixed on the receiving genome (Medrano-Soto et al. 2004;Amorós-Moya et al. 2010;Tuller 2011). We interpreted these results to indicate that horizontally transferred genes are evolving toward a similar nucleotide composition of the fungal genome based on the 5-mer/codon usage assay, but still maintain high protein similarity to and group with donor lineage copy in phylogenetic reconstructions.
The taxonomic survey of our HGT analysis shows that a diverse array of bacteria may have consistently contributed genetic information into the Basidiobolus genomes ( Figure 6 and Table S4).
The most abundant bacterial taxonomic groups associated with HGT are the Proteobacteria, Firmicutes, Actinobacteria/high GC gram positive bacteria, and Bacteroidetes ( Figure 6, Table S4). The proportion of HGT for each taxonomic group appears to be consistent among the three Basidiobolus genomes, and there are consistencies between the most common taxonomic groups responsible for HGT in Basidiobolus and the reported composition of bacteria associated with the gut microbiome in amphibians (Bletz et al. 2016;Kohl et al. 2013) and reptiles (Colston and Jackson 2016;Costello et al. 2010).

CONCLUSIONS
Our results confirm that the majority of zygomycete fungi classified in Mucoromycota and Zoopagomycota do not possess a large genomic potential for secondary metabolism. Significant departures from this pattern exist, however, as exemplified by Basidiobolus, a genus with a complex genomic evolution and potential for considerable and diverse secondary metabolite production. First, it possesses larger than average genome with less than 8% content of repetitive regions, but a genetic plasticity to integrate and express extrinsic DNA. Second, the incorporation of extrinsic DNA is consistent with selection for increased SM production, especially gene models that are related to the capture of resources available in anaerobic conditions (iron chelation by siderophores) and metabolites that may play roles in antibiosis (surfactin-like genes) or host interaction. Third, the amphibian gut environment predisposes Basidiobolus to the acquisition of these SM core genes via HGT from co-inhabiting bacterial species. More information is needed to further test these hypotheses, including sequencing of additional Basidiobolus species with long read technologies; more accurate characterization of amphibian microbiomes that test positive for Basidiobolus; and Liquid Chromatography -Tandem Mass Spectrometry (LC -MSMS) characterization of the Basidiobolus metabolome.