-
PDF
- Split View
-
Views
-
Cite
Cite
Nicole K Reynolds, Jason E Stajich, Gerald L Benny, Kerrie Barry, Stephen Mondo, Kurt LaButti, Anna Lipzen, Chris Daum, Igor V Grigoriev, Hsiao-Man Ho, Pedro W Crous, Joseph W Spatafora, Matthew E Smith, Mycoparasites, Gut Dwellers, and Saprotrophs: Phylogenomic Reconstructions and Comparative Analyses of Kickxellomycotina Fungi, Genome Biology and Evolution, Volume 15, Issue 1, January 2023, evac185, https://doi.org/10.1093/gbe/evac185
- Share Icon Share
Abstract
Improved sequencing technologies have profoundly altered global views of fungal diversity and evolution. High-throughput sequencing methods are critical for studying fungi due to the cryptic, symbiotic nature of many species, particularly those that are difficult to culture. However, the low coverage genome sequencing (LCGS) approach to phylogenomic inference has not been widely applied to fungi. Here we analyzed 171 Kickxellomycotina fungi using LCGS methods to obtain hundreds of marker genes for robust phylogenomic reconstruction. Additionally, we mined our LCGS data for a set of nine rDNA and protein coding genes to enable analyses across species for which no LCGS data were obtained. The main goals of this study were to: 1) evaluate the quality and utility of LCGS data for both phylogenetic reconstruction and functional annotation, 2) test relationships among clades of Kickxellomycotina, and 3) perform comparative functional analyses between clades to gain insight into putative trophic modes. In opposition to previous studies, our nine-gene analyses support two clades of arthropod gut dwelling species and suggest a possible single evolutionary event leading to this symbiotic lifestyle. Furthermore, we resolve the mycoparasitic Dimargaritales as the earliest diverging clade in the subphylum and find four major clades of Coemansia species. Finally, functional analyses illustrate clear variation in predicted carbohydrate active enzymes and secondary metabolites (SM) based on ecology, that is biotroph versus saprotroph. Saprotrophic Kickxellales broadly lack many known pectinase families compared with saprotrophic Mucoromycota and are depauperate for SM but have similar numbers of predicted chitinases as mycoparasitic.
Environmental sequencing efforts indicate that much undiscovered fungal diversity lies within early diverging clades, but these species are often cryptic and challenging to collect. Molecular tools are important for understanding the biology of these fungi, but due to difficulties associated with culturing, obtaining adequate DNA can be a limiting factor. Our analyses of Kickxellomycotina fungal genomes show that useful genetic data can be obtained from samples with as little as 600 ng of DNA. These data can be used as input for phylogenomic inference as well as preliminary, clade-based comparisons of functional annotations. Our results illustrate the utility of low coverage genomic data for uncovering cryptic species diversity, generating new evolutionary hypotheses, and inferring potential ecological roles.
Introduction
Understanding fungal evolution is challenging due to the paucity of fungal fossils, the cryptic and microscopic nature of many fungi, and the large number of estimated fungal taxa (2.2–3.8 million species—Hawksworth and Lücking 2017). Even the most obvious macroscopic form, the fungal fruiting body, is a visible yet ephemeral portion of a lifecycle that mainly takes place below-ground or in other hidden habitats (e.g., inside another organism as a symbiont). DNA sequencing technology has transformed fungal systematics and simultaneously increased our estimates of fungal diversity. Molecular phylogenetic studies have also revealed prolific convergent evolution in many ecological niches (e.g., mycoparasitism, lichenization, ectomycorrhizal associations) (Grube and Wedin 2016; Ahrendt et al. 2018; Chang et al. 2019) and physical structures (e.g., hypogeous truffle-like forms, losses of flagella) (Bonito et al. 2013; Galindo et al. 2021). Fungal genome sequencing efforts like the 1,000 Fungal Genomes Project (1000.fungalgenomes.org) and the Zygomycetes Genealogy of Life Project (zygolife.org) have made great strides towards an inclusive fungal phylogeny, particularly by increasing taxon sampling among the early diverging fungal lineages. Species of zygomycetes are the earliest diverging fungi to evolve a hyphal growth form and were previously classified within a single phylum (Zygomycota), but phylogenomic analyses demonstrated that they constitute two monophyletic phyla: Mucoromycota and Zoopagomycota (Spatafora et al. 2016), although other phylum-level relationships have been proposed (e.g., Li et al. 2021; Strassert and Monaghan 2022). Mucoromycota comprise primarily plant-associated and saprotrophic species, whereas Zoopagomycota includes mostly parasites of fungi and animals. Zoopagomycota species are less frequently collected, and many are difficult to culture and maintain in the lab, so members of this group have been understudied and less frequently included in phylogenetic analyses (Benny et al. 2016). However, technological advances such as single-cell genome sequencing have generated additional data for challenging parasitic taxa (Ahrendt et al. 2018; Davis et al. 2019). Genome-scale data have also facilitated new insights into the biology of Zoopagomycota fungi by elucidating metabolic pathways and secondary metabolite production (Ahrendt et al. 2018; Tabima et al. 2020), the evolution of plant cell wall degrading enzymes and small secreted proteins (Chang et al. 2015, 2022), and horizontal gene transfer events (Tabima et al. 2020; Wang et al. 2016a). Zoopagomycota are further divided into three subphyla: Entomophthoromycotina (insect parasites), Zoopagomycotina (parasites of fungi and small animals such as nematodes), and Kickxellomycotina (several trophic strategies, see below) (Hibbett et al. 2007; Spatafora et al. 2016). Phylogenomic analyses incorporating these taxa found that Kickxellomycotina and Zoopagomycotina are both monophyletic lineages, whereas Entomophthoromycotina is polyphyletic (Davis et al. 2019; Li et al. 2021). However, most studies have included only a few species from each lineage.
Kickxellomycotina are an interesting target for evolutionary studies because the group includes a monophyletic lineage of mycoparasites (Dimargaritales) and a mixture of nonmonophyletic saprotrophs (Kickxellales, Ramicandelaberales, Spiromycetales) and arthropod gut symbiont lineages (“trichomycetes”) (Asellariales, Barbatosporales, Harpellales, and Orphellales) (Doweld 2014; Tretter et al. 2014; White et al. 2018). Most Kickxellales species are apparently rare in the environment and some taxa have been reported only once or a few times (e.g., Dipsacomyces acuminosporus, Mycoëmilia yatsukahoi, Spirodactylon aureum) (Benjamin 1959, 1961; Kurihara et al. 2004). The exception is the genus Coemansia, which is the most commonly encountered genus of Kickxellales from soil and dung samples (Benny et al. 2016). Species of Dimargaritales parasitize fungal hosts in the Mucoromycota (e.g., Cokeromyces) and Ascomycota (e.g., Chaetomium) via a specialized apparatus that includes a penetration cell (appressorium) and a biotrophic absorption cell (haustorium) (Benjamin 1959, 1961). Dimargaritales species are almost exclusively found on dung rather than soil and are less common than Kickxellales species. Species of Harpellales live as commensals in the digestive tracts of immature aquatic insects where they attach to the host via a specialized holdfast cell or glue-like material (Lichtwardt et al. 2001). Asellariales, Barbatosporales, Harpellales, and Orphellales are generally ecologically and morphologically similar to Harpellales. However, these fungi inhabit different hosts, have some unique morphological characters, and all four arthropod-associated groups are hypothesized to represent three independent evolutionary events leading to arthropod symbiosis (Tretter et al. 2014; Valle and Cafaro 2008; White et al. 2006b, 2018).
Phylogenetic studies based on rDNA or multigene datasets support the sister relationship between Kickxellales and Orphellales and also between Harpellales and Asellariales, whereas Dimargaritales and Ramicandelaberales have been resolved in the earliest-diverging clade (Tretter et al. 2014; White et al. 2006a, 2006b, 2018). However, phylogenetic placement of Dimargaritales has been hampered by limited taxon sampling in genomic studies (Li et al. 2021; Ahrendt et al. 2018; Davis et al. 2019) and topology conflicts in multigene studies (Tretter et al. 2014). Phylogenies of Kickxellales based on rDNA data have also found polyphyly among genera, with the monotypic genera Kickxella (the type genus of the order) and Spirodactylon nested within the speciose genus Coemansia (Chuang et al. 2017; White et al. 2018). Additionally, rDNA data were not able to fully resolve relationships among some Coemansia taxa (Chuang et al. 2017).
Recent comparative genomic analyses indicate that species within different lineages of Kickxellomycotina have vastly different genome sizes and predicted secondary metabolites (Ahrendt et al. 2018; Tabima et al. 2020). Kickxellales and the mycoparasites in Dimargaritales have notably smaller estimated genome sizes (∼20–30 Mb) than the insect-symbiotic Harpellales (∼70–100 Mb). Dimargaris cristalligena (Dimargaritales), Linderina pennispora (Kickxellales), and Martensiomyces pterosporus (Kickxellaes) had more predicted secondary metabolites than all other analyzed Zoopagomycota except for species of Basidiobolus (Entomophthoromycotina). In contrast, Coemanisa mojavensis, C. reversa, and C. spiralis (Kickxellales) had among the fewest secondary metabolites (Tabima et al. 2020). The predicted secondary metabolite classes were also variable among the different lineages, with D. cristalligena having more nonribosomal peptide synthetases (NRPS) and Kickxellales having more polyketide synthases (PKS) (Tabima et al. 2020). Ahrendt et al. (2018) analyzed genomes from various mycoparasites and found that some subtilase genes of D. cristalligena formed a distinct group, whereas the remaining sequences grouped with other Zoopagomycota taxa. Similar to other mycoparasites, D. cristalligena lacked several enzymes usually present in the metabolic pathways of related saprobic fungi (e.g., enzymes in the thiamine synthesis and sulfate reduction pathways). Finally, Zoopagomycota fungi are generally assumed to be haploid, but single-nucleotide polymorphism comparisons between single-cell genomes versus genomes generated from multiple cells indicated that D. cristalligena is likely not haploid (Ahrendt et al. 2018). These findings point to the uniqueness of D. cristalligena relative to other Kickxellomycotina and illustrate various genomic features among the different lineages which might be related to their diverse ecological niches.
Although full coverage reference sequences paired with transcriptomic data are the ultimate goal for genome sequencing, this requires large amounts of pure, high-quality DNA and RNA. Single-cell genomics methods are a useful alternative for unculturable or fastidious species, but this approach is relatively expensive, sensitive to contamination, and prone to various sequencing artifacts (Pinard et al. 2006; Sabina and Leamon 2015; Gawad et al. 2016). An alternative approach is low coverage genome sequencing (LCGS) using a high-throughput, short-read technology, such as Illumina (e.g., Zhang et al. 2019; Liimatainen et al. 2022). Here we define LCGS as genome sequencing based on short-read technology without a corresponding transcriptome or long read scaffold backbone. We do not include the number of reads per base in our definition due to the small expected genome size of our species; therefore, we anticipate high reads per base coverage that may not correspond to a high-quality assembly (e.g., low BUSCO complete scores, large number of contigs). The quality of assembly can also depend on the repetitive sequence content, which is quite low for the Kickellomycotina (2–6%) (Ahrendt et al. 2018; Amses et al. 2022; Chang et al. 2015; Chang et al. 2022; Mondo et al. 2017; Wang et al. 2016a, 2016b). There are several conditions under which LGCS would be a more appropriate choice than other genome sequencing methods, including: 1) phylogenetic reconstruction is the main goal of data analysis, 2) the genome sizes of the target species are small and high coverage can be achieved with short read sequencing, and 3) many isolates will be sequenced (and therefore cost is an important consideration). The LGCS method has proven useful in phylogenomic studies of another zygomycete lineage, Mortierellaceae (Mucoromycota) (Vandepol et al. 2020). Most Mortierellaceae species have a ∼40 Mb genome and conflicts between morphology and molecular phylogenies had caused taxonomic confusion within the group (Petkovits et al. 2011). A combination of LGCS and multigene data from 318 Mortierellaceae isolates was used to resolve the phylogeny and disentangle several longstanding taxonomic problems (Vandepol et al. 2020).
Kickxellomycotina are an ideal candidate for phylogenomic analyses using LGCS. Most saprotrophic Kickxellomycotina species have genomes of ∼15–25 Mb, many species can be grown in pure or dual cultures, and relationships within and between several major lineages remain unresolved. We used Illumina sequencing to generate LGCS data for 171 Kickxellomycotina isolates (including 158 Coemansia representatives) and we also used PacBio to generate a full coverage, reference genome (including a transcriptome) for Pinnaticoemansia coronantispora CBS 131509 (Kickxellales). We had several hypotheses regarding methodology, phylogenetic topology, and functional annotation. As far as methodology, we hypothesized: 1) we would achieve high coverage due to small predicted genomes thus allowing us to retrieve hundreds of marker genes for phylogenetic reconstruction and preliminary functional analyses, and 2) reconstruction using different marker sets (i.e., BUSCO vs. Orthofinder) and analysis methods (i.e., maximum likelihood vs. coalescent-based) would yield the same topology due to the high number of orthologs recovered. Regarding phylogenetic reconstruction, we hypothesized: 1) Coemansia would be polyphyletic (with Kickxella and Spirodactylon nested within), 2) species complexes within Coemansia and the placement of Dimargaritales and Ramicandelaberales would be resolved, and 3) arthropod-associated taxa would remain as separate monophyletic clades (as found previously). For functional annotation, we hypothesized: 1) all species of Dimargaritales would have greater numbers of predicted secondary metabolites and proteases than other Kickxellomycotina, 2) Coemansia species would be depauperate in these enzyme predictions, and 3) the enzymatic profiles of Coemansia species would be consistent with a saprotrophic lifestyle (i.e., higher proportion of carbohydrate active enzymes than proteases).
Results
Nine-gene phylogeny
To create a dataset that could incorporate sequences from species lacking genome data, we mined our LGCS data (supplementary table S1, Supplementary Material online) for a set of eight marker genes (18S, 28S, β tubulin, EF1α, MCM7, RPB1, RPB2, and TSR1) used by Tretter et al. (2014) (plus one additional gene, actin) using the aTRAM (Allen et al. 2015) and PHYling pipelines (Stajich, http://github.com/stajichlab/PHYling_unified). The PHYling pipeline recovered most of the nine target genes from the majority of isolates in the LGCS dataset. Of the 171 analyzed genomes, PHYling identified RPB1 from the most isolates (n = 171) and EF1α from the fewest (n = 147). Using the aTRAM pipeline, 18S and 28S rDNA data were recovered from all Coemansia and Kickxella isolates, but not from the Dimargaritales. Only host genes were recovered from Dimargaritales samples, so no rDNA data were included for those isolates. After adding sequences downloaded from GenBank, the final alignment contained 227 representatives of Kickxellomycotina. Once unaligned regions were excluded, the final concatenated alignment contained 5,571 characters and 24.1% missing data. A few taxa (e.g., Myconymphaea yatsukahoi) only had rDNA sequences available. The best Maximum-Likelihood (ML) tree is shown in fig. 1. Four main clades of Coemansia were recovered and we refer to them as Coemansia, Kickxellales1, Kickxellales2, and Kickxella. The subtending nodes of the Coemansia, Kickxellales1, and Kickxellales2 clades were unsupported. Spirodactylon aureum NRRL 2810 was placed within the Coemansia clade (sister to Coemansia sp. RSA 552), recapitulating the topology of the rDNA tree from Chuang et al. (2017). The isopod-associated Asellaria ligiae (Asellariales) and blackfly-associated Barbatospora ambicaudata (Barbatosporales) were placed as early diverging branches within the insect-associated Harpellales clade and the Harpellales + Asellariales + Barbatosporales were recovered as sister to the coprophilous Spiromycetales. The stonefly-associated Orphellales were sister to the Harpellales + Spiromycetales. The mycoparasitic Dimargaritales were the earliest diverging lineage in the Kickxellomycotina, followed by the putatively saprotrophic Ramicandelaberales as sister to all other orders. Although the Dimargaritales were recovered as a monophyletic group, Dimargaris and Dispira were polyphyletic. Taxonomic emendations are in the supplemental files, Supplementary material online.

RAxML tree inferred from seven protein coding genes (actin, β tubulin, EF1α, MCM7, RPB1, RPB2, TSR1) and two rDNA (18S and 28S) genes. The LG model was used for amino acids and GTRCAT for nucleotides. Bolded branches had ≥75% bootstrap support. Major clades are outlined with boxes, isolates represented by low coverage genome data are highlighted and data downloaded from the JGI or GenBank are in black. The arrows highlight the placement of S. aureum as well as the type species of Coemansia and Kickxella. The star highlights the branch subtending the insect-associated groups (Asellariales, Barbatosporales, Harpellales, Orphellales). Images depict examples of host organisms of selected clades.
Genome assembly
Low coverage genome data were obtained for 171 isolates, including 158 isolates originally identified as Coemansia and six Dimargaritales, with 18 of the 171 being type cultures. Assembly quality was variable (fig. 2, supplementary table S1, Supplementary Material online), with the number of contigs ranging from 273 to 10,284, BUSCO completeness ranging from 38.8% to 91.8% (median 73.2), and average coverage ranging from 17.9 × to 827.5 × (an outlier), median 181×. Genome assembly length was relatively consistent for the Kickxellales species, with most in the expected ∼15–25 Mb range. The assembly size for the Dimargaritales species was larger, up to 50.78 Mb (Tieghemiomyces parasiticus), even after VizBin filtering (supplementary fig. S1). Lower quality (i.e., more fragmented) and higher quality assemblies were more-or-less randomly distributed across the tree (supplementary fig. S2, Supplementary Material online). Correlation plots between the genome assembly variables showed that several variables were significantly correlated (fig. 3). The number of predicted genes was significantly positively correlated with the number of contigs and BUSCO fragmented and duplicated scores. The BUSCO duplicate score was also positively correlated with the BUSCO complete scores. Average coverage was positively correlated with both the BUSCO complete scores and the number of PHYling markers recovered but was negatively correlated with the number of contigs in the assembly and the number of predicted genes. The BUSCO fragmented scores were negatively correlated with the number of PHYling markers recovered.

Genome assembly summary for the low coverage genome data, including number of contigs in the assembly, total length of the assembly (in megabases, Mb), the number of fragmented and complete genes from BUSCO analyses, average coverage, number of predicted genes from the annotation, and number of marker genes recovered by PHYling out of a total of 758.

Correlation matrix showing pairwise Pearson correlation values for the relationships between average coverage, contig count, BUSCO scores (percent complete, fragmented, and duplicated), number of predicted genes, and number of marker genes recovered by PHYling out of a total of 758. The correlation coefficient is shown and the asterisks indicate significant values (*** = P < 0, ** = P < 0.001, * P < 0.01, . = P < 0.05). The line indicates the regression line with shaded 95% confidence intervals. Graphs along the diagonal show the distribution of each variable. Coemansia sp. IMI 209128 was an outlier in the average coverage (827.5 × coverage) so it was removed prior to analyses.
Phylogenomic reconstruction
We used two different reconstruction methods including FastTree approximately ML (Price et al. 2010) and ASTRAL coalescent-based analyses (Sayyari and Mirarab 2016; Zhang et al. 2018a). In addition, we used OrthoFinder (Emms and Kelly 2019) to generate an alternate set of orthologs that was analyzed with FastTree. Single-gene alignments and gene trees from the BUSCO ortholog set were evaluated using PHYkit (Steenwyk et al. 2021). Correlation plots showing relationships between alignment and gene tree variables are shown in supplementary fig S3, Supplementary Material online. The mean bipartition support of the gene trees was most strongly positively correlated with alignment length, followed by the percent of variable sites and the percent of parsimony informative sites. Most alignments were 1500 bp long or less and included >100 taxa. Alignments with more spurious taxa (defined as taxa on branches that are ≥ 20 times the median length of all branches, Shen et al. 2018) generally included fewer total taxa than those with fewer spurious taxa. Likewise, trees with fewer spurious taxa had higher mean bipartition support. Greater numbers of spurious taxa were also associated with lower percentages of variable and parsimony informative sites and lower treeness scores. The majority of the dataset had no spurious taxa (supplementary fig. S3, Supplementary Material online).
The FastTree phylogeny based on the concatenated amino acid alignment of 511 BUSCO marker genes and 193 taxa is shown in supplementary fig. S4, The ASTRAL coalescent-based species tree was made from the best ML trees from 461 of the 511 gene trees. There were 50 fewer input gene trees than alignments due to the poor quality of some gene trees (e.g., too few taxa, no resolution among clades). The same four main clades of Kickxellaceae found in the nine-gene phylogeny were also recovered with this larger dataset using both the concatenated ML and coalescent-based analyses. The topologies of the ASTRAL and concatenated ML trees were similar overall to the nine-gene phylogeny (fig. 1) except that Spiromycetales were placed as sister to the “other Kickxellales” + Kickxellaceae clades (rather than sister to Harpellales). All branches received full support in both the ASTRAL and concatenated analyses, except where noted (fig. 4, supplementary figs. S4 and S5, Supplementary Material online). ASTRAL quartet scores for the deep nodes of the tree indicated that a large proportion of gene trees had alternative topologies (fig. 4). Nonetheless, in all cases the main topology received ≥98 local posterior support.

ASTRAL coalescent-based species tree compiled from 461 RAxML gene trees. Major clades are indicated by boxes and isolate names in bold are type cultures. Isolates in black text are high-quality reference genomes downloaded from the JGI or GenBank, whereas those that are highlighted are low coverage genomes. Arrows indicate the type species for the genera Coemansia and Kickxella. All unlabeled branches are fully supported. Because ASTRAL analyzes quartets, each node has three possible topologies. Pie charts at selected nodes represent the proportion of gene trees with the main topology (q1), alternate topology 1 (q2), and alternate topology 2 (q3).
The Orthofinder analyses found that 98.3% of genes (out of 1,370,791 total for all isolates) were placed among 21,993 orthogroups. One percent of genes were placed in species-specific orthogroups, whereas 1.7% of genes were unplaced. Seventy-eight orthogroups had representative genes from 193 taxa and the concatenated alignment was comprised of 101 orthogroups. Unlike the ASTRAL and concatenated ML trees, the OrthoFinder analyses (fig. 5) recovered the same placement of Spiromycetales as the nine-gene phylogeny (sister to Harpellales). However, the remaining topology of the OrthoFinder tree was the same as the other trees. Venn diagrams depicting the overlap in orthogroups that were detected in each clade found 5,263 (50.4%) orthogroups in common among the Kickxellaceae clades (supplementary fig. S5A, Supplementary Material online) but only 3,648 (18.5%) orthgroups in common among all of the Kickxellomycotina taxa (supplementary fig. S5B, Supplementary Material online). The Kickxellaceae clades had between 883 (8.5%) and 1,329 (12.7%) clade-specific orthogroups (supplementary fig. S5A, Supplementary Material online), whereas the other orders of Kickxellomycotina had between 805 (4.1%) and 6,173 (31.3%) clade-specific orthogroups (supplementary fig. S5B, Supplementary Material online). The rarefaction plot of sampled orthogroups by clade (supplementary fig. S6, Supplementary Material online) showed that sampling for the Coemansia clade was the most complete, followed by the Kickxellales2 clade. The remaining clades were undersampled in comparison.

OrthoFinder phylogeny based on 101 orthogroups aligned with MUSCLE and reconstructed with FastTree Approximately ML. All unlabeled branches are fully supported. Isolates in black text are high-quality reference genomes downloaded from the JGI and GenBank, whereas highlighted text are low coverage genomes. Arrows indicate the type species for the genera Coemansia and Kickxella.
Genome annotation
Functional annotation of secondary metabolites (SM), select proteases, and CAZymes were analyzed for each major clade (fig. 6). All four of the Kickxellaceae clades (i.e., Coemansia, Kickxellales1, Kickxellales2, and Kickxella) had similar types and numbers of SM (fig. 6A). Each of these four clades had low numbers of NRPS, NRPS-like, siderophores, terpenes, and beta-lactone-containing protease inhibitors. The number of identified subtilases and fungalysins (0–9) was likewise consistent across these four clades. The “other Kickxellales” also had few SM identified but varied widely in the number of proteases and subtilases. As expected based on the one previously analyzed genome sequence (Ahrendt et al. 2018), the mycoparasitic Dimargaritales had much higher numbers of NRPS and NRPS-like SM, as well as higher numbers of subtilases. However, Ramicandelaber brevisporus (putative saprotroph) had the highest number of subtilases of any isolate. The number of fungalysins detected in the mycoparasitic Dimargaritales genomes was not much greater than for the “other Kickxellales” (fig. 6A). The arthropod-associated Harpellales had low numbers of all SM and fungalysins, but relatively high numbers of subtilases. The most fungalysins (n = 19) were detected from D. acuminosporus (putative saprotroph, “other Kickxellales”), followed by Tieghemiomyces parasiticus (n = 16) and Dimargaris xerosporica (n = 12) (mycoparasites, Dimargaritales). Tieghemiomyces parasiticus had the most subtilases (n = 40) followed by D. cristalligena RSA 1219 (n = 28) (Dimargaritales). No beta-lactone containing protease inhibitors were predicted for any isolates outside of the four Kickxellaceae clades.

(A) Box plots of predicted secondary metabolites and selected proteases for the major clades of Kickxellomycotina. (B) Bar graphs showing the numbers of genes annotated to various CAZyme families for each of the major clades. Note that the y axis scale differs for each plot in A and B and that siderophores are mostly encoded by NRPS domains. CBM, carbohydrate binding module.
The CAZyme profiles were also similar across the four Kickxellaceae clades (fig 6B). Although the total numbers differed due to sampling intensity and varying assembly quality within each clade, the relative proportion of each CAZyme family was consistent. Of the families containing known cellulases and pectinases, only glycoside hydrolase (GH) family GH5 and GH3 were detected. However, GH5 was one of the most abundant families among the Kickxellaceae clades. None of the major pectinase-containing gene families (GH28, GH53, GH93), polysaccharide lyases (PL) (PL1, PL3, PL4, PL11), or carbohydrate esterases (CE) (CE8 and CE13) were detected from the Kickxellaceae clades. Kickxellaceae isolates had unexpectedly high numbers of chitinases and laccase-like oxidases, but completely lacked any PL (supplementary table S1, Supplementary Material online). In the entire low coverage dataset, only Linderina, Mycoëmilia, and Spiromyces had any PL identified. The Dimargaritales had lower numbers of CAZymes than Kickxellales across all families of enzymes except for carbohydrate binding module 18 and auxiliary activity family 11, both of which have known chitinase functions (Hartl et al. 2012).
We plotted the CAZyme:protease ratio of each major lineage (fig. 7) to compare results between lineages and also compare our results to those obtained by Ahrendt et al. (2018), who found parasitic species generally had a higher ratio of proteases to CAZymes. The linear model tests found significant relationships between these variables for all clades except “other Kickxellales”, a nonmonophyletic group for which there were only five representatives. Furthermore, the R2 values were 0.6267 or greater for three of the Kickxellaceae clades, Harpellales, and Dimargaritales, but the correlation was lower for the Coemansia clade (R2 = 0.3663).

Linear regressions for the number of CAZymes (as predicted by InterProScan against the dbCAN database) versus the number of proteases (as predicted by InterProScan against the MEROPS database) for low coverage genome data by clade. The R2 value is given for each association and bolded values are statistically significant (*** = P < 0, ** = P < 0.001, * P < 0.01).
Discussion
LCGS data for phylogenomics and comparative analyses
We used LCGS methods to generate data for 171 Kickxellomycotina isolates, including six mycoparasitic species of Dimargaritales grown from dual cultures that included both the parasites and their host fungi. Our assembly statistics (figs. 2 and 3) and alignment statistics from the BUSCO marker genes (supplementary fig. S3, Supplementary Material online) reveal several interesting features. First, there was a wide variation in quality of both the genome assemblies and the gene alignments. Although average coverage was significantly negatively correlated with the number of contigs and BUSCO fragmented scores, the correlation (∼ −0.23 for both) was weaker than anticipated (fig. 3). Zhang et al. (2018a) also found a positive correlation between sequencing coverage and BUSCO complete scores, but with high variation among taxa depending on genome size. These findings demonstrate that even for genomes that are small and in a similar size range (∼15–30 Mb), higher coverage does not automatically result in a high-quality assembly. Despite having a median coverage of 181×, our BUSCO complete scores had a median of 73.2. One possible explanation for this discrepancy is amplification and sequencing bias resulting in unequal representation across the genome, which is significantly affected by G:C ratio and can vary based on the specific kits and methods used (e.g., Rhodes et al. 2014, Sato et al. 2019, Modlin et al. 2021). Future work evaluating these biases for fungal genomes specifically will help improve methodology and interpret patterns of coverage versus assembly.
Second, the number of predicted genes was significantly positively correlated with the BUSCO duplicate scores, BUSCO fragmented scores, and contig count, but negatively correlated with average coverage and BUSCO complete scores (fig. 3). This suggests that some of the more fragmented assemblies likely had more erroneous gene predictions. However, several of the isolates with the highest BUSCO duplicate scores were species of Dimargaritales that also had relatively low fragmented scores (Dispira simplex: 59.4% dup., 2.4% frag., Tieghemiomyces parasiticus: 66.4% dup., 2.4% frag.). Dispira simplex and T. parasiticus were the only two isolates with >50% duplicate scores. The one previously published Dimargaritales genome (D. cristalligena RSA 468) was suggested to be nonhaploid (Ahrendt et al. 2018); therefore, it seems probable that these high duplicate scores for the Dimargaritales may indicate a nonhaploid state for these taxa. Although there are still relatively few reports of whole-genome duplications across Fungi (Albertin and Marullo 2012), there is evidence for this phenomenon among Mucoromycota fungi (Ma et al. 2009; Corrochano et al. 2016). Wang et al. (2018) also suggested genome duplication as a possible explanation for the much larger genome sizes of Harpellales gut fungi compared with other Kickxellomycotina. Recently, estimates of heterozygosity based on genome sequence data suggested that several Zoopagomycota species were at least diploid, including C. reversa, L. pennispora, M. pterosporus, and R. brevisporus (Amses et al. 2022).
Finally, although only 36.3% of our assemblies had BUSCO complete scores ≥80% (supplementary fig. S2, Supplementary Material online), we were still able to retrieve hundreds of marker genes from even the poorest assemblies for phylogenomic reconstruction (fig. 2). Quality among the 511 marker gene alignments was also variable (supplementary fig. S3, Supplementary Material online), but the different phylogenetic reconstruction methods nonetheless largely resulted in the same topologies for the various phylogenies we reconstructed (fig. 4, supplementary fig. S4, Supplementary Material online). The LCGS data also worked well as input for OrthoFinder analyses, which provided a different set of loci for reconstruction (fig. 5). Interestingly, the OrthoFinder analysis was the only method that recovered the same relationships between “other Kickxellales” and placement for Spiromycetales (sister to Harpellales) as the nine-gene dataset (which had a more complete taxonomic representation), supporting the idea that OrthoFinder analyses may be more robust to missing taxa in addition to missing genes (Emms and Kelly 2015).
Phylogenetic relationships among Kickxellomycotina
Our nine-gene phylogeny reconstructed some unexpected relationships among the trichomycetes (Asellariales, Barbatosporales, Harpellales, Orphellales) and the Spiromycetales. Contrary to the eight-gene analyses by Tretter et al. (2014) which resolved the insect-symbiotic Kickxellomycotina in several clades, our analyses recovered a monophyletic clade of insect gut fungi, with Asellariales, Barbatosporales, Harpellales, and Orphellales resolved as one evolutionary unit (fig. 1). Spiromycetales are nested within the gut fungi clade in our analyses, rather than as a separate clade as found previously (Tretter et al. 2014; White et al. 2006b). Species of Spiromyces and Mycoëmilia scoparia are considered saprotrophic due to their growth in axenic culture, but their trophic modes require further empirical testing. Mycoëmilia scoparia was originally isolated from soil containing dead isopods and grew vigorously on several different nutrient media (Kurihara et al. 2004). This rapid growth in axenic culture suggests that M. scoparia is a saprobe, but there is wide variation in the saprobic ability of insect-symbiotic fungi in Kickxellomycotina. For example, some Harpellales gut fungi that are considered obligate symbionts of insects can grow well in axenic culture (i.e., Smittium spp., Wang et al. 2017), whereas Asellariales and Orphellales gut fungi that inhabit isopod and stonefly nymph hosts have not been successfully cultured (Valle and Cafaro 2008; White et al. 2018). It is possible that M. scoparia is an intermediate form that is a facultative symbiont of isopods with saprotrophic capabilities. A recently described genus and species, Unguispora rhaphidophoridarum, matches this intermediate form which the authors term an “amphibious” life cycle (Ri et al. 2022). Unguispora adheres to the proventriculus of cave crickets and produces secondary spores shed into the cricket gut which subsequently grow saprophytically on cricket dung (Ri et al. 2022). However, U. rhaphidophoridarum was placed within the Kickxellaes (sister to Linderina) rather than the Spiromycetales in the molecular phylogeny. On the other hand, Spiromyces spp. have only been isolated from rodent dung, and they exhibit fastidious growth in culture but improved growth in mixed cultures containing bacteria and other dung-associated organisms (Benjamin 1963; O’Donnell et al. 1998). Interestingly, S. aspiralis had a higher protease:CAZyme ratio compared with M. scoparia (fig. 6), which is a profile more similar to biotrophs, but additional data are needed to interpret this finding.
If our topology more accurately reflects the relationships among these orders of arthropod-associated and putative saprotrophic fungi, then it implies a different evolutionary scenario than what was inferred by Tretter et al. (2014). The topology recovered by Tretter et al. (2014) suggested that the arthropod gut fungi likely evolved independently at least three times within Kickxellomycotina. In contrast, our topology suggests alternative hypotheses: 1) a single origin of biotrophic gut fungi with a putative reversion to saprotrophy (Spiromycetales), or 2) two independent origins of gut fungi with the Spiromycetales retaining the ancestral saprotrophic state. Either of these scenarios would be more parsimonious than three independent origins of biotrophy with arthropods. However, given the poor taxon sampling of taxa in the “other Kickxellales”clade (supplementary fig. S6, Supplementary Material online) and a lack of sequence data for many arthropod gut fungi, inference about the evolutionary origins of these fungi will benefit from additional genome sequencing. Future sequencing efforts paired with ancestral state reconstructions will help to resolve these questions.
One of the other main hypotheses we aimed to test was regarding the relationship between Dimargaritales, Ramicandelaberales, and the Kickxellales. Previous analyses suggested that either Dimargaritales and Ramicandelaberales were sister taxa (Tretter et al. 2014) or that Dimargaritales were the earliest diverging lineage in Kickxellomycotina and that Ramicandelaberales was sister to the remaining taxa (Davis et al. 2019). However, these previous phylogenomic analyses (e.g., Ahrendt et al. 2018; Chang et al. 2015; Davis et al. 2019; Spatafora et al. 2016) included sampling from across Kingdom Fungi and had limited taxon sampling from Kickxellomycotina. In an eight-gene analysis focused on Kickxellomycotina, Tretter et al. (2014) had greater taxon sampling, but were still unable to resolve the placement of Dimargaritales and Ramicandelaberales. Additionally, we aimed to test the topology of the species complex comprised of several Coemansia species found by Chuang et al. (2017). All analyses of our 171 low coverage genomes resolve Dimargaritales as the earliest diverging lineage in the subphylum and Ramicandelaberales as sister to the rest of Kickxellomycotina (figs. 1, 4, 5, supplementary fig. S4, Supplementary Material online). As far as the species complex, Coemansia pectinata (IMI 142377), C. furcata, and C. “aciculifera” from Taiwan were all placed in separate, monophyletic clades (Kickxellales2 clade). Unfortunately, we were unable to obtain successful LGCS for C. pennisetoides but the rDNA data included in the nine-gene dataset placed it in a separate clade from the rest.
Among the other saprotrophic species, our analyses revealed that Kickxellaceae isolates are divided among four large, monophyletic clades (“Kickxellaceae clades”). These relationships solidify previous findings that Kickxella and Spirodactylon are nested within Coemansia sensu lato (fig. 1) (Chuang et al. 2017). There were no broad phylogenetic patterns associated with morphology, geography, or substrate across these clades. There is a possibility that some species are endemic to certain regions because a few clades were comprised entirely of isolates from one country (supplementary table S2, Supplementary Material online), but this cannot be confirmed due to sampling bias. Nonetheless, as with Mortierellaceae, LCGS data provided resolution where morphology and rDNA data could not (Vandepol et al. 2020). We have begun the process of reorganizing the taxonomy of Kickxellaceae to reflect these phylogenetic relationships by emending the genera Coemansia and Kickxella and transferring the type of Spirodactylon (S. aureum) to Coemansia (i.e., synonymizing Spirodactylon to Coemansia) (Supplementary Table S2, Supplementary Material online). However, much work remains to be done, including creating new genera for the Kickxellales1 and Kickxellales2 clades, new species descriptions for the many unidentified Kickxellaceae, and reorganization of Dimargaritales genera to recognize monophyletic relationships.
Finally, the orthogroup rarefaction plot (supplementary fig. S6, Supplementary Material online) indicates that all the clades except Coemansia are undersampled. Our sampling includes representatives of all putative saprotrophic genera of Kickxellales (except Myconymphaea). Therefore, the incomplete sampling indicated in the rarefaction plot suggests that there is likely significant undiscovered diversity in these groups. For example, several of the genera within the “other Kickxellales” clade are monotypic and/or are rarely reported in the literature. The apparent rarity of these taxa may be tied to their geographic distributions; many of these fungi are from tropical sites that remain highly undersampled. The monotypic D. acuminosporus was collected from Honduras (Benjamin 1961), Linderina spp. have been collected from China, India, and Liberia (Raper and Fennell 1952; Chang 1967), and the monotypic M. pterosporus is from the Democratic Republic of the Congo (Meyer 1957) (supplementary table S2, Supplementary Material online). Any future microfungal biodiversity surveys in these areas will certainly uncover many new Kickxellomycotina species and lineages.
Functional annotations of genomes from across the Kickxellomycotina phylogeny
Previous reports proposed that some species of Kickxellales may not be saprotrophic. Mycoparasitic interactions have been suggested for C. reversa, which was found growing on Isaria spp. (Ascomycota) (Bainier 1906; Linder 1943) after its original description from rat dung (van Tieghem and Le Monnier, 1873). Martensella corticii is considered a mycoparasite of Vesiculomyces citrinus (=Corticium radiosum) (Basidiomycota) (Linder 1943) and was consistently found only on this host during a large survey of Corticium species (Jackson and Dearden 1948). Although no molecular data or cultures exist for M. corticii, its morphology unambiguously places it within the Kickxellales rather than Dimargaritales due to the formation of multicelled sporocladia with pseudophialides (Linder 1943). Conversely, potential arthropod associations have been previously suggested for other Kickxellales species. For example, Linderina macrospora (NBRC 105416 = BTCC-F30) and C. javaensis (NBRC 105414 = BTCC-F33) grew and sporulated better on media supplemented with aphids (Kurihara et al. 2008) and Pinnaticoemansia coronantispora may inhabit the foregut of earwigs (Dermaptera) (unpublished data, Y. Degawa personal communication). All of these observations led us to investigate differences in the functional annotations of our Kickxellomycotina genomes to better understand the likely trophic modes among the clades.
Proteases and Secondary Metabolites
Fungalysins (MEROPS family M36) are metalloproteases that have been implicated in pathogenesis of opportunistic human pathogens (e.g., Aspergillus fumigatus, Cryptococcus neoformans) and are suggested to degrade extracellular matrix proteins like elastin and keratin (Markaryan et al. 1994; Brouta et al. 2002; Pombejra et al. 2018). Batrachochytrium dendrobatidis, the chytrid pathogen of amphibians, has a large expansion of the M36 gene family that were differentially expressed during different stages of the life cycle (Rosenblum et al. 2008). In plant pathogenic species (e.g., Colletotrichum graminicola), fungalysins function to cleave plant chitinases and may act as effectors (Sanz-Martín et al. 2016; Ökmen et al. 2018). An expansion of genes belonging to the M36 family was also detected in the nonpathogenic Coprinopsis cinerea (Basidiomycota) (Lilly et al. 2007) although the function in this saprotroph is unclear. In our dataset, the mycoparasitic Dimargaritales had more fungalysins than other Kickxellomycotina, which is consistent with the findings that the M36 family is often associated with pathogens. However, the species with the most predicted fungalysins (n = 19) was D. acuminosporus, a putative saprotroph. The D. acuminosporus assembly had a BUSCO complete score of 77.8, a duplicate score of 0.5, and a 6.1 fragmented score, so the assembly was of moderate quality. Due to the correlation between BUSCO fragmented scores and number of predicted genes (fig. 3), some of these predicted fungalysins may be erroneous. On the other hand, little is known about D. acuminosporus because it has only been isolated once from soil (Benjamin 1961; Young 1999), highlighting the need for further studies of the poorly understood ecology of many Kickxellomycotina.
The subtilase family (Pfam PF00082) is diverse and is the second largest serine protease family (Siezen and Leunissen 1997). Within fungi these enzymes have been associated with many different roles. Some of the best-characterized functions have been described for fungal pathogens and parasites (e.g., mycoparasitic and nematode-parasitic Hypocreales, Ascomycota) (Iqbal et al. 2018). Results from large-scale comparative genomics analyses have led to the hypothesis that subtilase gene family expansions are adaptations to utilize animal tissues as a food source, at least among Ascomycota (Muszewska et al. 2011). For example, the entomopathogenic fungus Metarhizium robertsii (Hypocreales) had 48 subtilase genes, which was the most out of 83 analyzed genomes (Li et al. 2017). A comparison of Zoopagomycota subtilases also found expansions among mycoparasitic species (Ahrendt et al. 2018), a result that was recapitulated with our data (fig. 6A). Kickxellales taxa only had a small number of predicted subtilases, but on average had more predicted subtilases than fungalysins. Species of Harpellales had the second highest average number of subtilases. The Harpellales are associated with insects but are considered commensal rather than pathogenic (except in the case of Smittium morbosum, Wang et al. 2013), and the putative role of subtilases in these associations remains unknown.
Fungi are also important producers of secondary metabolites (SM) that perform a variety of functions (e.g., toxins), some of which have application in pharmaceuticals (e.g., antibiotics, antifungals, statins) (Cox and Simpson 2009; Rokas et al. 2018). It has also been suggested that SM can function defensively to deter fungal grazers, such as invertebrates (Kempken and Rohlfs 2009). Most groups of SM are NRPS and NRPS-like, PKS and PKS-like, or terpene cyclases (Macheleidt et al. 2016). Before many genome sequences of early diverging fungi (i.e., Blastocladiomycota, Chytridiomycota, Microsporidia, Mucoromycota, Rozellomycota, Zoopagomycota) were available, little was known about SM in those groups, but they were generally thought to be deficient in SM (Bushley and Turgeon 2010). However, recent analyses of Zoopagomycota fungi revealed that Dimargaris cristalligena RSA 468 (Dimargaritales) had the second highest proportion of predicted SM in Zoopagomycota (after Basidiobolus meristosporus), followed by L. pennispora ATCC 12442 and M. pterosporus CBS 209.56 (Kickxellales) (Tabima et al. 2020). We found support for the hypothesis that Coemansia and Kickxella isolates would have relatively few predicted SM, with increased numbers in other Kickxellales, and the most SM among the Dimargaritales (Ahrendt et al. 2018; Tabima et al. 2020) (fig. 6A). We found that SM predictions were consistent across the Kickxellaceae and “other Kickxellales” clades with only one or two NRPS and NRPS-like clusters identified in all taxa. Almost none of the predicted SM from any isolate had similarity to known clusters (i.e., the antiSMASH “KnownClusterBlast” found no matches). As expected, Dimargaritales had far more predicted SM than all other Kickxellomycotina. Tieghemiomyces parasiticus had the most NRPS and NRPS-like proteins (n = 89), whereas Dispira simplex had the least (n = 25). In comparison, species of Aspergillus (Ascomycota) (characterized as a SM-rich genus) were predicted to have between 39 and 81 SM clusters (Inglis et al. 2013). Kickxellalaceae clades consistently had 1–3 terpenes and 1–5 siderophores. Forty-nine Kickxellalaceae isolates had one beta-lactone containing protease inhibitor detected. No beta-lactone containing protease inhibitors were predicted for any Kickxellomycotina in other clades, so these proteins may be an adaptation specific to Kickxellaceae. Many beta-lactone compounds have characterized antimicrobial and antitumor properties and they can target a wide array of substrates (Robinson et al. 2019), so these enzymes are of particular interest in pharmacological studies.
Carbohydrate Active Enzymes (CAZymes)
To gain a better understanding of the substrates Kickxellales fungi might be able to utilize for nutrition and growth, we analyzed the predicted CAZymes present in each major clade of Kickxellomycotina (fig. 6B). CAZymes are categorized into CE, carbohydrate binding modules (CBM), glycoside hydrolases (GH), glycosyl transferase (GT), polysaccharide lyases (PL), and auxiliary activities (AA). Similar to the results for SM and proteases, the relative proportion of predicted CAZyme families was fairly constant across taxa in Kickxellaceae clades, although the total numbers differed. We were specifically interested in pectinases, cellulases, and chitinases. Although previous analyses have found a complete loss of the pectinase-containing family GH28 in C. reversa NRRL 1564 (Chang et al. 2015) our analyses extended this finding and suggest a complete loss of GH28 across the entire subphylum (supplementary table S1, Supplementary Material online). Indeed, none of the major pectinase families (GH53, GH93, PL1, PL3, PL4, PL11, CE8, CE13) were detected among Kickxellomycotina genomes and no PL families of any kind were found from any isolates except Linderina spp., Mycoëmilia scoparia, and Spiromyces aspiralis. Similarly, out of the main cellulase families (AA9, GH5, GH6, GH7, GH12), only AA9 and GH5 were predicted across all the isolates. However, enzymes in the GH5 family are diverse and facilitate the degradation of other substrates, such as starches and other polysaccharides (Davies and Attia 2021). β-glucosidases from the GH1 and GH3 families are known to degrade cellobiose in solution (Sørensen et al. 2013; Payne et al. 2015) and the GH3 family was detected in nearly all isolates.
Chitinases are important for endogenous cell wall remodeling as well as utilization of exogenous chitin compounds (Seidl 2008; Hartl et al. 2012). However, chitinases are thought to play a role in mycoparasitism, for example among Trichoderma species (Ascomycota) (Benítez et al. 2004). Chitinases are known from families CE4, CE9, GH18, GH19, GH20, GH46, GH75, GH80, and AA11 (Latgé 2007; Seidl 2008; Battaglia et al. 2011). Ahrendt et al. (2018) reported family GH19 from Zoopagomycota for the first time and found that the AA11 family was only detected among mycoparasites. Furthermore, D. cristalligena RSA 468 was unique in having several CBM18 genes. Once again, CBM18 was found almost exclusively among the mycoparasitic Dimargaritales, but Coemansia mojavensis RSA 71 and Spiromyces aspiralis RSA 2271 also had one predicted CBM18 gene. Most isolates had several GH19 and CE9 and also numerous GH46 genes predicted, but GH75 and GH80 genes were absent. However, unlike the results from Ahrendt et al. (2018), we found an expansion of AA11 across all Kickxellomycotina, except in Harpellales which had few predicted AA11 genes (fig. 6B). Chitinases had not been previously characterized for Kickxellales fungi, but we found diverse chitinase enzymes in numbers similar to those found in Rhizopus oryzae (Mucoromycota) (Battaglia et al. 2011). Rhizopus oryzae is a generalist saprotroph capable of utilizing a wide variety of substrates. Finally, we also unexpectedly found numerous AA1 genes predicted in the Kickxellaceae clades. This family of enzymes contains known fungal laccases as well as ferroxidases and laccase-like multicopper oxidases. Fungal laccases are best characterized among Dikarya and can degrade a wide variety of compounds including phenolics (Baldrian 2005). The most well-known laccases are those from Basidiomycota white and brown rot fungi which break down lignin (families AA1_1 and AA2) and hemicellulose (Hage and Rosso 2021). The predictions for Kickxellomycotina did not specify which subfamily the AA1 genes might belong to, but the detection of AA2 (fungal class II peroxidases) was surprising because lignin degradation among early diverging fungi has been poorly characterized and has been considered unlikely (Floudas et al. 2012; Janusz et al. 2017). However, some reports have shown activity and/or the presence of these and other ligninolytic enzymes in Mucoromycota (e.g., R. oryzae—Freitas et al. 2009, Mucor racemosus—Bonugli-Santos et al. 2009) and Chytridiomycota (Lange et al. 2019) but were missing from mycorrhizal Endogonaceae (Mucoromycota) genomes (Chang et al. 2019). Unfortunately, the specific enzyme families involved with lignin decay have not been well characterized in most early diverging fungal species. Interestingly, this is the first report of these putative lignin-degrading enzymes among any Zoopagomycota taxa and therefore warrants further investigation.
The patterns that we detected in the CAZyme:protease ratios (fig. 7) were consistent with the known trophic ecology of Kickxellomycotina. The ratios for the Kickxellales more closely match saprotrophic profiles (with greater CAZymes than proteases), whereas Dimargaritales and Harpellales have the opposite profile which aligns with other parasitic and animal-associated species (Ahrendt et al. 2018). Taken together, these results suggest that Kickxellales (particularly taxa in Kickxellaceae) have generalist saprotrophic capabilities, although the potential for facultative interactions with other fungi or insects cannot be ruled out. Species in Kickxellaceae commonly grow on dung, and the predicted enzymatic profiles fit the utilization of both plant and potentially fungal or insect substrates that are often present in the diets of herbivorous and omnivorous animals. Another interesting niche is within fungal necromass communities, which play an important role in C and N cycling while degrading mycelium (Zhang et al. 2018b). For example, Ramicandelaber was detected among necromass communities in Minnesota and was found to be significantly impacted by necromass substrate quality (Beidler et al. 2020). Whether other Kickxellomycotina are important in fungal decomposition remains to be explored.
Material and Methods
Cultures
Lyophilized cultures were obtained from the University of Florida (R.K. Benjamin (RSA) and G.L. Benny collections), United States Department of Agriculture Research Service Culture Collection (NRRL), the Westerdijk Fungal Biodiversity Institute, Netherlands (CBS), Bioresource Collection and Research Center, Taiwan (BCRC), National Institute of Technology and Evaluation Biological Resource Center, Japan (NBRC), and the Centre for Agriculture and Biosciences International, UK (IMI) culture collections (supplementary table S2, Supplementary Material online). We refer to individual cultures as isolates (e.g., RSA 532 is an isolate of Coemansia). Lyophililzed cultures previously preserved in sealed glass vials were scored with a file, broken open, and dried spores and hyphae of each fungal isolate was placed into ∼20 mL sterile MEYE broth (3 g malt extract, 3 g yeast extract, 5 g peptone, 10 g dextrose, and distilled water (dH2O) up to 1 L). Once hyphal growth was observed in the broth, tissues were transferred to either MEYE agar (as above but with 18 g agar), V8 agar (163 mL V8 juice, 3 g CaCO3, 18 g agar, and dH2O up to 1 L), or YGCH agar (10 g yeast extract, 15 mL glycerol, 15 g casein hydrolysate, 1.0 g K2HPO4, 0.5 g MgSO4•7H2O, 18 g agar, and dH2O up to 1 L) plates supplemented with mixtures of different antibiotics (Benny 2008; Benny et al. 2016). Various antibiotic combinations were used in an attempt to clear bacterial contamination found in a few cultures. Multiple replicate plates were grown for each isolate in either an 18° C incubator or at room temperature, depending on the optimal growth conditions for each isolate. Once each isolate was well established on an agar plate, sporulating hyphae were scraped from the surface and placed in 2× CTAB (cetyltrimethylammonium bromide) buffer for DNA extraction. Most isolates required several replicate plates to obtain sufficient material for DNA extraction. Species identification was based on the identification made by the collector at the time of isolation.
DNA and RNA extraction and Genome sequencing
DNA extraction followed the CTAB protocol of Gardes and Bruns (1993), but with the following modifications: tissues were ground using a micropestle attached to a drill press and subjected to two rounds of freezing and thawing, the phenol:chloroform wash step was followed by an additional chloroform wash step, and samples were left at −20° C overnight for the isopropanol precipitation step. After extraction, DNA was treated with RNase A and quantified by both Nanodrop 2000 spectrophotometer (ThermoFisher Scientific, Waltham, MA) and Qubit 4.0 Fluorometer (Invitrogen, Waltham, MA). The large subunit rDNA (28S) of several representative Coemansia isolates was amplified with primers LROR/LR5 (Hopple and Vilgalys 1994; Vilgalys and Hester 1990) and sent for Sanger sequencing at GeneWiz (South Plainfield, NJ). RNA extraction for Pinnaticoemansia coronantispora was performed using a Zymo Direct-zol RNA kit (Irvine, CA) and TRI Reagent (Sigma-Aldrich, St. Louis, MO). RNA was treated with DNase I, quantified with Qubit using the RNA kit, and stored in a −80° C freezer until shipment to the Joint Genome Institute (JGI).
Most genomic DNA samples were sent to the JGI for either low coverage Illumina (San Diego, CA) sequencing or full coverage reference genome sequencing with PacBio (Pacific Biosciences, Menlo Park, CA) (see details below). A set of 25 samples were sent to Novogene (Sacramento, CA) for low coverage Illumina sequencing (supplementary table S1, Supplementary Material online). At Novogene, sequencing libraries were generated using NEBNext DNA Library Prep Kit (New England BioLabs, Ipswich, MA) following manufacturer's recommendations using 1.0 μg DNA per sample and indices added to each sample. Genomic DNA was randomly fragmented to a size of 350 bp by shearing, then DNA fragments were end polished, A-tailed, and ligated with the NEBNext adapter for Illumina sequencing, and further PCR-enriched by P5 and indexed P7 oligos. The PCR products were purified (AMPure XP system) and resulting libraries were analyzed for size distribution by Agilent 2100 Bioanalyzer and quantified using real-time PCR. Sequencing was performed on the Illumina NovaSeq 6000 sequencing platform with NovaSeq XP v1 reagent kits, following 2 × 150 indexed chemistry.
Library preparation for Illumina sequencing at the JGI utilized plate-based DNA preparation performed on the PerkinElmer (Waltham, MA) Sciclone NGS robotic liquid handling system using Kapa Biosystems (Wilmington, MA) library preparation kit. A Covaris (Woburn, MA) LE220 focused-ultrasonicator sheared 200 ng of sample DNA to 600 bp. Sheared DNA fragments were size-selected by double-SPRI (solid-phase reversible immobilization beads) and then the selected fragments were end-repaired, A-tailed, and ligated with Illumina compatible sequencing adapters from Integrated DNA Technologies (Coralville, IA) containing a unique molecular index barcode for each sample library. Libraries were quantified using Kapa Biosystem's next-generation sequencing library qPCR kit and run on a Roche Diagnostics (Indianapolis, IN) LightCycler 480 real-time PCR instrument. Quantified libraries were then multiplexed with other libraries, and the pool of libraries was prepared for sequencing on the Illumina NovaSeq 6000 sequencing platform using NovaSeq XP v1 reagent kits, S4 flow cell, following a 2 × 150 indexed paired end run recipe.
The Pinnaticoemansia coronantispora CBS 131509 genome was prepared for PacBio sequencing using >10 kb Blue Pippin Size Selection, unsheared preparation. Using SMRTbell Template Prep Kit 1.0 (Pacific Biosciences), 1500 ng of genomic DNA was directly treated first with exonuclease to remove single-stranded ends and then with DNA damage repair mix followed by end repair and ligation of blunt adapters. The final library was size-selected with BluePippin system (Sage Science, Beverly, MA) with a 6 kb cutoff size and purified with AMPure PB beads. PacBio Sequencing primers were then annealed to the SMRTbell template library and sequencing polymerase was bound to them using Sequel II Binding kit 2.0. The prepared SMRTbell template libraries were then sequenced on a Pacific Biosystems’ Sequel II sequencer using v4 sequencing primer, 8 M v1 SMRT cells, and Version 2.0 sequencing chemistry with 1 × 1800 min sequencing movie run times. Filtered PacBio CCS reads were then assembled with Flye version 2.7.1-b1590 (https://github.com/fenderglass/Flye) to generate an assembly and polished with gcpp –algorithm arrow version SMRTLINK v8.0.0.80529 (https://www.pacb.com/support/software-downloads). The genome was annotated using the JGI Annotation pipeline (Grigoriev et al. 2014).
For the transcriptome, plate-based RNA sample prep was performed on the PerkinElmer Sciclone NGS robotic liquid-handling system using Illumina's TruSeq Stranded mRNA HT sample prep kit utilizing poly-A selection of mRNA following the protocol outlined by Illumina (https://support.illumina.com/sequencing/sequencing_kits/truseq-stranded-mrna.html). The total RNA starting material was 1 ug per sample and 8 PCR cycles were run for library amplification. The prepared library was then quantified and sequenced with Illumina NovaSeq 6000 as above. Filtered RNA-Seq reads were assembled into consensus sequences using Trinity v.2.11.0 (Grabherr et al. 2011).
Low coverage genome assembly and annotation
Sequence data were assembled using the Automatic Assembly for the Fungi (AAFTF) pipeline (Stajich et al. 2021). This pipeline trims reads using trimmomatic v0.33 (Bolger et al. 2014), filters contaminating reads (e.g., PhiX) with BBmap v38.16 (Bushnell 2014) and performs de novo assembly with SPAdes v3.9.0 (Bankevich et al. 2012; Prjibelski et al. 2020). The assembly is then further cleaned by removing additional contaminating sequences with sourmash (Brown and Irber 2016) and duplicate contigs with minimap2 (Li 2018), followed by polishing with Pilon (Walker et al. 2014). Assemblies were then annotated using Funannotate v1.8.1 (Palmer and Stajich 2020). Funannotate first masks repetitive regions of the genome with RepeatMasker (Smit et al. 2015) and creates ab initio gene prediction consensus models with EVidenceModeler (Haas et al. 2008) after training Augustus v3.3 (Stanke et al. 2006) using the BUSCO (Simão et al. 2015) fungi_odb10 data set and GeneMark (Ter-Hovhannisyan et al. 2008). Exon locations were inferred from protein alignments against the SwissProt (The UniProt Consortium 2021) database with BLASTX and exonerate v2.4.0 (Slater and Birney 2005). Functional annotations were added by searching against antiSMASH (Blin et al. 2019), dbCAN (Yin et al. 2012), eggNOG (Huerta-Cepas et al. 2019), MEROPs (Rawlings et al. 2018), and Pfam (Finn et al. 2014) databases with HMMER (Eddy 2009) and DIAMOND (Buchfink et al. 2021). Of particular interest were proteases (fungalysins MER0001400 and subtilases PF00082) and secondary metabolites that have been associated with parasitism. Additionally, CAZyme (carbohydrate active enzyme) families containing known chitinases, cellulases, and pectinases were evaluated to gain an understanding of the putative trophic mode of Kickxellales fungi. Box plots and bar graphs of these data were made using the R (R core team 2020) package ggplot2 (Wickham 2016) with the cowplot add-on (Wilke 2020). We also plotted the number of CAZymes versus the number of proteases for each clade. The linear model function in R was used to evaluate the significance of the relationships and calculate the R2 values.
Assembly statistics (e.g., average coverage, contig counts, etc.) were assessed with bbmap and BUSCO. BUSCO outputs scores for the number of complete, missing, fragmented, and duplicate benchmarking genes found in the genome assemblies as a method of estimating completeness and accuracy of the assembly. The predicted number of genes for each genome was based on the annotation outputs. Correlation plots depicting the relationships between these variables were made in R using the GGally extension (Schloerke et al. 2021) of the ggplot2 package.
Filtering contaminant sequences
Genomes from Dimargaritales mycoparasites cocultured on host fungi were expected to be contaminated with host sequences. In order to identify and remove host sequences, taxonomic identification and coverage for assembled contigs was assessed with BlobTools v1 (Challis et al. 2020). The taxonomic and coverage data from BlobTools and the assembly fasta files were then input to VizBin (Laczny et al. 2015). The graphical interface of VizBin allows the user to select sequences of interest by drawing polygons around clusters of sequences, thereby filtering the dataset. For each mycoparasite genome, polygons were drawn to exclude clusters of sequences taxonomically identified as host (i.e., Mucoromycota or Ascomycota) (Supplementary fig. S1, Supplementary Material online). The resulting filtered fasta files were reanalyzed with Funannotate to obtain the corrected annotations and subsequently used in downstream analyses. Likewise, bacterial contamination of a few Coemansia isolates was expected due to observed bacterial growth on some agar plates despite the use of several different antibiotics and transfer attempts. Additionally, we were interested in evidence of putative endohyphal symbionts, as have been identified across Mucoromycota species (Chang et al. 2019; Deveau et al. 2018) but have not been confirmed in any Zoopagomycota. To screen the genomes for bacterial sequences, assemblies were processed with the Autometa pipeline (Miller et al. 2019). The first step is to bin contigs according to their Kingdom-level classification using Prodigal gene prediction (Hyatt et al. 2010) and identity searches against the NCBI NR database. The output fasta files containing contigs binned as eukaryotic were then reannotated with Funannotate as above and used in downstream analyses.
Nine-gene dataset
To build a phylogeny including taxa for which genome data were unavailable (species of Asellaria, Barbatospora, Myconymphaea, Orphellales, Spirodactylon), we obtained reference sequences of seven protein-coding genes (actin, β tubulin, EF1α, MCM7, RPB1, RPB2, and TSR1) and the small subunit (18S) and large subunit (28S) ribosomal DNA from NCBI. These markers (except actin) were generated for many Kickxellomycotina taxa by Tretter et al. (2014) and we wanted to build upon this dataset by capturing these loci from our isolates. The automated Target Restricted Assembly Method (aTRAM) (Allen et al. 2015) was first used to retrieve 18S and 28S rDNA sequences from the low coverage genome data. The aTRAM pipeline creates BLAST-formatted databases for the forward reads of each sample and indexes the corresponding paired reverse reads. A reference sequence is then used as a query against these read databases to retrieve the best matches. This process is repeated iteratively until a de novo contig matching the reference sequence is obtained. Coemansia and Kickxella reference 18S and 28S sequences included those downloaded from GenBank and Sanger sequences generated as part of this study. Resulting contigs were further refined into scaffolds with CAP3 (Huang and Madan 1999) and ragtag (Alonge et al. 2019).
Next, Hidden Markov models (HMM) profiles were built using HMMER v3.3.2 (Eddy 2011; Eddy and the HMMER development team 2020) for the protein coding sequences of actin, β tubulin, EF1α, MCM7, RPB1, RPB2, and TSR1 based on reference sequences. The PHYling pipeline was then used to retrieve matching sequences from the LGCS data. This pipeline uses hmmsearch and hmmalign to select the best match to the models and assumes that the highest hits are approximate to orthologs. The pipeline then creates sequence alignments for each locus. Output fasta files of recovered loci from each of the low coverage datasets were imported into Mesquite (Maddison and Maddison 2021), aligned with MUSCLE (Edgar 2004) and adjusted by eye, with unaligned regions manually excluded. ML inference of the partitioned dataset was performed with RAxML v8 (Stamatakis 2014) using the GTRCAT substitution model for nucleotides, the LG model for amino acids, and the extended majority rule criterion for automatically determining bootstrap replicates. Tree output was visualized using FigTree v1.4.3 (Rambaut 2016) and edited in InkScape v1.0.2 (https://inkscape.org/en/).
Phylogenomic reconstruction
Three separate methods were used for phylogenetic reconstruction: the PHYling pipeline (utilizing FastTree) (Price et al. 2010), OrthoFinder v2.3.8 (Emms and Kelly 2019), and the ASTRAL coalescent-based method (Sayyari and Mirarab 2016; Zhang et al. 2018a). The PHYling pipeline searches the assembled genomes for a set of 758 putatively single-copy markers from the BUSCO dataset (fungi_odb10). Both the PHYling and ASTRAL analyses were based on the same set of 511 recovered (out of 758 total) marker genes. The protein fasta file output for each isolate was searched for the best hit to these markers, and the resulting hits were aligned to the original HMM using hmmalign. Alignments were trimmed with ClipKIT (Steenwyk et al. 2020) and gene trees were reconstructed in RAxML. All individual gene trees output from RAxML were then used as input for analysis in ASTRAL with detailed branch support output. The concatenated alignment was analyzed with FastTree approximately-ML using the LG and Gamma20 models (Price et al. 2010). All 511 BUSCO marker gene alignments (and their corresponding RAxML gene trees) identified by PHYling were analyzed with PHYkit (Steenwyk et al. 2021). Alignments were assessed for length, number of taxa, parsimony informative sites, variable sites, and relative composition variability (lower values may indicate lower composition bias—Phillips and Penny 2003). Gene trees were analyzed for spurious taxa (taxa on branches that are ≥ 20 times the median length of all branches—Shen et al. 2018), treeness (higher treeness values are thought to indicate a greater signal to noise ratio—Phillips and Penny 2003), and bipartition support values.
OrthoFinder was used to identify a different set of marker genes: orthologs identified from among the input taxa. The OrthoFinder analysis was run using DIAMOND for sequence similarity searching and the multiple sequence alignment method for tree reconstruction using MUSCLE and FastTree for reconstruction. The results were analyzed with KinFin v1.0 (Laetsch and Blaxter 2017) to obtain an orthogroup rarefaction plot by clade and Venny v2.1 (Oliveros 2015) was used to create Venn diagrams of orthogroup overlap between clades. Supplementary Table S1, Supplementary Material online lists all isolates, including full coverage reference genomes, that were included in all analyses.
Supplementary Material
Supplementary data are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
Acknowledgements
This work was supported by the National Science Foundation (DEB 1441677 to M.E.S.; DEB 1441715 to J.E.S.; DEB-1441604 to J.W.S.), the University of Florida Department of Plant Pathology and the Institute for Food and Agricultural Sciences, and the USDA-National Institute of Food and Agriculture (FLA-PLP-005289). The work (proposal 10.46936/10.25585/60001062) conducted by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02–05CH11231. We would also like to thank undergraduate researchers who helped along the way: Sophia Torla, Adrian Emery, and Elizabeth Dautel.
Data Availability
All genome data have been deposited to Genbank (accession numbers in supplementary Table S1, Supplementary Material online) and R scripts and other analysis files are available at the Center for Open Science OSF site: DOI 10.17605/OSF.IO/6ZNP7. Cultures not already deposited in collections were submitted to the CBS (Centraal Bureau voor Schimmelcultures, Westerdijk Fungal Biodiversity Institute) collection.