A highly contiguous genome assembly reveals sources of genomic novelty in the symbiotic fungus Rhizophagus irregularis

Abstract The root systems of most plant species are aided by the soil-foraging capacities of symbiotic arbuscular mycorrhizal (AM) fungi of the Glomeromycotina subphylum. Despite recent advances in our knowledge of the ecology and molecular biology of this mutualistic symbiosis, our understanding of the AM fungi genome biology is just emerging. Presented here is a close to T2T genome assembly of the model AM fungus Rhizophagus irregularis DAOM197198, achieved through Nanopore long-read DNA sequencing and Hi-C data. This haploid genome assembly of R. irregularis, alongside short- and long-read RNA-Sequencing data, was used to produce a comprehensive annotation catalog of gene models, repetitive elements, small RNA loci, and DNA cytosine methylome. A phylostratigraphic gene age inference framework revealed that the birth of genes associated with nutrient transporter activity and transmembrane ion transport systems predates the emergence of Glomeromycotina. While nutrient cycling in AM fungi relies on genes that existed in ancestor lineages, a burst of Glomeromycotina-restricted genetic innovation is also detected. Analysis of the chromosomal distribution of genetic and epigenetic features highlights evolutionarily young genomic regions that produce abundant small RNAs, suggesting active RNA-based monitoring of genetic sequences surrounding recently evolved genes. This chromosome-scale view of the genome of an AM fungus genome reveals previously unexplored sources of genomic novelty in an organism evolving under an obligate symbiotic life cycle.


Introduction
Uprooting almost any terrestrial plant reveals the arbuscular mycorrhizal (AM) symbiosis, a mutually beneficial interaction between most land plant species and members of the fungal Glomeromycotina subphylum (Parniske 2008). AM fungi are multinucleate, obligate symbionts that exist in all terrestrial ecosystems (Davison et al. 2015) and engage in symbioses with a wide range of plant species, often simultaneously (Bever 2002). While ecological and molecular mechanistic evidence suggest that the AM symbiosis relies on the reciprocal transfer of organic and inorganic nutrients through a permeable membranous interface (Bonfante and Genre 2010), our understanding of the genomic basis of this symbiotic lifestyle remains limited by the fact that whole-genome sequencing data are available for a limited number of AM species (Trepanier et al. 2005;Kobayashi et al. 2018;Morin et al. 2019;Singh et al. 2019Singh et al. , 2021Sun et al. 2019;Venice et al. 2020;Malar et al. 2021;Montoliu-Nerin et al. 2021;Sahraei et al. 2022). These include genome assemblies of multiple isolates of the model species, Rhizophagus irregularis, and the homokaryotic laboratory strain DAOM197198 ( Fig. 1) (Tisserant et al. 2013;Lin et al. 2014;Chen, Mathieu, et al. 2018;Chen, Morin, et al. 2018;Maeda et al. 2018;Yildirir et al. 2022). The most recent genome assembly of DAOM197198 represented a sizeable step-up in genome contiguity and quality (Yildirir et al. 2022); however, the contig N50 of 2.3 Mb and quantity of gaps in this assembly is lagging behind recent fungal genome assemblies (Chung et al. 2021;Liu et al. 2021). The de novo assembly of a reference genome is a crucial step for the genetic research of a given organism. To best support genomic and transcriptomic research, the ideal resource is a fully sequenced, contiguous genomic assembly with few gaps (Church et al. 2011;Rhie et al. 2021). Recent attention has been paid to the contribution of epigenetic and transposable element landscapes of R. irregularis to the adaptation and evolution of this species (Chaturvedi et al. 2021;Dallaire et al. 2021;Yildirir et al. 2022). The production of a higher-quality genome for R. irregularis will further enable research into the repetitive landscape and the genomic organization of Glomeromycotina fungi and their relatives, providing crucial insights into the biology and evolutionary history of the AM symbiosis.
This study presents a highly contiguous and near-gapless longread assembly of the R. irregularis isolate DAOM197198 achieved using long Nanopore reads, Hi-C data, and manual curation. Nanopore RNA-Sequencing was generated for R. irregularis, producing long reads that span entire transcripts to guide and improve Illumina short-read-based gene model predictions and to enable the annotation of untranslated (UTR) regions, prediction of poly(A) signals, and analysis of poly(A) tail length. Repetitive element, small RNA loci, and DNA cytosine methylome annotations are also provided. These datasets were combined with a tree-of-life scale analysis of gene birth events (Barrera-Redondo et al. 2023), which assigns an evolutionary age to protein-coding genes of R. irregularis and identifies taxonomically restricted genes that have no detectable homologs in other organisms. This analysis identifies molecular functions that are ancestral to the Glomeromycotina and describes an important gene birth event coinciding with their emergence. The chromosomal distribution of genetic and epigenetic features uncovers evolutionarily young regions of the genome that are potential cradles for new genes and small RNA production.

DNA preparation and sequencing
High-molecular-weight DNA was extracted from 2 g of R. irregularis DAOM197198 Grade A spores (Agronutrition) (Schwessinger and McDonald 2017). About 100 mg of ground spore material was resuspended in lysis buffer and processed as indicated. Two successive rounds of cleanup were performed using a 0.45× volume of Ampure XP beads in DNA-Lo-Bind tubes following the manufacturer's protocol. DNA was finally eluted in 50 µL of 10 mM Tris-pH8. DNA quality was assessed by running on a 0.5% agarose gel. Sequencing libraries were prepared using the Oxford Nanopore Rapid DNA sequencing kit SQK-RAD004 and sequenced on MinION flow cells R9.4.1 following the accompanying protocol. Genomic Nanopore reads were basecalled with Guppy Basecalling Software version 5.0.11 + 2b6dbff (Oxford Nanopore Technologies, Limited).
The raw assembly was then trimmed of contigs smaller than 500 bp (removing 2 contigs). Subsequent polishing of this trimmed assembly was carried out using the PEPPER-Margin-DeepVariant pipeline as described in Shafin et al. (2020Shafin et al. ( , 2021. Broadly, the Nanopore reads described above were aligned against the raw, trimmed R. irregularis assembly using minimap2 (parameters: -ax map-ont). About 83.7 Gb of Illumina reads obtained from Maeda et al. (2018) was also aligned against this assembly using BWA-MEM with default parameters (Li 2018). Alignments of the Nanopore and Illumina reads produced variant calls that were corrected in the assembly using the PEPPER-Margin-DeepVariant pipeline and Merfin (Formenti et al. 2022).
To assemble the telomeric regions of this genome, 1964 reads containing the telomeric repeat TTAGGG 8 were extracted from trimmed Nanopore reads. These repeat-containing reads were then used to assemble 62 telomeric contigs using Shasta with parameters as described above, with the exception of --Assembly.consensusCaller Bayesian:guppy-5.0.7-a and --Kmers.k 14. The 62 telomeric contigs were polished using the same polishing pipeline as described above, mapping the initial telomere repeat-containing reads and genomic Illumina reads to the telomeric contigs and polishing using the PEPPER-Margin-DeepVariant pipeline. The full genome contigs and the telomeric contigs were then manually fused based on overlapping sequence identified following minimap2 alignment (parameters: -ax map-ont). The QV score of the raw assembly was Q29.49, increasing to Q32.6 following polishing with PEPPER, and finally Q36.27 after polishing with DeepVariant and fusing with separately assembled and polished telomeric contigs.
The assembly process resulted in the assembly of a complete, circular mitochondrial genome of 70,793 bp. The circularity of the mitochondrial assembly graph was visualized using Bandage ( Supplementary Fig. 1a) (Wick et al. 2015). MitoHifi v.2.2 (Laslett and Canback 2008;Allio et al. 2020;Uliano-Silva et al. 2021) was used to annotate the mitochondrial genome (Fig. 2a). This mitochondrial genome was removed from the nuclear genome assembly for manual curation. Hi-C read data for R. irregularis DAOM197198 (Yildirir et al. 2022) were aligned to the remaining 42 contigs using BWA-mem (Li and Durbin 2009) and the subsequent alignment file was used to produce a PretextView Map (Harry 2020). The PretextView Hi-C contact map and the assembled contigs were manually curated (as described in Howe et al. 2021) to produce chromosome-scale scaffolds (Table 1).

Quality assessment of the R. irregularis DAOM 197198 assembly
The genome assembly was scored by BUSCO version 5.2.2 (Simao et al. 2015) as 95.8% complete using the fungi_odb10 database. In total, 726 complete BUSCOs were identified out of a total of 758 BUSCO groups searched, of which 13 were duplicated. All trimmed Nanopore reads were mapped to the assembly using Minimap2 (parameters: -ax -map-ont) (Li 2018), resulting in the mapping of 1,279,771 reads to the final assembly (99.32% of total trimmed reads). Mosdepth was used to examine the cumulative distribution of read coverage for each contig. Average Nanopore read coverage was highly uniform, between 77 and 85× across all nuclear contigs ( Supplementary Fig. 1b). To assess Illumina read coverage uniformity, BWA-MEM was used to align Illumina genomic DNA reads  to the assembly, with 200,768,646 of 211,520,841 (94.61%) reads mapping successfully.

Genome annotation
Rhizophagus irregularis DAOM197198 RNA samples (plates of 50,000 spores/sample) used for genome annotation and the protocol for the production of rice exudates to treat spore plates were the same as described previously (Dallaire et al. 2021). The Illumina RNA-Seq samples used were an untreated spore plate, a 24-hour rice exudate-treated sample, a 48-hour rice exudate-

Fig. 2.
Nuclear and mitochondrial genome assemblies of Rhizophagus irregularis. a) Circular map of mitochondrial genome with annotated genes (pink), tRNAs (black), and rRNAs (green). b) Hi-C contact map visualized in PretextView. Chromosomes are displayed in size order from left to right . c) Physical map of 32 chromosomes numbered according to size (Mb). Grey coloring of the ideogram highlights contigs that were scaffolded together. Telomeric sequences are represented by dark blue squares at the ends of ideograms. Nanopore read coverage is shown as a purple histogram. treated sample (48e_1), and a sample of R. irregularis-colonized maize root (growth conditions with RNA extraction as described for rice plants in Dallaire et al. 2021). Additional Illumina RNA-Seq samples from a further experiment described in the same publication used for genome annotation were 2 Nicotiana benthamiana root samples colonized by R. irregularis and 2 germinated spore samples. Short-read library preparation, sequencing, and adapter trimming were carried out on paired-end polyA+ RNA by Novogene UK Co. Ltd. with read lengths of 150 bp.
To produce the long-read RNA-Seq data used to refine Illumina-based gene models, Nanopore RNA-Seq was carried out using a sample of R. irregularis DAOM197198 pre-germinated spore plate (50,000 spores/plate) ( Table 2 and described below). About 1 µg of total RNA from 3 samples was individually poly(A) selected using the NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB #E7490). Poly(A)+ concentration and rRNA depletion were assessed using Qubit and TapeStation. Approximately 20 ng from each sample (total of 70 ng of poly(A)-selected RNA) was barcoded using the Nanopore PCR-cDNA barcoding kit (Kit SQK-PCB109). PCR was performed with 14 cycles and 6.5-minute extensions. About 33.3 fmol of each barcoded sample was pooled together and prepared for sequencing on an R9.4.1 flow cell. Sequence reads were demultiplexed and basecalled using Guppy Basecalling Software version 5.0.11 + 2b6dbff (Oxford Nanopore Technologies, Limited). Following basecalling, Pychopper version 2.5 was used to trim the reads and rescue fused reads. These reads were provided as evidence to the update script of funannotate version 1.8.7 (https://zenodo.org/record/4054262#.Yv4hJy8w3AY), which uses PASA (Haas et al. 2003, to refine gene models of the Illumina-based gene annotation; to predict 5′UTR, 3′UTR, and polyadenylation signal sequences; and to extract poly(A) tail sequences. Transposon-related protein domains were removed from the updated gene models, leading to the final Illumina + Nanopore-based gene annotation presented in this study.
Illumina-based and Illumina + Nanopore-based gene models were functionally annotated separately using funannotate version 1.8.7, ran with the BUSCO database "fungi" and the UniProt DB version 2022_01, and with protein domain prediction evidence from (1) (3) Secondary metabolism and transmembrane domain prediction using antiSMASH version 6.0.1 (Blin et al. 2021). Gene annotations were scored by BUSCO with the fungi_odb9 database. GO terms associated with the Illumina + Nanopore-based annotation were processed with g:Profiler's GMT tool, and GO term analyses were performed using the g:Profiler web server (Raudvere et al. 2019) using the token "gp__xfGY_dQeI_yx4," or the GMT file provided as Supplemental File.

Repeat and transposable element annotation
Repeats were modeled using EDTA (parameter --sensitive 1) (Ou et al. 2019). Rhizophagus irregularis multi-copy coding genes sometimes get detected as repetitive and wrongly end up in repeat libraries. Protein domains were, therefore, predicted from repeat consensus sequences using InterProScan 5.55-88.0, and consensus sequences containing gene-related InterPro domains were filtered out. The remaining consensus sequences were used to mask the genome assembly using RepeatMasker (parameters -s -no_is -norna -nolow -div 40) (Smit et al. 2015).

Phylostratigraphy analyses
GenEra (Barrera-Redondo et al. 2023) was run using DIAMOND in ultra-sensitive mode (Buchfink et al. 2021). An E-value threshold of 1E −5 was chosen to balance the detection of distant homologs while minimizing the amount of false positives against the NR database ( Barrera-Redondo et al. 2023 Genes with taxonomic representativeness scores below 30% were flagged as possible contamination or horizontal gene transfer and were not included in subsequent analyses. Some phyloranks were corrected: strain level ranks were moved to species level ("R. irregularis DAOM 197198" to "R. irregularis" and "Linnemannia elongata AG-77" to "Linnemannia elongata" and "Fungi incertae sedis" was moved to the kingdom level "Fungi"). Several phyloranks were collapsed due to insufficient genomic data (Supplementary Fig. 2a) or unresolved phylogenetic placement of subphyla ( Supplementary Fig. 2, b-d). GenEra's homology detection failure test (Weisman et al. 2020;Barrera-Redondo et al. 2023) was run by using the pairwise evolutionary distances from a phylogenomic tree ) to obtain a list of genes in R. irregularis whose ages cannot be explained by gene untraceability from the genus to the kingdom phyloranks.

Chromosomal distribution of genomic features and expression
Nanopore RNA-Seq reads were trimmed of adapters and cleaned with seqclean (Chen et al. 2007) to remove a percentage of undetermined bases, polyA tails, overall low complexity sequences, and short terminal matches. Cleaned sequences were then mapped using minimap2 (options -G max intron length = 3000, -ax, map-ont) (Li 2018). Small RNA-Seq reads were aligned to the genome using bowtie (options --mmap r) (Langmead and Salzberg 2012). Nanopore and small RNA RPKM were calculated using bamCoverage (options --bam --binSize 200 --ignoreDuplicates --normalizeUsing RPKM) (Ramirez et al. 2016). A general additive model was used to regress feature values across chromosome lengths (gam(<gene age or RPKM> ∼ s(Chrom.start, bs = "cs", by = Chrom))), using the R package mgcv v.1.8-40. Gene age fits were plotted using the start position of each gene and RPKM fits were plotted using the start position of every 200 bp bin. Gene ages were randomly permuted 1,000 times, and the mean was plotted using the unchanged start position of each gene. A paired t-test grouped by chromosome was used to test the significance of the observed gene age distributions relative to random permutations.

Sequence data retrieval, alignment, and phylogenetic analyses
A BlastP search was performed using the Saccharomyces cerevisiae protein sequence of FAS1 (CAA82025.1) and FAS2 (CAA97948.1) as the query sequence against fungal genomes. Sequences with >95% coverage and >40% identity were selected, and a total of 147 FAS homologs from 94 species were subjected to alignment and phylogenetic analysis. The Rozellomycota Paramicrosporidium saccamoebae FAS (PJF17744.1) was selected as an outgroup. Amino acids were aligned using MUSCLE5 (Edgar 2021). A maximum likelihood phylogenetic tree was inferred using RAxML-NG with 20 distinct starting trees using the best-fit model (LG + I + G4) selected by ModelTest-NG (Kozlov et al. 2019;Darriba et al. 2020). Bootstrapping converged after 100 replicates, branch support was assessed with Felsenstein's bootstraps, and bootstrapping convergence was tested using the autoMRE criterion within RAxML-NG. The ML tree was rooted using pxrr v1.2 within the phyx package .

De novo assembly of the R. irregularis genome
Assembly using trimmed Nanopore reads resulted in 44 contigs that were polished using Illumina reads . Two of these contigs were filtered out due to their size of <500 bp, resulting in a polished and filtered assembly of 42 contigs ( Supplementary Fig. 1a). The assembly process produced a complete, circular mitochondrial genome of 70,793 bp, within the size range of other AM fungal mitochondrial genomes (Fig. 2a) (Lee and Young 2009;Nadimi et al. 2016). This mitochondrial genome was annotated using MitoHifi and contains sequences encoding transfer RNAs (tRNAS), ribosomal subunits, and genes typically identified on a fungal mitochondrial genome. Manual curation based on Hi-C read alignment to the nuclear genome assembly was used to assign the remaining 42 contigs to 32 chromosomal units (Fig. 2b). Prior to manual curation, contig N50 and L50 were 3,900,757 bp (∼3.9 Mb) and 15, respectively, rising to a scaffold N50 and L50 of 5,085,394 (∼5 Mb) and 13 postcuration (Table 1). Twenty-three of these scaffolds were complete and gapless chromosomes (Fig. 2c). Seventeen of the 32 chromosome-scale scaffolds of R. irregularis were produced telomere-to-telomere, with telomeric repeats of sequence TTAGGG n identified at both 5′ and 3′ ends of the scaffolds, and an additional 14 containing 1 telomere (Fig. 2c). Average Illumina and Nanopore read coverage were highly uniform across all scaffolds, indicating that repetitive sequences are fully resolved (Supplementary Fig. 1b). The 32 chromosomes display extensive macro-synteny to a recent assembly of this species, except for stretches of chromosomes 1 and 5 ( Supplementary Fig.  1c). This assembly suggests a misjoin in a previous assembly of this species, which would result in the potential overestimation of the number of R. irregularis chromosomes (Table 1). Research into the location of centromeric repeats of this symbiotic fungus may aid further analyses into chromosome number and structure of these organisms. The final haploid nuclear assembly following removal of the circular mitochondrial contig is 146,773,001 bp in size.

Genome annotation using short-and long-read sequencing
Following modeling, curation, and masking of repetitive sequences and transposable elements, protein-coding genes were annotated using published Illumina RNA-Sequencing (RNA-Seq) reads from multiple life stages (Table 2). This Illumina-based gene annotation was manually curated to remove transposable elements, leaving 30,230 gene models (Illumina-based gene annotation). Gene models were then refined with long Nanopore RNA-Seq reads, improving the support of exon-intron boundaries by sequencing reads (Fig. 3a, Illumina + Nanopore-based annotation) and increasing gene and exon length (Fig. 3, b and c). Updated gene models were manually curated to remove transposable elements, resulting in a final annotation of 30,209 genes. This gene count is consistent with previous studies into genes encoded by AM fungal genomes (Morin et al. 2019). Long-read data did not change the overall BUSCO score (96.9%) but moved one duplicated BUSCO gene to the single-copy category (Fig. 3d). Functional annotation of gene models indicated that long reads increased the number of genes with assigned Gene Ontology (GO) terms (+101 genes), PFAM domains (+44 genes), InterPro domains (+54 genes), and secretion signals (+11 genes), while the number of biosynthetic genes and CAZymes remained constant (Fig. 3e). Refining gene models with long reads, therefore, resulted in more accurate gene models and a higher number of functionally annotated genes. Examples of updated gene models include glucosamine-6-phosphate isomerase (NAG1) and Crinkler effector 10 (CRN10), 2 genes thought to be involved in arbuscule development and function (Kobae et al. 2015;Voss et al. 2018). Compared to previous accessions, long-read data revealed 2 novel transcript isoforms of NAG1 that contain an additional exon ( Fig. 3f; g17052-T1 and g17052-T2). A misannotated first intron of CRN10 was fixed, and the updated gene sequence is identical to the one described in Voss et al. (2018).

Untranslated regions, poly(A) tails, and the poly(A) signal of R. irregularis
Long RNA-Seq reads provided evidence for untranslated region (UTR) prediction, polyadenylation site detection, and poly(A) tail length analyses; 5′UTR and 3′UTR length distributions have respective means of 116 and 250 nucleotides (nt) (Fig. 3, g and h), which are comparable to the fungal averages (134 and 237 nt, respectively) and within the known ranges of eukaryotic UTR lengths (100-200 nt 5′ UTR, 200-1,000 nt 3′UTR) (Pesole et al. 2001;Mignone et al. 2002;Bruno et al. 2010;Lin and Li 2012). A MEME motif search in the 50 bp preceding the poly(A) tails of 242,742 unique poly(A) sites yielded one significantly enriched hexanucleotide motif, the canonical AAUAAA (Table 3, E-value 1.3e−24) (Bailey and Elkan 1994). This sequence accounts for 56.7% of detected poly(A) sites, indicating high sequence conservation to the mammalian polyadenylation signal, compared to yeast (13.2%), Aspergillus oryzae (6%), Arabidopsis thaliana (10%), and Oryza sativa (7%) ( Table 3) (Graber et al. 1999;Loke et al. 2005;Shen et al. 2008;Tanaka et al. 2011). Additional derivatives such as AUUAAA and AAUAUA were also detected but were not significantly enriched. The distribution of poly(A) tail lengths in spore transcripts ranged from 10 to 473 nt, with a mean of 42 nt (Fig. 3i), which is comparable to the 50 nt average observed in S. cerevisiae using similar methods (Tudek et al. 2021).

A burst of gene novelty with the emergence of Glomeromycotina fungi
A tree of life scale comparative genomics analysis was used to estimate the evolutionary ages of R. irregularis genes, tracing gene birth events to the last universal common ancestor (Barrera-Redondo et al. 2023). This analysis suggests that 34% (n = 10,250) of R. irregularis genes have homologs across taxonomic levels and date back to the origin of cellular organisms (Fig. 4a, all genes). This most ancient phylorank (phylorank 1) is enriched for basic cellular functions and primary metabolic processes such as transcription, translation, and regulation of cell cycle (Supplementary Table 1), which are expected to be conserved across the tree of life. Notably, 2,373 out of 2,533 members of R. irregularis' expanded kinase gene repertoire are found at phylorank 1, consistent with protein phosphorylation as a fundamental mechanism of cell signaling (Supplementary Tables 1 and 2) (Kwon et al. 2019). All phosphate transporters (PT1 to PT7), ammonium transporters (AMT1, AMT2, AMT3), and monosaccharide transporters (MST2, MST3, MST4) are found at phylorank 1 ( Table 4). As may be expected, this analysis suggests that phosphate, nitrogen, and carbohydrate efflux and homeostasis are ancestral molecular functions that emerged long before AM fungi. Our analysis revealed comparable numbers of highly conserved genes in the Glomeromycotina fungi Gigaspora margarita (40%, n = 11,731) and Geosiphon pyriformis (46%, n = 6,875) ( Supplementary Fig. 2a, phylorank 1). The Glomeromycotina, Mucoromycotina, and Mortierellomycotina species analyzed here share similar gene age distributions until the emergence of the Mucoromycota, where each lineage displays their independent historical patterns of gene emergence ( Supplementary Fig.  2a, phyloranks 1 to 5). A peak of gene birth events at the Glomeromycotina phylorank indicates that the emergence of this fungal lineage is marked by a burst of lineage-restricted evolutionary novelties (Fig. 4a, phylorank 6, all genes). One caveat of phylostratigraphy is that gene age is often underestimated because of the inability of pairwise aligners to trace back homologs in outgroups that are too evolutionarily distant. Robust assessment of gene birth events, therefore, relies on testing the null hypothesis of homology detection failure (HDF) in order to achieve high-confidence predictions (Barrera-Redondo et al. 2023). A more stringent analysis taking into account HDF of recently evolved genes confirmed the burst of gene birth in Glomeromycotina (Fig. 4a, phylorank 6, high confidence). Confidently ranked genes born in the Glomeromycotina include an HTH APSES-type transcription factor (g4815), a Zn(2)-C6 fungal-type transcription factor (g25112), an Opy2-like membrane anchor protein (g2640), 2 uncharacterized Crinkler-type effectors (g11050, g27662), a Complex 1 LYR protein (g6617), and many F-box and Leucine repeat genes (Fig. 4b, and Supplementary Tables 2 and 3). Two GO terms related to replication were enriched at the high confidence phylorank 6 and are linked to genes of potential viral origin ( Fig. 4b and Table 5). These genes have putative replication-origin-binding domains (InterPro domain IPR003450) and were most likely acquired through horizontal transfer in the common ancestor of Glomeromycotina and subsequently inherited vertically throughout the whole lineage. Genes born at the emergence of Glomeromycotina may encode functions that were crucial for their evolutionary success and diversification, such as developmental innovation for symbiosis or obligate biotrophy.
Although the ages of most genes at the Mucoromycota phylorank may be underestimated without accounting for HDF, general shifts in protein sequence space can still be captured (Domazet-Lošo et al. 2022). GO term enrichment analyses were performed to investigate molecular functions that are ancestral to Glomeromycotina. GO terms related to ion transport, transmembrane transporter activity, and membrane components are  Table 1, phylorank 5). The genes underlying this enrichment mainly consisted of a group of 32 transient receptor channel subfamily V-like genes with predicted permeability to Ca 2+ (Fig. 4b and Table 6) (Nilius and Szallasi 2014). Innovation in transmembrane ion transport, therefore, precedes Glomeromycotina and may be a feature that marked the evolutionary transition of Mucoromycota fungi.

Variation in gene age along chromosomes reveals that evolutionarily young loci produce abundant small RNAs
To investigate genome-wide patterns of feature distribution, a series of datasets were mapped to R. irregularis chromosomal scaffolds using nonparametric linear regressions. Normalized reads per kilobase per million reads mapped (RPKM) of Nanopore RNA-Seq (full-length, poly(A)-selected) and small RNA-Seq (∼24 nt long) were reported in 200 bp genomic bins, and gene ages (pre-HDF test) were measured across the chromosomal length ( Fig. 6 and Supplementary Fig. 4a). An uneven distribution of gene age was observed for all chromosomes, distinguishing regions enriched with evolutionarily ancient genes (low phyloranks) from regions with evolutionarily young genes (high phyloranks) ( Fig. 6 and Supplementary Fig. 4a). Genomic regions with evolutionarily ancient genes tend to have high poly(A)+ RNA and low small RNA expression levels, and these patterns are reversed in regions with evolutionarily young genes ( Fig. 6 and Supplementary Fig. 4, a-e). However, a small number of genes produce small RNAs (Dallaire et al. 2021), and when examined at the scale of individual genes, per-gene small RNA expression levels did not correlate with gene age (Supplementary Fig. 4d). This led to the conclusion that sequence surrounding young genes, rather than young genes themselves, drive the observed pattern of small RNA expression. Two particular loci of ∼2 Mbps in length were identified (Fig. 6, regions shaded in blue) that collectively contain evolutionarily young coding regions and the most abundant concentration of highly expressed small RNA loci. These data suggest that the genome of R. irregularis presents highly transcribed regions harboring highly conserved genes, and lesser transcribed, small RNA-producing regions with evolutionarily younger genes.

Discussion
The number of fungal species with highly contiguous, long-read, and chromosome-scale assemblies lags behind that of animals and plants (Marks et al. 2021;Rhie et al. 2021). This work presents a chromosome-scale assembly of the symbiotic fungus R. irregularis, isolate DAOM197198, the model species for molecular research into AM fungi. This assembly of 32 chromosomal scaffolds is highly contiguous, with only 10 gaps and a contig N50 of 3.9 Mb. Nuclear chromosomes display a very high synteny with those of a previous assembly of R. irregularis DAOM197198 ( Supplementary Fig. 1c) (Yildirir et al. 2022), though this assembly assigns sequence to 32 chromosomal scaffolds, in contrast to the 33 chromosomal scaffolds previously presented. A complete, gapless, circular mitochondrial genome of 70,793 bp was also Fig. 6. Genome-wide patterns of gene age and expression. Per-chromosome genomic distribution of Nanopore polyA+ RNA-Seq (top line graph, long RNA), expression of small non-coding RNAs (RPKM, second line graph, sRNA), gene age (third line graph, old to young corresponding to phyloranks 1 to 9), small RNA loci (top ideogram, sRNA), and highly methylated CGs (bottom ideogram, 5 mC values > 80% are shown, 5 mC). Color gradients of line graphs match the y-axis scales. Grey line graphs overlapping with gene ages represent the mean of 1,000 random permutations of gene ages, and a paired t-test (grouped by chromosome) was used to test the significance of observed gene age distributions relative to random permutations P adj ). Values used for nonparametric linear regressions of long and small RNA expression are normalized RPKM calculated in 200 bp bins. Gene ages are regressed and plotted following chromosomal gene distribution (not binned). Blue-shaded regions highlight 2 loci containing the youngest genes and the highest concentration of highly expressed small RNA loci. The first 8 chromosomes are shown here, and chromosomes 9-32 are shown in Supplementary Fig. 4a.
assembled, a size consistent with a previous assembly of the R. irregularis mitochondria (Lee and Young 2009). This novel assembly, alongside a high-quality genome annotation of R. irregularis, consisting of gene models with corrected structures, splice junctions, and untranslated regions, will further aid research into R. irregularis and AM fungal biology, as well as comparative genomics approaches. This highly contiguous genome assembly enabled an analysis of chromosomal distributions of R. irregularis genomic features and gene and small RNA expression. This supports a previous observation of functional and evolutionary genome compartmentalization in R. irregularis (Yildirir et al. 2022) and builds on this work by showing that chromosomes contain highly expressed regions with highly conserved genes, and lowly expressed regions hosting more recently evolved genes. This is reminiscent of the 2-speed genome model, which has been described in filamentous phytopathogens (Torres et al. 2020) and proposed to exist in AM fungi (Reinhardt et al. 2021;Yildirir et al. 2022). According to this model, fast-evolving virulence-associated genes are compartmentalized into repeat-rich genomic regions or accessory chromosomes that are depleted of conserved housekeeping genes. In the plant pathogenic fungus Sclerotinia sclerotiorum, small RNAs originate from transposable elements in polymorphic genome compartments (Derbyshire et al. 2019). In R. irregularis, quantitative evidence for differential evolutionary speed and sequence variation in genomics compartments is lacking. Nevertheless, evolutionary patterns of genomic architecture can be observed, as well as small RNA production in regions with evolutionarily young coding spaces (Fig. 6). Evolutionary and functional compartmentalization of genes is likely not limited to species with pathogenic lifestyles, and future work will further elucidate its role as a general evolutionary feature. Analysis of genome-wide patterns of small RNA expression may point to loci that encode the basis for lineagespecific adaptations and diversification in AM fungi and may facilitate studies into adaptive structural and sequence variation.
With the increasing number of reference genomes available for Earth's biodiversity (Lewin et al. 2018) and the development of efficient algorithms for sequence analysis (Buchfink et al. 2021;Jumper et al. 2021), characterization of genes and genomes can harness comparisons at tree-of-life scale. This study used a phylostratigraphic gene age inference tool that performs alignments against the entire NCBI non-redundant protein database to trace back the emergence of R. irregularis genes ( Barrera-Redondo et al. 2023). Genetic machinery for phosphate, ammonium, monosaccharide transport, ion transmembrane transport, and a group of transmembrane ion channels were found to have evolved at or before the Mucoromycota phylorank, thereby predating the emergence of Glomeromycotina. The evolution of ion transporters in AMF's ancestors may have been crucial for maintaining intracellular ion balance in organisms that harvest high levels of negatively charged phosphate from the soil. A similar phenomenon was observed in the genomes of saprotrophic fungi, which encode the symbiosis toolkit of their successor ectomycorrhizal species (Hess et al. 2018;Miyauchi et al. 2020). Similarly in plants, the genetic basis for symbiont perception, nodule organogenesis, and nitrogen-fixation genes already existed in the common ancestor of nitrogen-fixing legumes and diversified in downstream nitrogenfixing lineages (Libourel et al. 2022 The detection of a gene birth event accompanying the emergence of Glomeromycotina highlights the existence of previously undescribed lineage-restricted innovation. Gene birth events associate with the emergence of ectomycorrhizal lifestyles (Hess et al. 2018;Miyauchi et al. 2020) and of rhizoid and root development in land plants ( Barrera-Redondo et al. 2023). While the loss of the gene encoding de novo FAS activity likely played a major role in creating dependence to externally supplied carbon (Trepanier et al. 2005;Bravo et al. 2017;Jiang et al. 2017;Keymer et al. 2017;Luginbuehl et al. 2017;Malar et al. 2021), the birth of lineage-restricted genes such as the transcription factors identified here may also underlie an evolutionary transition in the Glomeromycotina subphylum.
Supplemental material available at G3 online.

Funding
This work was supported in whole or in part by Cancer Research UK (C13474/A18583, C6946/A14492) and the Wellcome Trust (219475/Z/19/Z, 092096/Z/10/Z) to EAM. JSL was supported by the Max Planck Society and JB-R by the European Research Council (grant agreement 864038 to Susana Coelho).

Conflicts of interest
The author(s) declare no conflict of interest.

Author contributions
BFM: conceptualization, formal analysis, investigation, visualization, methodology, writing-original draft, writing-review and editing. JSL: formal analysis, investigation, writing-review and editing. JB-R: formal analysis, investigation, writing-review and editing. TL: formal analysis, investigation, writing-review and editing. GY: resources, writing-review and editing. JS: writingreview and editing. NC: writing-review and editing. UP: writing -review and editing. EAM: funding acquisition, writing-review and editing. AD: conceptualization, formal analysis, investigation, visualization, methodology, project administration, writing-original draft, writing-review and editing.